The Phrasal Lexicon

(Becker, 1975) at Theoretical Issues in NLP (an ACL workshop)

This paper is rather tongue-in-cheek, and it’s a real delight to read. It argues that phrases, not just words, represent free-standing concepts. We manipulate words and phrases to assemble new meanings. It has just enough of a point to it that it doesn’t deserve publication in SIGBOVIK, and there’s a twist ending that I won’t give away.

Becker starts off by saying that we, unlike physicists, deal with a mesy set of problems. When we make simplifying assumptions about our sentences, they now behave nothing like reality. This brings us to the first idea:


Any theory (or partial theory) of the English Language that is expounded in the English Language must account for (or at least apply to) the text of its own exposition.

Stray thought: Has anyone read the Wikipedia article on foreshadowing? It’s stunningly terse.

Anyway, his criterion is used to wipe the slate of theoretical linguistics clean. It (no doubt) generalizes to other languages besides English. He further attacks theoretical linguists for having lost ground to computational methods. (That feels like he’s just begging for enemies at this workshop.)

Now to the meat of the paper. He presents six categories that phrases can take on, avoiding the dismissive “That’s just an idiom”:

  1. Polywords: multi-word phrases that are locked in. “to blow up” == “to explode”. “for good” == “forever”
  2. Phrasal constraints: slightly less locked in. You can have something “by pure coincidence” or “by sheer coincidence”.
  3. Deictic locutions: Tools for altering the flow of conversation, like “for that matter” or “by the way”.
  4. Sentence builders: essentially templatic sentences. [A] gave [B] a (long) song and dance about [topic].
  5. Situational utterances: Complete sentences that are largely phatic, and often socially expected. “You’re too kind” and “How can I ever repay you?” qualify.
  6. Verbatim texts: more-or-less memorized quotes. “Better late than never” or “I’ve got ninety-nine problems, but [subject] ain’t one.”

He posits that memorization and mimicry are how we get around problems, and the use of phrases is merely one such example. In this way, the paper nicely connects the look at multi-agent generation of language and the importance of phrase-based MT.

The bottom line:

  • (Look at that, another phrase.) Becker nicely gives his own bottom line.


    Elegance and truth are inversely related.

Written on March 8, 2018