Parsing and Hypergraphs
(Klein and Manning, 2004) at New developments in parsing technology
My favorite things to combine: graph theory and NLP. Today’s paper takes a look at parsing from the lens of Dijkstra’s algorithm for shortest paths. The authors plainly state the key idea of this paper:
There is intuitively very little difference between (a) combining subtrees to form a tree, (b) combining hypotheses to form a conclusion, and (c) visiting all tail nodes of a hyperarc before traversing to a head node.
They use this view to propose a parsing algorithm.
Directed hypergraphs are like normal directed graphs, except each edge (or arc) can connect multiple source nodes to multiple target nodes. (I find this language clearer than ‘head’ and ‘tail’.) Klein and Manning rely only on edges with only multiple sources, which are called B-arcs. Now, in the logical deduction view, each vertex (or node) represents a proposition, so B-arcs represent Horn clauses. The simplest example is a definite clause: check off all the boxes and you know the result is true. “Laptop is packed” and “charger is packed” and “teeth are brushed” means “ready to go to work”. Now, a path in the graph encodes a deduction—a sequence of these edges used to reach a result.
Let’s connect this to parsing. Parsing starts off with a set of production rules—a grammar, \(G\)— and a lattice \(L\) saying which words occur at which spans of the sentence. So let’s initialize each of the \(O(n^2)\) spans as a node. We’ll also create an arc for each production rule connecting its descendants to the span it contains. Finally, we create a special source node, which connects to each word. Now, a parse of a sentence becomes a path from the source node to the root edge.
To score these, we associate a score with each edge in, say, the max-plus semiring. Then you run a shortest-path algorithm (like Dijkstra’s) on the graph to get scores.
There’s more to this paper about efficiency tricks they employed and how there are connections to Knuth’s notion of superior grammars and do deterministic finite-state automata, but the big idea is the connection. That’s the part that will change how you think about each problem.