Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings
(He et al., 2017) at ACL
A good day is when an NLP paper talks about network science. (Except when they reinvent ideas.)
As an aside, I’m in a great mood today. I hope you are, too.
A symmetric collaborative dialogue is a situation in which both users have the same understanding and access regarding their environment, and they must communicate to achieve a shared goal. In this particular experimental setting, the two agents share some common property and must communicate to discover it.
By asking “do you have anyone who went to columbia?” [sic], B is suggesting that she has some Columbia friends…. Such conversational implicature is lost when interpreting the utterance as simply an information request.
In this task, the authors take crowdsourced conversations as source data and attempt to uncover a mutual friend between the two agents.
To achieve this, the authors take the knowledge graphs stored by each agent and embed them. Each entity that the agent knows about is added to its knowledge graph as a node, and edges representing predicates connect the nodes. For instance, Friends(John, Mary)
would be represented as an edge between John and Mary, labeled “friends”.
To stick more to their formalism, nodes are split among three categories: items, attributes, and entities. A relation like (item1, hasSchool, columbia)
is pretty straightforward—you just need to know that item1
is a unique identifier for an item, used instead of a name. Instead, hasName
could be used to show that item1
’s name is mia
or something. (Because after all, I work with two Matthews, three Ryans, and two Jasons. Both my thesis advisor at SMU and my advisor here at Hopkins are named David.)
You wind up with a bipartite graph. One set is the set of entities, and the other is the combined set of attributes and items. That is, attributes and items never directly touch. They only do it through entities. (It may help to think of it as the three sets, where entities—attribute values—are the connectors between items and attributes. Though this is also totally expressible in a good old SQL database.)
So how do you embed graphs—that is, create a fixed-size vector representation of them? Well, you take certain features of them, then perform a transformation. This project used both dialogue features and the structure in the knowledge base.
- Node (\(v\) for “vertex”) features: things like degree, node type, and whether it’s been mentioned this turn \(t\). All of these together make a vector \(F_t(v)\).
- Mention vectors. These contain information about the conversation up to turn \(t\) about the node \(v\). The process is as follows:
- Concatenate the embeddings of this turn’s utterances.
-
If \(v\) is mentioned this turn, apply a transformation function to the previous turn’s mention vector and this utterance. This gives a value between 0 and 1. Use this to compute a weighted sum between the utterance embedding and the old values. Otherwise, just keep the previous turn’s representation. (If this sounds a lot like an RNN to you, you win!)
EDIT: The mention vector they compute is actually an example of exponential smoothing. The time-step’s smoothing factor \(\lambda_t\) is learned by a neural network. If \(v\) is mentioned this turn, we use the smoothed value. Otherwise, we just propagate the old value forward.
- Recursive node embeddings: When you have a keyword like “columbia”, you can propagate it through the network, making embeddings for each depth \(k\) that it reaches. The concept is that if you have an item who went to “columbia”, you’d want to pull other relevant attributes about that item, in case they’re queried next.
So how did these get turned into questions in a conversation? They use another network, an LSTM which takes in the embedding as an input. They pass the output into a distribution over words and node names.