‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions

(Winn and Muresan, 2018) at ACL

Sometimes one word on a matter isn’t enough. To describe color, we resort to compositional names: “light blue”, “cherry red”. This paper looks at a group of these: comparative names. Those are names like “duller olive” or “slightly paler”.

When naming a target color that is similar to a distractor, we use comparative adjectives (Monroe et al., 2017). This relates to the notion of codability, or entropy: it takes more bits to distinguish between the two colors, because our broad-level categories don’t manage to distinguish them. The work of Winn and Muresan seeks to ground these comparisons. They want to match each descriptor up to a vector, transforming “blue” + “lighter (given that the color is blue)” = “lighter blue”.

The authors used the data of the xkcd color survey, which elicited 954 color names from 500 namings and 222,500 user sessions. Many of these were compositional (though not comparative), and each label had on average 600 datapoints. (I’m curious about the distribution and should dig into the data. No doubt ‘red’ was way more common than ‘filemot’.) Winn and Muresan distilled the data down to 415 tuples like (ʙʟᴜᴇ, ‘lighter’, ʟɪɢʜᴛ ʙʟᴜᴇ), where ʙʟᴜᴇ actually is a set of RGB datapoint. There are 79 unique reference labels (like ʙʟᴜᴇ) and 81 unique comparatives—relatively slim pickings.

The authors’ work centers around one straightforward assumption: that you can compare the averages of the sets to learn the positions. With that, all you’d need is to average each cluster and compare a vector from centroid to centroid—the noise averages out and the signal emerges.

A concern that I have methodologically with a lot of color research is that distances in RGB space don’t align with perceptual judgments of similarity (see enter CIE LAB coordinates), so measurements of human judgments in this space are often hard to interpret. But RGB coordinates were used in the data collection, and there’s a lot of color-and-language work that uses this; it’s natural to follow precedent set by prior work.

The final grounding component of the work considers grounding in language-in-use, going beyond the word–coordinates mappings above. The goal becomes to predict the coordinates of a color in color space, based on the word embeddings of the defining phrase (e.g., [vec("ligher"), vec("blue")]). Because we’re using embeddings in a shared space (and not one-hot word IDs), the model has the ability to predict coordinates for unseen words or combinations. The authors also evaluate with Delta-E, which does account for the “perceptual judgments of similarity” I mention above, and their Figure 4 shows how Delta-E doesn’t match up to RGB-based distance. I think if I were to follow up on this, I would predict in a perceptual space as well, so that the structure of the perceptual color space could bias the learning process.

network architecture diagram

The bottom line

Nobody has done work on grounding comparatives before. It’s great to see color work brought to the forefront at ACL, and I think it serves as a great inroads for people curious about issues of color and computational modeling of language.

[This blog post has been revised since its original posting in 2018.]

Written on October 16, 2018