Word Vectors and Dot Products

A really cool thing about vectors is that they are used to represent and compare a lot of different things that don’t, at first glance, appear to be mathematically representable or comparable. And a lot of this power comes from working with vectors that are “bigger” than the 2-component vectors we have looked at thus far.

\(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\end{bmatrix}\) \(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\end{bmatrix}\)

For example, we could have a vector with 26 components. Some would say that this is a vector with 26 dimensions, but I don’t see the need to talk about dimensions—for the most part, if we’re talking about 26-component vectors, we’re probably not talking about dimensions in any helpful sense, except to help us look smart.

At the right are two possible 26-component vectors. We can say that the vector on the left represents the word pelican. The vector on the right represents the word clap. Each component of the vectors is a representation of a letter from a to z in the word. So, each vector may not be unique to the word it represents. The one on the left could also be the vector for capelin, a kind of fish, or panicle, which is a loose cluster of flowers.

The words, however, are similar in that the shorter word clap contains all the letters that the longer word pelican contains. We might be able to see this similarity show up if we measure the cosine between the two vectors. The cosine can be had, recall, by determining the dot product of the vectors (multiply each pair of corresponding elements and add all the products) and dividing the result by the product of their lengths (the lengths being, in each case, the square root of component12 + component22 . . .). What we get for the two vectors on the right is: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{4}} \approx 0.816}\]

This is fairly close to 1. The angle measure between the two words would be about 35°. Now let’s compare pelican and plenty. These two words are also fairly similar—there is the same 4-letter overlap between the words—but should yield a smaller cosine because of the divergent letters. Confirm for yourself, but for these two words I get: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{6}} = \frac{2}{3}\quad\quad}\]

And that’s about a 48-degree angle between the words. An even more different word, like sausage (the a and s components have 2s), produces a cosine (with pelican) of about 0.3693, which is about a 68° angle.

So, we see that with vectors we can apply a numeric measurement to the similarity of words, with anagrams having cosines of 1 and words sharing no letters at all as being at right angles to each other (having a dot product and cosine of 0).

Published by

Josh Fisher

Instructional designer, software development in K-12 mathematics education.