[sorry this post is a little sloppy – I had a hard stop at 7:00 pm and wanted to get it out the door.]
During the last day of my machine learning class we discussed “word2vec.” I’d heard of it before because of a couple of tweets and blog posts from Jordan Ellenberg. For example:
I was reviewing this blog post during one of the breaks and stumbled on this passage:
During class I couldn’t figure out how to think through this problem, but on the bike ride home I had an idea that seemed to work.
I don’t actually know if I’ve arrived at a “close enough” answer by coincidence because higher dimensional geometry is strange. Here are a few example projects that I’ve done with the boys which range from fun to mind blowing:
One strange thing that comes to mind immediately in the statement from Ellenberg’s blog is that it must be somehow hard to find vectors in higher dimensions that meet at angles close to 0 degrees or 180 degrees. But why?
My solution to the 3 million points on a 300 dimensional sphere problem on the way home went something like this:
(1) Use a 2x2x . . . x2 cube with center at the origin and with all verticies having coordinates in every dimension that are either +1 or -1.
(2) Assume that the first vector is the 300-dimensional vector (1,1, . . . ,1)
(3) Now form 3 million 300-dimensional vectors with +1 or -1 randomly chosen for each position in the vector.
(4) By the dot product formula for vectors, the smallest dot product will product the largest angle, so now we just have to figure out how many -1’s the vector with the most -1’s should have.
(5) The number of -1’s in our vectors should be described by the binomial distribution with a mean of 150 and a standard deviation of
(6) To get to a 1 in 3 million chance we have to go out about 5 standard deviations (so about 43 away from the mean) but just to double check I ran this handy dandy Mathematica code:
sure enough, with 3,000,000 trials we expect about 1 vector to have 192 to 193 -1’s . Let’s say 192.
(7) So, that vector is going to have 84 more -1’s than +1’s so the dot product with our vector (1,1,1, . . . . , 1) is going to be -84. That means the cosine of the angle between the two vectors will be -0.28.
So, close to what Ellenberg stated in his blog.
I’m not sure that this method is 100% right, but think it captures the general idea. It certainly helped me understand why it is hard for random vectors to be parallel (or anti-parallel) in high dimensions.