[sorry this post is a little sloppy – I had a hard stop at 7:00 pm and wanted to get it out the door.]
During the last day of my machine learning class we discussed “word2vec.” I’d heard of it before because of a couple of tweets and blog posts from Jordan Ellenberg. For example:
Jordan Ellenberg’s Messing Around with word2vec
I was reviewing this blog post during one of the breaks and stumbled on this passage:
During class I couldn’t figure out how to think through this problem, but on the bike ride home I had an idea that seemed to work.
I don’t actually know if I’ve arrived at a “close enough” answer by coincidence because higher dimensional geometry is strange. Here are a few example projects that I’ve done with the boys which range from fun to mind blowing:
Did you know that there is a 30-60-90 triangle in a hyper-cube?
Carl Sagan on the 4th dimension
Using snap cubes to talk about the 4th dimension
Counting Geometric Properties in 4 and 6 dimensions
A Strange Problem I overheard Bjorn Poonen discussing
Bjorn Poonen’s n-dimensional sphere problem with kids
A fun surprise in Bjorn Poonen’s n-dimensional sphere problem
One strange thing that comes to mind immediately in the statement from Ellenberg’s blog is that it must be somehow hard to find vectors in higher dimensions that meet at angles close to 0 degrees or 180 degrees. But why?
My solution to the 3 million points on a 300 dimensional sphere problem on the way home went something like this:
(1) Use a 2x2x . . . x2 cube with center at the origin and with all verticies having coordinates in every dimension that are either +1 or -1.
(2) Assume that the first vector is the 300-dimensional vector (1,1, . . . ,1)
(3) Now form 3 million 300-dimensional vectors with +1 or -1 randomly chosen for each position in the vector.
(4) By the dot product formula for vectors, the smallest dot product will product the largest angle, so now we just have to figure out how many -1’s the vector with the most -1’s should have.
(5) The number of -1’s in our vectors should be described by the binomial distribution with a mean of 150 and a standard deviation of
(6) To get to a 1 in 3 million chance we have to go out about 5 standard deviations (so about 43 away from the mean) but just to double check I ran this handy dandy Mathematica code:
sure enough, with 3,000,000 trials we expect about 1 vector to have 192 to 193 -1’s . Let’s say 192.
(7) So, that vector is going to have 84 more -1’s than +1’s so the dot product with our vector (1,1,1, . . . . , 1) is going to be -84. That means the cosine of the angle between the two vectors will be -0.28.
So, close to what Ellenberg stated in his blog.
I’m not sure that this method is 100% right, but think it captures the general idea. It certainly helped me understand why it is hard for random vectors to be parallel (or anti-parallel) in high dimensions.
If you want to choose a random point from a uniform distribution on the (n–1)-sphere, the usual approach is to first take n points from a one dimensional normal distribution as the cartesian coordinates in n-dimensional space, and then renormalize by dividing each coordinate by the distance to the origin (square root of the sum of squares of the coordinates).
interesting – I wouldn’t have thought of that on the bike ride home, that’s for sure!
Love this post and the one other comment on it.
How did you come to the correct choice for the binomial distribution would be a helpful addition though, it’s been a minute