This morning Nassim Taleb posed this problem on Twitter:

A later tweet clarified that the problem was asking for “at least 3.”

I wanted to write up my solution because I think that it is important to share the thought process rather than just the answer to a question. I’ve written up solutions to similar problems before and was originally inspired to write posts like this one after seeing Tim Gowers “live blog” a solution to an IMO problem.

A couple of notes to be clear:

(Edit) (0) Make sure to read all the way to the bottom as I now have corrected a counting mistake

(1) I’m not sure this solution is correct (EDIT – with the final correction at the bottom I am now much more confident),

(2) Incredibly, two keyboards broke while typing up this post, so sorry if there are typos. It is a miracle that the latex formulas work.

Anyway . . .

I understood the problem to be the same question as this one:

You have 8 balls and 16 boxes numbered 1 through 16. For each ball you select a box uniformly at random and place the ball in the selected box. After placing the 8th ball in one of the boxes, what is the probability that at least one box contains at least 3 balls.

My first inclination was to do a quick back of the envelope calculation using the Poisson and Binomial distributions:

So, assuming a Poisson distribution for the number of balls in each box (with expected value 1/2), the probability of 3 or more balls in a box is about 1.44%. The probability that no boxes have 3 is (1 – 0.0143877)^16 which is about 80%. So the probability of at least one box having 3 or more balls would be around 20%.

The same calculation assuming a binomial distribution with 8 balls having a 1/16 chance of going in a box leads to a guess of around 16% for the probability of a box having at least 3 balls.

I knew that neither of these approaches was correct, but I thought that starting this way would give me a sense of what the right answer would be.

Next I asked how many different ways there were to put 8 balls into 16 boxes. This question is a classic “stars and bars” question from combinatorics. The answer is which is 490,314.

So, how can I makes sense of these 490,314 different ways to put the balls in the boxes?

My first thought here was to look at the partitions of 8, since no matter how the balls are distributed, the sum of the number of balls in each box has to be 8. Turns out there are 22 ways to partition 8:

Since there are only 5 partitions with only 2’s and 1’s, I figured I’d just count the different ways that those partitions could occur in the 16 boxes.

So, for example:

(1) [1,1,1,1,1,1,1,1] can occur in = 12,870 ways

(2) [2,1,1,1,1,1,1] can occur in * = 80,080 ways

The full calculations for the partitions with only 2’s and 1’s and the ones that include numbers 3 or greater are here:

So, I had to check these calculations a bunch of times since the ones we want to exclude add up to only 258,570 out of 490,314 -> not even close to 80%. But, checking and checking and checking kept getting me to the same answer.

The total ways to distribute the balls in ways that at least one box gets 3 or more balls is 231,744, and thus the probability of a box getting at least 3 balls is 231,744 / 490,314 which is roughly 47.3%. This probability is much higher than I would have guessed.

Later I saw this follow up tweet from Taleb:

I spent the rest of the day trying to come up with a better formulation, but really was not that successful. The best I could come up with was a formulation using generating functions.

The total number of ways to distribute the 8 balls into 16 boxes is given by the coefficient of in the polynomial:

Luckily that number is 490,314.

Similarly, the total number of ways to distribute the 8 balls into 16 boxes with at most 2 balls per box is given by the coefficient of in the polynomial:

and that number is also what we found above -> 258,570.

So, we get the same answer as above.

The generating function approach also gives a way to solve the general problem in the second part of Taleb’s tweet. We’d just look at the coefficients of in these two polynomials:

, and

.

Then do the same calculation as above.

However, although this approach with generating functions is easy to describe, it really isn’t much easier to calculate, so I doubt it was the reformulation that Taleb had in mind.

(EDIT that goes from here to the end)

[I’ve had to steal a keyboard from my son since one broken keyboard doesn’t type symbols like $, [], and {} making latex formulas impossible and the other doesn’t type the letter t. 4:00 am problems!]

At 4:00 am I woke up and realized an error in the approach. By that time there were a few comments on twitter also pointing out the under counting.

the error in counting comes in assuming the each distribution of balls coming from a partition of 8 is equally likely. That’s not the case. The distribution [1,1,1,1,1,1,1,1] can indeed occur in ways, but there are also 8! (8 factorial, I mean, in case it displays incorrectly) arrangements for each of those ways.

However, the distribution [2,1,1,1,1,1,1] which occurs in * has 8! / 2! different arrangements.

So, factoring in that under counting leads to the following correct count for the cases with only 1’s and 2’s:

The 3,567,874,400 cases are out of a total of 16^8 cases and thus the probability of having a 3 in more than one box is given by:

So, the result is (happily) closer to the back of the envelope result that’d get and I don’t have to worry that the Poisson / Binomial approximations are way off!

Now, back to bed ðŸ™‚