Yesterday I saw a great introductory stats question thanks to a tweet from Ole Peters. The question is here:

In case it doesn’t come through in the tweet, here’s the problem:

You flip a fair coin 20 times. If this sequence contains at least one HHHH, I pay you $100. If it contains at least one HHHT, you pay me $100. If it contains neither, nobody wins.

The question, essentially, is this -> Would you like to play this game?

I introduced the game to my son and asked him what he thought:

So my son thought that the sequence HHHH would appear more than HHHT. Now we went to a short Mathematica program that I wrote to explore the game:

Next we talked about the surprise – HHHT was much more likely than HHHH, and more than 10x more likely to occur alone. The idea here was a little hard for him to see, but eventually he was able to figure out why HHHH was so unlikely to occur alone.

Finally, we went back to the whiteboard to talk through the details one more time. What I was trying to talk about here – and unfortunately not doing a great job of articulating – was:

(i) Why does HHHH occur alone so infrequently,

(ii) Why do the sequences HHHH and HHHT occur together so much, and

(iii) Why does HHHT occur alone much more frequently than HHHH?

I think this is an absolutely amazing introductory statistics problem for kids to think through. It is a really neat problem all by itself, but it also helps kids see that analyzing a time series of data – even a simple one – can be surprisingly subtle!