Working through a neat problem from Martin Weissman’s An Illustrated Theory of Numbers

I just got back from a wo7rk trip to Sydney and I’m going to blame jet lag for goofing up the videos. Because I forgot to zoom out after zooming in during the first video, this is really more of an audio project than a video one!

Today we returned to Martin Weissman’s An Illustrated Theory of Numbers. Flipping through the chapter on prime numbers (which is incredible!) I ran across a problem dealing with the set of numbers {1, 4, 7, 10, 13, \ldots } and thought it would be a great one to talk through with the boys.

It was really fun as you will see hear . . .

I stared by introducing the problem and also making it impossible to see what we were doing:

Next we started playing with the first part of the problem. What we talk through here is this idea from number theory: If two numbers A and B are in our set, and A = B*C, then C is also in the set.

The boys looked at a few examples initially and noticed that lots of numbers in the set didn’t factor in the set. Then they noticed that the problem was really a problem about modular arithmetic.

The next part of the problem we played with was going through an exercise similar to the “Sieve of Eratosthenes” procedure to find the “primes” in our set:

Finally, we took at look at the part of the problem that caught my attention -> find elements of our set that factor into irreducible elements in non-unique ways.

My older son found one example -> 100 = 10*10 = 25*4.

The property of our set shows that the integers factoring into primes in a unique way is actually a pretty special property.

Sorry for the filming screw up – fortunately the visuals for this project were quite a bit less important than average. I’m excited to play around in the project chapter this week – I really love this book!

A fascinating limit to share with both calculus and probability students

I learned an amazing identity from Nassim Taleb earlier in the week. It has taken me a few days to understand it and I wanted to write a quick post just to get my thoughts down in paper before I forget them.

The identity is (sorry for the poor latex formatting):

\lim_{N\to\infty} e^{-N} (1 + N + N^2 /2! + \dots + N^N / N!) = 1/2

There are at least 2 things I think are interesting for students in this identity:

(1) An easy mistake would be to think that the limit is 1.

A great question to ask a student is why isn’t this expression just e^{-N} multiplied by the definition of e^{N}?

(2) The very slick probability-related proof here (scroll up to “the probabilistic way”):

The idea is to view the expression in the limit as a the probability that a Poisson process with an expected value of N has a value of N or less. Then, think of the process as being a sum of N independent Poisson processes with an expected value of 1. By the central limit theorem, this sum converges to a normal distribution with mean N, so exactly half of the distribution will be less than N (in the limit).

This probabilistic interpretation of the limit turns a really difficult computation into a neat and illuminating application of the central limit theorem.

Playing with insurance claims from a Pareto distribution

I’m in Sydney for a week for work. I don’t sleep so well on planes and brought a couple of books that I’ve been meaning to get to forever. One is a book on modeling extreme / rare events:

I didn’t make it too far into the book though, because on page 35 the book makes the (not at all surprising) claim that classical risk theory has to be adjusted because most insurance claims are not accurately modeled with “nice” claim distributions. Rather than dive straight into the theory, I decided to just play around with different claim distributions to see what happens in different claim distribution settings.

My little experiment on the airplane went like this:

Suppose that you have 1,000 claim data points from a distribution. You take a way-too-simple approach and decide you’ll charge a price equal to the mean plus 1/2 of a standard deviation from this distribution to cover each of the next 1,000 claims. Also, your insurance company will start with capital equal to 10x the initial premium.

For various distributions of claims, what are the chances that you end up eventually going insolvent?

Before writing more, though, let me give a simple example. Say that the initial set of claims were simple 50/50 coin flips. You pay $1 for a head and $0 for tails. Here your historical data set of 1,000 claims will have a mean of $0.5 and a standard deviation of $0.5.

Per the convention I set out above, for the next 1,000 coin flips you’ll receive $0.75 of premium to cover each flip and start with $7.5 of capital.

It doesn’t take too much thinking about this situation to see that your chances of going insolvent are nearly 0. One way would be to start with 30 flips in a row coming up heads (collecting $22.5 in premium and paying out $30 in claims) . . . so not much to worry about here.

Here’s picture showing what a typical scenario looks like when the initial 1,000 claims are drawn from a normal distribution with a mean of $20 and a standard deviation of 5. In this case, the mean of the claims was 19.77, half the standard deviation was 2.48. You’ll collect $22.25 to cover each of the next 1,000 claims and start with $222.5 of capital. The graph shows how your capital grows over time covering the next 1,000 claims:

Screen Shot 2018-06-24 at 4.23.00 PM

But what if the distributions aren’t so nice? My choice of not so nice claim distributions were Pareto distributions with minimum value 1 and exponents of 1.2 and 1.8. Would sampling 1,000 “claims” from these distributions and naively calculating the sample mean and standard deviations protect you from insolvency?

Sometimes the experiment went well – for this run (using a tail exponent of 1.2) the mean of the 1,000 sample claims was $4.15, 1/2 of the standard deviation was 5.85. So, you charged roughly $10 to cover each new claim and started with $100 of capital. It was a bumpy ride, but you did make money over time:

Screen Shot 2018-06-24 at 4.31.40 PM.png

But it wasn’t always so nice – here the sample mean was 4.44, 1/2 of the sample standard deviation was 7.27 and the (roughly) 400th claim sunk the company:

Screen Shot 2018-06-24 at 4.34.28 PM

With tons of time to kill on the 17 hour flight, I decided to run two really long experiments. I picked 1,000 different sets of 1,000 initial claims for both the distribution with tail exponent 1.2 and the one with tail exponent 1.8. For each of those 1,000 initial data sets, I ran 1,000 individual experiments to see how often you went insolvent with the naive mean / standard deviation approach.

The two graphs below have the standard deviation of the initial set of claims on the x-axis and the number of trials out of 1,000 in which the company went insolvent on the y-axis

For tail exponent 1.2 the experimental results looked like this:

Screen Shot 2018-06-24 at 4.50.21 PM

Screen Shot 2018-06-24 at 4.56.18 PM

The red curve is a rough fit to the data with an equation of the form y = C / x^{1.2} in the first graph and of the form y = C / x^{1.8} in the second graph.

The data show a couple of neat ideas from this experiment, I think. The first surprising thing with Pareto distributions is how much the standard deviation varies even looking at 1,000 samples from the distribution.

The idea that 1,000 samples might not be even close to enough to get any sense at all about a distribution is one that has been kicking around in my head since I saw this video from Nassim Taleb in 2015 (relevant discussion begins around 8:40). Although the wide variation in the sample standard deviations is actually a different issue that what Taleb is talking about (he’s talking about trying to get your arms around the mean of the distribution), but still I think the wide variation in the standard deviations here help illustrate that sometimes 1,000 data points isn’t close to enough to start understanding what is going on.

The other thing to notice is that insolvency is roughly inversely related to the standard deviation you see raised to the tail exponent power. The problem here is that the standard deviation is low when there’s been no large “claims” which makes it hard to estimate the tail exponent. Also, if you were ever actually trying to estimate large claim frequency, you’d almost never have 1,000 data points.

Playing with water rockets on Father’s Day

My younger son has a water rocket project for his 6th grade science class. I thought that playing around with his rocket would make for a fun little project this morning because there was an interesting (sort of) math-related question he wanted to answer -> what amount of water would propel the rocket to the maximum height?

What I ended up learning was that being a good 6th grade science teacher is probably a bit out of reach for me . . . .

Here are a few trials (and fails) as we tested various amounts of water:

Here’s our final attempt with 500 ml of water in the rocket and some extra help from me to prevent the rocket from falling over.

Two wrap up we went inside and looked at what data we had.

So, maybe not our best project ever in terms of getting results, but a fun one anyway. Definitely a fun way to spend an hour on Father’s Day.

*Helping kids understand when the Central Limit Theorem applies and when it doesn’t

My older son is studying a bit of introductory statistics right now. I was a little surprised to see this statement in his book:

“. . . you will learn that if you repeat an experiment a large number of times, the graph of the average outcome is approximately the shape of a bell curve.”

I certainly don’t expect middle school / high school textbooks to be 100% mathematically precise, but a little more precision here would have been nice.

For today’s project I decided to show them one example where the statement was true and one where it wasn’t.

For the first example I chose an exercise from the book -> the situation here is a basketball player taking 164 shots and having 64.2% chance of making each of those shots.

Here’s our discussion of that problem (sorry that we were a little clumsy with the camera):

Next we revisited the archery problem that we studied previously. Here’s the problem:

Sharing an advanced expected value problem from Nassim Taleb with kids

Here’s our discussion of this problem today. It is fascinating to see that even with 100,000 trials both the mean and standard deviation of the outcomes jump all over the place.

I think it is really important to understand the difference between these two different types of experiments. Both situations are really important for understand the world we live in!

How a kid approaches a challenging problem

We stumbled on this problem in the book my older son is studying over the summer:

A game involves flipping a fair coin up to 10 times. For each “head” you get 1 point, but if you ever get two “tails” in a row the game ends and you get no points.

(i) What is the probability of finishing the game with a positive score?

(ii) What is the expected win when you play this game?

The problem gave my son some trouble. It took a few days for us to get to working through the problem as a project, but we finally talked through it last night.

Here’s how the conversation went:

(1) First I introduced the problem and my son talked about what he knew. There is a mistake in this part of the project that carries all the way through until the end. The number of winning sequences with 5 “heads” is 6 rather than 2. Sorry for not catching this mistake live.

(2) Next we tried to tackle the part where my son was stuck. His thinking here is a great example of how a kid struggling with a tough math problem thinks.

(3) Now that we made progress on one of the tough cases, we tackled the other two:

(4) Now that we had all of the cases worked out, we moved on to trying to answer the original questions in the problem. He got a little stuck for a minute here, but was able to work through the difficulty. This part, too, is a nice example about how a kid thinks through a tough math problem.

(5) Now we wrote a little Mathematica program to check our answers. We noticed that we were slightly off and found the mistake in the 5 heads case after this video.

I really like this problem. There’s even a secret way that the Fibonacci numbers are hiding in it. I haven’t shown that solution to my son yet, though.

Helping kids understand the math of unfair algorithms – inspired by a Cathy O’Neil talk

Last week I saw Cathy O’Neil talk at Harvard:

Part of the talk was on how algorithms – and black box algorithms, in particular – can create unfair outcomes. O’Neil goes into this topic in much more detail (but also in a very easy to read and understand way) in her book Weapons of Math Destruction.

The talk was part of a conference honoring Harvard math professor Barry Mazur who was O’Neil’s PhD advisor. At the end of the talk one of the questions from the audience was (essentially): What can someone who has a focus on academic math do to help the public understand some of the problems inherent in the algorithms that shape our lives?

O’Neil said (again, essentially) that a good approach would be to find ways to communicate the mathematical ideas to the public in ways that were not “gobbledygook.”

Although I’m not an academic mathematician, this exchange was on my mind and I decided to try out a simple idea that I hoped would help the boys understand how small changes lead can lead to very unequal outcomes. There are no equations in this project, just our new ball dropping machine.

First I asked to boys to look at the result of several trials of the machine dropping balls and tell me what they saw. As always, it is really interesting to hear how kids describe mathematical ideas:

Next I tilted the board a bit by putting a thin piece of plastic under one side. I asked the boys to guess what would happen to the ball distribution now. They gave their guesses and we looked at what happened.

One nice thing was that my younger son noticed that the tails of the distribution were changed quite a bit, but the overall distribution changed less than he was expecting:

I’m sorry this part ran long, but hopefully it was a good conversation.

To finish up the project I tried to connect the changes in the tails of the distribution with some of the ideas that O’Neil talked about on Thursday. One thing that I really wanted to illustrate how small changes in our machine (a small tilt) led to large changes in the tails of our distribution.

I hope this project is a useful way to illustrate one of O’Neil’s main points to kids. Algorithms can create unfairness in ways that are hard to detect. Even a small “tilt” that doesn’t appear to impact the overall distribution very much can lead to big changes in the tails. If we are making decisions in the tails – admitting the “top” 10% of kids into a school, firing the “bottom” 10% of employees, or trying to predict future behavior of a portion of a population, say – that small tilt can be magnified tremendously.

It may not be so easy for kids to understand the math behind the distributions or the ways the distributions change, but they can understand the idea when they see the balls dropping in this little machine.

A challenging but worthwhile probability problem for kids

Alexander Bogomolny shared a great problem from the 1982 AHSME yesterday:

I remember this problem from way back when I was studying for the AHSME back in the mid 1980s. I thought it would be fun to talk through this problem with my older son – it has some great lessons. One lesson in particular is that there is a difference between counting paths and calculating probabilities. It was most likely this problem that taught me that lesson 30+ years ago!

So, here’s my son’s initial reaction to the problem:

Next we talked through how to calculate the probabilities. This calculation gave him more trouble than I was expecting. He really was searching for a rule for the probabilities that would work in all situations – but the situations are different depending on where you are in the grid!

Despite the difficulty, I’m glad we talked through the problem.

(also, sorry about the phone ringing in the middle of the video!)

So, definitely a challenging problem, but also a good one to help kids begin to understand some ideas about probability.

A few intro calculus ideas to help explain why we study basic properties of sums

My older son is doing a some review this summer in the Integrated CME Mathematics III book this summer. The topic in the 2nd chapter of the book is sequences and series. I thought it would be fun to show him where (at least some of) this math leads. So tonight we talked about some basic ideas in calculus.

First I introduced the topic and reviewed some of the basic ideas of sequences and series:

Now we used the ideas from the first part to find the area under the curve y = x by approximating with rectangles:

To wrap up we extended the idea to find the are under the curve y = x^2 from x = 0 to x = 2. It was fun to see that the basic ideas seemed to makes sense to him.

I was really happy with how this project went. Putting these ideas together to calculate the area under a curve – even a simple curve – is a big step. It might be fun to try a few more examples like these before moving on to the next chapter.

Revisiting Graham’s number

We had only a short time this morning for a project. At this point I should know better than to rush things, but I don’t!

Based on some twitter conversations this week I thought it would be fun to revisit talking about Graham’s number. We’ve done several projects on Graham’s number in the past, but not in at least a few years.

To get started, I asked the boys what they remembered:

Next we talked about one of the simple properties of Graham’s number (and power towers) -> they get big really quickly!

Here we talked about why 3^3^3^3^3 is already as about as large as one of the usual “large” properties listed for Graham’s number. Namely, you couldn’t write down all the digits of this number if you put 1 digit in each Planck volume of the universe:

Next we talked through how to find the last couple of digits of Graham’s number. This part of the project is something that I thought would go quickly, but didn’t at all. Still, it is pretty amazing that you can find the last few digits even though there’s next to nothing you can say about Graham’s number.

If you search for “Graham’s Number” in my blog or in google, you’ll find some other ideas that are fun to explore with kids. I highly recommend Evelyn Lamb’s article, too:

Evelyn Lamb’s amazing article about Graham’s number