and today seemed like a good day to revisit it. This project needs Mathematica to do yourself, but I think it is also really interesting to hear what the kids have to say about the maps.

Here’s their first reaction to an animation showing total (population adjusted) cases in the US over time:

I wasn’t happy with the color scheme I chose for the first map, so the main work we did for the project today was making a new map with an improved color scheme. That work required us to look carefully at the data and study the distribution of the population weighted counts by US county. Here’s the new map and how the boys described that work:

This project was a nice way for kids to think about how to present and interpret data. Thanks to Mads Bahrami and to Wolfram for making the original work public.

Last week I saw this fascinating tweet from Scott Gottlieb:

Bernstein out with important report today looking at correlation between mobility trends and Covid outbreaks; predicts states like Arizona, Arkansas, Alabama, Mississippi, North Carolina, South Carolina are likely to see intensification in the epidemic on top of recent increases. pic.twitter.com/P7VZl7Pzjz

— Scott Gottlieb, MD (@ScottGottliebMD) June 19, 2020

I thought the charts in the tweet would be great for a discussion of the pandemic in the US with kids, so I gave it a shot this morning. We walked through the charts one at a time (my kids just finished 8th and 10th grade). Here’s what they had to say:

Chart 1:

The boys were able to understand why separating out the New York area from the rest of the country made sense. They were concerned about states making decisions about reopening when the cases in the country (ex NY area) were not declinng:

The next chart in Gottlieb’s tweet was one I thought was particularly interesting:

I was really interested to see if the kids could understand why the projections based on the aggregate data would be different than the combined state by state projections.

For the third chart, the boys had some really interesting things to say. I was really happy that they noticed that the color scheme changed chart to chart.

Finally, to try to connect some of the ideas we talked about, we went to the FT’s website to study some of the trends in positive tests in the US. We had a good discussion about a few states and then a really nice discussion about log plots. It was great to hear what kids see in all of these charts:

I’m really happy with how this project went – it is nice to hear what kids have to say about different data sets related to the corona virus. Obviously not all of the information about the corona virus is going to be accessible to kids, talking through a few of the ideas that are accessible will really help them understand the pandemic, and the decisions we have to make around the pandemic, much better.

The Washington Post did a nice article last week on measuring the number of deaths related to the corona virus in the US. I learned about it from this tweet from Keith Devlin:

Today I had the boys read the article and we talked through several of the ideas they thought were interesting. Here are their initial thoughts and also their thoughts about how you would count the excess deaths from the graph shown in the cover pic from the article:

My younger son mentioned two ideas that caught his eye in the article – the difference between Republican / Democrat states and the difference in outcomes with large and medium lockdowns. We talked about those ideas here:

My older son had two things that he thought were interesting – the reporting delays and how the article counted the excess deaths vs the corona virus deaths:

Following those discussions we downloaded some data from the CDC’s website to see if we could match the Washington Post’s numbers. We could for Massachusetts, but were off by a bit for Indiana. Not sure why – the trouble of filming this stuff live – but the main ideas was just to show the boys how to check the numbers in articles like these (and why checking is important):

This was a fun project – I think the analysis of excess deaths is a helpful way to understand how bad the pandemic is. I’m glad the Washington Post published this article.

In the video he’s reference some of his earlier work that was really eye-opening to me. Specifically the discussion around 8:50 in this video:

Tonight I had the boys watch the new video and then we discussed the property of heavy tail distributions that Nassim talked about -> especially that the sample mean for heavy tail distributions is likely going to be below the true mean.

We started with some basic ideas from Nassim’s video and then looked at the alpha = 2 case:

Next we looked at the alpha = 1.2 case. Here we began to see clearly how the sample mean underestimates the true mean of the distribution:

Finally, we looked at the alpha = 1.03 case that Nassim mentions in his video and found that almost always the sample mean underestimated the true mean. We also saw that even with 100,000 samples, our sample mean was not even close to the true mean of the distribution.

This was a fun project. A little on the advanced side for kids, but my main hope is that they start to appreciate that an important part of any statistical analysis is understanding the kinds of distributions that you are likely to be dealing with.

Jordan Ellenberg had a great article about corona virus testing in the NYT last week:

The idea of group testing is fascinating all by itself, but it also has some great math lessons for kids in it. I thought it would be fun to introduce some of those ideas to the boys this morning.

We started with a quick explanation of group testing and then looked at a simple case – a group of 16 with 1 person having the virus:

Now we looked at a slightly more complicated case – 100 people and 10 have the virus. It turned out to be a little more difficult to understand than I was expecting, but they made some great progress understanding the ideas as we talked through this case:

Next I had them read and study Ellenberg’s article for about 10 min. Here are their reactions and some of the ideas they thought were important.

It was really cool to hear their ideas about the article. This project help me understand that the group testing idea is harder to grasp than I realized. After a few examples, though, I think Ellenberg’s article was accessible to the boys and helped them understand how / why group testing could be an important step in dealing with the pandemic.

Last week I saw a terrific twitter thread from Natalie Dean, who is an assistant professor of biostatistics at the University of Florida:

A toy example of why test sensitivity and specificity matter in serosurveys.

Imagine population seroprevalence is 4%. Test sensitivity is 80%. Specificity is 99.9%. For a random sample of 3000 participants, you would expect ~100 positives, 3% of which will be false positives. pic.twitter.com/TMyCZGNXkq

Today I used the ideas in Dean’s tweets for a project with my kids. We started by taking a close look at the first example to make sure that we understood all of the concepts and calculations:

Having talked through the concepts in the first example we were now able to take a detailed look at Dean’s second example. This example is really helpful in understanding why sensitivity and specificity are such important ideas:

Now I asked the boys what they thought we could learn from these two examples. Then we looked at Dean’s conclusions:

Finally, I asked the boys to create an example that was similar to Dean’s example – so similar numbers of positives in the test, but with different percentages of the population having the disease. They spent about 10 min playing around in Mathematica and found a good one. Here they talk through what they found (sorry about the tilt in the camera – not sure what happened!):

When I saw Dean’s twitter thread, my hope was that the ideas would be accessible to kids. I think both of my kids were able to learn some important ideas from Dean’s twitter thread, so I’m extremely grateful that she’s taking the time to make these ideas accessible to the public. Definitely follow her to keep up with the latest ideas and research about the corona virus.

Working through some of the ideas made for a welcome distraction yesterday afternoon. When I finished up I thought that there was an important idea in the paper that kids could understand even if the math behind the result was beyond them. That result is here:

So, I started today’s Family Math project by explaining this result in a way that hopefully kids can understand:

Next we went to a short program I wrote in Mathematica to explore this result – here’s how I introduced that program to the boys:

Now I had my younger son vary the number of initial trials we had – my program had been picking 20. We looked to see if we picked “n” numbers that the chance that a new selection would be bigger than the maximum of our n numbers was roughly 1/n.

Finally, I had my older son vary the tail parameter of the Pareto distribution. There’s a little surprise that came from my not updating the program correctly – an accidental good experiment / programming lesson for the boys – but eventually we were able to find that our experiment matched Nassim’s result:

It is fun to try to explain results like this to kids. Again, there’s no chance that they can follow the underlying math, but they can certainly see the ideas from the computer experiment.

The overall idea of Nassim’s paper is also an important statistical point for kids (and everyone!) to understand -> what you see and what you don’t see are both important!

I saw a neat twitter thread from Zachary Binney last week:

The FDA has approved the first antibody test for COVID-19, from Cellex. It theoretically tells you if you've had it & are, as far as we know, immune for some time.

Sensitivity is 93.8%, specificity 95.6%. Sounds great, right?

Well, sort of. (1/6)

— Zachary Binney, PhD (@zbinney_NFLinj) April 2, 2020

The ideas in Binney’s thread are really important if you want to understand testing, so I thought I’d share them with the boys this morning. We started by looking at the thread and then going to Wikipedia to get a few definitions:

Now we went through a few specific examples. For all three we assumed the test was 95% accurate. In our first example we assumed that 5% of the population would have a disease. What is your chance of having the disease if you test positive?

Next we looked at what would happen if only 1% of the population has the disease (sorry the camera wasn’t showing the bottom of the white board here 😦 ):

Finally we looked at what would happen if 30% of the population had the disease:

The problem we are looking at here is a pretty famous one in probability and statistics. Binney’s twitter thread made for a great opportunity to show how the ideas aren’t just theory or problem set problems, too.

Select three points uniformly at random inside of a unit square. What is the expected area of the circle passing through those three points?

This question turns out to have a lot of nice surprises. The first is that exploring the idea of how to find the circle is a great project for kids. The second is that the distribution of circle areas is fascinating.

I started the project today by having the kids explore how to find the inscribed and circumscribed circles of a triangle using paper folding techniques.

My younger son went first showing how to find the incircle:

My olde son went next showing how to find the circumcircle:

With that introduction we went to the whiteboard to talk through the problem that Steve Phelps shared yesterday. I asked the boys to give me their guess about the average area of the circle passing through three random points in the unit square. Their guesses – and reasoning – were really interesting:

Now that we’d talked through some of the introductory ideas in the problem, we talked about how to find the area of a circle passing through three specific points. The fun surprise here is that finding this circle isn’t as hard as it seems initially:

Following the sketch of how to find the circle in the last video, I thought I’d show them a way to find the area of this circle using ideas from coordinate geometry and linear algebra – topics that my younger son and older son have been studying recently. Not everything came to mind right way for the boys, but that’s fine – I wasn’t trying to put them on the spot, but just show them how ideas they are learning about now come into play on this problem:

Finally, we went to the computer to look at the some simulations. The kids noticed almost immediately that the mean of the results was heavily influenced by the maximum area – that’s exactly the idea of “extremistan” that Nassim Taleb talks about!

This project is a great way for kids to explore a statistical sampling problem that doesn’t obey the central limit theorem!

I really love the problem that Phelps posted! It is such a great way to combine fascinating and fundamental ideas from geometry and statistics

I saw a great thread on twitter last week – actually in the reverse order in which the tweets appeared. First I saw from Vincent Pantaloni:

With the N candies experiment, mark K=17, recapture n=17 among which k=2 are marked, you can calculate a first estimate using k/n=K/N which gives N=K/(k/n)=17/(2/17)=144.https://t.co/ZPw7QW5tGa

I thought it would be fun to do a project on this idea with the boys. Unlike a few (or maybe most) of our introductory statistics exercises, the program here was likely going to be too hard for the kids to write themselves, so I just wrote it myself and the boys played with it at the end.

To start I had them look at Pantaloni’s tweet:

Next we looked at Webb’s tweet – this one requires a bit more explanation, but the boys were able to understand what Webb’s animation was showing:

Now I spent 5 min explaining how the program I wrote worked. Since my simulation was quite a bit more simplified than the prior two (and also didn’t have any animation), I wanted to be sure they understood what I was doing before we dove in:

For the first run of my simulation, we looked at a 5000 trials of a pond with 1,500 fish and sampling from 4% of the pond From the conversation here you can hear that the boys are gaining a pretty good understanding of the process and are also able to make sense of the distribution of outcomes:

Finally, we looked at 5000 trials of a pond with 750 fish and sampling from 16% of the pond. Again the boys did a nice job explaining the results.

At the end we talked about why this sort of sampling problem can be really difficult.