Last week I saw a amazing new result about primes by two mathematicians at Stanford – Robert J. Lemke Oliver and Kanna Soundararajan – via an Evelyn Lamb article:

Peculiar Pattern found in “Random” Prime Numbers by Evelyn Lamb

Erica Klarreich at Quanta magazine also wrote a fantastic article about the result:

Mathematicians Discover Prime Conspiracy by Erica Klarreich

and there’s also a neat discussion of the result on Terry Tao’s blog:

Terry Tao’s blog post about the new result

After seeing the two articles (I only saw Tao’s blog post today) I thought it would be fun to play around with some similar ideas and chose to look at the last digits of triples of consecutive primes. Over the course of the week I was able to use a simple program in Mathematica to count how often the different triples of last digits occur in consecutive primes in the first 10 billion primes. Right from the start I found something I didn’t expect – counting the occurrence of the triples of last digits seemed to pair the sets of last digits quite naturally into groups of 2.

For example, for 3 consecutive primes in the first 10 billion prime numbers the last digits (3,7,1) occur 178,500,881 times and the last digits (9,3,7) occur 178,500,928 times. Another example of the strange grouping is that the triple (1,1,3) occurs 147,750,170 times and the triple (7,9,9) occurs 147,761,746 times. Weird – what’s causing this clustering?

All of my data is in the google doc linked below. I’m sorry that the data in the google doc isn’t organized very well – I was just playing around for myself, but thought that it might be fun to share anyway:

My google doc with all of the data I collected this week

I didn’t really study any number theory in college or graduate school, so I have essentially no way to know if something like the counts for the last digits of consecutive prime triples pairing up is an easy to prove fact or an impossible to prove fact. After thinking about the strange groups of two for a few days without having any decent ideas I sent an e-mail to authors of the new paper and asked them for help. They wrote back last night – which was super cool! – and provided a (possibly) easy way to think about it. I sort of can’t believe that they wrote back, but I’m really excited to spend a bit more time trying to understand their explanation.

Receiving their e-mail got me even more interested in / excited about their paper, so I spent several hours today going through it one more time. The results and conjectures are general enough to apply to the problem of consecutive triples and that led me to try to see if the paper could help me get a better understanding of the data I’d collected. Happily, I was able to understand a bit more of the paper the 2nd time through,

With sort of an “I know enough to be dangerous” understanding I attempted to predict the number of various prime triples in the next set of 1 billion primes (so, last digits of three consecutive primes from the 10 billionth prime number to the 11 billionth prime number). My guesses are in column R and column U of the “Approximations” tab in my google doc. The results should be in tomorrow morning 🙂

One fun thing about the two sets of guesses is that the sum of the guesses for all of the triples adds up to almost exactly 1 billion! Since I’m looking at 1 billion primes the sum be 1 billion, but I didn’t take that constraint into account (not directly anyway) when I was playing with the numbers.

One other bit of structure I was able to notice in the data after re-reading the paper today was a different set of clustering. The triples with three of the same numbers have the lowest counts, triples with two of the same number in a row have (generally) the next lowest counts, triples with two numbers that are the same, but not in a row have (generally) the next lowest counts, and triples with three different numbers have (generally) the highest counts. *I think* their paper predicts this ordering.

So, a really fun week of playing around with prime numbers. There are still a few things to think about – the e-mail from the paper’s authors, and seeing if there’s any way to improve the predictions – but I’m extremely happy with how this little side project went this week. Haven’t had that much fun learning new math in a long time 🙂

I thought the 3-same, 2-same, all-different, clustering you mention toward the end was actually the most obvious clustering since it follows directly from their paper (just further confirmation of their claim) — BUT why the exact order within those clusters might be further interesting to parse out (why do 9,1,7, and 3,9,1, come at end, and not earlier on, or maybe they already understand the order?)

Also, those few anomalies where certain pairings (by modulo) aren’t actually the closest ones numerically seems needing further exploration?

Finally, I’m not clear, what is the difference in how you made the R & U predictions?

I hadn’t fully appreciated how the paper applied to triples and that the calculation of the C1 and C2 constants depended on consecutive digits being the same. That idea didn’t click until today (and I actually do t know how to calculate the C2 constants).

The difference between the two predictions was that the second ignored the data from the first billion primes. It always looked off when I made charts, though it wasn’t actually too far off the paper’s predictions.

The predictions themselves try to calculate the c2 constants from the data and also assume that the next term will have a factor of 1/(log(x) * log(log(x))). That approximation seemed to fit the data pretty well (and it seemed like the natural choice for the next term in the series)

Sorry if there are lots of typos – I’m on my phone.

Thanks… and now looking through the 10 billion data the pairing “anomalies” I referenced above (that I think were present at the cumulative 7 or 8 billion mark) seem to have washed out — that’s nice! I’ll never understand this, but at least it has a certain elegance to it.

just glancing at your final results I can’t tell if one prediction did significantly better than the other (though looks like maybe U was a little better overall, but pretty mixed???)

Did you see KW Regan’s followup to all this (the meat of it is beyond my grasp, but still some interesting discussion).

Any chance you’ve heard from anyone else by now (via email) working on different iterations of this whole approach?

My guesses weren’t so good, but it caused me to go back and read the paper more carefully (You’d think I could just read it carefully the first time . . . .)

I now can compute the first couple of terms in the series for triples of the form (a,a,a) and (a,b,a). Computing the C2 term (using the paper’s lingo) is still beyond my reach.

Hopefully will have results up to 20 billion in the next couple of days.

20 billion!… you’re possessed! ;-)) …can’t imagine much will change (except maybe ‘pairings’ getting even closer together?). Are you just trying to hone your predictive model, or thinking something new might appear?

I’d love to figure out something about the next term in the series. My first thought was that it would be 1/( (ln(x)*(ln(ln(x))), but the first cut at that guess wasn’t all that convincing.

Trying to guess at the next term has helped me understand why the mathematicians have been mentioning how slow the convergence is.