Last digit counts for the first 1 billion twin primes

I’ve been playing around with the list digits of consecutive twin primes for a few weeks. The idea was inspired by the paper of Lemke Oliver and Soundararajan on consecutive primes described by Evelyn Lamb here:

Evelyn Lamb’s article about the new result about last digits of prime number

All of the data I mention below is in this google doc in the “Improved Twin Prime Sheet” tab:

My google doc with twin prime counts – see the “Improved Twin Prime Sheet” tab

As of today (April 21, 2016) I’ve looked at the last digits of consecutive twin primes through the first 20 billion primes. The 20 billionth prime is 518,649,879,439 and there are 1,020,112,181 twin primes up to that point.

Here’s what the counts look like so far:

So, for example, in consecutive twin primes, the last digit pair (1,3) (1,3) appears just over 108 million times.  The average for each of the 9 counts is just over 113 million.

The recent paper on last digits of consecutive primes gives some insight into how the counts of consecutive primes differ from the average count.  The largest term in their formula for the deviation looks like \ln (\ln (x)) / \ln(x)  and given the theoretical formulas for the distribution of twin primes I expected the deviation for the consecutive twin prime counts might be described by a term of the form \ln (\ln(x) ) / \ln(x)^2.   Here, btw, x is the last prime counted, so about 518 billion.

Sure enough, the deviations from the mean seemed similar to this term:

Subtracting away this term we are left with the following errors:

What caught my eye here is that four of the error terms are much larger than the other ones (and this difference was clear in each of the counts from 1 billion up to 20 billion primes).

Here is where things got strange.  Playing around with those four large errors, I noticed that they were weirdly close to the average count divided by 100*e.    So, as of the last count there were just over 1 billion twin primes, the average count for each of the 9 potential last digit pairs was around 113 million, and dividing that number by 100*e I’m left with 416,976.   The average magnitude of the 4 large errors was 414,953:

This pattern was also evident in all of the counts from prime number 1 billion up to the 20 billionth prime.  Here’s a chart of the ratio of the 1/(100*e) term to the average of these 4 large errors.  The ratio is surprisingly close to 1 at every stage.

This was a big surprise because (if the pattern holds) it would mean that there are slightly more consecutive twin primes with last digits (9,1) (9,1) than there are of the form (1,3) (1,3) and (7,9), (7,9).    The ratio of the (1,3) (1,3) count to the (9,1) (9,1) count would be 1 – 1/(100*e) so just slightly less than 1.

My super limited (and that lack of understanding can’t be emphasized enough) understanding of the k-tuples conjecture of Hardy and Littlewood is that the counts of the different last digits of consecutive primes would all be equal as the counts tended to infinity.  So, if this strange 1/(100*e) term sticks around that would mean that consecutive twin primes are just a little bit different than consecutive primes.

I don’t know enough number theory to even begin to understand why any of these deviations actually happen.  Following the ideas from the paper of Lemke Oliver and Soundararajan and correcting for this strange 1/(100*e) term, you get pretty amazing approximations to the last digit counts of consecutive twin primes, though.  Here are the errors the counts so far – remember the average count is 113 million, and just correcting for the \ln(\ln(x)) / \ln(x)^2 term and this 1/(100*e) term reduce the errors down to around 1 part in 1000:

Again, I don’t know enough number theory to understand (much less explain) the structure that I’m seeing, but it sure has been interesting to play around with it and see some of this structure emerge.