I’m in Sydney for a week for work. I don’t sleep so well on planes and brought a couple of books that I’ve been meaning to get to forever. One is a book on modeling extreme / rare events:
I didn’t make it too far into the book though, because on page 35 the book makes the (not at all surprising) claim that classical risk theory has to be adjusted because most insurance claims are not accurately modeled with “nice” claim distributions. Rather than dive straight into the theory, I decided to just play around with different claim distributions to see what happens in different claim distribution settings.
My little experiment on the airplane went like this:
Suppose that you have 1,000 claim data points from a distribution. You take a way-too-simple approach and decide you’ll charge a price equal to the mean plus 1/2 of a standard deviation from this distribution to cover each of the next 1,000 claims. Also, your insurance company will start with capital equal to 10x the initial premium.
For various distributions of claims, what are the chances that you end up eventually going insolvent?
Before writing more, though, let me give a simple example. Say that the initial set of claims were simple 50/50 coin flips. You pay $1 for a head and $0 for tails. Here your historical data set of 1,000 claims will have a mean of $0.5 and a standard deviation of $0.5.
Per the convention I set out above, for the next 1,000 coin flips you’ll receive $0.75 of premium to cover each flip and start with $7.5 of capital.
It doesn’t take too much thinking about this situation to see that your chances of going insolvent are nearly 0. One way would be to start with 30 flips in a row coming up heads (collecting $22.5 in premium and paying out $30 in claims) . . . so not much to worry about here.
Here’s picture showing what a typical scenario looks like when the initial 1,000 claims are drawn from a normal distribution with a mean of $20 and a standard deviation of 5. In this case, the mean of the claims was 19.77, half the standard deviation was 2.48. You’ll collect $22.25 to cover each of the next 1,000 claims and start with $222.5 of capital. The graph shows how your capital grows over time covering the next 1,000 claims:

But what if the distributions aren’t so nice? My choice of not so nice claim distributions were Pareto distributions with minimum value 1 and exponents of 1.2 and 1.8. Would sampling 1,000 “claims” from these distributions and naively calculating the sample mean and standard deviations protect you from insolvency?
Sometimes the experiment went well – for this run (using a tail exponent of 1.2) the mean of the 1,000 sample claims was $4.15, 1/2 of the standard deviation was 5.85. So, you charged roughly $10 to cover each new claim and started with $100 of capital. It was a bumpy ride, but you did make money over time:

But it wasn’t always so nice – here the sample mean was 4.44, 1/2 of the sample standard deviation was 7.27 and the (roughly) 400th claim sunk the company:

With tons of time to kill on the 17 hour flight, I decided to run two really long experiments. I picked 1,000 different sets of 1,000 initial claims for both the distribution with tail exponent 1.2 and the one with tail exponent 1.8. For each of those 1,000 initial data sets, I ran 1,000 individual experiments to see how often you went insolvent with the naive mean / standard deviation approach.
The two graphs below have the standard deviation of the initial set of claims on the x-axis and the number of trials out of 1,000 in which the company went insolvent on the y-axis
For tail exponent 1.2 the experimental results looked like this:


The red curve is a rough fit to the data with an equation of the form
in the first graph and of the form
in the second graph.
The data show a couple of neat ideas from this experiment, I think. The first surprising thing with Pareto distributions is how much the standard deviation varies even looking at 1,000 samples from the distribution.
The idea that 1,000 samples might not be even close to enough to get any sense at all about a distribution is one that has been kicking around in my head since I saw this video from Nassim Taleb in 2015 (relevant discussion begins around 8:40). Although the wide variation in the sample standard deviations is actually a different issue that what Taleb is talking about (he’s talking about trying to get your arms around the mean of the distribution), but still I think the wide variation in the standard deviations here help illustrate that sometimes 1,000 data points isn’t close to enough to start understanding what is going on.
The other thing to notice is that insolvency is roughly inversely related to the standard deviation you see raised to the tail exponent power. The problem here is that the standard deviation is low when there’s been no large “claims” which makes it hard to estimate the tail exponent. Also, if you were ever actually trying to estimate large claim frequency, you’d almost never have 1,000 data points.