Plot for the blog post

#collaps
fig, axes = plt.subplots()
sns.distplot(np.random.poisson(2, 10000), kde = False, ax = axes, color = 'red', norm_hist = True)
sns.despine(fig)
plt.savefig(r"./my_icons/statistics/distributions/poisson.png")

Poisson distribution

We say $y$ has a poisson distribution if its probability mass function is of the form

$p(y|\lambda)=\frac{\lambda^y e^{-\lambda}}{y!}$ for $y = 0, 1, 2,...$

This distribution is useful for modeling rare events. $\lambda$ is often interpreted as rate since the data is often in relation to some baseline for example the number of infected per 1000 population or number of defect machines per hour. An important thing to keep in mind is that draws from the distribution are independent, so the example with the infected people shouldn't be modeled by a poisson distribution since this clearly changes the belief of what one would expect from the next draw.

So if you want to model your data as if it were drawn for a hypothetical poisson distribution you need to think if your data is actually independent of one another.

Another implicit assumption is that the rate parameter doesn't change. This is especially important when dealing with units over time.

Properties of the distribution

The mean and the variance of the poisson distribution are the same

$E(y)=Var(y)=\lambda$

poisson.mean(3) == poisson.var(3)
True

MLE - Estimator

The maximum likelihood estimator for the mean is simply the sample mean

$\hat{\lambda}=\frac{1}{n} \sum_{i=1}^ny_i=\bar{y}$

Asymptotics

Forgetting the general likelihood theroy for a second, since the estimator is just the mean of iid realisations we can directly apply the central limit theorem to construct the asymptotic distribution of the estimator, namely the distribution is given by

$\sqrt{n} \frac{\bar{y} - \lambda}{\sqrt{\lambda}} \sim N(0,1)$

where I used the fact that the expected value and the variance are equal.

To convice ourself we can write a small simulation. Using numpy broadcasting we can do this monte carlo simulation basically in one line of code!

Confidence intervall for $\hat{\lambda}$

Wikipedia says the confidence interval for a sample from a poisson distribution can be calculated as follows

$ci_{under} = \frac{1}{2} \chi^2(\frac{\alpha}{2}, 2 k)$

$ci_{upper} = \frac{1}{2} \chi^2(1 - \frac{\alpha}{2}, 2 k + 2)$

Note that $k$ is the number of events. This means if we have given a sample $y_1, y_2,...y_n$ we take the sum to get the total number of events $k = \sum_i y_i$, then calculate the confidence interval for this "draw" from the artfical distribution and then transform the confidence limits back to our scale. I learned this "trick" from this blog post.

Lets test this by simulating some data, say from a poisson distribution with $\lambda = 2$ and $n = 100$

def conf_int(n, lambda_,  alpha = 0.05):
    """Calc confidence interval as in https://www.wikiwand.com/en/Poisson_distribution"""
    
    
    events = lambda_ * n # calculates total number of events
    
    
    a, b = .5 * chi2.ppf(alpha / 2, 2 * events), .5 * chi2.ppf(1 - alpha / 2, 2 * events + 2)
    
    return a / n, b / n 

# draw 1000 times a sample, calualte for each conf int and check if true lambda is included -> expected freq is around 0.95
n = 100
lambda1 = 2
interations = 1000

count = 0
for i in range(interations):
    
    sample = np.random.poisson(lambda1, n)
    
    
    c, d = conf_int(n, sample.mean())
    
    
    if (lambda1 > c) and (lambda1 < d):
        count += 1

print(count / interations)
0.96

Plot of the density for various levels of $\lambda$

We see for small $\lambda$ the poisson distribution is quite right skewed but for larger values it becomes more symmetric and for very large values it can be approximated quite well by a normal distribution with mean equal to lambda and standart deviation equal to the square root of lambda. Note that the poisson distribution is discrete while the normal distribution is continuous!

The connection between the binomial and poisson distribution

Lets plot the binomial distribution for $p=0.05$ and $n=50$ and do the same for the poisson distribution with $\lambda=p n$

We see that the poisson distribution is a good approximation even for relativly small $n$.

Conclusion: So in an real world example, say we model the spread of a diseas in 10 citys. If there are around 100 infected per city and the total population in each city is 200, then use a binomial distribution to model this. If the population is upwards of 1.000.000 then use poisson.

Real world example

I got the number of drowned people in germany per year from 1995 to 2019. Since one can expect the number of drowned people each year to be independent, the data can be model by a poisson distribution. Hence $\lambda$ an be interpreted as the mean number of drowned people per year.

# prepare data
year = list(range(1995, 2020))

drown_data = [680, 509, 602, 477, 597, 429, 520, 598, 644, 470,
              477, 606, 482, 475, 474, 438, 410, 383, 446, 392,
              488, 537, 404, 504, 417]

data = pd.DataFrame(dict(year = year, events = drown_data))

# calculate some confidence intervalls for the bar plot
xerr, yerr = conf_int(data.shape[0], np.array(data.events))

ci = np.concatenate([yerr, xerr]).reshape(2, -1)
ci =  np.array(data.events).reshape(1, -1) - ci
ci = np.abs(ci)
fig, axes = plt.subplots(figsize = (10, 10))
plt.xticks(rotation=45)
sns.barplot(x = 'year', y = 'events', data = data, ax = axes, ci = None, yerr = ci)
sns.despine(fig)

The estimated drownings per year and the confidence interval is given by

est_lambda = data.events.mean()

print("Estimated lambda parameter", est_lambda)

conf_int(data.shape[0], est_lambda)
Estimated lambda parameter 498.36
(489.64710889281446, 507.1890140859974)

We are now ready to answer questions such as what is the propability that there are more than 1000 drownings in a two year interval. Which can be calculated as follows:

First we bring lambda to the correct scale: 498.36 mean drownings for 1 year -> 498.36 * 2 mean drownings in a two year period.

Then calculate the propability as $P(y > 1000) = 1 - P(y <= 1000)$ where y is poisson distributed with the adjusted lambda parameter

1 - poisson.cdf(1000, est_lambda * 2)
0.45028889930312177

Another example: We saw that there are only few years where there were more than 500 deaths. In how many years, on average, we will see again a year with more than 500 deaths ?

Answer: First calculate the probability $P(y > 500)$.

prob = 1 - poisson.cdf(500, est_lambda); prob
0.45888036873418414

By using the geometric distribution we can conclude that it will be in around ~2 years.

1 / prob
2.179217216806393

Baysian estimation of rate parameter

If we want to be more baysian we can do this quite easily in pymc3. In the following I will take a flat prior over the mean rates over the interval $[0, 1000]$.

/home/christopher/anaconda3/envs/new_base/lib/python3.8/site-packages/arviz/data/io_pymc3.py:85: FutureWarning: Using `from_pymc3` without the model will be deprecated in a future release. Not using the model will return less accurate and less useful results. Make sure you use the model argument or call from_pymc3 within a model context.
  warnings.warn(
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f974bbd4b80>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f974bbc0f40>]],
      dtype=object)