The Poisson Distribution
1. Introduction
The Poisson distribution is the workhorse for count data — the number of times something happens in a fixed interval of time, space, or volume. Named after French mathematician Siméon Denis Poisson (1837), it appears everywhere: call-centre arrivals, radioactive decay counts, hospital admissions, insurance claims, defects in manufactured goods, and species counts in ecology plots.
Its appeal is simplicity: the entire distribution is described by a single parameter, \(\lambda\), which is simultaneously the mean and the variance.
2. Mathematical Definition
2.1 Probability Mass Function (PMF)
The probability of observing exactly \(k\) events in an interval is:
\[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots\]
where \(\lambda > 0\) is the expected rate of events per interval.
2.2 Cumulative Distribution Function (CDF)
\[F(k; \lambda) = P(X \leq k) = e^{-\lambda} \sum_{j=0}^{k} \frac{\lambda^j}{j!}\]
2.3 Key Moments
| Statistic | Value |
|---|---|
| Mean | \(\lambda\) |
| Variance | \(\lambda\) |
| Skewness | \(1 / \sqrt{\lambda}\) |
| Kurtosis | \(1 / \lambda\) |
The mean equals the variance — this is the single most important diagnostic property of the Poisson distribution.
3. Visualising the Effect of \(\lambda\)
library(ggplot2)
library(dplyr)
library(tidyr)
x_grid <- 0:30
lam_vals <- c(1, 3, 7, 15)
pmf_df <- lapply(lam_vals, function(l) {
data.frame(k = x_grid, prob = dpois(x_grid, l), lambda = paste0("λ = ", l))
}) |> bind_rows()
ggplot(pmf_df, aes(x = k, y = prob)) +
geom_segment(aes(xend = k, yend = 0), color = "#4E9AF1", linewidth = 0.9) +
geom_point(color = "#4E9AF1", size = 2) +
facet_wrap(~ lambda, scales = "free") +
labs(
title = "Poisson PMF for Selected λ Values",
x = "Count k",
y = "P(X = k)"
) +
theme_minimal(base_size = 13)

As \(\lambda\) increases the distribution shifts right, spreads out, and becomes increasingly symmetric — eventually approximating a Normal distribution.
4. R Functions
R provides four standard functions for the Poisson distribution:
lambda <- 5
# P(X = 3): probability of exactly 3 events
cat("P(X = 3): ", round(dpois(3, lambda), 5), "\n")
## P(X = 3): 0.14037
# P(X <= 3): cumulative probability
cat("P(X <= 3): ", round(ppois(3, lambda), 5), "\n")
## P(X <= 3): 0.26503
# P(X > 3): upper tail (lower.tail = FALSE avoids cancellation)
cat("P(X > 3): ", round(ppois(3, lambda, lower.tail = FALSE), 5), "\n")
## P(X > 3): 0.73497
# qpois: smallest k such that P(X <= k) >= 0.90
cat("90th percentile: ", qpois(0.90, lambda), "\n")
## 90th percentile: 8
# rpois: random samples
set.seed(42)
samples <- rpois(n = 1000, lambda = lambda)
cat("Sample mean: ", round(mean(samples), 4), "\n")
## Sample mean: 4.914
cat("Sample variance: ", round(var(samples), 4), "\n")
## Sample variance: 4.8975
cat("Var / Mean ratio: ", round(var(samples) / mean(samples), 4),
" (should be ≈ 1)\n")
## Var / Mean ratio: 0.9966 (should be ≈ 1)
5. When to Use the Poisson Distribution
Use the Poisson distribution when all five of the following conditions hold:
| Condition | Description |
|---|---|
| Count data | You are counting discrete occurrences, not measuring a continuous quantity |
| Fixed interval | Events are counted over a fixed unit of time, space, or volume |
| Independence | One event does not make another more or less likely |
| Constant rate | The average rate \(\lambda\) does not vary across intervals |
| No upper bound | There is no theoretical maximum number of events |
And as a data-driven check: sample mean ≈ sample variance.
Classic examples
scenarios <- data.frame(
Scenario = c("Hospital ED arrivals / hour",
"Typos per page in a manuscript",
"Radioactive decay events / second",
"Insurance claims / policy / year",
"Manufacturing defects / unit",
"Bacteria colonies per agar plate"),
`Fixed interval` = c("1 hour", "1 page", "1 second",
"1 year", "1 unit", "1 plate"),
`Typical λ` = c("12", "0.4", "0.001", "0.05", "1.5", "8")
)
knitr::kable(scenarios, align = "lcc")
| Scenario | Fixed.interval | Typical.λ |
|---|---|---|
| Hospital ED arrivals / hour | 1 hour | 12 |
| Typos per page in a manuscript | 1 page | 0.4 |
| Radioactive decay events / second | 1 second | 0.001 |
| Insurance claims / policy / year | 1 year | 0.05 |
| Manufacturing defects / unit | 1 unit | 1.5 |
| Bacteria colonies per agar plate | 1 plate | 8 |
6. When NOT to Use the Poisson Distribution
6.1 Overdispersion — Variance > Mean
The most common violation. Real-world count data often show overdispersion: the variance exceeds the mean because events cluster or because individuals differ in their underlying risk. The Poisson model then underestimates variability, producing falsely narrow confidence intervals and inflated test statistics.
set.seed(42)
# Ideal Poisson data
pois_data <- rpois(500, lambda = 5)
# Overdispersed data: Negative Binomial with same mean but extra variation
od_data <- rnbinom(500, mu = 5, size = 0.8)
results <- data.frame(
Dataset = c("Poisson(λ=5)", "Overdispersed (NB)"),
Mean = c(mean(pois_data), mean(od_data)),
Variance = c(var(pois_data), var(od_data)),
`Var/Mean` = c(var(pois_data) / mean(pois_data),
var(od_data) / mean(od_data))
)
knitr::kable(results, digits = 2,
col.names = c("Dataset", "Mean", "Variance", "Var / Mean"))
| Dataset | Mean | Variance | Var / Mean |
|---|---|---|---|
| Poisson(λ=5) | 4.92 | 4.96 | 1.01 |
| Overdispersed (NB) | 4.95 | 34.27 | 6.93 |
A Var/Mean ratio well above 1 is a red flag for overdispersion. Values of 2–10 are common in ecological and healthcare count data.
k_range <- 0:max(od_data)
obs_counts <- as.numeric(table(factor(od_data, levels = k_range)))
exp_counts <- dpois(k_range, mean(od_data)) * length(od_data)
bar_df <- data.frame(
k = rep(k_range, 2),
freq = c(obs_counts, exp_counts),
Type = rep(c("Observed", "Poisson expected"), each = length(k_range))
)
ggplot(bar_df, aes(x = k, y = freq, fill = Type)) +
geom_col(position = "dodge", width = 0.7) +
scale_fill_manual(values = c("Observed" = "#4E9AF1", "Poisson expected" = "#E63946")) +
coord_cartesian(xlim = c(0, 25)) +
labs(
title = "Overdispersed Count Data vs Poisson Expectation",
subtitle = "Poisson underestimates high counts and the zero count",
x = "Count k",
y = "Frequency",
fill = NULL
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")

6.2 Zero-Inflation — More Zeros Than Expected
Some processes generate excess structural zeros on top of the Poisson zeros: e.g., non-smokers (structural zero) vs. smokers who happened to smoke zero cigarettes this week (Poisson zero). Forcing a Poisson fit misses the spike at zero entirely.
set.seed(42)
lambda3 <- 4
n <- 500
# 35% structural zeros + Poisson(4) for the rest
structural_zero <- rbinom(n, 1, 0.35)
zi_counts <- ifelse(structural_zero == 1, 0, rpois(n, lambda3))
cat("Expected zeros under Poisson(4): ", round(dpois(0, lambda3) * 100, 1), "%\n")
## Expected zeros under Poisson(4): 1.8 %
cat("Observed zeros in ZIP data: ", round(mean(zi_counts == 0) * 100, 1), "%\n")
## Observed zeros in ZIP data: 35.4 %
k_range_zi <- 0:max(zi_counts)
obs_prop_zi <- as.numeric(table(factor(zi_counts, levels = k_range_zi))) / n
pois_prop_zi <- dpois(k_range_zi, lambda3)
zi_df <- data.frame(
k = rep(k_range_zi, 2),
prob = c(obs_prop_zi, pois_prop_zi),
Type = rep(c("Observed (ZIP)", "Poisson(4)"), each = length(k_range_zi))
)
ggplot(zi_df, aes(x = k, y = prob, fill = Type)) +
geom_col(position = "dodge", width = 0.7) +
scale_fill_manual(values = c("Observed (ZIP)" = "#4E9AF1", "Poisson(4)" = "#E63946")) +
labs(
title = "Zero-Inflated Data vs Poisson Expectation",
subtitle = "35% structural zeros — Poisson cannot account for the spike at zero",
x = "Count k",
y = "Proportion",
fill = NULL
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")

6.3 Non-Constant Rate or Non-Independence
- Contagion / clustering: one event triggers others (infectious disease, financial contagion). Events are not independent.
- Heterogeneous exposure: individuals have different underlying risk. Model with covariates in a Poisson GLM, or switch to a mixture distribution.
- Time-varying rate: arrivals at an ER peak at noon, not at 3 am. Include time as a covariate or use an inhomogeneous Poisson process.
6.4 Bounded Counts — Use Binomial Instead
If there is a fixed number of trials \(n\) and you are counting successes, the Binomial is the correct model. Poisson approximates Binomial only when \(n\) is large and \(p\) is small.
| Situation | Correct model |
|---|---|
| \(k\) successes in \(n\) fixed trials | Binomial |
| \(k\) events in a fixed interval, no upper bound | Poisson |
| \(k\) events, Var >> Mean | Negative Binomial |
| \(k\) events, excess zeros | Zero-Inflated Poisson |
7. Diagnosing: Is Poisson Appropriate?
7.1 Mean–Variance Ratio
Compute the index of dispersion \(I = s^2 / \bar{x}\):
- \(I \approx 1\) → Poisson consistent
- \(I \gg 1\) → overdispersion → consider Negative Binomial
- \(I \ll 1\) → underdispersion (rare) → consider Conway–Maxwell–Poisson
7.2 GLM Residual Deviance
Fit a Poisson GLM and check whether residual deviance ≈ degrees of freedom:
pois_glm <- glm(od_data ~ 1, family = poisson)
disp_ratio <- pois_glm$deviance / pois_glm$df.residual
cat("Residual deviance: ", round(pois_glm$deviance, 1), "\n")
## Residual deviance: 3019
cat("Degrees of freedom: ", pois_glm$df.residual, "\n")
## Degrees of freedom: 499
cat("Deviance / df ratio: ", round(disp_ratio, 2), "\n")
## Deviance / df ratio: 6.05
cat("(Well-fitted Poisson → ratio ≈ 1)\n")
## (Well-fitted Poisson → ratio ≈ 1)
7.3 Formal Dispersion Test
Under Poisson, \((n-1)s^2/\bar{x} \sim \chi^2_{n-1}\):
dispersion_test <- function(x) {
n <- length(x)
stat <- (n - 1) * var(x) / mean(x)
p_val <- pchisq(stat, df = n - 1, lower.tail = FALSE)
cat(sprintf(" Stat: %.1f | df: %d | p-value: %.4f\n", stat, n - 1, p_val))
}
cat("Poisson data:\n"); dispersion_test(pois_data)
## Poisson data:
## Stat: 503.0 | df: 499 | p-value: 0.4417
cat("Overdispersed:\n"); dispersion_test(od_data)
## Overdispersed:
## Stat: 3456.1 | df: 499 | p-value: 0.0000
8. Alternatives and When to Use Them
8.1 Negative Binomial — Overdispersion
The Negative Binomial adds a dispersion parameter \(\theta\) (size). As \(\theta \to \infty\) it converges to Poisson. Use it whenever the Var/Mean ratio is substantially above 1.
\[P(X = k) = \binom{k + \theta - 1}{k} \left(\frac{\theta}{\theta + \mu}\right)^\theta \left(\frac{\mu}{\theta + \mu}\right)^k\]
library(MASS)
pois_fit <- glm(od_data ~ 1, family = poisson)
nb_fit <- glm.nb(od_data ~ 1)
cat("Poisson AIC: ", round(AIC(pois_fit), 1), "\n")
## Poisson AIC: 4332.2
cat("Neg. Binomial AIC: ", round(AIC(nb_fit), 1), "\n\n")
## Neg. Binomial AIC: 2688.3
theta <- nb_fit$theta
mu_nb <- exp(coef(nb_fit)[[1]])
cat("NB estimated μ: ", round(mu_nb, 3), "\n")
## NB estimated μ: 4.948
cat("NB estimated θ: ", round(theta, 3), " (small θ = strong overdispersion)\n")
## NB estimated θ: 0.787 (small θ = strong overdispersion)
k_grid <- 0:30
model_df <- bind_rows(
data.frame(k = k_grid,
prob = dpois(k_grid, mean(od_data)),
Model = "Poisson"),
data.frame(k = k_grid,
prob = dnbinom(k_grid, mu = mu_nb, size = theta),
Model = "Neg. Binomial")
)
obs_prop <- as.numeric(table(factor(od_data, levels = k_grid))) / length(od_data)
ggplot() +
geom_col(data = data.frame(k = k_grid, prob = obs_prop),
aes(x = k, y = prob), fill = "gray80", color = "white") +
geom_line(data = model_df,
aes(x = k, y = prob, color = Model), linewidth = 1.3) +
scale_color_manual(values = c("Poisson" = "#4E9AF1", "Neg. Binomial" = "#E63946")) +
coord_cartesian(xlim = c(0, 25)) +
labs(
title = "Poisson vs Negative Binomial: Fitting Overdispersed Data",
subtitle = "Gray bars = observed proportions",
x = "Count k",
y = "Probability",
color = NULL
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")

The Negative Binomial tracks the long right tail that Poisson misses entirely.
8.2 Zero-Inflated Models — Excess Zeros
Zero-Inflated Poisson (ZIP) mixes a point mass at zero with a Poisson component:
\[P(X = k) = \begin{cases} \pi + (1-\pi)e^{-\lambda} & k = 0 \\ (1-\pi)\dfrac{\lambda^k e^{-\lambda}}{k!} & k \geq 1 \end{cases}\]
where \(\pi\) is the probability of
a structural zero. The pscl package provides
zeroinfl() for both ZIP and ZINB models. Use
ZINB when you have excess zeros and
overdispersion in the non-zero counts.
8.3 Quasi-Poisson — A GLM Patch
In a GLM context, keep the Poisson log-link but relax the variance assumption to \(\text{Var}(Y) = \phi\lambda\), estimating \(\phi\) from the data. Coefficients are identical to Poisson; standard errors are inflated by \(\sqrt{\hat\phi}\).
qp_fit <- glm(od_data ~ 1, family = quasipoisson)
cat("Quasi-Poisson estimated φ:", round(summary(qp_fit)$dispersion, 3), "\n")
## Quasi-Poisson estimated φ: 6.926
Quasi-Poisson is a convenient patch but does not model the mechanism behind overdispersion. Prefer Negative Binomial when interpretability matters.
8.4 Summary of Alternatives
| Problem | Recommended model |
|---|---|
| Var >> Mean (overdispersion) | Negative Binomial |
| Excess structural zeros | Zero-Inflated Poisson (ZIP) |
| Excess zeros + overdispersion | Zero-Inflated Neg. Binomial (ZINB) |
| Overdispersion, GLM context | Quasi-Poisson |
| Var < Mean (underdispersion) | Conway–Maxwell–Poisson |
| Fixed upper bound, known \(n\) | Binomial |
| Waiting time between events | Exponential |
9. Connections to Other Distributions
9.1 Poisson as the Limit of the Binomial
When \(n\) is large and \(p\) is small such that \(\lambda = np\) remains moderate, \(\text{Binomial}(n, p) \approx \text{Poisson}(\lambda)\).
n_b <- 1000; p_b <- 0.005; lam_b <- n_b * p_b
x_range <- 0:20
approx_df <- bind_rows(
data.frame(k = x_range, prob = dbinom(x_range, n_b, p_b),
Distribution = paste0("Binomial(", n_b, ", ", p_b, ")")),
data.frame(k = x_range, prob = dpois(x_range, lam_b),
Distribution = paste0("Poisson(λ = ", lam_b, ")"))
)
ggplot(approx_df, aes(x = k, y = prob, color = Distribution)) +
geom_line(linewidth = 1.1) +
geom_point(size = 2) +
scale_color_manual(values = c("#4E9AF1", "#E63946")) +
labs(
title = "Poisson as a Limit of the Binomial",
subtitle = paste0("n = ", n_b, ", p = ", p_b, " → λ = np = ", lam_b),
x = "Count k",
y = "P(X = k)",
color = NULL
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")

9.2 Normal Approximation for Large \(\lambda\)
For large \(\lambda\), skewness \(1/\sqrt{\lambda} \to 0\) and \(\text{Poisson}(\lambda) \approx \text{Normal}(\lambda, \lambda)\).
lam_large <- 30
x_seq <- 0:60
norm_df <- bind_rows(
data.frame(x = x_seq, y = dpois(x_seq, lam_large),
Dist = paste0("Poisson(λ = ", lam_large, ")")),
data.frame(x = x_seq, y = dnorm(x_seq, lam_large, sqrt(lam_large)),
Dist = paste0("Normal(μ = ", lam_large, ", σ = √", lam_large, ")"))
)
ggplot(norm_df, aes(x = x, y = y, color = Dist)) +
geom_line(linewidth = 1.2) +
scale_color_manual(values = c("#4E9AF1", "#E63946")) +
labs(
title = "Normal Approximation for Large λ",
x = "k",
y = "Probability / Density",
color = NULL
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")

9.3 Exponential Inter-Arrival Times
If events arrive as a Poisson process with rate \(\lambda\) per unit time, the waiting time between consecutive events follows \(\text{Exponential}(1/\lambda)\). The two distributions are paired: Poisson counts events; Exponential measures the gaps between them.
10. Summary
| Topic | Key Points |
|---|---|
| Use when | Counting occurrences, fixed interval, independence, constant rate, no upper bound, Var ≈ Mean |
| Don’t use when | Var >> Mean, excess zeros, clustered events, fixed \(n\) trials |
| Diagnose with | Var/Mean ratio, deviance/df from Poisson GLM, dispersion test |
| Alternatives | Neg. Binomial (overdispersion), ZIP/ZINB (excess zeros), Quasi-Poisson (GLM patch), Binomial (bounded) |
| R functions | dpois, ppois, qpois,
rpois, glm(family=poisson),
MASS::glm.nb |
| Special connections | Limit of Binomial (large \(n\), small \(p\)); Normal approximation (large \(\lambda\)); Exponential inter-arrivals |