Understanding the Weibull Distribution


1. Introduction

The Weibull distribution is one of the most widely used distributions in reliability engineering, survival analysis, and failure-time modelling. Named after Swedish engineer Waloddi Weibull (1951), it is prized for its flexibility: by tuning just two parameters it can mimic an exponential, a normal-like bell, or a heavily skewed shape.

Why use the Weibull?

  • Models time-to-failure of mechanical and electronic components
  • Widely used in survival analysis (time-to-event data)
  • Flexible enough to model early failures (infant mortality), random failures, or wear-out failures
  • Foundation of accelerated life testing and warranty analysis

2. Mathematical Definition

2.1 Probability Density Function (PDF)

The two-parameter Weibull PDF is:

\[f(x; k, \lambda) = \frac{k}{\lambda} \left(\frac{x}{\lambda}\right)^{k-1} e^{-(x/\lambda)^k}, \quad x \geq 0\]

where:

Parameter Symbol Role
Shape \(k > 0\) Controls the shape of the distribution
Scale \(\lambda > 0\) Stretches or compresses the distribution along the x-axis

2.2 Cumulative Distribution Function (CDF)

\[F(x; k, \lambda) = 1 - e^{-(x/\lambda)^k}\]

2.3 Survival (Reliability) Function

\[S(x) = 1 - F(x) = e^{-(x/\lambda)^k}\]

2.4 Hazard Function

The hazard function (instantaneous failure rate) is:

\[h(x) = \frac{f(x)}{S(x)} = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}\]

This is one of the Weibull’s most powerful features — the shape of the hazard is entirely determined by \(k\).


3. The Role of Shape Parameter \(k\)

library(ggplot2)
library(dplyr)
library(tidyr)

# Define a grid of x values and shape parameters to compare
x     <- seq(0.01, 4, length.out = 500)
k_vals <- c(0.5, 1, 1.5, 2, 3.5)
lambda <- 1  # fix scale = 1 for clarity

pdf_data <- lapply(k_vals, function(k) {
  data.frame(
    x   = x,
    pdf = dweibull(x, shape = k, scale = lambda),
    k   = paste0("k = ", k)
  )
}) |> bind_rows()

ggplot(pdf_data, aes(x = x, y = pdf, color = k)) +
  geom_line(linewidth = 1.1) +
  labs(
    title    = "Weibull PDF for Different Shape Parameters (λ = 1)",
    x        = "x",
    y        = "Density f(x)",
    color    = "Shape k"
  ) +
  theme_minimal(base_size = 13) +
  coord_cartesian(ylim = c(0, 2))

Interpreting \(k\)

\(k\) value Hazard shape Real-world meaning
\(k < 1\) Decreasing hazard Infant mortality — failures most likely early
\(k = 1\) Constant hazard Random failures — reduces to Exponential
\(k \approx 2.5\) Increasing hazard Approximates a Normal distribution
\(k > 1\) Increasing hazard Wear-out failures — risk grows with age

4. The Role of Scale Parameter \(\lambda\)

lambda_vals <- c(0.5, 1, 2, 3)
k_fixed     <- 2  # fix shape = 2

scale_data <- lapply(lambda_vals, function(lam) {
  data.frame(
    x      = x,
    pdf    = dweibull(x, shape = k_fixed, scale = lam),
    lambda = paste0("λ = ", lam)
  )
}) |> bind_rows()

ggplot(scale_data, aes(x = x, y = pdf, color = lambda)) +
  geom_line(linewidth = 1.1) +
  labs(
    title = "Weibull PDF for Different Scale Parameters (k = 2)",
    x     = "x",
    y     = "Density f(x)",
    color = "Scale λ"
  ) +
  theme_minimal(base_size = 13)

Key insight: \(\lambda\) is the characteristic life — the value of \(x\) at which \(F(x) \approx 63.2\%\) of items have failed, regardless of \(k\).


5. R Functions for the Weibull

R provides four built-in functions for the Weibull distribution:

# Parameters
k   <- 2      # shape
lam <- 3      # scale

# --- dweibull: probability density at x = 2 ---
d_val <- dweibull(x = 2, shape = k, scale = lam)
cat("PDF at x=2:      ", round(d_val, 5), "\n")
## PDF at x=2:       0.28497
# --- pweibull: cumulative probability P(X <= 2) ---
p_val <- pweibull(q = 2, shape = k, scale = lam)
cat("P(X <= 2):       ", round(p_val, 5), "\n")
## P(X <= 2):        0.35882
# --- qweibull: quantile (inverse CDF) — 50th percentile ---
q_val <- qweibull(p = 0.5, shape = k, scale = lam)
cat("Median (50th %): ", round(q_val, 5), "\n")
## Median (50th %):  2.49766
# --- rweibull: random samples ---
set.seed(42)
samples <- rweibull(n = 1000, shape = k, scale = lam)
cat("Sample mean:     ", round(mean(samples), 5), "\n")
## Sample mean:      2.7224
cat("Sample SD:       ", round(sd(samples), 5), "\n")
## Sample SD:        1.42954

6. Key Summary Statistics

For a Weibull(\(k\), \(\lambda\)) random variable \(X\):

Statistic Formula
Mean \(\lambda \, \Gamma\!\left(1 + \frac{1}{k}\right)\)
Variance \(\lambda^2 \left[\Gamma\!\left(1 + \frac{2}{k}\right) - \left(\Gamma\!\left(1 + \frac{1}{k}\right)\right)^2\right]\)
Median \(\lambda (\ln 2)^{1/k}\)
Mode (k>1) \(\lambda \left(\frac{k-1}{k}\right)^{1/k}\)
weibull_stats <- function(k, lambda) {
  mean_val   <- lambda * gamma(1 + 1/k)
  var_val    <- lambda^2 * (gamma(1 + 2/k) - (gamma(1 + 1/k))^2)
  median_val <- lambda * (log(2))^(1/k)
  mode_val   <- if (k > 1) lambda * ((k - 1) / k)^(1/k) else 0
  
  cat(sprintf("Weibull(k=%.1f, λ=%.1f)\n", k, lambda))
  cat(sprintf("  Mean:   %.4f\n", mean_val))
  cat(sprintf("  SD:     %.4f\n", sqrt(var_val)))
  cat(sprintf("  Median: %.4f\n", median_val))
  cat(sprintf("  Mode:   %.4f\n\n", mode_val))
}

weibull_stats(0.5, 1)
## Weibull(k=0.5, λ=1.0)
##   Mean:   2.0000
##   SD:     4.4721
##   Median: 0.4805
##   Mode:   0.0000
weibull_stats(1.0, 1)
## Weibull(k=1.0, λ=1.0)
##   Mean:   1.0000
##   SD:     1.0000
##   Median: 0.6931
##   Mode:   0.0000
weibull_stats(2.0, 1)
## Weibull(k=2.0, λ=1.0)
##   Mean:   0.8862
##   SD:     0.4633
##   Median: 0.8326
##   Mode:   0.7071
weibull_stats(3.5, 1)
## Weibull(k=3.5, λ=1.0)
##   Mean:   0.8997
##   SD:     0.2847
##   Median: 0.9006
##   Mode:   0.9083

7. Hazard Functions Visualised

The hazard function is critical in reliability — it tells us how the instantaneous risk of failure changes over time.

hazard_data <- lapply(k_vals, function(k) {
  h <- dweibull(x, shape = k, scale = 1) / pweibull(x, shape = k, scale = 1, lower.tail = FALSE)
  data.frame(x = x, hazard = h, k = paste0("k = ", k))
}) |> bind_rows()

ggplot(hazard_data, aes(x = x, y = hazard, color = k)) +
  geom_line(linewidth = 1.1) +
  coord_cartesian(ylim = c(0, 5)) +
  labs(
    title  = "Weibull Hazard Function h(x) for Different Shape Values (λ = 1)",
    x      = "Time x",
    y      = "Hazard h(x)",
    color  = "Shape k"
  ) +
  theme_minimal(base_size = 13)


8. Fitting a Weibull Distribution to Data

8.1 Simulate Data

set.seed(123)
true_k   <- 2.5
true_lam <- 4.0

failure_times <- rweibull(n = 200, shape = true_k, scale = true_lam)
summary(failure_times)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.508   2.504   3.526   3.484   4.445   8.897

8.2 MLE Estimation with MASS::fitdistr

library(MASS)

fit <- fitdistr(failure_times, densfun = "weibull")
print(fit)
##      shape       scale  
##   2.6844687   3.9167731 
##  (0.1464182) (0.1085882)
cat("\nTrue parameters:  k =", true_k, "  λ =", true_lam, "\n")
## 
## True parameters:  k = 2.5   λ = 4
cat("Estimated:        k =", round(fit$estimate["shape"], 3),
    "  λ =", round(fit$estimate["scale"], 3), "\n")
## Estimated:        k = 2.684   λ = 3.917

8.3 Goodness-of-Fit: Density Overlay

k_hat   <- fit$estimate["shape"]
lam_hat <- fit$estimate["scale"]

ggplot(data.frame(x = failure_times), aes(x = x)) +
  geom_histogram(aes(y = after_stat(density)), bins = 25,
                 fill = "#4E9AF1", color = "white", alpha = 0.7) +
  stat_function(fun = dweibull, args = list(shape = k_hat, scale = lam_hat),
                color = "#E63946", linewidth = 1.3, linetype = "solid") +
  stat_function(fun = dweibull, args = list(shape = true_k, scale = true_lam),
                color = "#2A9D8F", linewidth = 1.1, linetype = "dashed") +
  labs(
    title    = "Fitted Weibull vs True Distribution",
    subtitle = "Red = fitted MLE | Green dashed = true distribution",
    x        = "Failure Time",
    y        = "Density"
  ) +
  theme_minimal(base_size = 13)

8.4 Q-Q Plot

# Compute theoretical quantiles under fitted model
probs     <- ppoints(length(failure_times))
theor_q   <- qweibull(probs, shape = k_hat, scale = lam_hat)
observed_q <- sort(failure_times)

ggplot(data.frame(theoretical = theor_q, observed = observed_q),
       aes(x = theoretical, y = observed)) +
  geom_point(alpha = 0.5, color = "#4E9AF1") +
  geom_abline(slope = 1, intercept = 0, color = "#E63946", linewidth = 1) +
  labs(
    title = "Weibull Q-Q Plot",
    x     = "Theoretical Quantiles",
    y     = "Sample Quantiles"
  ) +
  theme_minimal(base_size = 13)

Points close to the red line indicate a good Weibull fit.


9. Real-World Example: Bearing Failures

A factory records the operating hours until 50 ball bearings fail. We model these using the Weibull distribution.

set.seed(99)
# Simulate bearing lifetimes (hours)
bearing_hours <- rweibull(50, shape = 3, scale = 2000)

# Fit
bearing_fit <- fitdistr(bearing_hours, "weibull")
k_b   <- bearing_fit$estimate["shape"]
lam_b <- bearing_fit$estimate["scale"]

cat("Estimated shape (k):  ", round(k_b, 3), "\n")
## Estimated shape (k):   2.883
cat("Estimated scale (λ):  ", round(lam_b, 3), "\n\n")
## Estimated scale (λ):   2070.18
# Probability of failure before 1500 hours
p_fail_1500 <- pweibull(1500, shape = k_b, scale = lam_b)
cat(sprintf("P(failure before 1500 hrs) = %.1f%%\n", p_fail_1500 * 100))
## P(failure before 1500 hrs) = 32.6%
# B10 life — time at which 10% of bearings have failed
b10 <- qweibull(0.10, shape = k_b, scale = lam_b)
cat(sprintf("B10 life (10%% fail):        %.0f hours\n", b10))
## B10 life (10% fail):        948 hours
# Mean time to failure
mttf <- lam_b * gamma(1 + 1/k_b)
cat(sprintf("Mean Time to Failure (MTTF): %.0f hours\n", mttf))
## Mean Time to Failure (MTTF): 1846 hours
t_seq <- seq(0, 4000, length.out = 300)
surv  <- pweibull(t_seq, shape = k_b, scale = lam_b, lower.tail = FALSE)

ggplot(data.frame(t = t_seq, S = surv), aes(x = t, y = S)) +
  geom_line(color = "#2A9D8F", linewidth = 1.3) +
  geom_hline(yintercept = 0.90, linetype = "dashed", color = "#E63946") +
  geom_vline(xintercept = b10,  linetype = "dashed", color = "#E63946") +
  annotate("text", x = b10 + 100, y = 0.95,
           label = paste0("B10 = ", round(b10, 0), " hrs"), hjust = 0) +
  labs(
    title = "Bearing Survival (Reliability) Curve",
    x     = "Operating Hours",
    y     = "Survival Probability S(t)"
  ) +
  scale_y_continuous(labels = scales::percent) +
  theme_minimal(base_size = 13)


10. Connections to Other Distributions

Special case Parameters Equivalent distribution
\(k = 1\) Any \(\lambda\) Exponential(\(1/\lambda\))
\(k = 2\) Any \(\lambda\) Rayleigh distribution
\(k \approx 3.6\) Any \(\lambda\) Approximately Normal
\(k = 1, \lambda = 1\) Standard Exponential
x2 <- seq(0.001, 5, length.out = 500)

special <- bind_rows(
  data.frame(x = x2,
             y = dweibull(x2, shape = 1, scale = 1),
             dist = "Weibull(k=1) = Exponential"),
  data.frame(x = x2,
             y = dweibull(x2, shape = 2, scale = 1),
             dist = "Weibull(k=2) = Rayleigh"),
  data.frame(x = x2,
             y = dweibull(x2, shape = 3.6, scale = 1),
             dist = "Weibull(k=3.6) ≈ Normal")
)

ggplot(special, aes(x = x, y = y, color = dist)) +
  geom_line(linewidth = 1.2) +
  labs(title = "Weibull Special Cases", x = "x", y = "Density", color = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top")


11. Summary

Topic Key Points
Shape \(k\) \(k<1\): decreasing hazard; \(k=1\): constant; \(k>1\): increasing hazard
Scale \(\lambda\) Characteristic life; 63.2% of items failed at \(x = \lambda\)
Flexibility Encompasses Exponential, Rayleigh, and near-Normal as special cases
R functions dweibull, pweibull, qweibull, rweibull, MASS::fitdistr
Applications Reliability, survival analysis, warranty, accelerated life testing

Session Info

sessionInfo()
## R version 4.5.3 (2026-03-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Phoenix
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MASS_7.3-65   tidyr_1.3.2   dplyr_1.2.1   ggplot2_4.0.2
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.7.1        cli_3.6.5          knitr_1.51         rlang_1.1.7       
##  [5] xfun_0.56          purrr_1.2.1        generics_0.1.4     S7_0.2.1          
##  [9] jsonlite_2.0.0     labeling_0.4.3     glue_1.8.0         htmltools_0.5.9   
## [13] sass_0.4.10        scales_1.4.0       rmarkdown_2.30     grid_4.5.3        
## [17] tibble_3.3.1       evaluate_1.0.5     jquerylib_0.1.4    fastmap_1.2.0     
## [21] yaml_2.3.12        lifecycle_1.0.5    compiler_4.5.3     RColorBrewer_1.1-3
## [25] pkgconfig_2.0.3    rstudioapi_0.18.0  farver_2.1.2       digest_0.6.39     
## [29] R6_2.6.1           tidyselect_1.2.1   pillar_1.11.1      magrittr_2.0.4    
## [33] bslib_0.10.0       withr_3.0.2        tools_4.5.3        gtable_0.3.6      
## [37] cachem_1.1.0