Weibull Distribution in Clinical Medicine: Three Applied Examples


Overview

The Weibull distribution is exceptionally well-suited to clinical and biomedical research because its shape parameter \(k\) carries direct interpretive meaning: whether the hazard of an event is decreasing (\(k < 1\)), constant (\(k = 1\)), or increasing (\(k > 1\)) over time. Below are three worked examples — each with simulated data, model fitting, and publication-style plots.


Example 1: Survival Analysis in Oncology

Background

In a randomised clinical trial for advanced non-small-cell lung cancer (NSCLC), patients are followed until death. We fit a parametric Weibull survival model to estimate median overall survival and extrapolate beyond the trial window — a requirement for health-technology assessment (e.g., NICE submissions in the UK).

A shape parameter \(k > 1\) is expected: the hazard of death rises over time as disease progresses.

Simulate Trial Data

set.seed(2024)
n <- 300

# Treatment arm: better survival (larger scale = longer life)
treat <- data.frame(
  time   = rweibull(n, shape = 1.6, scale = 18),
  status = 1,          # 1 = event (death) observed
  arm    = "Treatment"
)

# Control arm: shorter survival
control <- data.frame(
  time   = rweibull(n, shape = 1.6, scale = 11),
  status = 1,
  arm    = "Control"
)

# Add some random censoring (administrative cut-off at 36 months)
df_onco <- bind_rows(treat, control) |>
  mutate(
    time   = pmin(time, 36),
    status = if_else(time == 36, 0, status)   # 0 = censored
  )

head(df_onco)
##        time status       arm
## 1  6.120470      1 Treatment
## 2 19.501032      1 Treatment
## 3  9.914616      1 Treatment
## 4  9.493456      1 Treatment
## 5 15.448686      1 Treatment
## 6  9.416640      1 Treatment

Fit Weibull Survival Models

# Fit Weibull model per arm using survreg (scale = 1/shape in survreg)
fit_treat   <- survreg(Surv(time, status) ~ 1,
                       data   = filter(df_onco, arm == "Treatment"),
                       dist   = "weibull")

fit_control <- survreg(Surv(time, status) ~ 1,
                       data   = filter(df_onco, arm == "Control"),
                       dist   = "weibull")

# survreg parameterises differently: shape = 1/scale, scale_param = exp(Intercept)
extract_weibull <- function(fit) {
  list(
    shape = 1 / fit$scale,
    scale = exp(coef(fit)[["(Intercept)"]])
  )
}

params_t <- extract_weibull(fit_treat)
params_c <- extract_weibull(fit_control)

cat("Treatment arm  — shape k:", round(params_t$shape, 3),
    "  scale λ:", round(params_t$scale, 3), "\n")
## Treatment arm  — shape k: 1.495   scale λ: 19.133
cat("Control arm    — shape k:", round(params_c$shape, 3),
    "  scale λ:", round(params_c$scale, 3), "\n")
## Control arm    — shape k: 1.522   scale λ: 10.051
# Median survival = λ * (ln2)^(1/k)
med_t <- params_t$scale * log(2)^(1 / params_t$shape)
med_c <- params_c$scale * log(2)^(1 / params_c$shape)
cat("\nEstimated median OS — Treatment:", round(med_t, 1), "months\n")
## 
## Estimated median OS — Treatment: 15 months
cat("Estimated median OS — Control:  ", round(med_c, 1), "months\n")
## Estimated median OS — Control:   7.9 months

Kaplan–Meier vs Weibull Fit

t_seq <- seq(0, 48, length.out = 300)   # extrapolate to 48 months (beyond 36-month trial)

weibull_surv <- bind_rows(
  data.frame(
    t   = t_seq,
    S   = pweibull(t_seq, shape = params_t$shape, scale = params_t$scale, lower.tail = FALSE),
    arm = "Treatment"
  ),
  data.frame(
    t   = t_seq,
    S   = pweibull(t_seq, shape = params_c$shape, scale = params_c$scale, lower.tail = FALSE),
    arm = "Control"
  )
)

# Kaplan–Meier
km_fit <- survfit(Surv(time, status) ~ arm, data = df_onco)
km_df  <- data.frame(
  t      = km_fit$time,
  S      = km_fit$surv,
  arm    = rep(names(km_fit$strata), km_fit$strata)
) |>
  mutate(arm = gsub("arm=", "", arm))

ggplot() +
  geom_step(data = km_df,
            aes(x = t, y = S, color = arm), linewidth = 0.8, alpha = 0.6) +
  geom_line(data = weibull_surv,
            aes(x = t, y = S, color = arm), linewidth = 1.2, linetype = "dashed") +
  geom_vline(xintercept = 36, linetype = "dotted", color = "grey50") +
  annotate("text", x = 36.5, y = 0.85, label = "Trial cut-off\n(36 months)",
           hjust = 0, size = 3.5, color = "grey40") +
  scale_y_continuous(labels = percent, limits = c(0, 1)) +
  scale_color_manual(values = c("Treatment" = "#2A9D8F", "Control" = "#E63946")) +
  labs(
    title    = "Overall Survival: Kaplan–Meier (solid) vs Weibull Fit (dashed)",
    subtitle = "Dashed lines extrapolate beyond the trial window for HTA modelling",
    x        = "Time (months)",
    y        = "Survival Probability",
    color    = "Arm"
  ) +
  theme_minimal(base_size = 13)

Hazard Over Time

hz_df <- bind_rows(
  data.frame(
    t       = t_seq,
    hazard  = dweibull(t_seq, shape = params_t$shape, scale = params_t$scale) /
              pweibull(t_seq, shape = params_t$shape, scale = params_t$scale, lower.tail = FALSE),
    arm     = "Treatment"
  ),
  data.frame(
    t       = t_seq,
    hazard  = dweibull(t_seq, shape = params_c$shape, scale = params_c$scale) /
              pweibull(t_seq, shape = params_c$shape, scale = params_c$scale, lower.tail = FALSE),
    arm     = "Control"
  )
)

ggplot(hz_df, aes(x = t, y = hazard, color = arm)) +
  geom_line(linewidth = 1.2) +
  scale_color_manual(values = c("Treatment" = "#2A9D8F", "Control" = "#E63946")) +
  labs(
    title   = "Weibull Hazard Function — NSCLC Trial",
    subtitle = paste0("k > 1 in both arms confirms increasing hazard of death over time"),
    x       = "Time (months)",
    y       = "Hazard h(t)",
    color   = "Arm"
  ) +
  theme_minimal(base_size = 13)

Clinical takeaway: The fitted Weibull (\(k \approx 1.6\)) confirms an increasing hazard of death over time — consistent with progressive disease. The dashed extrapolation beyond 36 months is what HTA bodies use to compute lifetime QALYs and cost-effectiveness.


Example 2: Time to Hospital Readmission (Heart Failure)

Background

Thirty-day readmission for heart failure is a key quality metric penalised under the US Hospital Readmissions Reduction Program (HRRP). Readmission risk is highest in the first few days after discharge, then declines — a decreasing hazard pattern suited to \(k < 1\).

We use a Weibull Accelerated Failure Time (AFT) model to identify patient-level predictors of early readmission.

Simulate Discharge Cohort

set.seed(42)
n <- 500

# Covariates
age          <- round(rnorm(n, mean = 72, sd = 10))
prior_admits <- rpois(n, lambda = 1.5)       # number of admissions in past year
ef_reduced   <- rbinom(n, 1, prob = 0.55)    # 1 = reduced ejection fraction

# Linear predictor (AFT scale): older, more prior admissions, reduced EF = faster readmission
lp <- -0.015 * age - 0.20 * prior_admits - 0.30 * ef_reduced

# Baseline: Weibull with k=0.7 (decreasing hazard), scale=25 days
scale_i <- 25 * exp(lp)   # AFT: covariates multiply the time scale

time_to_readmit <- rweibull(n, shape = 0.7, scale = pmax(scale_i, 1))

# Censored at 30 days (not readmitted within window)
status <- as.integer(time_to_readmit <= 30)
time   <- pmin(time_to_readmit, 30)

df_hf <- data.frame(time, status, age, prior_admits, ef_reduced)

cat("Readmitted within 30 days:", sum(status), "/", n,
    sprintf("(%.1f%%)\n", 100 * mean(status)))
## Readmitted within 30 days: 477 / 500 (95.4%)

Fit Weibull AFT Model

aft_fit <- survreg(
  Surv(time, status) ~ age + prior_admits + ef_reduced,
  data = df_hf,
  dist = "weibull"
)

summary(aft_fit)
## 
## Call:
## survreg(formula = Surv(time, status) ~ age + prior_admits + ef_reduced, 
##     data = df_hf, dist = "weibull")
##                Value Std. Error     z       p
## (Intercept)   2.9225     0.5061  5.77 7.7e-09
## age          -0.0133     0.0069 -1.93  0.0535
## prior_admits -0.1750     0.0556 -3.15  0.0016
## ef_reduced   -0.1739     0.1347 -1.29  0.1967
## Log(scale)    0.3815     0.0368 10.37 < 2e-16
## 
## Scale= 1.46 
## 
## Weibull distribution
## Loglik(model)= -1282.3   Loglik(intercept only)= -1290.4
##  Chisq= 16.32 on 3 degrees of freedom, p= 0.00097 
## Number of Newton-Raphson Iterations: 5 
## n= 500
# In AFT models, exp(coef) is the Time Ratio (TR):
# TR < 1 means the covariate *accelerates* time to event (shortens readmission-free time)
coefs <- coef(aft_fit)[-1]   # drop intercept
TR    <- exp(coefs)

data.frame(
  Covariate    = c("Age (per year)", "Prior admissions (per +1)", "Reduced EF (vs preserved)"),
  Coefficient  = round(coefs, 3),
  Time_Ratio   = round(TR, 3),
  Interpretation = c(
    "Each additional year reduces readmission-free time",
    "Each prior admission shortens time to next readmission",
    "Reduced EF shortens time to readmission"
  )
)
##                              Covariate Coefficient Time_Ratio
## age                     Age (per year)      -0.013      0.987
## prior_admits Prior admissions (per +1)      -0.175      0.839
## ef_reduced   Reduced EF (vs preserved)      -0.174      0.840
##                                                      Interpretation
## age              Each additional year reduces readmission-free time
## prior_admits Each prior admission shortens time to next readmission
## ef_reduced                  Reduced EF shortens time to readmission

Predicted Readmission-Free Survival by Risk Profile

k_aft  <- 1 / aft_fit$scale   # shape

# Define three patient profiles
profiles <- data.frame(
  age          = c(60, 75, 85),
  prior_admits = c(0,  2,  4),
  ef_reduced   = c(0,  1,  1),
  label        = c("Low risk\n(60y, 0 prior, preserved EF)",
                   "Moderate risk\n(75y, 2 prior, reduced EF)",
                   "High risk\n(85y, 4 prior, reduced EF)")
)

t_days <- seq(0, 30, length.out = 200)

pred_df <- lapply(seq_len(nrow(profiles)), function(i) {
  p       <- profiles[i, ]
  log_mu  <- coef(aft_fit)[1] +
              coef(aft_fit)["age"]          * p$age +
              coef(aft_fit)["prior_admits"] * p$prior_admits +
              coef(aft_fit)["ef_reduced"]   * p$ef_reduced
  scale_p <- exp(log_mu)
  data.frame(
    t     = t_days,
    S     = pweibull(t_days, shape = k_aft, scale = scale_p, lower.tail = FALSE),
    label = p$label
  )
}) |> bind_rows()

ggplot(pred_df, aes(x = t, y = S, color = label)) +
  geom_line(linewidth = 1.3) +
  scale_y_continuous(labels = percent, limits = c(0, 1)) +
  scale_color_manual(values = c("#2A9D8F", "#F4A261", "#E63946")) +
  labs(
    title    = "Predicted Readmission-Free Survival — Heart Failure Cohort",
    subtitle = "Weibull AFT model (k < 1 → decreasing hazard: risk highest immediately post-discharge)",
    x        = "Days Since Discharge",
    y        = "P(Not Readmitted)",
    color    = "Patient Profile"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.text = element_text(size = 9))

Hazard of Readmission Over 30 Days

# Illustrate the decreasing hazard for an average patient
avg_log_mu <- coef(aft_fit)[1] +
              coef(aft_fit)["age"]          * 72 +
              coef(aft_fit)["prior_admits"] * 1.5 +
              coef(aft_fit)["ef_reduced"]   * 0.55
avg_scale  <- exp(avg_log_mu)

haz_df <- data.frame(
  t      = t_days[-1],
  hazard = dweibull(t_days[-1], shape = k_aft, scale = avg_scale) /
           pweibull(t_days[-1], shape = k_aft, scale = avg_scale, lower.tail = FALSE)
)

ggplot(haz_df, aes(x = t, y = hazard)) +
  geom_line(color = "#E63946", linewidth = 1.3) +
  labs(
    title    = "Hazard of Readmission Over Time — Average Heart Failure Patient",
    subtitle  = "k < 1 confirms decreasing hazard: highest risk in first 1–3 days post-discharge",
    x        = "Days Since Discharge",
    y        = "Hazard h(t)"
  ) +
  theme_minimal(base_size = 13)

Clinical takeaway: The shape \(k < 1\) confirms that readmission risk is highest immediately after discharge and falls rapidly — reinforcing the value of intensive 48–72 hour post-discharge follow-up calls and rapid outpatient visits for high-risk patients.


Example 3: Implantable Medical Device Reliability — ICD Battery Life

Background

Implantable cardioverter-defibrillators (ICDs) must be replaced when the battery depletes. The timing of replacements follows a Weibull distribution with \(k > 1\) — depletion risk increases with device age. Manufacturers and clinicians use Weibull analysis to:

  • Predict the distribution of replacement surgeries across a device cohort
  • Estimate B10 life (age at which 10% of devices have depleted)
  • Plan elective replacement interval (ERI) alerts

Simulate Device Cohort

set.seed(77)
n_devices <- 400

# ICD battery life in years; k > 1 (wear-out failure pattern)
true_k   <- 4.5    # steep wear-out
true_lam <- 7.2    # characteristic life ~7.2 years

battery_life <- rweibull(n_devices, shape = true_k, scale = true_lam)

# Administrative censoring: study ends at 10 years
status_icd <- as.integer(battery_life <= 10)
time_icd   <- pmin(battery_life, 10)

cat("Devices depleted within 10 years:", sum(status_icd), "/", n_devices, "\n")
## Devices depleted within 10 years: 390 / 400
cat("Mean observed follow-up: ", round(mean(time_icd), 2), "years\n")
## Mean observed follow-up:  6.64 years

Fit Weibull Model

fit_icd <- fitdistr(battery_life, "weibull")   # use uncensored for simplicity
k_icd   <- fit_icd$estimate["shape"]
lam_icd <- fit_icd$estimate["scale"]

cat("Estimated shape k:  ", round(k_icd, 3), "\n")
## Estimated shape k:   4.439
cat("Estimated scale λ:  ", round(lam_icd, 3), "years\n\n")
## Estimated scale λ:   7.283 years
# Key reliability metrics
b10  <- qweibull(0.10, shape = k_icd, scale = lam_icd)
b50  <- qweibull(0.50, shape = k_icd, scale = lam_icd)
mttf <- lam_icd * gamma(1 + 1 / k_icd)

cat(sprintf("B10 life  (10%% depleted): %.2f years\n", b10))
## B10 life  (10% depleted): 4.39 years
cat(sprintf("B50 life  (median):       %.2f years\n", b50))
## B50 life  (median):       6.71 years
cat(sprintf("MTTF (mean battery life): %.2f years\n", mttf))
## MTTF (mean battery life): 6.64 years

Battery Reliability Curve with Key Milestones

t_icd <- seq(0, 12, length.out = 400)
S_icd <- pweibull(t_icd, shape = k_icd, scale = lam_icd, lower.tail = FALSE)
F_icd <- 1 - S_icd

rel_df <- data.frame(t = t_icd, S = S_icd, F = F_icd)

ggplot(rel_df, aes(x = t)) +
  geom_line(aes(y = S), color = "#2A9D8F", linewidth = 1.3) +
  geom_line(aes(y = F), color = "#E63946", linewidth = 1.3, linetype = "dashed") +
  # B10 annotation
  geom_vline(xintercept = b10, linetype = "dotted", color = "#E76F51") +
  geom_hline(yintercept = 0.90, linetype = "dotted", color = "#E76F51") +
  annotate("text", x = b10 + 0.1, y = 0.60,
           label = paste0("B10 = ", round(b10, 1), " yrs\n(10% depleted)"),
           hjust = 0, size = 3.5, color = "#E76F51") +
  # B50 annotation
  geom_vline(xintercept = b50, linetype = "dotted", color = "#264653") +
  annotate("text", x = b50 + 0.1, y = 0.25,
           label = paste0("Median = ", round(b50, 1), " yrs"),
           hjust = 0, size = 3.5, color = "#264653") +
  scale_y_continuous(labels = percent, limits = c(0, 1)) +
  scale_x_continuous(breaks = 0:12) +
  labs(
    title    = "ICD Battery Reliability and Failure Curves",
    subtitle = "Green = Reliability S(t)  |  Red dashed = Cumulative failures F(t)",
    x        = "Device Age (years)",
    y        = "Probability"
  ) +
  theme_minimal(base_size = 13)

Distribution of Replacement Surgeries Over Time

# Density = expected replacements per unit time across the cohort
pdf_df <- data.frame(
  t   = t_icd,
  pdf = n_devices * dweibull(t_icd, shape = k_icd, scale = lam_icd)
)

ggplot(pdf_df, aes(x = t, y = pdf)) +
  geom_area(fill = "#4E9AF1", alpha = 0.3) +
  geom_line(color = "#4E9AF1", linewidth = 1.3) +
  geom_vline(xintercept = b10,  linetype = "dashed", color = "#E76F51") +
  geom_vline(xintercept = mttf, linetype = "dashed", color = "#264653") +
  annotate("text", x = b10  - 0.1, y = 55, label = paste0("B10\n", round(b10, 1), "y"),
           hjust = 1, size = 3.5, color = "#E76F51") +
  annotate("text", x = mttf + 0.1, y = 55, label = paste0("MTTF\n", round(mttf, 1), "y"),
           hjust = 0, size = 3.5, color = "#264653") +
  labs(
    title    = "Projected ICD Replacement Surgery Load Over Time",
    subtitle = paste0("Cohort of ", n_devices, " devices | k = ", round(k_icd, 2),
                      " confirms wear-out failure (k > 1)"),
    x        = "Device Age (years)",
    y        = "Expected Replacements per Year"
  ) +
  theme_minimal(base_size = 13)

Hazard of Battery Depletion

haz_icd <- data.frame(
  t      = t_icd[-1],
  hazard = dweibull(t_icd[-1], shape = k_icd, scale = lam_icd) /
           pweibull(t_icd[-1], shape = k_icd, scale = lam_icd, lower.tail = FALSE)
)

ggplot(haz_icd, aes(x = t, y = hazard)) +
  geom_line(color = "#E63946", linewidth = 1.3) +
  labs(
    title    = "Hazard of ICD Battery Depletion Over Time",
    subtitle = "Steeply increasing hazard (k > 1) — typical wear-out failure pattern",
    x        = "Device Age (years)",
    y        = "Hazard h(t)"
  ) +
  theme_minimal(base_size = 13)

Clinical takeaway: With B10 life at 4.4 years, elective replacement interval (ERI) alerts should activate before this threshold. The steeply rising hazard (\(k \approx 4.5\)) means that once batteries start depleting in a cohort, the surgical workload accumulates rapidly — informing how many electrophysiology slots to reserve.


Comparison Across Examples

comparison <- data.frame(
  Example        = c("NSCLC Survival", "HF Readmission", "ICD Battery"),
  `Shape k`      = c("~1.6", "~0.7", "~4.5"),
  `Hazard shape` = c("Increasing", "Decreasing", "Steeply increasing"),
  `Clinical meaning` = c(
    "Death risk grows as disease progresses",
    "Readmission risk highest right after discharge",
    "Depletion risk rises sharply with device age"
  ),
  `Key output`   = c("Median OS, HTA extrapolation", "B10 = 10% readmitted by day X", "B10 life, MTTF, surgical load")
)

knitr::kable(comparison, col.names = c("Example", "Shape k", "Hazard", "Clinical Meaning", "Key Output"),
             caption = "Summary of Weibull Applications in Clinical Medicine")
Summary of Weibull Applications in Clinical Medicine
Example Shape k Hazard Clinical Meaning Key Output
NSCLC Survival ~1.6 Increasing Death risk grows as disease progresses Median OS, HTA extrapolation
HF Readmission ~0.7 Decreasing Readmission risk highest right after discharge B10 = 10% readmitted by day X
ICD Battery ~4.5 Steeply increasing Depletion risk rises sharply with device age B10 life, MTTF, surgical load

Session Info

sessionInfo()
## R version 4.5.3 (2026-03-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Phoenix
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_1.4.0   MASS_7.3-65    survival_3.8-6 tidyr_1.3.2    dplyr_1.2.1   
## [6] ggplot2_4.0.2 
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-4       gtable_0.3.6       jsonlite_2.0.0     compiler_4.5.3    
##  [5] tidyselect_1.2.1   jquerylib_0.1.4    splines_4.5.3      yaml_2.3.12       
##  [9] fastmap_1.2.0      lattice_0.22-9     R6_2.6.1           labeling_0.4.3    
## [13] generics_0.1.4     knitr_1.51         tibble_3.3.1       bslib_0.10.0      
## [17] pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.7        cachem_1.1.0      
## [21] xfun_0.56          sass_0.4.10        S7_0.2.1           cli_3.6.5         
## [25] withr_3.0.2        magrittr_2.0.4     digest_0.6.39      grid_4.5.3        
## [29] rstudioapi_0.18.0  lifecycle_1.0.5    vctrs_0.7.1        evaluate_1.0.5    
## [33] glue_1.8.0         farver_2.1.2       rmarkdown_2.30     purrr_1.2.1       
## [37] tools_4.5.3        pkgconfig_2.0.3    htmltools_0.5.9