A primer on Bayesian statistics, with an application to mortality rate estimation

A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington

Outline Subjective probability Practical aspects Application to mortality rate estimation Summary

Probability does not exist he abandonment of superstitious beliefs about the existence of Phlogiston, the Cosmic Ether, Absolute Space and ime,... or of Fairies and Witches, was an essential step along the road to scientific thinking. Probability too, if regarded as something endowed with some kind of objective existence is no less a misleading misconception. Probabilistic reasoning - always to be understood as subjective - merely stems from our being uncertain about something. It makes no difference whether the uncertainty relates to an unforeseeable future, or to an unnoticed past, or to a past doubtfully reported...he only relevant thing is uncertainty - the extent of our own knowledge and ignorance. he actual fact of whether or not the events considered are in some sense determined, or known by other people... is of no consequence. de Finetti.

he canonical coin example: Suppose I am about to flip a coin. Are there correct versions of Pr me(heads)? Pr you(heads)? Now it lands in my hand. I see it and you don t. Pr me(heads) = 1 if I see heads, 0 if I see tails. Pr you(heads) =? 1 if I see heads, 0 if I see tails.

What happens in a coin toss? angular momentum linear momentum

Subjective information Now consider Pr(heads) Pr(heads linear momentum, angular momentum) Pr(heads linear momentum, angular momentum, knowledge of physics) Pr me (heads coin has been flipped) Pr you (heads coin has been flipped) de Finetti and others argued that probability, if it is to have any meaning at all, is a measure of your information about an uncertain event. Information about an event may vary from person to person, so there is no correct or objective probability, only subjective probability.

Bayesian inference sample space: Y = all possible datasets, from which y is to be sampled. parameter space: Θ = all possible parameter values, from which we hope to identify the truth. Bayesian learning: begins with joint beliefs about y and θ, expressed in terms of probabilities. 1. For each θ Θ, our prior distribution p(θ) describes our belief that θ represents the true population characteristics. 2. For each θ Θ and y Y, our sampling model p(y θ) describes our belief that y would be the outcome of our study if we knew θ to be true.

Bayesian updating Once we obtain the data y, the last step is to update our beliefs about θ: 3. For each θ Θ, our posterior distribution p(θ y) describes our belief that θ is the true value, having observed dataset y. he posterior distribution is obtained from the prior distribution and sampling model via Bayes rule: p(θ y) = R p(y θ)p(θ) Θ p(y θ)p( θ) d θ Note 1: Bayes rule does not tell us what our beliefs should be, it tells us how they should change after seeing new information. Note 2: Bayes rule is normative, not necessarily descriptive of actual learning.

Bayesian inference and Bayesian methods Bayesian inference is the process of optimal inductive learning via Bayes rule. Bayesian methods are data analysis tools based on Bayesian inference. Bayesian methods provide: parameter estimates with good statistical properties; parsimonious descriptions of observed data; predictions for missing data and forecasts of future data; a computational framework for model estimation, selection and validation.

Rare event estimation θ = percent of infected children in a school population y = number infected in a sample of size 20 Θ = [0, 1] Y = {0, 1,..., 20} Sampling model: Y θ binomial(20, θ) Prior distribution: θ beta(2, 20) probability 0.0 0.1 0.2 0.3 0.4 θ=0.05 θ=0.10 θ=0.20 0 5 10 15 0 5 10 15 20 number infected in the sample 0.0 0.2 0.4 0.6 0.8 1.0 percentage infected in the population E[θ] = 0.10 Pr(θ < 0.10) = 0.64 mode[θ] = 0.05 Pr(0.05 < θ < 0.20) = 0.66

Posterior inference Suppose we observe Y = 0: θ {Y = 0} beta(2, 40) 0 5 10 15 p(θ) p(θ y) E[θ Y = 0] = 0.05 mode[θ Y = 0] = 0.025 Pr(θ < 0.10 Y = 0) = 0.93 0.0 0.2 0.4 0.6 0.8 1.0 percentage infected in the population

Sensitivity analysis E[θ Y = y] = = = a + y a + b + n n y a + b + n n + a + b a a + b + n a + b n w + n ȳ + w w + n θ0, θ0 0.0 0.1 0.2 0.3 0.4 0.5 0.22 0.12 0.06 0.26 0.24 0.2 0.18 0.16 0.14 0.1 0.08 0.04 0.02 0.0 0.1 0.2 0.3 0.4 0.5 0.3 0.9 0.1 0.5 0.7 0.975 5 10 15 20 25 w 5 10 15 20 25 w

Alternative approaches Wald interval: ȳ ± 1.96 p ȳ(1 ȳ)/n has correct asymptotic frequentist coverage has about 80% frequentist coverage if n = 20 (depending on the true θ) has 0% coverage if y = 0 (unless θ = 0) Adjusted Wald interval q ˆθ ± 1.96 ˆθ(1 ˆθ)/n, where ˆθ = n n + 4 ȳ + 4 n + 4 1 2. has coverage probability much closer to the nominal level is related to a Bayesian interval based on θ beta(2, 2)

Mortality rate estimation (joint work with Jacob Markus) year cohort 1 2-1 0-5 y 1,1 y 1,2 y 1, 1 y 1, 6-10 y 2,1 y 2,2 y 1, 1 y 2, 11-15 y 1,1 y 1,2 y 1, 1 y 3,. y i,t = number reported deaths for the ith cohort in year t. It is likely that y i,t is an underestimate of the true number of deaths. Data: reported death counts: {y i,t, i = 1,..., m, t = 1,..., } census counts: {n 1,1,..., n m,1} and {n 1,,..., n m, } Based on these data, how do we describe our uncertainty about the true number of deaths? age specific death rates? reporting rates?

A simple model for a single cohort n t is the number of people at risk at the beginning of year t; d t is the number of mortalities in year t; y t is the number of reported mortalities in year t. n t > d t > y t y t binomial(d t, θ y ) d t binomial(n t, θ d ) n t p(n t n t 1, d t 1, ψ) y 1 y 2 y 1 y d 1 d 2 d 1 d n 1 n 2 n 1 n

Prior distributions and parameter estimation Y = {y t, t = 1,..., } N c = {n 1, n } D = {d t, t = 1,..., } N o = {n 2,..., n 1 } φ = {θ y, θ d, ψ} p(φ, D, N o N c, Y ) = p(d, No, Nc, Y φ)p(φ) R p(nc, Y φ)p(φ) dφ p(d, N o, N c, Y φ)p(φ) Need to specify the prior distribution p(φ) for φ = {θ y, θ d, ψ}.

Beta and gamma prior distributions Parameter spaces: θ y [0, 1] θ d [0, 1] ψ [0, ) We use beta distributions for θ y and θ d and gamma distributions for ψ: θ beta(θ 0, w): E[θ] = θ 0 V [θ] = θ 0(1 θ 0)/w ψ gamma(ψ 0, w): E[ψ] = ψ 0 V [ψ] = ψ 0/w p(θ) 0 5 10 20 30 p(ψ) 0 1 2 3 4 5 6 7 0.00 0.05 0.10 0.15 0.20 θ 0.8 0.9 1.0 1.1 1.2 ψ

Simulation study n 1 = 1, 000, 000 θ d = 0.025, θ y = 0.90, ψ = 0.95. n 10 = 499, 516 We center the priors for θ d and ψ around their correct values, but with w {10, 20, 40, 80, 160, 320} Posterior distributions of θ d, θ y, ψ {d 1,..., d 10} {n 1,..., n 10} are approximated using an MCMC algorithm.

Simulation results Unknown ψ θd 0.00 0.05 0.10 0.15 number of deaths 15000 20000 25000 30000 35000 1 2 3 4 5 6 7 8 9 10 year

Simulation results Known ψ θd 0.00 0.05 0.10 0.15 number of deaths 15000 20000 25000 30000 35000 1 2 3 4 5 6 7 8 9 10 year

Simulation results θ y 0.65 0.75 0.85 0.95 θ y 0.65 0.75 0.85 0.95

Multiple age cohorts n i,t = population of cohort i at time t d i,t = deaths in cohort i at time t y i,t = reported deaths in cohort i at time t 0 θ y = 0.90 ψ = 0.95 θ d = B @ θ 1 θ 2 θ 3 θ 2 θ 3 θ 4 θ 3 θ 4 θ 5 For estimation, we use priors centered around incorrect values: E[θ y ] = 0.95 E[ψ] = 1.00 E[θ d,i,t ] = 0.11... 1 C A

Simulated data Given n 66,1,..., n 75,1 and θ d, d i,t binomial(n i,t, θ d[i t+1] ) y i,t binomial(d i,t, θ y ) n i,t+1 Poisson(ψ[n i,t d i,t ]) Priors were centered around incorrect parameter values. pop/10^6 1.4 1.6 1.8 2.0 death rate 0.05 0.15 0.25 66 68 70 72 74 age cohort 70 75 80 age

Simulation results death rate 0.05 0.10 0.15 0.20 0.25 0.30 70 75 80 85 age

Simulation results θ y 0.65 0.75 0.85 0.95

Summary Bayesian inference uses probability to represent uncertainty Bayesian methods derived from Bayesian inference give stable estimates when data information is low allow for estimation in large stochastic systems accommodate missing data, different data sources Possibly a useful tool for combining sources of demographic information