A primer on Bayesian statistics, with an application to mortality rate estimation

Similar documents
CSC321 Lecture 18: Learning Probabilistic Models

Computational Perception. Bayesian Inference

Introduction to Bayesian Inference

Conjugate Priors, Uninformative Priors

(1) Introduction to Bayesian statistics

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian RL Seminar. Chris Mansley September 9, 2008

10. Exchangeability and hierarchical models Objective. Recommended reading

Probability and Estimation. Alan Moses

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Inference. Introduction

Time Series and Dynamic Models

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

CS 361: Probability & Statistics

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Analysis (Optional)

Bayesian Inference. p(y)

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Modeling Environment

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Computational Cognitive Science

Schrödinger in Nature and the Greeks

Computational Cognitive Science

CS 361: Probability & Statistics

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Bayesian Inference: Posterior Intervals

Bayesian Models in Machine Learning

Introduction to Bayesian Statistics

Bayesian SAE using Complex Survey Data Lecture 1: Bayesian Statistics

What are the Findings?

The subjective worlds of Frequentist and Bayesian statistics

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

PMR Learning as Inference

Lecture 2: Priors and Conjugacy

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Introduction to Machine Learning

Principles of Bayesian Inference

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction into Bayesian statistics

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Parameter Learning With Binary Variables

CS540 Machine learning L9 Bayesian statistics

Principles of Bayesian Inference

Bayesian Methods: Naïve Bayes

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

Probabilistic Reasoning

COMP90051 Statistical Machine Learning

Recursive Estimation

Principles of Bayesian Inference

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Review of Probabilities and Basic Statistics

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Intro to Bayesian Methods

Hierarchical Models & Bayesian Model Selection

Chapter 5. Bayesian Statistics

Probabilistic Learning

Bayesian Inference and MCMC

INTRODUCTION TO BAYESIAN ANALYSIS

STAT J535: Introduction

2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling

Introduction to Bayesian inference

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

CPSC 540: Machine Learning

Data Analysis and Uncertainty Part 2: Estimation

A Discussion of the Bayesian Approach

Using Probability to do Statistics.

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Discrete Binary Distributions

Principles of Bayesian Inference

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Advanced Probabilistic Modeling in R Day 1

Introduction to Probabilistic Machine Learning

LEARNING WITH BAYESIAN NETWORKS

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Lecture 2: Conjugate priors

MCMC notes by Mark Holder

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

CS 4649/7649 Robot Intelligence: Planning

an introduction to bayesian inference

Lecture 6: Model Checking and Selection

Bayesian Modeling, Inference, Prediction and Decision-Making

David Giles Bayesian Econometrics

2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach.

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Frequentist Statistics and Hypothesis Testing Spring

Neutral Bayesian reference models for incidence rates of (rare) clinical events

The Metropolis-Hastings Algorithm. June 8, 2012

The Central Limit Theorem

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Transcription:

A primer on Bayesian statistics, with an application to mortality rate estimation Peter off University of Washington

Outline Subjective probability Practical aspects Application to mortality rate estimation Summary

Probability does not exist he abandonment of superstitious beliefs about the existence of Phlogiston, the Cosmic Ether, Absolute Space and ime,... or of Fairies and Witches, was an essential step along the road to scientific thinking. Probability too, if regarded as something endowed with some kind of objective existence is no less a misleading misconception. Probabilistic reasoning - always to be understood as subjective - merely stems from our being uncertain about something. It makes no difference whether the uncertainty relates to an unforeseeable future, or to an unnoticed past, or to a past doubtfully reported...he only relevant thing is uncertainty - the extent of our own knowledge and ignorance. he actual fact of whether or not the events considered are in some sense determined, or known by other people... is of no consequence. de Finetti.

he canonical coin example: Suppose I am about to flip a coin. Are there correct versions of Pr me(heads)? Pr you(heads)? Now it lands in my hand. I see it and you don t. Pr me(heads) = 1 if I see heads, 0 if I see tails. Pr you(heads) =? 1 if I see heads, 0 if I see tails.

What happens in a coin toss? angular momentum linear momentum

Subjective information Now consider Pr(heads) Pr(heads linear momentum, angular momentum) Pr(heads linear momentum, angular momentum, knowledge of physics) Pr me (heads coin has been flipped) Pr you (heads coin has been flipped) de Finetti and others argued that probability, if it is to have any meaning at all, is a measure of your information about an uncertain event. Information about an event may vary from person to person, so there is no correct or objective probability, only subjective probability.

Bayesian inference sample space: Y = all possible datasets, from which y is to be sampled. parameter space: Θ = all possible parameter values, from which we hope to identify the truth. Bayesian learning: begins with joint beliefs about y and θ, expressed in terms of probabilities. 1. For each θ Θ, our prior distribution p(θ) describes our belief that θ represents the true population characteristics. 2. For each θ Θ and y Y, our sampling model p(y θ) describes our belief that y would be the outcome of our study if we knew θ to be true.

Bayesian updating Once we obtain the data y, the last step is to update our beliefs about θ: 3. For each θ Θ, our posterior distribution p(θ y) describes our belief that θ is the true value, having observed dataset y. he posterior distribution is obtained from the prior distribution and sampling model via Bayes rule: p(θ y) = R p(y θ)p(θ) Θ p(y θ)p( θ) d θ Note 1: Bayes rule does not tell us what our beliefs should be, it tells us how they should change after seeing new information. Note 2: Bayes rule is normative, not necessarily descriptive of actual learning.

Bayesian inference and Bayesian methods Bayesian inference is the process of optimal inductive learning via Bayes rule. Bayesian methods are data analysis tools based on Bayesian inference. Bayesian methods provide: parameter estimates with good statistical properties; parsimonious descriptions of observed data; predictions for missing data and forecasts of future data; a computational framework for model estimation, selection and validation.

Rare event estimation θ = percent of infected children in a school population y = number infected in a sample of size 20 Θ = [0, 1] Y = {0, 1,..., 20} Sampling model: Y θ binomial(20, θ) Prior distribution: θ beta(2, 20) probability 0.0 0.1 0.2 0.3 0.4 θ=0.05 θ=0.10 θ=0.20 0 5 10 15 0 5 10 15 20 number infected in the sample 0.0 0.2 0.4 0.6 0.8 1.0 percentage infected in the population E[θ] = 0.10 Pr(θ < 0.10) = 0.64 mode[θ] = 0.05 Pr(0.05 < θ < 0.20) = 0.66

Posterior inference Suppose we observe Y = 0: θ {Y = 0} beta(2, 40) 0 5 10 15 p(θ) p(θ y) E[θ Y = 0] = 0.05 mode[θ Y = 0] = 0.025 Pr(θ < 0.10 Y = 0) = 0.93 0.0 0.2 0.4 0.6 0.8 1.0 percentage infected in the population

Sensitivity analysis E[θ Y = y] = = = a + y a + b + n n y a + b + n n + a + b a a + b + n a + b n w + n ȳ + w w + n θ0, θ0 0.0 0.1 0.2 0.3 0.4 0.5 0.22 0.12 0.06 0.26 0.24 0.2 0.18 0.16 0.14 0.1 0.08 0.04 0.02 0.0 0.1 0.2 0.3 0.4 0.5 0.3 0.9 0.1 0.5 0.7 0.975 5 10 15 20 25 w 5 10 15 20 25 w

Alternative approaches Wald interval: ȳ ± 1.96 p ȳ(1 ȳ)/n has correct asymptotic frequentist coverage has about 80% frequentist coverage if n = 20 (depending on the true θ) has 0% coverage if y = 0 (unless θ = 0) Adjusted Wald interval q ˆθ ± 1.96 ˆθ(1 ˆθ)/n, where ˆθ = n n + 4 ȳ + 4 n + 4 1 2. has coverage probability much closer to the nominal level is related to a Bayesian interval based on θ beta(2, 2)

Mortality rate estimation (joint work with Jacob Markus) year cohort 1 2-1 0-5 y 1,1 y 1,2 y 1, 1 y 1, 6-10 y 2,1 y 2,2 y 1, 1 y 2, 11-15 y 1,1 y 1,2 y 1, 1 y 3,. y i,t = number reported deaths for the ith cohort in year t. It is likely that y i,t is an underestimate of the true number of deaths. Data: reported death counts: {y i,t, i = 1,..., m, t = 1,..., } census counts: {n 1,1,..., n m,1} and {n 1,,..., n m, } Based on these data, how do we describe our uncertainty about the true number of deaths? age specific death rates? reporting rates?

A simple model for a single cohort n t is the number of people at risk at the beginning of year t; d t is the number of mortalities in year t; y t is the number of reported mortalities in year t. n t > d t > y t y t binomial(d t, θ y ) d t binomial(n t, θ d ) n t p(n t n t 1, d t 1, ψ) y 1 y 2 y 1 y d 1 d 2 d 1 d n 1 n 2 n 1 n

Prior distributions and parameter estimation Y = {y t, t = 1,..., } N c = {n 1, n } D = {d t, t = 1,..., } N o = {n 2,..., n 1 } φ = {θ y, θ d, ψ} p(φ, D, N o N c, Y ) = p(d, No, Nc, Y φ)p(φ) R p(nc, Y φ)p(φ) dφ p(d, N o, N c, Y φ)p(φ) Need to specify the prior distribution p(φ) for φ = {θ y, θ d, ψ}.

Beta and gamma prior distributions Parameter spaces: θ y [0, 1] θ d [0, 1] ψ [0, ) We use beta distributions for θ y and θ d and gamma distributions for ψ: θ beta(θ 0, w): E[θ] = θ 0 V [θ] = θ 0(1 θ 0)/w ψ gamma(ψ 0, w): E[ψ] = ψ 0 V [ψ] = ψ 0/w p(θ) 0 5 10 20 30 p(ψ) 0 1 2 3 4 5 6 7 0.00 0.05 0.10 0.15 0.20 θ 0.8 0.9 1.0 1.1 1.2 ψ

Simulation study n 1 = 1, 000, 000 θ d = 0.025, θ y = 0.90, ψ = 0.95. n 10 = 499, 516 We center the priors for θ d and ψ around their correct values, but with w {10, 20, 40, 80, 160, 320} Posterior distributions of θ d, θ y, ψ {d 1,..., d 10} {n 1,..., n 10} are approximated using an MCMC algorithm.

Simulation results Unknown ψ θd 0.00 0.05 0.10 0.15 number of deaths 15000 20000 25000 30000 35000 1 2 3 4 5 6 7 8 9 10 year

Simulation results Known ψ θd 0.00 0.05 0.10 0.15 number of deaths 15000 20000 25000 30000 35000 1 2 3 4 5 6 7 8 9 10 year

Simulation results θ y 0.65 0.75 0.85 0.95 θ y 0.65 0.75 0.85 0.95

Multiple age cohorts n i,t = population of cohort i at time t d i,t = deaths in cohort i at time t y i,t = reported deaths in cohort i at time t 0 θ y = 0.90 ψ = 0.95 θ d = B @ θ 1 θ 2 θ 3 θ 2 θ 3 θ 4 θ 3 θ 4 θ 5 For estimation, we use priors centered around incorrect values: E[θ y ] = 0.95 E[ψ] = 1.00 E[θ d,i,t ] = 0.11... 1 C A

Simulated data Given n 66,1,..., n 75,1 and θ d, d i,t binomial(n i,t, θ d[i t+1] ) y i,t binomial(d i,t, θ y ) n i,t+1 Poisson(ψ[n i,t d i,t ]) Priors were centered around incorrect parameter values. pop/10^6 1.4 1.6 1.8 2.0 death rate 0.05 0.15 0.25 66 68 70 72 74 age cohort 70 75 80 age

Simulation results death rate 0.05 0.10 0.15 0.20 0.25 0.30 70 75 80 85 age

Simulation results θ y 0.65 0.75 0.85 0.95

Summary Bayesian inference uses probability to represent uncertainty Bayesian methods derived from Bayesian inference give stable estimates when data information is low allow for estimation in large stochastic systems accommodate missing data, different data sources Possibly a useful tool for combining sources of demographic information