Bayesian statistics, simulation and software

Similar documents
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Bayesian statistics, simulation and software

Beta statistics. Keywords. Bayes theorem. Bayes rule

Foundations of Statistical Inference

STAT 830 Bayesian Estimation

Bayesian statistics, simulation and software

A Very Brief Summary of Bayesian Inference, and Examples

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification


Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Computational Perception. Bayesian Inference


Discrete Binary Distributions

Bayesian Inference. p(y)

Lecture 2: Statistical Decision Theory (Part I)

Computer intensive statistical methods

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Lecture 2: Conjugate priors

Parameter Estimation

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Computer intensive statistical methods

Bayesian Inference: Posterior Intervals

Introduction to Machine Learning. Lecture 2

Statistics 135 Fall 2007 Midterm Exam

1. Fisher Information

Basic methods for simulation of random variables: 1. Inversion

Chapter 5. Bayesian Statistics

Data Analysis and Uncertainty Part 2: Estimation

Markov Chain Monte Carlo methods

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

Introduction to Bayesian Statistics 1

Introduction to Bayesian Methods

A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods

Hierarchical Models & Bayesian Model Selection

A few basics of credibility theory

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

IEOR165 Discussion Week 5

HPD Intervals / Regions

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

NEWCASTLE UNIVERSITY SCHOOL OF MATHEMATICS & STATISTICS SEMESTER /2013 MAS2317/3317. Introduction to Bayesian Statistics: Mid Semester Test

Classical and Bayesian inference

9 Bayesian inference. 9.1 Subjective probability

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Probability Review - Bayes Introduction

Inference for a Population Proportion

Stochastic population forecast: a supra-bayesian approach to combine experts opinions and observed past forecast errors

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

The Bayesian Paradigm

Checking for Prior-Data Conflict

Probability and Statistics qualifying exam, May 2015

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Applied Probability Models in Marketing Research: Introduction

STAT J535: Chapter 5: Classes of Bayesian Priors

General Bayesian Inference I

CS 361: Probability & Statistics

Model comparison. Christopher A. Sims Princeton University October 18, 2016

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

Bayes: All uncertainty is described using probability.

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Continuous random variables

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

MAS3301 Bayesian Statistics

Other Noninformative Priors

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Reading Material for Students

Using Probability to do Statistics.

One-parameter models

Bayesian Regression Linear and Logistic Regression

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

STAT:5100 (22S:193) Statistical Inference I

STAT215: Solutions for Homework 1

Time Series and Dynamic Models

Bayesian statistics, simulation and software

Introduction to Probabilistic Machine Learning

Statistics Masters Comprehensive Exam March 21, 2003

FIRST YEAR EXAM Monday May 10, 2010; 9:00 12:00am

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

STAT 425: Introduction to Bayesian Analysis

Parametric Techniques Lecture 3

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

CS 361: Probability & Statistics

Topic 16 Interval Estimation. The Bootstrap and the Bayesian Approach

Approximate Likelihoods

Linear Models A linear model is defined by the expression

Parametric Techniques

Statistical Inference

Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict

Markov Chain Monte Carlo and Applied Bayesian Statistics

Part III. A Decision-Theoretic Approach and Bayesian testing

TUTORIAL 8 SOLUTIONS #

Midterm Examination. STA 205: Probability and Measure Theory. Thursday, 2010 Oct 21, 11:40-12:55 pm

Transcription:

Module 3: Bayesian principle, binomial model and conjugate priors Department of Mathematical Sciences Aalborg University 1/14

Motivating example: Spelling correction (Adapted from Bayesian Data Analysis by Gelman et al.) Problem: Someone types radom. Question: What did they mean to type? Random? Ingredients: Data x: The observed word radom. Parameter of interest θ: The correct word. Comments: To solve this we need background information on which words are usually typed; an idea about how words are typically mistyped. 2/14

Bayesian idea Data/observation model: Conditional on θ (here the correct word), data x (here the typed word) is distributed according to a density π(x θ) L(θ; x) the likelihood. Prior: Prior knowledge (i.e. before collecting data) about θ is summaried by a density π(θ) the prior. Posterior: The updated knowledge about θ after collecting data: The conditional distribution of θ given data x is π(θ x) = π(x θ)π(θ) π(x) π(x θ)π(θ) ( posterior likelihood prior ). 3/14

Example: Prior Google provides the following prior probabilities for three candidate words: θ π(θ) random 7.60 10 5 radon 6.05 10 6 radom 3.12 10 7 Comments The relative high probability for the word radom is perhaps surprising: Name of Polish airshow and nickname for Polish hand gun. Probably, in the context of writing a scientific report these prior probabilities would look different. 4/14

Example: Likelihood Google s model of spelling and typing errors provides the following conditional probabilities/likelihoods: θ π(x = radom θ) random 0.00193 radon 0.000143 radom 0.975 Comments This is not a probability distribution! If one in fact intends to write radom this actually happens in 97.5% of cases. If one intends to write either random or radon this is rarely misspelled radom. 5/14

Example: Posterior Combining the prior and likelihood we obtain the posterior probabilities: π(θ x) = π(x θ)π(θ) π(x) π(x θ)π(θ) θ π(x = radom θ)π(θ) π(θ x = radom ) random 1.47 10 7 0.325 radon 8.65 10 10 0.002 radom 3.04 10 7 0.673 Conclusion With the given prior and likelihood the word radom is twice as likely as random. Criticism The posterior probability for radom may seem too high. Likelihood or prior to blame or both? Likelihood is perhaps OK in this case. Prior depends on context and hence might be wrong. 6/14

Binomial example Data model: Binomial, X B(n,p), n known. ( ) n π(x p) = p x (1 p) n x, x = 0,1,...,n, 0 p 1. x Prior: Beta distribution, π(p) Be(α,β), where we have to specify the parameters α > 0 and β > 0, and the Beta distribution has density/pdf { Γ(α)Γ(β) Γ(α+β) π(p) = pα 1 (1 p) β 1 for 0 p 1 0 otherwise. If α = β = 1, then π(p) = 1 for 0 p 1 (the uniform distribution). 7/14

Beta distribution: Examples α=1, β=1 α=0.5, β=0.5 0.0 0.4 0.8 0.0 1.0 2.0 α=2, β=4 α=2, β=0.5 0.0 1.0 2.0 0.0 1.0 2.0 3.0 Mean: E[p] = α α+β, Variance: Var[p] = αβ (α+β) 2 (α+β +1). 8/14

Binomial example cont. Data model:x B(n,p). Prior: π(p) Be(α,β), that is { Γ(α)Γ(β) Γ(α+β) π(p) = pα 1 (1 p) β 1 for 0 p 1 0 otherwise. Posterior: π(p x) π(x p)π(p) ( ) n = p x (1 p) n x x p x+α 1 (1 p) n x+β 1 Be(x+α,n x+β). Γ(α)Γ(β) Γ(α+β) pα 1 (1 p) β 1 9/14

Conjugate priors In the binomial example: Both prior and posterior were beta distributions! Very convenient! We say that the beta distribution is conjugate. Definition: Conjugate priors Let π(x θ) be the data model. A class Π of prior distributions for θ is said to be conjugate for π(x θ) if π(θ x) π(x θ)π(θ) Π whenever π(θ) Π. That is, prior and posterior are in the same class of distributions. Notice: Π should be a class of tractable distributions for this to be useful. 10/14

Posterior mean & variance Posterior π(p x) Be(x+α,n x+β). Posterior mean E[p x] = x+α (x+α)+(n x+β) = x+α α+β +n. If n max{α,β}, then E[p x] x n (the natural unbiased estimate). Posterior variance (x+α)(n x+β) Var[p x] = (x+α+n x+β) 2 (x+α+n x+β +1) (x+α)(n x+β) = (α+β +n) 2 (α+β +n+1) = ( x n + α n )(n x n + β n ) ) 2 (α+β +n+1) E[p(1 p)] 0 as n. α+β +n+1 ( α+β+n n 11/14

Example: Placentia Previa (PP) Question: Is the sex ratio different for PP births compared to normal births? Prior knowledge: 51.5% of new-borns are boys. Data: Of n = 980 cases of PP x=543 were boys (543/980=55.4%). Data model: X B(n,p). Prior: π(p) Be(α,β). Posterior: π(p x) Be(x+α,n x+β) = Be(543+α,437+β) How to choose α and β, and what difference does it make? 12/14

Placenta Previa: Beta priors and posteriors α=1, β=1 α=2.425, β=2.575 α=4.85, β=5.15 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 α=9.7, β=10.3 α=48.5, β=51.5 α=97, β=103 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 13/14

Conclusion (see the notes or Gelman et al. (2014)) For the different choices of priors, 95% posterior intervals (as defined later) for p do not contain 51.5%, which indicates that the probability for a male birth given placenta previa is higher than in the general population. 14/14