Bayesian statistics, simulation and software

Similar documents
Bayesian statistics, simulation and software

Chapter 8: Sampling distributions of estimators Sections

A Very Brief Summary of Bayesian Inference, and Examples

Bayesian statistics, simulation and software

Foundations of Statistical Inference

Chapter 5. Bayesian Statistics

Beta statistics. Keywords. Bayes theorem. Bayes rule

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

A Very Brief Summary of Statistical Inference, and Examples

1 Introduction. P (n = 1 red ball drawn) =

Introduction to Bayesian Methods

Department of Statistics

STAT J535: Chapter 5: Classes of Bayesian Priors

Chapter 4. Bayesian inference. 4.1 Estimation. Point estimates. Interval estimates

STAT 425: Introduction to Bayesian Analysis

The exponential family: Conjugate priors

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

COS513 LECTURE 8 STATISTICAL CONCEPTS

A tutorial on. Richard Boys

CSC 2541: Bayesian Methods for Machine Learning


STAT 830 Bayesian Estimation

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Classical and Bayesian inference

Bayesian Inference for Normal Mean


Parametric Techniques Lecture 3

Part III. A Decision-Theoretic Approach and Bayesian testing

Parametric Techniques

General Bayesian Inference I

Bayesian Inference: Concept and Practice

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Predictive Distributions

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Statistical Theory MT 2007 Problems 4: Solution sketches

An Introduction to Bayesian Linear Regression

Parameter Estimation

Chapter 9: Hypothesis Testing Sections

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Poisson CI s. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Foundations of Statistical Inference

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Statistical Inference: Maximum Likelihood and Bayesian Approaches

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

MAS3301 Bayesian Statistics Problems 5 and Solutions

9 Bayesian inference. 9.1 Subjective probability

Bayesian Inference: Posterior Intervals

1 Hypothesis Testing and Model Selection

MAS451: Principles of Statistics

Minimum Message Length Analysis of the Behrens Fisher Problem

Generalized Linear Models Introduction

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian Inference in a Normal Population

Bayesian Inference and Decision Theory

Other Noninformative Priors

Part 4: Multi-parameter and normal models

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Objective Prior for the Number of Degrees of Freedom of a t Distribution

Statistical Theory MT 2006 Problems 4: Solution sketches

Exponential Families

Data Analysis and Uncertainty Part 2: Estimation

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

SOLUTION FOR HOMEWORK 6, STAT 6331

Bayesian Inference in a Normal Population

Linear Models A linear model is defined by the expression

ST 740: Multiparameter Inference

MAS3301 Bayesian Statistics

Principles of Bayesian Inference

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Part 2: One-parameter models

Classical and Bayesian inference

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

TEORIA BAYESIANA Ralph S. Silva

Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict

CS 361: Probability & Statistics

Chapter 9: Interval Estimation and Confidence Sets Lecture 16: Confidence sets and credible sets

Problem Selected Scores

Confidence & Credible Interval Estimates

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

The linear model is the most fundamental of all serious statistical models encompassing:

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Remarks on Improper Ignorance Priors

Lecture 3. Univariate Bayesian inference: conjugate analysis

Invariant HPD credible sets and MAP estimators

Gibbs Sampling in Linear Models #2

Checking the Reliability of Reliability Models.

More on nuisance parameters

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

arxiv: v1 [stat.ap] 27 Mar 2015

GAUSSIAN PROCESS REGRESSION

Markov Chain Monte Carlo (MCMC)

Bernoulli and Poisson models

Interval Estimation. Chapter 9

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

ebay/google short course: Problem set 2

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Transcription:

Module 4: Normal model, improper and conjugate priors Department of Mathematical Sciences Aalborg University 1/25

Another example: normal sample with known precision Heights of some Copenhageners in 1995: Histogram of height 0 100 200 300 400 500 140 150 160 170 180 190 200 210 Assume: Heights are independent and normal: X i N(µ,τ), i = 1,...,n. For now: Assume precision τ is known. 2/25

Bayesian idea: Illustration Data model: X N(µ,0.01) (i.e. pop. sd = 10) Prior: We believe that the population mean is most likely between 160 cm and 200 cm: π(µ) N(180,0.01) (i.e. 160 µ 200 with prior probability 95%). Posterior: After observing a number of heights, our knowledge about µ is updated/summarised by the posterior (is it mostly about females?): µ 10 140 150 160 170 180 190 200 210 220 3/25

Normal example: One (!) observation Data model: X N(µ,τ) Assume: Precision τ known. Interest: The unknown mean µ. Prior: The prior for µ is specified as µ N(µ 0,τ 0 ). 1 τ0 (Within 1 sd = 1τ 0 from µ 0 with prior probability 68%.) µ 0 4/25

Normal example: Data density Data: One observation, X = x, from N(µ,τ): τ π(x µ) = ( 2π exp 12 ) τ(x µ)2 ( τ = 2π exp 1 2 τx2 1 ) 2 τµ2 +τµx ( exp 1 ) 2 τx2 +τµx where proportional to ( ) refers to that we consider µ as fixed whilst x is the argument for this conditional density. Notice the pattern inside the last exponential. If we instead pay attention to the likelihood, i.e. when we consider x as fixed, we get ( L(µ;x) exp 1 ) 2 τµ2 +τµx. 5/25

Normal example: Posterior density Posterior Likelihood Prior: π(µ x) L(µ; x)π(µ) τ0 = L(µ;x) ( 2π exp 12 ) τ 0(µ µ 0 ) 2 exp ( 12 τµ2 +τxµ 12 ) τ 0µ 2 +τ 0 µµ 0 ( = exp 1 ) 2 (τ +τ 0)µ 2 +(τx+τ 0 µ 0 )µ ( ) τx+τ0 µ 0 N,τ +τ 0. τ +τ 0 Notice: Both prior and posterior for µ are normal (conjugateness), and posterior precision = prior precision + likelihood precision. 6/25

Normal example: Posterior mean & variance The posterior: π(µ x) N Posterior expectation: µ 1 E[µ x] = τx+τ 0µ 0 τ +τ 0 ( τx+τ0µ 0 τ+τ 0,τ +τ 0 ). = τ τ +τ 0 x+ τ 0 τ +τ 0 µ 0. Weighted average of prior mean and observation x. µ 0 µ 1 x Posterior variance is smaller than prior variance: 1 τ 1 Var[µ x] = 1 τ +τ 0 < 1 τ 0 = Var(µ). 7/25

Posterior as prior or updating believes General setup: We are interested in parameter θ. Data model: π(x θ). Prior: Data: Posterior: π(θ). First observation x 1 π(x 1 θ). π(θ x 1 ) π(x 1 θ)π(θ). Assume we have a second observation x 2 π(x 2 θ) which is (conditional) independent of x 1 given θ. Then the new posterior is π(θ x 1,x 2 ) π(x 1,x 2 θ)π(θ) = π(x 1 θ)π(x 2 θ)π(θ) π(x 2 θ) }{{} π(θ x 1 ) }{{} likelihoodnew prior Notice: The posterior after observing x 1 is the prior before observing x 2. 8/25

Independent normal case Posterior mean and precision after one observation x 1 : µ 1 = x 1τ +µ 0 τ 0 τ +τ 0 and τ 1 = τ +τ 0. Next, µ 1 and τ 1 are prior mean and precision before observing x 2. Then posterior mean and precision after observing (conditionally independent) x 1 and x 2 are µ 2 = E[µ x 1,x 2 ] = x 2τ +µ 1 τ 1 τ +τ 1 = x 2τ +x 1 τ +µ 0 τ 0 = (x 1 +x 2 )τ +µ 0 τ 0, τ +τ +τ 0 2τ +τ 0 τ 2 = Var[µ x 1,x 2 ] = τ +τ 1 = 2τ +τ 0. This can easily be generalised... 9/25

Many independent normal observations Assume: X 1,X 2,...,X n iid N(µ,τ). τ > 0 is known. Prior π(µ) N(µ 0,τ 0 ). The posterior is then π(µ x 1,x 2,...,x n ) N ( τ i x ) i +τ 0 µ 0,nτ +τ 0. nτ +τ 0 10/25

Posterior mean: Sanity check The posterior is π(µ x 1,x 2,...,x n ) N ( n τ i=1 x ) i +τ 0 µ 0,nτ +τ 0. nτ +τ 0 Does the posterior mean make sense? Yes, because µ n := E[µ x 1,...,x n ] = τ n i=1 x i +τ 0 µ 0 nτ +τ 0 n = τn1 n i=1 x i +τ 0 µ 0 nτ +τ 0 = nτ τ 0 µ 0 nτ +τ 0 x+ nτ +τ 0 is a weighted average of sample average x = 1 n n i=1 x i and prior mean µ 0, so that for n large, µ n x, and so the choice of µ 0 and τ 0 is of little importance. NB: precision/knowledge τ n = nτ +τ 0 is ever more precise as n increases. 11/25

Heights in Copenhagen: Prior 0.00 0.05 0.10 0.15 0.20 Prior (µ 0 = 180 τ 0 = 0.01) 150 160 170 180 190 200 12/25

Heights in Copenhagen: Posterior One observation 0.00 0.05 0.10 0.15 0.20 Prior (µ 0 = 180, τ 0 = 0.01) Posterior (n = 1, τ = 0.01) Data average 150 160 170 180 190 200 13/25

Heights in Copenhagen: Posterior Ten observations 0.00 0.05 0.10 0.15 0.20 Prior (µ 0 = 180, τ 0 = 0.01) Posterior (n = 10, τ = 0.01) Data average 150 160 170 180 190 200 14/25

Heights in Copenhagen: Posterior 100 observations 0.00 0.05 0.10 0.15 0.20 Prior (µ 0 = 180, τ 0 = 0.01) Posterior (n = 100, τ = 0.01) Data average 150 160 170 180 190 200 15/25

Heights in Copenhagen: Posterior 2627 observations 0.00 0.05 0.10 0.15 0.20 Prior (µ 0 = 180, τ 0 = 0.01) Posterior (n = 2627, τ = 0.01) Data average 150 160 170 180 190 200 16/25

How to summarise the posterior π(θ x)? The posterior is usually summarised using one or more of the following: Plot of posterior density π(θ x). See previous slides. Posterior mean and variance/precision. Credible intervals. See next slide. Maximum a posteriori (MAP) estimate MAP(θ) = argmax θ π(θ x). 17/25

Credible intervals Methods for defining a suitable (e.g.) 95% credible interval/region for a one-dimensional (sub-)parameter θ include: Highest posterior density region (HPDI): Choosing the narrowest region(!) which contains θ with 95% posterior probability. Central posterior interval (CPI): The interval given by the 2.5% and 97.5% quantiles of the posterior distribution for θ. In case of the normal example, µ x 1,...,x n N(µ n,τ n ), so the 95% HPDI (= CPI) for µ is 1 µ n ±1.96. τ n µn µn 1.96 1 τn µn + 1.96 1 τn 18/25

Compared to classical confidence interval The classical 95% confidence interval for µ is 1 x±1.96 nτ. For CPI: Assuming the prior precision is τ 0 = 0, i.e. an infinite prior variance (an improper prior, but the posterior is well-defined/limiting case), then µ n = x, τ n = nτ, i.e. the same as the classical confidence interval but different interpretations!! 19/25

Conjugate priors In this example: Both prior and posterior were normal distributions! This is very convenient and we say that the prior and posterior distributions are conjugate distributions. Definition: Conjugate priors Let π(x θ) be the data model. A class Π of prior distributions for θ is said to be conjugate for π(x θ) if π(θ x) π(x θ)π(θ) Π whenever π(θ) Π. That is, the prior and posterior are in the same class of distributions. Notice: Π should be a class of tractable distributions for this to be useful. 20/25

Improper prior If we have no prior knowledge, we may (perhaps) be tempted to use a flat prior, i.e. π(θ) k (a positive constant). If θ R, this is an example of an improper prior, because π(θ)dθ k =. Perhaps problematic but may not be considered to be an issue if posterior is proper, i.e. if π(θ x)dθ π(x θ)π(θ)dθ <. Notice: If π(θ) 1, then MAP estimator = maximum likelihood estimator. 21/25

Normal example: Unknown precision, known mean iid Data model: X 1,X 2,...,X n N(µ,τ): ( ( τ )n 2 π(x τ) = exp 1 n ) 2π 2 τ (x i µ) 2. i=1 Prior: Gamma distribution: π(τ) Gamma(α,β), that is ( 1 π(τ) = β α Γ(α) τα 1 exp τ ), τ > 0. β Properties: E[τ] = αβ, Var[τ] = αβ 2. Shape parameter α > 0 and scale parameter β > 0 (see next slide). 22/25

Gamma distribution α=1, β=2 α=2, β=2 0.0 0.2 0.4 0.0 0.2 0.4 0 5 10 15 α=5, β=0.5 0 5 10 15 α=9, β=1 0.0 0.2 0.4 0.0 0.2 0.4 0 5 10 15 0 5 10 15 23/25

Normal example: Posterior precision Data model: X 1,X 2,...,X n iid N(µ,τ): Prior: π(τ) Gamma(α,β). Posterior: An easy calculation gives { π(τ x) Gamma n 2 +α, 1 2 Posterior mean and variance: E[τ x] = For large n, n 2 +α 1 2 n i=1 (x i µ) 2 + 1 β E[τ x] 1ˆσ 2 } 1 n (x i µ) 2 + 1. β i=1, Var[τ x] = where ˆσ 2 = 1 n n 2 +α ( 1 2 n i=1 (x i µ) 2 + 1 β n (x i µ) 2 i=1 is the maximum likelihood estimate of the variance. ) 2. 24/25

Known mean: Priors and posteriors α =0.1, β =0.1, n =10 α =0.1, β =0.1, n =25 0 40 80 Prior posterior σ^ 2 0 40 80 Prior posterior σ^ 2 0.00 0.02 0.04 0.06 0.08 0.10 α =0.1, β =0.1, n =100 0.00 0.02 0.04 0.06 0.08 0.10 α =0.1, β =0.1, n =2627 0 40 80 Prior posterior σ^ 2 0 40 80 Prior posterior σ^ 2 0.00 0.02 0.04 0.06 0.08 0.10 0.00 0.02 0.04 0.06 0.08 0.10 25/25