The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Similar documents
STAT J535: Chapter 5: Classes of Bayesian Priors

Linear Models A linear model is defined by the expression

Overall Objective Priors

Remarks on Improper Ignorance Priors

1. Fisher Information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

10. Exchangeability and hierarchical models Objective. Recommended reading

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

STAT 425: Introduction to Bayesian Analysis

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

A Very Brief Summary of Bayesian Inference, and Examples

Bayesian Inference: Concept and Practice

One-parameter models

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

STAT 730 Chapter 4: Estimation

Statistical Theory MT 2007 Problems 4: Solution sketches

Introduction to Machine Learning

A Very Brief Summary of Statistical Inference, and Examples

Predictive Distributions

Conjugate Priors, Uninformative Priors

A Discussion of the Bayesian Approach

Confidence Intervals for Normal Data Spring 2014

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Classical and Bayesian inference

simple if it completely specifies the density of x

STAT215: Solutions for Homework 2

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Statistical Inference

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Statistical Theory MT 2006 Problems 4: Solution sketches

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

ACCOUNTING FOR INPUT-MODEL AND INPUT-PARAMETER UNCERTAINTIES IN SIMULATION. < May 22, 2006

Data Analysis and Uncertainty Part 2: Estimation

Chapter 8: Sampling distributions of estimators Sections

Other Noninformative Priors

Math 494: Mathematical Statistics

Lecture 6. Prior distributions

Invariant HPD credible sets and MAP estimators

Chapter 5 continued. Chapter 5 sections

1 Data Arrays and Decompositions

Probability Distributions Columns (a) through (d)

Introduction to Bayesian Methods

Stat 5102 Final Exam May 14, 2015

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

1 Introduction. P (n = 1 red ball drawn) =

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

E. Santovetti lesson 4 Maximum likelihood Interval estimation

Multiparameter models (cont.)

STATISTICS SYLLABUS UNIT I

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Maximum likelihood estimation

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Divergence Based priors for the problem of hypothesis testing

A Bayesian Treatment of Linear Gaussian Regression

Fisher Information & Efficiency

Completeness. On the other hand, the distribution of an ancillary statistic doesn t depend on θ at all.

CS 340 Fall 2007: Homework 3

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

STAT 830 Bayesian Estimation

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian Inference for Normal Mean

Probability and Distributions

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Primer on statistics:

Likelihood inference in the presence of nuisance parameters

Introduction to Bayesian Inference: Supplemental Topics

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Markov Chain Monte Carlo (MCMC)

Objective Bayesian Statistical Inference

Intro to Bayesian Methods

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Stat 5101 Lecture Notes

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Maximum Likelihood Estimation

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

g-priors for Linear Regression

CPSC 540: Machine Learning

Statistical Inference

COS513 LECTURE 8 STATISTICAL CONCEPTS

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Introduction to Bayesian Inference: Supplemental Topics

Foundations of Statistical Inference

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Chapter 8.8.1: A factorization theorem

Fitting Narrow Emission Lines in X-ray Spectra

Machine Learning, Fall 2012 Homework 2

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

An Extended BIC for Model Selection

Lecture 2: Priors and Conjugacy

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

Statistics 135 Fall 2007 Midterm Exam

Transcription:

The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 1 / 13

Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His book Theory of Probability, which first appeared in 1939, played an important role in the revival of the Bayesian view of probability. Harold Jeffreys (1891-1989) Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 2 / 13

Information Matrix Suppose data X has density f(x θ) which is twice differentiable in the coordinates of the unknown parameter θ = (θ 1,..., θ p ). Expected Fisher Information: p p matrix I with j, k entry [ 2 ] I j,k (θ) = E X θ log f(x θ) θ j θ k n [ 2 ] if indpt. = E Xi θ log f i (x i θ) θ i=1 j θ k [ 2 ] if i.i.d. = ne X1 θ log f(x 1 θ) θ j θ k Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 3 / 13

Jeffreys-rule prior Derived to have invariance under 1-1 transformations Jeffreys proposed p(θ) det [I(θ)], where I(θ) is Fisher information. This is called the Jeffreys-rule prior (often incorrectly shortened to the Jeffreys prior). If ψ is 1-1 transformation of θ: where J ij = θ i / ψ j. Since p(ψ) = p(θ)[detj], I (ψ) = JI(θ)J T = deti = [deti](detj) 2 Jeffreys prior p(ψ) det [I (ψ)] is invariant. Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 4 / 13

Example: Jeffreys Prior for Bernoulli Data The Jeffreys prior for iid Bernoulli data X i θ iid Ber(θ) is p(θ) 1 θ(1 θ) = Beta(1/2, 1/2) Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 5 / 13

Note: Jeffreys would often deviate from using the Jeffreys-rule prior (so Jeffreys prior is the name best used for the priors he recommended). For a Poisson mean λ, he curiously recommended instead of the Jeffreys-rule prior p(λ) 1/λ, p(λ) 1/ λ; and even though p(λ) 1/λ doesnt work when x = 0 is observed. For normal problems with unknown variance, he recommended the independent Jeffreys prior. Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 6 / 13

Example: Normal Data Suppose observations X i µ, σ 2 iid N(µ, σ 2 ), for i = 1, 2,..., n. Denote parameters θ = (µ, σ 2 ), then Fisher Information Jeffreys-rule prior: I(θ) = Independent Jeffreys prior: ( n/σ 2 0 0 n/(2σ 4 ) p(θ) (1/σ 6 ) 1/2 = 1/σ 3 p(θ) 1/σ 2 (ultimately recommended by Jeffreys) ) Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 7 / 13

Strengths of the Jeffreys-rule Prior Almost always defined Invariant to transformations Almost always yields a proper posterior Mixture models are the main known examples in which improper posteriors result. Great for one-dimensional parameters It arises from many other approaches, such as minimum description length and various entropy approaches. Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 8 / 13

Weaknesses of the Jeffreys-rule Prior It depends on the statistical model, and hence appears to violate the likelihood principle; but all reasonable objective Bayesian theories do this. Example: Suppose X is Negative Binomial, i.e. p(x r, θ) = Γ(x + r) Γ(x + 1)Γ(r) θr (1 θ) x, for x = 0, 1,... 1 The Jeffreys-rule prior is p(θ). θ(1 θ) It requires the Fisher information to exist. Example of non-existence: Uniform [0, θ] distribution. Often fails badly for higher-dimensional parameters Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 9 / 13

Example of Failure the Neyman-Scott Problem Suppose we observe X ij N(µ i, σ 2 ), i = 1,..., n; j = 1, 2 Defining X i = (X i1 + X i2 )/2, X = ( X 1,..., X n ), S 2 = n i=1 (X i1 X i2 ) 2, and µ = (µ 1,..., µ n ), the likelihood function p(x µ, σ 2 ) 1 exp [ 1σ )] ( σ X 2n 2 µ 2 + S2 4 Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 10 / 13

The Fisher information matrix is I(µ, σ 2 ) = diag{2/σ 2,..., 2/σ 2, n/σ 4 }, the last entry corresponding to the information for σ 2. Jeffreys rule prior: p(µ, σ 2 ) = 1/σ (n+2) Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 11 / 13

Jeffreys rule prior: p(µ, σ 2 ) = 1/σ (n+2) is bad. For instance, the marginal posterior for σ 2 is ) p(σ 2 1 X) = (σ 2 exp ( S2 ) n+1 4σ 2. Here S 2 depends on X. Posterior mean E(σ 2 X) = S 2 4(n 1) Suppose data X are generated from the sampling distribution under the true parameter σ 2 0. Because the frequentist density of S2 /(2σ 2 0 ) is Chi-Squared with n degrees of freedom, as n, the posterior mean S 2 /[4(n 1)] σ 2 0 /2. Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 12 / 13

Thus under the Jeffreys rule prior, the posterior distribution of σ 2 is inconsistent, i.e., it does not concentrate around the true value σ 2 0. The independent Jeffreys prior is fine here. There also exist cases where the independent Jeffreys prior doesn t work. Yingbo Li (Clemson) The Jeffreys Prior MATH 9810 13 / 13