The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Size: px

Start display at page:

Download "The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13"

Aubrey Poole
6 years ago
Views:

1 The Jeffreys Prior Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His book Theory of Probability, which first appeared in 1939,

2 Sir Harold Jeffreys English mathematician, statistician, geophysicist, and astronomer His book Theory of Probability, which first appeared in 1939, played an important role in the revival of the Bayesian view of probability. Harold Jeffreys ( ) Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

3 Information Matrix Suppose data X has density f(x θ) which is twice differentiable in the coordinates of the unknown parameter θ = (θ 1,..., θ p ). Expected Fisher Information: p p matrix I with j, k entry [ 2 ] I j,k (θ) = E X θ log f(x θ) θ j θ k n [ 2 ] if indpt. = E Xi θ log f i (x i θ) θ i=1 j θ k [ 2 ] if i.i.d. = ne X1 θ log f(x 1 θ) θ j θ k Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

4 Jeffreys-rule prior Derived to have invariance under 1-1 transformations Jeffreys proposed p(θ) det [I(θ)], where I(θ) is Fisher information. This is called the Jeffreys-rule prior (often incorrectly shortened to the Jeffreys prior). If ψ is 1-1 transformation of θ: where J ij = θ i / ψ j. Since p(ψ) = p(θ)[detj], I (ψ) = JI(θ)J T = deti = [deti](detj) 2 Jeffreys prior p(ψ) det [I (ψ)] is invariant. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

5 Example: Jeffreys Prior for Bernoulli Data The Jeffreys prior for iid Bernoulli data X i θ iid Ber(θ) is p(θ) 1 θ(1 θ) = Beta(1/2, 1/2) Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

6 Note: Jeffreys would often deviate from using the Jeffreys-rule prior (so Jeffreys prior is the name best used for the priors he recommended). For a Poisson mean λ, he curiously recommended instead of the Jeffreys-rule prior p(λ) 1/λ, p(λ) 1/ λ; and even though p(λ) 1/λ doesnt work when x = 0 is observed. For normal problems with unknown variance, he recommended the independent Jeffreys prior. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

7 Example: Normal Data Suppose observations X i µ, σ 2 iid N(µ, σ 2 ), for i = 1, 2,..., n. Denote parameters θ = (µ, σ 2 ), then Fisher Information Jeffreys-rule prior: I(θ) = Independent Jeffreys prior: ( n/σ n/(2σ 4 ) p(θ) (1/σ 6 ) 1/2 = 1/σ 3 p(θ) 1/σ 2 (ultimately recommended by Jeffreys) ) Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

8 Strengths of the Jeffreys-rule Prior Almost always defined Invariant to transformations Almost always yields a proper posterior Mixture models are the main known examples in which improper posteriors result. Great for one-dimensional parameters It arises from many other approaches, such as minimum description length and various entropy approaches. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

9 Weaknesses of the Jeffreys-rule Prior It depends on the statistical model, and hence appears to violate the likelihood principle; but all reasonable objective Bayesian theories do this. Example: Suppose X is Negative Binomial, i.e. p(x r, θ) = Γ(x + r) Γ(x + 1)Γ(r) θr (1 θ) x, for x = 0, 1,... 1 The Jeffreys-rule prior is p(θ). θ(1 θ) It requires the Fisher information to exist. Example of non-existence: Uniform [0, θ] distribution. Often fails badly for higher-dimensional parameters Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

10 Example of Failure the Neyman-Scott Problem Suppose we observe X ij N(µ i, σ 2 ), i = 1,..., n; j = 1, 2 Defining X i = (X i1 + X i2 )/2, X = ( X 1,..., X n ), S 2 = n i=1 (X i1 X i2 ) 2, and µ = (µ 1,..., µ n ), the likelihood function p(x µ, σ 2 ) 1 exp [ 1σ )] ( σ X 2n 2 µ 2 + S2 4 Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

11 The Fisher information matrix is I(µ, σ 2 ) = diag{2/σ 2,..., 2/σ 2, n/σ 4 }, the last entry corresponding to the information for σ 2. Jeffreys rule prior: p(µ, σ 2 ) = 1/σ (n+2) Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

12 Jeffreys rule prior: p(µ, σ 2 ) = 1/σ (n+2) is bad. For instance, the marginal posterior for σ 2 is ) p(σ 2 1 X) = (σ 2 exp ( S2 ) n+1 4σ 2. Here S 2 depends on X. Posterior mean E(σ 2 X) = S 2 4(n 1) Suppose data X are generated from the sampling distribution under the true parameter σ 2 0. Because the frequentist density of S2 /(2σ 2 0 ) is Chi-Squared with n degrees of freedom, as n, the posterior mean S 2 /[4(n 1)] σ 2 0 /2. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

13 Thus under the Jeffreys rule prior, the posterior distribution of σ 2 is inconsistent, i.e., it does not concentrate around the true value σ 2 0. The independent Jeffreys prior is fine here. There also exist cases where the independent Jeffreys prior doesn t work. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

STAT J535: Chapter 5: Classes of Bayesian Priors

STAT J535: Chapter 5: Classes of Bayesian Priors David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 The Bayesian Prior A prior distribution must be specified in a Bayesian analysis. The choice