Statistical Theory MT 2006 Problems 4: Solution sketches

Similar documents
Statistical Theory MT 2007 Problems 4: Solution sketches

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples

1 Introduction. P (n = 1 red ball drawn) =

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

Chapter 8: Sampling distributions of estimators Sections

STAT 425: Introduction to Bayesian Analysis

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 5. Bayesian Statistics

9 Bayesian inference. 9.1 Subjective probability

General Bayesian Inference I

Introduction to Bayesian Methods

Remarks on Improper Ignorance Priors

Department of Statistics

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

7. Estimation and hypothesis testing. Objective. Recommended reading

Foundations of Statistical Inference

Beta statistics. Keywords. Bayes theorem. Bayes rule

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Invariant HPD credible sets and MAP estimators

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

1 Hypothesis Testing and Model Selection

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 )

INTRODUCTION TO BAYESIAN ANALYSIS

Chapter 9: Hypothesis Testing Sections

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Lecture 6. Prior distributions

INTRODUCTION TO BAYESIAN METHODS II

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Foundations of Statistical Inference

Part III. A Decision-Theoretic Approach and Bayesian testing

Seminar über Statistik FS2008: Model Selection

Bayesian Inference. Chapter 2: Conjugate models

HPD Intervals / Regions

Integrated Objective Bayesian Estimation and Hypothesis Testing

Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Harrison B. Prosper. CMS Statistics Committee

Lecture 1: Introduction

Bayesian Inference: Posterior Intervals

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Qualifying Exam in Probability and Statistics.

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Fisher Information & Efficiency

Poisson CI s. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

A Very Brief Summary of Statistical Inference, and Examples

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

32. STATISTICS. 32. Statistics 1

Classical and Bayesian inference

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Stat 5102 Final Exam May 14, 2015

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Statistics Masters Comprehensive Exam March 21, 2003

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Stat 5101 Lecture Notes

7. Estimation and hypothesis testing. Objective. Recommended reading

Sampling Distributions

Interval Estimation. Chapter 9

Mathematical statistics

Predictive Distributions

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Lecture 13 and 14: Bayesian estimation theory

Statistics Ph.D. Qualifying Exam

Prior Choice, Summarizing the Posterior

On the Bayesianity of Pereira-Stern tests

10. Exchangeability and hierarchical models Objective. Recommended reading

Theory of Statistical Tests

STAT 830 Bayesian Estimation

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Parameter Estimation

Ch. 5 Hypothesis Testing

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Objective Bayesian Hypothesis Testing

Sampling Distributions

Bayesian Inference in Astronomy & Astrophysics A Short Course

Homework 1: Solution

Principles of Statistics

APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

32. STATISTICS. 32. Statistics 1

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

A Discussion of the Bayesian Approach

Statistical Inference

DA Freedman Notes on the MLE Fall 2003

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

2 Functions of random variables

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Bayesian Statistics from Subjective Quantum Probabilities to Objective Data Analysis

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Foundations of Statistical Inference

A GLOSSARY OF SELECTED STATISTICAL TERMS

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Transcription:

Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine the Jeffreys prior π J for θ, and discuss whether the scale-invariant prior π 0 (θ) = 1/θ might be preferrable as noninformative prior. Using π J as the reference measure, find the maximum entropy prior under the constraints that the prior mean and variance of θ are both 1. (Just write it in terms of the constraints λ 1 and λ from lectures, do not solve for these.) Repeat, for the reference measure π 0. (Again, just write it in terms of the constraints λ 1 and λ from lectures, do not solve for these.) Think of X as the number of events during an interval of time of length T in a Poisson process of rate λ. In this case, θ = λt. Use this to justify the use of π 0 as a noninformative prior. Solution: The Poisson probability mass function is f(x θ) = e The conjugate prior π(θ) is given by θ θx, x = 0, 1,... x! π(θ) e τ 0θ θ τ 1, θ > 0. For this to be a proper prior we require τ 0 > 0 and τ 1 > 1. The posterior density of θ given x is then given by π(θ x) e (τ 0+1)θ θ τ 1+x, θ > 0, which is a Gamma distribution with parameters α = τ 1 + x + 1 and β = τ 0 + 1. To obtain the Jeffreys prior we need the Fisher information I(θ). Note that so θ log f(x θ) = x θ, I(θ) = E It follows that the Jeffreys prior is ( θ ) = θ θ = 1 θ. π J (θ) θ 1, θ > 0. Comparison with scale-invariant prior: If your relative beliefs about any two parameter values θ 1 and θ depend only on their ratio θ 1 /θ, then you will be led to the scaleinvariant prior, π 0 = 1/θ. If on the other hand you insist that the prior distribution of 1

θ is invariant under parametric transformatinos then you are led to the Jeffreys prior. In this example, these two philosophies are incompatible. Maximum entropy prior: The constraints are E π (θ) = 1 and E π ((θ 1) ) = 1, and the reference measure is π ref (θ) = π J (θ) θ 1. By definition, the prior density that maximises the entropy relative to the reference density π ref and satisfies m constraints is π(θ) = π ref(θ)exp( m k=1 λ k g k (θ)) πref (θ)exp( m k=1 λ k g k (θ))dθ, and in our case m =, g 1 (θ) = θ, g (θ) = (θ 1), and so π(θ) θ 1 exp(λ1 θ + λ (θ 1) ). Similarly, when the reference measure is π ref = π 0, then the maximum entropy prior is π(θ) θ 1 exp(λ 1 θ + λ (θ 1) ). Poisson process: The probability density of the inter-arrival times in a Poisson process is f(x λ) = λe λx. For this density the Fisher information is λ ( log λ + λx) = 1 λ, so that the Jeffreys prior for λ is proportional to 1/λ. Since θ = λt and T is a known constant, this implies that, from the perspective of inter-arrival times, the prior density for θ should also be proportional to 1/θ.. Consider a random sample of size n from a normal distribution with unknown mean θ and unknown variance φ, with the improper prior for (θ, φ). f(θ, φ) = φ 1, φ > 0 Calculate the joint posterior distribution and both marginal posteriors for (θ, φ). Calculate the 100(1 α)% HPD credible interval for θ in the posterior. Compare this with the classical confidence interval for θ. Solution: The joint posterior p.d.f. of θ and φ is proportional to e (x i θ) /(φ) (πφ) n/ 1 φ.

Let λ = 1, then, using the change of variable formula (Jacobian φ 1/λ ) the joint posterior p.d.f. of θ and λ is given by { π(θ, λ) exp λ } (xi θ) λ n/ 1. Now use the identity (xi θ) = (x i x) + n(x θ) to give { π(θ, λ) exp λ } (xi x) λ n/ 1 exp { λ } n(x θ). (1) Now integrate θ out, using that exp { λ } n(x θ) dθ = to give the marginal density π(λ) exp { λ (xi x) } π λn, λ (n 1)/ 1. It follows that λ has the Gamma density Gamma(α, β) with α = n 1 and β = 1 (xi x). Changing variables, with y = λ (x i x), we have π(y) = y (n 1)/ 1 e y/, y > 0. Γ(1/(n 1)) (n 1)/ In other words, the distribution of φ 1 (x i x) is χ n 1. From (1) the conditional posteriori distribution of θ given λ is N (x, (λn) 1 ), i.e. (θ x) λn N (0, 1). It follows that (a posteriori) (θ x) λn is independent of λ, so that (θ x) λn λ (x i x) /(n 1) t n 1, i.e. θ x (xi x) t n 1, n(n 1) the point being that this function of θ and x has the same distribution when viewed either as a random variable dependent on x with θ fixed or a random variable dependent on θ with x fixed. This key result can be obtained by integrating λ ourt of (1). Finally, let t α/ be the α/-quantile of the t n 1 -distribution, then the 100(1 α)% HPD credible interval is x ± t (xi x) α/ n(n 1), which is exactly the same as the classical 100(1 α)% confidence interval. However the interpretation is different (see lecture notes). 3

3. Suppose that x 1,..., x n is a random sample from a Poisson distribution with unknown mean θ. Two models for the prior distribution of θ are contemplated; π 1 (θ) = e θ, θ > 0, and π (θ) = e θ θ, θ > 0. (a) Calculate the the Bayes estimator of θ under both models, with quadratic loss function. (b) The prior probabilities of model 1 and model are assessed at probability 1/ each. Calculate the Bayes factor for H 0 :model 1 applies against H 1 :model applies. Solution: (a) We calculate π 1 (θ x) e nθ θ x i e θ = e (n+1)θ θ x i, which we recognize as Gamma( x i + 1, n + 1). expected value of the posterior distribution, xi + 1 n + 1. For model, The Bayes estimator is the π (θ x) e nθ θ x i e θ θ = e (n+1)θ θ x i +1, which we recognize as Gamma( x i +, n + 1). expected value of the posterior distribution, xi + n + 1. The Bayes estimator is the Note that the first model has greater weight for smaller values of θ, so the posterior distribution is shifted to the left. (b) The prior probabilities of model 1 and model are assessed at probability 1/ each. Then the Bayes factor is 0 0 e nθ θ x i e θ dθ e nθ θ x i e θ θdθ = Γ( x i + 1)/((n + 1) x i +1 ) Γ( x i + )/((n + 1) x i + ) n + 1 = xi + 1. Note that in this setting = 4 P (model 1 x) P (model x) P (model 1 x) 1 P (model 1 x),

so that Hence P (model 1 x) = (1 + B(x)) 1. P (model 1 x) = which is decreasing for x increasing. = ( 1 + ( xi + 1 n + 1 ) 1, 1 + x + 1 n 1 + 1 n 4. Let θ be a real-valued parameter and let f(x θ) be the probability density function of an observation x, given θ. The prior distribution of θ has a discrete component that gives probability β to the point null hypothesis H 0 : θ = θ 0. The remainder of the distribution if continuous, and conditional on θ θ 0, its density is g(θ). ) 1 (1) Derive an expression for π(θ 0 x), the posterior probability of H 0. () Derive the Bayes factor B(x) for the null hypothesis against the alternative. (3) Express π(θ 0 x) in terms of B(x). (4) Explain how you would use B(x) to construct a most powerful test of size α for H 0, against the alternative θ θ 0. Suppose that x 1,..., x n is a sample from a normal distribution with mean θ and variance v. Let H 0 : θ = 0, let β = 1/, and let g(θ) = (πw ) 1/ exp{ θ /(w )}, for < θ <. Show that, if the sample mean is observed to be 10(v/n) 1/, then (5) the most powerful test of size α = 0.05 will reject H 0 for any value of n; (6) the posterior probability of H 0 converges to 1, as n. Solution: (1) Following lectures, the posterior probability of θ 0 is where () The Bayes factor is P (θ=θ 0 x) P (θ θ 0 x) P (θ=θ 0 ) P (θ θ 0 ) π(θ 0 x) = βf(x θ 0 ) βf(x θ 0 ) + (1 β)m(x), m(x) = f(x θ)g(θ)dθ. βf(x θ 0 ) = βf(x θ 0 ) + (1 β)m(x) βf(x θ 0) + (1 β)m(x) / β (1 β)m(x) 1 β = f(x θ 0) m(x). 5

(3) From lectures, π(θ 0 x) = ( 1 + 1 β ) 1. βb(x) (4) The simple hypothesis θ = θ 0 can be tested against the simple hypothesis that the density of x is f(x θ)g(θ)dθ; i.e. H 0 : x f(x θ) and H 1 : x m(x). The Neyman-Pearson Lemma says that the most powerful test of size α rejects H 0 when f(x θ)/m(x) < C α, where C α is chosen such that So we reject when B(x) < C α. {x:f(x θ)/m(x)<c α} f(x θ)dx = α. (5) We now have x = (x 1,..., x n ). The statistic x is sufficient. Under H 0, x N(0, v/n) and under H 1, x N(0, v/n + w ). The most powerful test rejects when x/ v/n > C α. For a 5% test, C α is certainly less than, therefore with the observed value of x the test will reject H 0 at the 5% level. (6) However the Bayes factor is 1 v/n exp{ 10 v/n v/n } 1 v/n+w exp{ 10 v/n } (v/n+w as n. Therefore the posterior probability of H 0 converges to 1 (contrasting with the conclusion of the classical test). 6