Statistical Theory MT 2007 Problems 4: Solution sketches

Similar documents
Statistical Theory MT 2006 Problems 4: Solution sketches

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples

1 Introduction. P (n = 1 red ball drawn) =

Chapter 8: Sampling distributions of estimators Sections

STAT 425: Introduction to Bayesian Analysis

Chapter 5. Bayesian Statistics

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences FINAL EXAMINATION, APRIL 2013

General Bayesian Inference I

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

INTRODUCTION TO BAYESIAN METHODS II

Beta statistics. Keywords. Bayes theorem. Bayes rule

9 Bayesian inference. 9.1 Subjective probability

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction to Bayesian Methods

Remarks on Improper Ignorance Priors

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

Foundations of Statistical Inference

7. Estimation and hypothesis testing. Objective. Recommended reading

Department of Statistics

Invariant HPD credible sets and MAP estimators

Foundations of Statistical Inference

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Mathematical statistics

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Chapter 9: Hypothesis Testing Sections

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Classical and Bayesian inference

INTRODUCTION TO BAYESIAN ANALYSIS

A Very Brief Summary of Statistical Inference, and Examples

1 Hypothesis Testing and Model Selection

Lecture 1: Introduction

Part III. A Decision-Theoretic Approach and Bayesian testing

Seminar über Statistik FS2008: Model Selection

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Parameter Estimation

Bayesian Inference. Chapter 2: Conjugate models

Answers to the 8th problem set. f(x θ = θ 0 ) L(θ 0 )

Theory of Statistical Tests

HPD Intervals / Regions

Statistics Masters Comprehensive Exam March 21, 2003

Qualifying Exam in Probability and Statistics.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Stat 5102 Final Exam May 14, 2015

Problem Selected Scores

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Bayesian Inference: Posterior Intervals

Fisher Information & Efficiency

Bayesian Inference: Concept and Practice

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Poisson CI s. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Stat 5101 Lecture Notes

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Lecture 6. Prior distributions

APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

DA Freedman Notes on the MLE Fall 2003

Foundations of Statistical Inference

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Prior Choice, Summarizing the Posterior

Integrated Objective Bayesian Estimation and Hypothesis Testing

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Hypothesis Testing: The Generalized Likelihood Ratio Test

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

Harrison B. Prosper. CMS Statistics Committee

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

32. STATISTICS. 32. Statistics 1

Objective Bayesian Hypothesis Testing

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Bayesian parameter estimation with weak data and when combining evidence: the case of climate sensitivity

10. Exchangeability and hierarchical models Objective. Recommended reading

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Statistical Data Analysis Stat 3: p-values, parameter estimation

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Exercises and Answers to Chapter 1

7. Estimation and hypothesis testing. Objective. Recommended reading

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS

Principles of Statistics

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Interval Estimation. Chapter 9

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Mathematical statistics

Course 4 Solutions November 2001 Exams

1. Fisher Information

COMP90051 Statistical Machine Learning

Lecture 2. (See Exercise 7.22, 7.23, 7.24 in Casella & Berger)

SOLUTION FOR HOMEWORK 6, STAT 6331

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Bayesian Statistics from Subjective Quantum Probabilities to Objective Data Analysis

Transcription:

Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the mixture form π(θ α, τ 0, τ 1 ) = α l π(θ τ (l) 0, τ (l) 1 ), for α 1 + + α l = 1, where the densities π(θ τ (l) 0, τ (l) 1 ) are members of the conjugate family associated with f(x θ). Show that after observing a sample x 1,..., x n, the posterior for θ also has a mixture form, and identify the (posterior) mixing probabilities and the (posterior) hyperparameters. Solution: Conjugate priors for the likelihood f(x θ) are of the form φ(θ τ) = g(θ) τ 0 exp{cφ(θ)τ 1 }/K(τ), where K(τ) = K(τ 1, τ ) is the normalising constant. After observing a sample x 1,..., x n the posterior density is π(θ x, τ) = g(θ) τ 0+H 0 exp{cφ(θ)(τ 1 + H 1 )}/K(τ + H), where H 0 = n and H 1 = h(x i ), and H = (H 0, H 1 ). If the prior has mixture form, then the posterior is proportional to α l π(θ τ (l) )g(θ) n exp{cφ(θ)h 1 } = α l g(θ) τ (l) +H 0 exp{cφ(θ)(τ (l) + H 1 )}/K(τ (l) ) = α l π(θ x, τ (l) )K(τ (l) + H)/K(τ (l) ). The normalising constant is the integral of this expression, and since each of the component densities, π(θ x, τ (l) ), has integral 1, the normalising constant is d α l K(τ (l) + H)/K(τ (l) ), so that the posterior density is where β l π(θ x, τ (l) ), β l = α lk(τ (l) + H)/K(τ (l) ) d α l K(τ (l) + H)/K(τ (l) ). 1

. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior family, and associate posterior distribution, for θ. Determine the Jeffreys prior π J for θ, and discuss whether the scale-invariant prior π 0 (θ) = 1/θ might be preferrable as noninformative prior. Using π J as the reference measure, find the maximum entropy prior under the constraints that the prior mean and variance of θ are both 1. (Just write it in terms of the constraints λ 1 and λ from lectures, do not solve for these.) Repeat, for the reference measure π 0. (Again, just write it in terms of the constraints λ 1 and λ from lectures, do not solve for these.) Solution: The Poisson probability mass function is f(x θ) = e θ θx, x = 0, 1,... x! An element π(θ) of the conjugate prior family is given by π(θ) e τ 0θ θ τ 1, θ > 0. For this to be a proper prior we require τ 0 > 0 and τ 1 > 1. The posterior density of θ given x is then given by π(θ x) e (τ 0+1)θ θ τ 1+x, θ > 0, which is a Gamma distribution with parameters α = τ 1 + x + 1 and β = τ 0 + 1. To obtain the Jeffreys prior we need the Fisher information I(θ). Note that so I(θ) = E It follows that the Jeffreys prior is θ log f(x θ) = x θ, ( log f(x θ) θ ) π J (θ) θ 1, θ > 0. = θ θ = 1 θ. Comparison with scale-invariant prior: If your relative beliefs about any two parameter values θ 1 and θ depend only on their ratio θ 1 /θ, then you will be led to the scaleinvariant prior, π 0 = 1/θ. If on the other hand you insist that the prior distribution of θ is invariant under parametric transformatinos then you are led to the Jeffreys prior. In this example, these two philosophies are incompatible. Maximum entropy prior: The constraints are E π (θ) = 1 and E π ((θ 1) ) = 1, and the reference measure is π ref (θ) = π J (θ) θ 1. By definition, the prior density that

maximises the entropy relative to the reference density π ref and satisfies m constraints is π(θ) = π ref(θ)exp( m k=1 λ k g k (θ)) πref (θ)exp( m k=1 λ k g k (θ))dθ, and in our case m =, g 1 (θ) = θ, g (θ) = (θ 1), and so π(θ) θ 1 exp(λ1 θ + λ (θ 1) ). Similarly, when the reference measure is π ref = π 0, then the maximum entropy prior is π(θ) θ 1 exp(λ 1 θ + λ (θ 1) ). Side remark: Poisson process: The probability density of the inter-arrival times in a Poisson process is f(x λ) = λe λx. For this density the Fisher information is λ ( log λ + λx) = 1 λ, so that the Jeffreys prior for λ is proportional to 1/λ. Since θ = λt and T is a known constant, this implies that, from the perspective of inter-arrival times, the prior density for θ should also be proportional to 1/θ. 3. Consider a random sample of size n from a normal distribution with unknown mean θ and unknown variance φ, with the improper prior for (θ, φ). f(θ, φ) = φ 1, φ > 0 Calculate the joint posterior distribution and both marginal posteriors for (θ, φ). Show that the 100(1 α)% HPD credible interval for θ in the posterior is x ± t (xi x) α/ n(n 1), where t α/ is the α/-quantile of the t n 1 -distribution. Compare this with the classical confidence interval for θ. Solution: The joint posterior p.d.f. of θ and φ is proportional to e (x i θ) /(φ) (πφ) n/ 1 φ. 3

Let λ = 1, then, using the change of variable formula (Jacobian φ 1/λ ) the joint posterior p.d.f. of θ and λ is given by { π(θ, λ) exp λ } (xi θ) λ n/ 1. Now use the identity (xi θ) = (x i x) + n(x θ) to give { π(θ, λ) exp λ } (xi x) λ n/ 1 exp { λ } n(x θ). (1) Now integrate θ out, using that exp { λ } n(x θ) dθ = to give the marginal density π(λ) exp { λ (xi x) } π λn, λ (n 1)/ 1. It follows that λ has the Gamma density Gamma(α, β) with α = n 1 and β = 1 (xi x). Changing variables, with y = λ (x i x), we have π(y) = y (n 1)/ 1 e y/, y > 0. Γ(1/(n 1)) (n 1)/ In other words, the distribution of φ 1 (x i x) is χ n 1. From (1) the conditional posteriori distribution of θ given λ is N (x, (λn) 1 ), i.e. (θ x) λn N (0, 1). It follows that (a posteriori) (θ x) λn is independent of λ, so that (θ x) λn λ (x i x) /(n 1) t n 1, i.e. θ x (xi x) t n 1, n(n 1) the point being that this function of θ and x has the same distribution when viewed either as a random variable dependent on x with θ fixed or a random variable dependent on θ with x fixed. This key result can be obtained by integrating λ ourt of (1). Finally, let t α/ be the α/-quantile of the t n 1 -distribution, then the 100(1 α)% HPD credible interval is x ± t (xi x) α/ n(n 1), which is exactly the same as the classical 100(1 α)% confidence interval. However the interpretation is different (see lecture notes). 4

4. Suppose that x 1,..., x n is a random sample from a Poisson distribution with unknown mean θ. Two models for the prior distribution of θ are contemplated; π 1 (θ) = e θ, θ > 0, and π (θ) = e θ θ, θ > 0. (a) Calculate the the Bayes estimator of θ under both models, with quadratic loss function. (b) The prior probabilities of model 1 and model are assessed at probability 1/ each. Calculate the Bayes factor for H 0 :model 1 applies against H 1 :model applies. Solution: (a) We calculate π 1 (θ x) e nθ θ x i e θ = e (n+1)θ θ x i, which we recognize as Gamma( x i + 1, n + 1). expected value of the posterior distribution, xi + 1 n + 1. For model, The Bayes estimator is the π (θ x) e nθ θ x i e θ θ = e (n+1)θ θ x i +1, which we recognize as Gamma( x i +, n + 1). expected value of the posterior distribution, xi + n + 1. The Bayes estimator is the Note that the first model has greater weight for smaller values of θ, so the posterior distribution is shifted to the left. (b) The prior probabilities of model 1 and model are assessed at probability 1/ each. Then the Bayes factor is 0 0 e nθ θ x i e θ dθ e nθ θ x i e θ θdθ = Γ( x i + 1)/((n + 1) x i +1 ) Γ( x i + )/((n + 1) x i + ) n + 1 = xi + 1. Note that in this setting = 5 P (model 1 x) P (model x) P (model 1 x) 1 P (model 1 x),

so that Hence P (model 1 x) = (1 + B(x)) 1. P (model 1 x) = which is decreasing for x increasing. = ( 1 + ( xi + 1 n + 1 ) 1, 1 + x + 1 n 1 + 1 n 5. Let θ be a real-valued parameter and let f(x θ) be the probability density function of an observation x, given θ. The prior distribution of θ has a discrete component that gives probability β to the point null hypothesis H 0 : θ = θ 0. The remainder of the distribution if continuous, and conditional on θ θ 0, its density is g(θ). ) 1 (1) Derive an expression for π(θ 0 x), the posterior probability of H 0. () Derive the Bayes factor B(x) for the null hypothesis against the alternative. (3) Express π(θ 0 x) in terms of B(x). (4) Explain how you would use B(x) to construct a most powerful test of size α for H 0, against the alternative θ θ 0. Suppose that x 1,..., x n is a sample from a normal distribution with mean θ and variance v. Let H 0 : θ = 0, let β = 1/, and let g(θ) = (πw ) 1/ exp{ θ /(w )}, for < θ <. Show that, if the sample mean is observed to be 10(v/n) 1/, then (5) the most powerful test of size α = 0.05 will reject H 0 for any value of n; (6) the posterior probability of H 0 converges to 1, as n. Solution: (1) Following lectures, the posterior probability of θ 0 is where () The Bayes factor is P (θ=θ 0 x) P (θ θ 0 x) P (θ=θ 0 ) P (θ θ 0 ) π(θ 0 x) = βf(x θ 0 ) βf(x θ 0 ) + (1 β)m(x), m(x) = f(x θ)g(θ)dθ. βf(x θ 0 ) = βf(x θ 0 ) + (1 β)m(x) βf(x θ 0) + (1 β)m(x) / β (1 β)m(x) 1 β = f(x θ 0) m(x). 6

(3) From lectures, π(θ 0 x) = ( 1 + 1 β ) 1. βb(x) (4) The simple hypothesis θ = θ 0 can be tested against the simple hypothesis that the density of x is f(x θ)g(θ)dθ; i.e. H 0 : x f(x θ) and H 1 : x m(x). The Neyman-Pearson Lemma says that the most powerful test of size α rejects H 0 when f(x θ)/m(x) < C α, where C α is chosen such that So we reject when B(x) < C α. {x:f(x θ)/m(x)<c α} f(x θ)dx = α. (5) We now have x = (x 1,..., x n ). The statistic x is sufficient. Under H 0, x N(0, v/n) and under H 1, x N(0, v/n + w ). The most powerful test rejects when x/ v/n > C α. For a 5% test, C α is certainly less than, therefore with the observed value of x the test will reject H 0 at the 5% level. (6) However the Bayes factor is 1 v/n exp{ 10 v/n v/n } 1 v/n+w exp{ 10 v/n } (v/n+w as n. Therefore the posterior probability of H 0 converges to 1 (contrasting with the conclusion of the classical test). 7