Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine the Jeffreys prior π J for θ, and discuss whether the scale-invariant prior π 0 (θ) = 1/θ might be preferrable as noninformative prior. Using π J as the reference measure, find the maximum entropy prior under the constraints that the prior mean and variance of θ are both 1. (Just write it in terms of the constraints λ 1 and λ from lectures, do not solve for these.) Repeat, for the reference measure π 0. (Again, just write it in terms of the constraints λ 1 and λ from lectures, do not solve for these.) Think of X as the number of events during an interval of time of length T in a Poisson process of rate λ. In this case, θ = λt. Use this to justify the use of π 0 as a noninformative prior. Solution: The Poisson probability mass function is f(x θ) = e The conjugate prior π(θ) is given by θ θx, x = 0, 1,... x! π(θ) e τ 0θ θ τ 1, θ > 0. For this to be a proper prior we require τ 0 > 0 and τ 1 > 1. The posterior density of θ given x is then given by π(θ x) e (τ 0+1)θ θ τ 1+x, θ > 0, which is a Gamma distribution with parameters α = τ 1 + x + 1 and β = τ 0 + 1. To obtain the Jeffreys prior we need the Fisher information I(θ). Note that so θ log f(x θ) = x θ, I(θ) = E It follows that the Jeffreys prior is ( θ ) = θ θ = 1 θ. π J (θ) θ 1, θ > 0. Comparison with scale-invariant prior: If your relative beliefs about any two parameter values θ 1 and θ depend only on their ratio θ 1 /θ, then you will be led to the scaleinvariant prior, π 0 = 1/θ. If on the other hand you insist that the prior distribution of 1
θ is invariant under parametric transformatinos then you are led to the Jeffreys prior. In this example, these two philosophies are incompatible. Maximum entropy prior: The constraints are E π (θ) = 1 and E π ((θ 1) ) = 1, and the reference measure is π ref (θ) = π J (θ) θ 1. By definition, the prior density that maximises the entropy relative to the reference density π ref and satisfies m constraints is π(θ) = π ref(θ)exp( m k=1 λ k g k (θ)) πref (θ)exp( m k=1 λ k g k (θ))dθ, and in our case m =, g 1 (θ) = θ, g (θ) = (θ 1), and so π(θ) θ 1 exp(λ1 θ + λ (θ 1) ). Similarly, when the reference measure is π ref = π 0, then the maximum entropy prior is π(θ) θ 1 exp(λ 1 θ + λ (θ 1) ). Poisson process: The probability density of the inter-arrival times in a Poisson process is f(x λ) = λe λx. For this density the Fisher information is λ ( log λ + λx) = 1 λ, so that the Jeffreys prior for λ is proportional to 1/λ. Since θ = λt and T is a known constant, this implies that, from the perspective of inter-arrival times, the prior density for θ should also be proportional to 1/θ.. Consider a random sample of size n from a normal distribution with unknown mean θ and unknown variance φ, with the improper prior for (θ, φ). f(θ, φ) = φ 1, φ > 0 Calculate the joint posterior distribution and both marginal posteriors for (θ, φ). Calculate the 100(1 α)% HPD credible interval for θ in the posterior. Compare this with the classical confidence interval for θ. Solution: The joint posterior p.d.f. of θ and φ is proportional to e (x i θ) /(φ) (πφ) n/ 1 φ.
Let λ = 1, then, using the change of variable formula (Jacobian φ 1/λ ) the joint posterior p.d.f. of θ and λ is given by { π(θ, λ) exp λ } (xi θ) λ n/ 1. Now use the identity (xi θ) = (x i x) + n(x θ) to give { π(θ, λ) exp λ } (xi x) λ n/ 1 exp { λ } n(x θ). (1) Now integrate θ out, using that exp { λ } n(x θ) dθ = to give the marginal density π(λ) exp { λ (xi x) } π λn, λ (n 1)/ 1. It follows that λ has the Gamma density Gamma(α, β) with α = n 1 and β = 1 (xi x). Changing variables, with y = λ (x i x), we have π(y) = y (n 1)/ 1 e y/, y > 0. Γ(1/(n 1)) (n 1)/ In other words, the distribution of φ 1 (x i x) is χ n 1. From (1) the conditional posteriori distribution of θ given λ is N (x, (λn) 1 ), i.e. (θ x) λn N (0, 1). It follows that (a posteriori) (θ x) λn is independent of λ, so that (θ x) λn λ (x i x) /(n 1) t n 1, i.e. θ x (xi x) t n 1, n(n 1) the point being that this function of θ and x has the same distribution when viewed either as a random variable dependent on x with θ fixed or a random variable dependent on θ with x fixed. This key result can be obtained by integrating λ ourt of (1). Finally, let t α/ be the α/-quantile of the t n 1 -distribution, then the 100(1 α)% HPD credible interval is x ± t (xi x) α/ n(n 1), which is exactly the same as the classical 100(1 α)% confidence interval. However the interpretation is different (see lecture notes). 3
3. Suppose that x 1,..., x n is a random sample from a Poisson distribution with unknown mean θ. Two models for the prior distribution of θ are contemplated; π 1 (θ) = e θ, θ > 0, and π (θ) = e θ θ, θ > 0. (a) Calculate the the Bayes estimator of θ under both models, with quadratic loss function. (b) The prior probabilities of model 1 and model are assessed at probability 1/ each. Calculate the Bayes factor for H 0 :model 1 applies against H 1 :model applies. Solution: (a) We calculate π 1 (θ x) e nθ θ x i e θ = e (n+1)θ θ x i, which we recognize as Gamma( x i + 1, n + 1). expected value of the posterior distribution, xi + 1 n + 1. For model, The Bayes estimator is the π (θ x) e nθ θ x i e θ θ = e (n+1)θ θ x i +1, which we recognize as Gamma( x i +, n + 1). expected value of the posterior distribution, xi + n + 1. The Bayes estimator is the Note that the first model has greater weight for smaller values of θ, so the posterior distribution is shifted to the left. (b) The prior probabilities of model 1 and model are assessed at probability 1/ each. Then the Bayes factor is 0 0 e nθ θ x i e θ dθ e nθ θ x i e θ θdθ = Γ( x i + 1)/((n + 1) x i +1 ) Γ( x i + )/((n + 1) x i + ) n + 1 = xi + 1. Note that in this setting = 4 P (model 1 x) P (model x) P (model 1 x) 1 P (model 1 x),
so that Hence P (model 1 x) = (1 + B(x)) 1. P (model 1 x) = which is decreasing for x increasing. = ( 1 + ( xi + 1 n + 1 ) 1, 1 + x + 1 n 1 + 1 n 4. Let θ be a real-valued parameter and let f(x θ) be the probability density function of an observation x, given θ. The prior distribution of θ has a discrete component that gives probability β to the point null hypothesis H 0 : θ = θ 0. The remainder of the distribution if continuous, and conditional on θ θ 0, its density is g(θ). ) 1 (1) Derive an expression for π(θ 0 x), the posterior probability of H 0. () Derive the Bayes factor B(x) for the null hypothesis against the alternative. (3) Express π(θ 0 x) in terms of B(x). (4) Explain how you would use B(x) to construct a most powerful test of size α for H 0, against the alternative θ θ 0. Suppose that x 1,..., x n is a sample from a normal distribution with mean θ and variance v. Let H 0 : θ = 0, let β = 1/, and let g(θ) = (πw ) 1/ exp{ θ /(w )}, for < θ <. Show that, if the sample mean is observed to be 10(v/n) 1/, then (5) the most powerful test of size α = 0.05 will reject H 0 for any value of n; (6) the posterior probability of H 0 converges to 1, as n. Solution: (1) Following lectures, the posterior probability of θ 0 is where () The Bayes factor is P (θ=θ 0 x) P (θ θ 0 x) P (θ=θ 0 ) P (θ θ 0 ) π(θ 0 x) = βf(x θ 0 ) βf(x θ 0 ) + (1 β)m(x), m(x) = f(x θ)g(θ)dθ. βf(x θ 0 ) = βf(x θ 0 ) + (1 β)m(x) βf(x θ 0) + (1 β)m(x) / β (1 β)m(x) 1 β = f(x θ 0) m(x). 5
(3) From lectures, π(θ 0 x) = ( 1 + 1 β ) 1. βb(x) (4) The simple hypothesis θ = θ 0 can be tested against the simple hypothesis that the density of x is f(x θ)g(θ)dθ; i.e. H 0 : x f(x θ) and H 1 : x m(x). The Neyman-Pearson Lemma says that the most powerful test of size α rejects H 0 when f(x θ)/m(x) < C α, where C α is chosen such that So we reject when B(x) < C α. {x:f(x θ)/m(x)<c α} f(x θ)dx = α. (5) We now have x = (x 1,..., x n ). The statistic x is sufficient. Under H 0, x N(0, v/n) and under H 1, x N(0, v/n + w ). The most powerful test rejects when x/ v/n > C α. For a 5% test, C α is certainly less than, therefore with the observed value of x the test will reject H 0 at the 5% level. (6) However the Bayes factor is 1 v/n exp{ 10 v/n v/n } 1 v/n+w exp{ 10 v/n } (v/n+w as n. Therefore the posterior probability of H 0 converges to 1 (contrasting with the conclusion of the classical test). 6