INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ is given by f(x θ) and Θ has prior pdf r(θ). Suppose that we observe X = x and have calculated the posterior distribution s(θ x). We can now compute δ(x) := E(Θ X = x). In the case that s(θ x) is continuous, we have δ(x) = θs(θ x)dθ. One natural point estimator for θ is δ(x) and the associated point estimator is given by δ(x) = E(Θ X). This is just one important example of a Bayes estimator. From a variation of the first exercise on the first homework, you can verify that δ(x) is the value a for which E(Θ a) 2 X = x) is minimized. Thus in the case where L(θ, θ ) = θ θ 2, we have that δ(x) minimizes E[L(Θ, δ(x)) X = x)]. In general, we can consider other choices of L. The function L is called a loss function, and we aim to find a δ(x) which minimizes the conditional expected loss. The function δ is called a decision function and is a Bayes estimate of θ if it is a minimizer. More, generally, if we are interested in estimating a function of θ, given by g(θ), a Bayes estimator of g(θ) is a decision function δ(x) which minimizes E[L(g(Θ)), δ(x) X = x]. In this course we will mostly be concerned with the squared loss function L(θ, θ ) = θ θ 2. There are many other reasonable choices of loss function to consider, for example, the absolute loss given by L(θ, θ ) = θ θ.
2 INTRODUCTION TO BAYESIAN METHODS II Given a decision function δ and loss function L the risk function is defined to be R δ (θ) = EL(θ, δ(x)); the expectation here is with respect the the conditional likelihood of X so that in the continuous case EL(θ, δ(x)) = L(θ, δ(x))l(x θ)dx. An application of Fubni s theorem also shows that a Bayes estimator minimizes the expected risk. Since s(θ x)f X (x) = L(x θ)r(θ), we have [ ER δ (Θ) = L(θ, δ(x))l(x θ)dx] r(θ)dθ [ = L(θ, δ(x))s(θ x)f X (x)dx] dθ [ ] = L(θ, δ(x))s(θ x)dθ f X (x)dx = E[L(Θ, δ(x)) X = x)]f X (x)dx. We see that if δ(x) is a Bayes estimate, then it does more than minimize the expected risk, since it is minimizes E[L(Θ, δ(x)) X = x)] for every x! Exercise. Let Y be a random variable. Set g(a) = E(Y a) 2. Minimize g. Exercise 2. Let Y be a continuous random variable. Set g(a) = E Y a. Minimize g. Exercise 3. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ Unif(0, θ) and Θ has the Pareto distribution with scale parameter b > 0 and shape parameter α > 0. Find the Bayes estimator (with respect to the squared loss function) for θ. Solution. We compute the posterior distribution, and if we recognize it, then we will know E(Θ X). Let r be the prior pdf. Recall that r(θ) = αbα [θ > b]. θα+ Let x (0, θ) n, t(x) = max {x,..., x n }, and s(θ x) denote the posterior distribution. Recall that t(x) is a sufficient statistic for θ, and conditional on Θ = θ, we have the pdf for t(x) is given by g(t; θ) = ntn [t (0, θ)]. θ n
We have that for INTRODUCTION TO BAYESIAN METHODS II 3 s(θ t) g(t; θ)r(θ) ntn αb α [θ > b][θ > t] θ n θα+ [θ > max {t, b}] θα+n+ So that s(θ t) is the pdf of a Pareto distribution with posterior hyperparameters α = α + n and b = max {t, b}. Thus the family Pareto distribution is a conjugate family for the uniform scale family. Recall that a Pareto random variable with parameters α and b has mean α α b, from which it follows that E(Θ X) = α + n max {t(x), b}. α + n So, we see from Exercise 3, the key to computing a Bayes estimator boils down to computing the posterior distribution. In the next Exercise we will do some computations with the inverse gamma distribution. 2. Examples with the inverse gamma distribution Exercise 4. We say that a positive real-valued random variable X has the inverse gamma distribution with parameters α > 0 and β > 0 if it has pdf given by f(x; α, β) = βα Γ(α) x α e β x [x > 0]. Prove that if W has the gamma distribution with parameters α = α > 0 and β = /β, then W = d X. Deduce that EX n = Solution. Let x > 0. We have P(W x) = P(x W ) = β n (α ) (α n). /x g(w; α, /β)dw, where g is the pdf for W. The chain rule and the fundamental theorem of calculus give that the pdf for W is given by x 2 β α Γ(α) x β e α x = f(x; α, β), as required. The moment result now follows from a previous exercise.
4 INTRODUCTION TO BAYESIAN METHODS II Exercise 5. Let α > 0 be known. Let X = (X,..., X n ) be a random sample from the inverse gamma distribution with parameters α and β, where β > 0 is unknown. Show that n T = is a sufficient statistic for β. Solution. Let x (0, ) n. Let t = x + + x n. We have that n β α L(x; β) = Γ(α) e β n x i x α i = β αn e tβ x α i Γ(α) ; i= i= so that we can apply the Neyman factorization with and X i g(t; β) = β αn e tβ H(x) = n i= x α i Γ(α). Exercise 6. Let α, α 0, β 0 be known. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ InvGamma(α, θ) and Θ Gamma(α 0, β 0 ). Find the posterior distribution. Solution. From the previous exercise, we have that L(x θ) = g(t; θ)h(x); notice that g may not be the pdf of the sufficient statistic T. Let r(θ) be the prior distribution and s be the posterior. We have s(θ t) g(t; θ)r(θ) i= θ αn e tθ Γ(α 0 )β α θ α0 e θ β 0 0 θ αn+α 0 e θ(t+ β 0 ) We recognize that s(θ t) is the pdf of a gamma distribution with posterior hyperparameters α = αn + α 0 and /β = t + β 0. 0
INTRODUCTION TO BAYESIAN METHODS II 5 Exercise 7. Let α, β be known. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ is exponential with mean θ and Θ InvGamma(α, β). Find the posterior distribution. Solution. Let x (0, ) n. Let t = x + + x n. We have s(θ x) L(x; θ)r(θ) βα θ n e t/θ Γ(α) θ α e β θ θ α n e β+t θ. Thus we recognize that s(θ x) = s(θ t) is the pdf of a inverse gamma distribution with posterior hyperparameters α = α + n and β = β + t. Exercise 8. Let α, β be known. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ, where X θ is normal with mean 0 and variance θ and Θ InvGamma(α, β). Find the posterior distribution. Solution. Let x R n. Let t = 2 n i= x2 i. We have that s(θ x) L(x; θ)r(θ) βα e t/θ θn/2 Γ(α) θ α e β θ θ α n/2 e t+β θ. So we recognize that s(θ x) = s(θ t) has the inverse gamma distribution with hyperparameters α = α + n/2 and β = β + t. 3. Credible intervals First, let us recall confidence intervals in the classical setting. Let X = (X,..., X n ) be a random sample from f θ. Let u(x) < v(x). Suppose that P θ [θ (u(x), v(x))] = α; that is the random interval (u(x), v(x)) contains θ with probability α. If we observe X = x, then we call (u(x), v(x)) a 00( α) percent confidence interval for θ. Note that θ is either in CI or not in the CI; there is no probability once we observe X = x. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ is given by f(x θ) and Θ has prior pdf r(θ). Suppose u and v are functions of x such that ( P Θ ( u(x), v(x) ) ) X = x = α
6 INTRODUCTION TO BAYESIAN METHODS II Then we say that the interval (u(x), v(x)) is a 00( α) percent crediable interval for θ. Notice in the Bayesian setting, the deterministic interval (u(x), v(x)) really does contain θ with probability α. 4. Bayesian hypothesis testing Let be given by the disjoint union = N A. Suppose we want to test H 0 : θ N. Let X = (X,..., X n ) be a random sample from the conditional distribution of X given Θ = θ is given by f(x θ) and Θ has prior pdf r(θ). Consider the critical function [ ] φ(x) = P(Θ Θ A X = x) > P(Θ Θ N X = x).