MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655
Outline 1 2 MIT 18.655
Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) : Loss function. R(θ, δ) = E X θ [L(θ, δ(x ))] Decision Problem: Bayes Components π : Prior distribution on Θ r(π, δ) = E θ [R(θ, δ)] = E θ [E X θ [L(θ, δ(x )]] = E X,θ [L(θ, δ(x ))] Bayes risk of the procedure δ(x ) with respect to the prior π. r(π) = inf δ D r(π, δ): Minimum Bayes risk. δ π : r(π, δ π ) = r(π) Bayes rule with respect to prior π and loss L(, ). 3 MIT 18.655
: Interpretations of Bayes Risk Bayes risk I r(π, δ) = Θ R(θ, δ)π(dθ) π( ) weights θ Θ where R(θ, δ) matters. π(θ) = constant: weights θ uniformly. Note: uniform weighting depends on parametrization. Interdependence of specifying the loss function and prior density: I I r(π, δ) = [L(θ, δ(x))π(θ)]p(x θ)dx dθ. IΘ IX = [L (θ, δ(x))π (θ)]p(x θ)dx dθ. Θ X for L (, ), π ( ) such that L (θ, δ(x))π (θ) = L(θ, δ(x))π(θ) 4 MIT 18.655
: Quadratic Loss Quadratic Loss: Estimating q(θ) with a A = {q(θ), θ Θ}. L(θ, a) = [q(θ) a] 2. Bayes risk: r(π, δ) = E ([q(θ) δ(x )] 2 ) Bayes risk as expected Posterior Risk: r(π, δ) = E X (E θ X L(θ, δ(x ))) = E X (E θ X ([q(θ) δ(x)] 2 )) Bayes decision rule specified by minimizing: E θ X ([q(θ) δ(x)] 2 X = x) for each outcome X = x, which is solved by δ π (x) = EI θ X =x [q(θ) X = x] Θ q(θ)p(x θ)π(θ)dθ = I p(x θ)π(θ)dθ Θ 5 MIT 18.655
Bayes Procedure: Quadratic Loss Example 3.2.1 X 1,..., X n iid N(θ, σ 2 ), σ 2 > 0, known. Prior Distribution: π : θ N(η, τ 2 ). Posterior Distribution: θ X = x : N(η, τ 2 ) 1 1 1 1 where η = [ X + η]/[ + ] σ 2 /n τ 2 σ 2 /n τ 2 2 1 1 = [ σ 2 /n + ] 1. τ 2 τ Bayes Procedure: δ π (X ) = E [θ x] = η Observations: Posterior risk: E [L(θ, δ π ) X = x] = τ 2 2. (constant!) = Bayes risk: r(π, δ π ) = τ MLE δ MLE (x) = x has Constant Risk: R(θ, X) = σ 2 /n = BayesRisk: r(θ, X) = σ 2 /n (> τ lim τ δ π (x) = x and lim τ τ 2 = σ 2 /n. 6 MIT 18.655 2 )
Bayes Procedure: General Case Bayes Risk and Posterior Risk: r(π, δ) = E θ [R(θ, δ(x )] = E θ [E X θ [L(θ, δ(x ))]] = E X [E θ X [L(θ, δ(x ))]] = E X [r(δ(x) x)] where r(a x) = E [L(θ, a) X = x] (Posterior risk) Proposition 3.2.1 Suppose δ : X A is such that r(δ (x) a) = inf a A {r(a x)} Then δ is a Bayes rule. Proof. For any procedure δ D, r(π, δ) = E X [r(δ(x) x)] E X [r(δ (x) x)] = r(π, δ ). 7 MIT 18.655
for Problems With Finite Θ Finite Θ Problem Θ = {θ 0, θ 1,..., θ K } A = {a 0, a 1,..., a q } (q may equal K or not) L(θ i, a j ) = w ij, for i = 0, 1,..., K, j = 0, 1,..., q K Prior distribution: π(θ i ) = π i 0, i = 1,..., K ( 0 π i = 1). Data/Random variable: X P θ with density/pmf p(x θ). Solution: Posterior probabilities: π i p(x θ i ) π(θ i X = x) = K j=0 π j p(x θ j ) Posterior risks: K i=0 w ij π i p(x θ i ) r(a j X = x) = K i=0 π i p(x θ i ) Bayes decision rule: δ (x) satisfies r(δ (x) x) = min 0 j K r(a j x) 8 MIT 18.655
Finite Θ Problem: Classification Classification Decision Problem p = q, identify A with Θ Loss function: 1 if i = j L(θ i, a j ) = w ij = 0 if i = j Bayes procedure minimizes Posterior Risk r(θ i x) = P[θ = θ i x] = 1 P[θ = θ i x] = δ (x) = θ i A, that maximizes P[θ = θ i x]. Special case: Testing Null Hypothesis vs Alternative p = q = 1 π 0 = π, π 1 = 1 π 0 Testing Θ 0 = {θ 0 } versus Θ 1 = {θ 1 } Bayes rule chooses θ = θ 1 if P[θ = θ 1 x] > P[θ = θ 0 x]. 9 MIT 18.655
Finite Θ Problem: Testing Equivalent Specifications of Bayes Procedure: δ (x) Minimizes r(π, δ) = πr(θ 0, δ) + (1 π)r(θ 1, δ) = πp(δ(x ) = θ 1 θ 0 ) + (1 π)p(δ(x ) = θ 0 θ 1 ) Chooses θ = θ 1 if P[θ = θ 1 x] > P[θ = θ 0 x] (1 π)p(x θ 1 ) > πp(x θ 0 ) p(x θ 1 ) > π/(1 π) (Likelihood Ratio) p(x θ 0 (1 π) p(x θ 1 ) > 1 (Bayes Factor) π p(x θ 0 ) The procedure δ solves: Minimize : P(δ(X ) = θ 0 θ 1 ) P(Type II Error) Subject to : P(δ(X ) = θ 1 θ 0 ) α P(Type I Error) i.e., minimizes the Lagrangian: P(δ(X ) = θ 1 θ 0 ) + λp(δ(x ) = θ 0 θ 1 ) with Lagrange multiplier λ = (1 π)/π. 10 MIT 18.655
Estimating Success Probability With Non-Quadratic Loss Decision Problem: X 1,..., X n iid Bernoulli(θ) Θ = {θ} = {θ : 0 < θ < 1} = (0, 1) A = {a} = Θ Loss equal to relative-squared-error: (θ a) 2 L(θ, a) =, 0 < θ < 1 and a real. θ(1 θ) Solving the Decision Problem By sufficiency, consider decision rules based on the sufficient statistic n S = 1 X i Binomial(n, θ). For a prior distribution π on Θ, with density π(θ), denote the density of the posterior distribution I by π(θ s) = [π(θ)θ s (1 θ) (n s) ]/ Θ[π(t)t s (1 t) (n s) ]dt 11 MIT 18.655
Solving the Decision Problem (continued) The posterior risk r(a S = k) is r(a S = k) = E [L(θ, a) S = k) (θ a) = E [ 2 θ(1 θ) S = k] θ 2 a = E [ a + 2 (1 θ) 1 θ θ(1 θ) S = k] θ 2 1 = E [ (1 θ) k] ae [ 1 θ k] + a 2 E[ θ(1 θ) k] which is a parabola in a minimized at 1 E [ 1 θ k] a = 1 E [ θ(1 θ) k] This defines the Bayes rule δ (S) for S = k (if the expectations exist). The Bayes rule can be expressed in closed form when the prior distribution is θ Beta(r, s) has closed form solution β(r+k,n k+s 1) (r+k 1) δ (k) = = β(r+k 1,n k+s 1) n+r+s 2 For r = s = 1, δ (k) = k/n = X (for k = 0, a = 0 directly) 12 MIT 18.655
With Hierarchical Prior Example 3.2.4 Random Effects Model X ij = µ + Δ i + E ij, i = 1,..., I and j = 1,..., J E ij are iid N(0, σe 2 ). Δ i iid N(0, σδ 2 ) independent of the E ij µ N(µ 0, σµ 2 ). Bayes Model: Specification I Prior distribution on θ = (µ, σe 2, σδ 2 ) µ N(µ 0, σ 2 ), independent of σ2 µ e and σδ 2. π(θ) = π 1 (µ)π 2 (σe 2 )π 3 (σδ 2 ). Data distribution: X ij θ are jointly normal random variables with E [X ij θ] = µ Var[X ij θ] = σδ 2 + σe 2 σ 2 Δ + σ2 if i = k, j = l Cov[X ij, X kl θ] = Δ if 0 if 13 MIT 18.655 σ 2 e i = k, j = l i = k
With Hierarchical Prior Example 3.2.4 Random Effects Model X ij = µ + Δ i + E ij, i = 1,..., I and j = 1,..., J E ij are iid N(0, σ 2 ). e Δ i iid N(0, σ 2 ) independent of the E ij Δ µ N(µ 0, σ 2 ). µ Bayes Model: Specification II Prior distribution on (µ, σe 2, σδ 2, Δ 1,..., Δ I ) r π(θ) = π I 1 (µ) π 2 (σe 2 ) π 3 (σδ 2 ) i=1 π 4 (Δ i σδ 2 ) r = π I 1 (µ) π 2 (σe 2 ) π 3 (σδ 2 ) i=1 φ σδ (Δ i ) Data distribution: X ij θ are independent normal random variables with E [X ij θ] = µ + Δ i, i = 1,... I, and j = 1,..., J Var[X ij θ] = σe 2 14 MIT 18.655
with Hierarchical Prior Issues: Decision problems often focus on single Δ i Posterior analyses then require marginal posterior distributions; e.g. I r π(δ 1 = d 1 x) = {θ:δ1 =d 1 } π(θ x) {i except Δ 1 } dθ i Approaches to computing marginal posterior distributions Direct computation (conjugate priors) Markov-Chain Monte Carlo (MCMC): simulations of posterior distributions. 15 MIT 18.655
Equivariance Definition ˆθ M : estimator of θ applying methodolgy M. h(θ) : one-to-one function of θ (a reparametrization). ˆθ M is equivariant if hh(θ) M = h(θˆm ) Equivariance of MLEs and MLEs are equivariant. Bayes procedures not necessarily equivariant For squared error loss, the Bayes procedure is mean of posterior distribution. With non-linear reparametrization h( ), E [h(θ) x] = h(e [θ x]). 16 MIT 18.655
and Reparametrization Reparametrization of Bayes Decision Problems Reparametrization in Bayes decision analysis is not just a transformation-of-variables exercise with the joint/posterior distributions. The loss function should be transformed as well. If φ = h(θ) and Φ = {φ : φ = h(θ), θ Θ} then L [φ, a] = L[h 1 (φ), a]. The decision analysis should be independent of the parametrization. 17 MIT 18.655
Equivariant Bayesian Decision Problems Equivariant Loss Functions Consider a loss function for which: L(h(θ), a) = L(θ, a), for all one-to-one functions h( ). Such a loss function is equivariant General class of equivariant loss functions: L(θ, a) = Q(P θ, P a ) E.g., Kullback-Leibler divergence loss: p(x a) L(θ, a) = E [log( ) θ] p(x θ) Loss probability-weighted log-likelihood ratio. For canonical exponential family: k L(η, a) = [η j a j )E[T j (X ) η] + A(η) A(a) j=1 18 MIT 18.655
MIT OpenCourseWare http://ocw.mit.edu 18.655 Mathematical Statistics Spring 2016 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.