Spring 2006: Examples: Laplace s Method; Hierarchical Models

Size: px

Start display at page:

Download "Spring 2006: Examples: Laplace s Method; Hierarchical Models"

Barrie Hicks
5 years ago
Views:

1 Spring 2006: Examples: Laplace s Method; Hierarchical Models Brian Junker February 13, 2006 Second-order Laplace Approximation for E[g(θ) y] An Analytic Example Hierarchical Models Example of Hierarchical Data and Modeling February 13, 2006

2 Second-order Laplace Approximation for E[g(θ) y] For non-negative g(θ), E[g(θ) y] = g(θ)p(y θ)p(θ)dθ p(y θ)p(θ)dθ = e nh (θ) dθ 1e nh(θ) dθ = I num I denom Lettingθ = argmin θ h (θ)=argmin θ [ 1/n log g(θ)p(θ)p(y θ)], and ˆθ=argmin θ h(θ)=argmin θ [ 1/n log p(θ)p(y θ)], I num = e nh (θ ) 2π 1 nh (θ ) [1+O(1/n)] I denom = e nh(ˆθ) 1 2π nh (ˆθ) [1+O(1/n)] Due to cancellation in the [1+O(1/n)] terms, we get E[g(θ) y] = 1/h (θ ) exp{ nh (θ )} 1/h (ˆθ) exp{ nh(ˆθ)} [1+O(1/n 2 )] February 13, 2006

3 An Analytic Example Suppose we have data y 1,...,y n iid Pois(θ), and priorθ Gamma(α,β). p(y θ) = θ i y i e nθ / i y i! p(θ y) Thus we already know θ ny e nθ θ α 1 e βθ = θ e ()θ Gamma(θ α+ny,) E[θ y]= α+ny, mode(θ y)=, Var (θ y)= α+ny () 2 We will examine: The basic normal approximation to the posterior; The expected rate E[1/θ y] given the data February 13, 2006

4 Normal Approximation:θ y N(ˆθ, I obs (ˆθ) 1 ) log p(θ y) 2 θ 2 ( ) () logθ ()θ = θ θ = θ 2 () Plugging in ˆθ=mode(θ y)=()/() and inverting, get so θ y N I obs (ˆθ) 1 = () 2 (, () 2 ) See R code for this lecture February 13, 2006

5 Posterior expected rate E[1/θ y] The first order Laplace approximation simply plugs in ˆθ: E[1/θ y] 1/ˆθ= The second-order Laplace approximation: E[1/θ y] h (θ ) 1/2 1/θ p(y θ )p(θ ) h (ˆθ) 1/2 p(y ˆθ)p(ˆθ) Ignoring constants that will cancel in the fraction, I denom = ˆθ ny e nˆθ ˆθ α 1 e βˆθ () 2 where ˆθ=()/(), and α+ny 2 I num = () 2 1/θ θ ny e nθ θ α 1 e βθ whereθ = (α+ny 2)/(), since log[(1/θ)p(y θ)p(θ)] logθ+() logθ ()θ February 13, 2006

6 If we observe that θ ˆθ = (α+ny 2)/() ()/() =α+ny 2 then the second order Laplace approximation yields E[1/θ y] α+ny 2 () 2 () 2 1/θ θ ny e nθ θ α 1 e βθ ˆθ ny e nˆθ ˆθ α 1 e βˆθ = 1/θ ( ) θ +1/2 e ()(θ ˆθ) ˆθ = α+ny 2 ( ) /2 α+ny 2 e February 13, 2006

7 We can compare these: First Order: E[1/θ y] Second Order: E[1/θ y] Exact Calculation: E[1/θ y] = = α+ny 2 0 ( ) /2 α+ny 2 e +1 1/θ Gamma(θ α+ny,)dθ ( () Γ() )( ) Γ(α+ny 2) () α+ny 2 = α+ny 2 See R code for today s lecture February 13, 2006

8 Some comments Ifθ=(θ 1,...,θ d ), then the same approach yields E[g(θ) y] = Σ 1/2 exp{ nh (θ )} ˆΣ 1/2 exp{ nh(ˆθ)} [1+O(1/n 2 )] where h(θ), h (θ), ˆθ andθ are as before, and now theσ s are inverse-information (variance/covariance) matrices: [ ] ˆΣ= 2 1 [ ] h(θ) Σ 2 h 1 (θ) = θ i θ j θ i θ j θ=ˆθ If g(θ) 0 fails, can consider E[g(θ) + C y], or E[exp{sg(θ)} y]. These yield similar results (Tierney, Kass and Kadane, 1989, JASA). There is a similar approximation for marginal posterior densities p(θ 1 y) p(y θ 1,θ 2 )p(θ 1,θ 2 )dθ 2 = exp{ p(θ 1,θ 2 )}dθ 2 θ=θ Σ(θ 1 ) 1/2 exp{ p(θ 1, ˆθ 2 (θ 1 ))} [1+O(1/n 3/2 )] [ ] 1 where ˆθ 2 (θ 1 )=argmax θ2 p(θ 1,θ 2 ) and Σ= 2 p(θ 1,θ 2 ) θ 2i θ 2 j θ2 =ˆθ 2 (θ 1 ) February 13, 2006

9 Our general setup Level 1: y p(y θ) Hierarchical Models Level 2:θ p(θ) is just a way of factoring the joint density p(y,θ)= p(y θ)p(θ). In multiple parameter problems it is advantageous to factor the joint density more fully. E.g. in the Normal model with a fully conjugate prior, p(y,µ,σ 2 )= p(y µ,σ 2 )p(µ σ 2 )p(σ 2 ) Suggests 3-level hierarchy: Level 1: y p(y µ,σ 2 ) Level 2:µ p(µ σ 2 ) Level 3:σ 2 p(σ 2 )... Or more generally: Level 1: y p(y θ,τ) or just p(y θ) Level 2:θ p(θ τ) Level 3:τ p(τ) It is often useful to organize the modeling into three or more hierarchical levels. This is especially true if the data itself is naturally hierarchical February 13, 2006

10 Example of Hierarchical Data and Modeling Gelman pp. 138ff. discusses the following test coaching data (for SAT-V scores, which generally have mean=500, SD=100, and range from 200 to 800 points): School A B C D E F G H y j σ j Each y j represents the mean effect on test scores of n j students in school j. We do not know n i but we do knowσ j, the estimated SE of the effect in each school. How shall we treat this data? Estimating eight separate mean effects? Pooled and estimating a single mean effect? See additional R code for this lecture February 13, 2006

11 Estimating eight separate mean effects? 95% CI s (classical confidence intervals, or credible intervals under a flat prior: Effect CI s Schools So it seems plausible that the eight studies are estimating a common effect February 13, 2006

12 Pooled and estimating a single mean effect? The pooled mean and S E 2 are y.. = jy j /σ 2 j/ j1/σ 2 j 7.9, σ 2..= 1/ j1/σ 2 j= 17.4 and, e.g.,χ 2 7 = i(y j y.. ) 2 /σ 2 j= 4.6<7, suggesting that there is a common mean effect. However, a sample from the pooled normal distribution doesn t yield estimates as spread out as these y s: > sort(y) [1] > M < ; effects <- matrix(na,ncol=8,nrow=m) > for (m in 1:M) { effects[m,] <- sort(rnorm(8,7.9,sqrt(17.4))) } > round(apply(effects,2,mean),2) [1] > round(summary(effects[,8]),2) Min. 1st Qu. Median Mean 3rd Qu. Max February 13, 2006

Remarks on Improper Ignorance Priors

As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive