HW3 Solutions : Applied Bayesian and Computational Statistics

Size: px

Start display at page:

Download "HW3 Solutions : Applied Bayesian and Computational Statistics"

Ashlynn Perkins
5 years ago
Views:

1 HW3 Solutions : Applied Bayesian and Computational Statistics March 2, 2006 Problem 1 a Fatal Accidents Poisson(θ I will set a prior for θ to be Gamma, as it is the conjugate prior. I will allow the paramters to be α = 0.5, β = 0.01 as it is near the non-informative Jeffrey s prior shown in the last problem. Thus, the posterior distribution can be calculated. p(θ y p(y θp(θ = e nθ θ yi θ 1 2 e 0.01θ = e θ θ(n+0.01 yi 1 2 Note that this is a Gamma(α = y i + 1 2, β = n I get the 95% predictive interval using simulation. I found the interval to be [14, 35]. > theta=rgamma(1000, sum(y+0.5, length(y+0.01 > ypred=rpois(1000, theta > sort(ypred[c(25, 975] [1] b Let θ = α + βt (a noninformative priors What do we know about α + βt? We know it must be greater than zero since it is the mean of a Poisson. The prior for (α, β 1 is one option. Given a fixed range of time values we may simply truncate the prior to ensure that α + βt > 0 on the support of the prior (technically the interior of the support. Another option is Jeffrey s prior, however with multiple parameters, the use of Jeffreys prior is more controversial. (b Informative Prior: One may postulate that the number of accidents decreases over time and choose a prior which favors negative values of β. Since, we do not hear of plane crashes all the time in the news, we 1

2 may believe that α is a fairly small positive number. To put these beliefs into practice, we can take, for example, independent normals with means 50 and 5 and standard deviations of 20 and 5. Another informative prior would be to do a regression analysis on the data to get estimates and distributions for α and β, and assume then they are independent (not necessarily realistic. However, this would be double counting the data, and not really a prior. I will draw the contours assuming the regression analysis. (Again, this is not an ideal method. We know that the regression coefficients are normally distributed. Performing a regression analysis, we get the following. Call: lm(formula = y t Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t (Intercept e-06 *** t Signif. codes: 0 *** ** 0.01 * Residual standard error: on 8 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 8 DF, p-value: So I will let α N(28.87, std=2.75 and β N( 0.92, std=0.44. The contours are in Figure 1. > z=matrix(na, 100, 100 > alpha.grid=seq(0, 50, length=100 > beta.grid=seq(-30, 20, length=100 > funval=function(alpha, beta{ + return(dnorm(alpha, 28.87, 2.75*dnorm(beta, -0.92, 0.44 > for (i in 1:100{ + for (j in 1:100{ + z[i,j]=funval(alpha.grid[i], beta.grid[j] contour(alpha.grid, beta.grid, z, xlab="alpha", ylab="beta", xlim=c(20, 40 (c Posterior Density & Sufficient Statistics Using the non-informative prior (α, β 1 we can compute the posterior. p(α, β y 1 10 i=1 = e (nα+β t i i (α + βt i y i e (α+βt i (α + βt i y i 2

3 beta alpha Figure 1: Prior Contours for 4b In this case, there is no way to write the posterior as a reduced function of the data, so the data are the sufficient statistics, i.e. (y 1, t 1,..., (y 10, t 10 (d Proper Posterior To check if the posterior is proper, we need to show p(α, β ydαdβ <. The handwavy answer is that the exponential term will dominate any polynomial, and thus it must be integrable. A more precise answer is the following. If we let Ω be the parameter space (possible values for (α, β, we get sup (α,β Ω {(α + βt i e (α+βt i sup u [0, {ue u = M for some constant M. Hence, we can create an upper bound for the integral as M 9 (α + βt 1 e (α+βt 1 dαdβ which is integrable. (e Linear Regression See the above section on Prior Distributions. We found α N(28.87, std=2.75 and β N( 0.92, std=0.44. (f Posterior Recall that the prior on (α, β 1. Recall that the posterior is proportional to e (nα+β t i i (α+ βt i y i. A contour plot of the posterior is in Figure 2. > alpha.grid=seq(15, 50, length=100 > beta.grid=seq(-3, 1, length=100 > for (i in 1:100{ + for (j in 1:100{ + z[i,j]=postfun(alpha.grid[i], beta.grid[j] > contour(alpha.grid, beta.grid, z, xlim=c(20, 40, ylim=c(-2.5, 0.5, xlab="alpha", ylab="beta" (g Posterior density for expected number of accidents in Below is the code on how I sampled from the joint posterior. I then computed α + β 11 for each of the sampled values and then I plotted the histogram in Figure 3 3

4 beta alpha Figure 2: Posterior Contours for 4f > alpha.grid=seq(20, 40, length=100 > beta.grid=seq(-2.5, 0.5, length=100 > for (i in 1:100{ + for (j in 1:100{ + z[i,j]=postfun(alpha.grid[i], beta.grid[j] > zvec=c(z > post.sample=sample(length(alpha.grid*length(beta.grid, length(alpha.grid, replace=t, prob=zvec > alpha=rep(na, length(alpha.grid > beta=rep(na, length(beta.grid > for (m in 1:100{ + j=post.sample[m]%/%100 + i=post.sample[m]%%100 + j=j+1 + if ( i ==0 {i=100; j=j-1 + alpha[m]=alpha.grid[i] + beta[m]=beta.grid[j] + > hist(alpha+11*beta (h Number of fatal accidents For this, I generate 100 new values of y based on the values of α and β. Doing this, I find the confidence interval to be [9,30]. > newy=rpois(100, alpha+beta*11 > quantile(newy, c(0.025, % 97.5% (i Informative prior vs. posterior 4

5 Histogram of alpha + 11 * beta Frequency alpha + 11 * beta Figure 3: Posterior Expected Number of Accidents for 4g Problem 2 The informative prior does vary from the posterior, in that my informative prior assumes independence, and α and β are clearly not independent in the posterior. Though I did know when I made the informative prior that the independence assumption was not accurate. a L(θ y = i p i y i = (2 + θ p 1 (1 θ p 2+p 3 θ p 4 /4 n b We use simple acceptance/rejection sampling to do our Monte Carlo simulation. We get the following summary statistics mean: sd: accept. rate: Plots of the posterior density and histograms of the samples for 2b and 2d are given below. 5

6 2d 2d Density Density samples samples # This is the code for 2b y=c(125,18,20,34; loglik = function(theta { logp = y[1]*log(2+theta+(y[2]+y[3]*log(1-theta+ y[4]*log(theta; fudge = 1e-3; M = -nlm( function(t { -loglik(t,.5$minimum + fudge; n = ; samples = runif(n,0,1; U = runif(n,0,1; accept = (log(u < loglik(samples - M; samples = samples[accept]; sum(accept/n; mean(samples; sd(samples; hist(samples,br=100,freq=f; s = seq(0,1, length=1000; gridpost = exp(loglik(s; gridpost = gridpost/sum(gridpost*length(s; lines(s,gridpost; c The estimated mean and standard deviation from the Laplace approximation are and respectively. The code is given below. h = function(t { -loglik(t/n 6

7 hstar = function(theta { logp = -(y[1]*log(2+theta+(y[2]+y[3]* log(1-theta+(y[4]+1*log(theta/n; hstar2 = function(theta { logp = -(y[1]*log(2+theta+(y[2]+y[3]* log(1-theta+(y[4]+2*log(theta/n; min = nlm( h,.5, hessian=t; min.star = nlm( hstar,.5, hessian=t; min.star2 = nlm( hstar2,.5, hessian=t; mu = sqrt(1/min.star$hessian*exp(-n*min.star$minimum / (sqrt(1/min$hessian*exp(-n*min$minimum mom2 = sqrt(1/min.star2$hessian*exp(-n*min.star2$minimum / (sqrt(1/min$hessian*exp(-n*min$minimum sqrt(mom2 - muˆ2 d The likelihood function is the same. We get the following summary statistics from running the same code as in part a with a different data vector. mean: sd: accept. rate: e We have that our old posterior is cl(θ y where c is the appropriate normalizing constant. We have as the posterior mean under the new Beta(5, 15 prior E(θ y = θl(θ yp(θdθ θp(θcl(θ ydθ θi dbeta(θ i, 5, 15 = L(θ yp(θdθ p(θcl(θ ydθ dbeta(θi, 5, 15 (1 where θ i is the i th draw from our old posterior. Similarly, we may estimate the second moment. The estimated posterior mean and standard deviation from this Beta(5, 15 prior are and respectively. Problem 3 We are given that y ij = θ j + ɛ ij, ɛ ij N(0, σ 2, θ j = µ + γ j, γ j N(0, τ 2. Combining these, we see that y ij = µ + γ j + ɛ ij, ɛ ij N(0, σ 2, γ j N(0, τ 2 where the γ s are independent of each other, and independent of the ɛ s, and the ɛ s are independent of each other. a Corr(y i1,j, y i2,j, i 1 i 2 Corr(y i1,j, y i2,j = = Cov(y i1,j, y i2,j Var(y i1,j Var(y i2,j Cov(µ + γ j + ɛ i1 j, µ + γ j + ɛ i2 j Var(µ + γ j + ɛ i1 j Var(µ + γ j + ɛ i2 j 7

8 µ is constant = Cov(γ j + ɛ i1 j, γ j + ɛ i2 j Var(γ j + ɛ i1 j Var(γ j + ɛ i2 j = Cov(γ j, γ j + Cov(γ j, ɛ i1,j + Cov(γ j, ɛ i2,j + Cov(ɛ i1,j, ɛ i2,j Var(γ j + ɛ i1,j Var(γ j + ɛ i2 j by independence = τ σ 2 + τ 2 b Corr(y i1 j 1, y i2 j 2 Problem 4 Corr(y i1 j 1, y i2 j 2 = µ is constant = a Graphical Summaries Corr(µ + γ j1 + ɛ i1 j 1, µ + γ j2 + ɛ i2 j 2 Var(µ + γ j1 + ɛ i1 j 1 Var(µ + γ j2 + ɛ i2 j 2 Corr(γ j1 + ɛ i1 j 1, γ j2 + ɛ i2 j 2 Var(γ j1 + ɛ i1 j 1 Var(γ j2 + ɛ i2 j 2 = Corr(γ j 1, γ j2 + Corr(γ j2, ɛ i1 j 1 + Corr(γ j1, ɛ i2 j 2 + Corr(ɛ i1 j 1, ɛ i2 j 2 Var(γ j1 + ɛ i1 j 1 Var(γ j2 + ɛ i2 j 2 = = Var(γ j1 + ɛ i1 j 1 Var(γ j2 + ɛ i2 j 2 See included pdf file for history, autocorrelation density plots and the numerical summary for sigma.theta and theta[1]. My bugs code is below. list(j=8, y=c(28, 8, -3, 7, -1, 1, 18, 12, sigma.y=c(15, 10, 16, 11, 9, 11, 10, 18 model{ for (j in 1:J{ y[j] dnorm(theta[j], tau.y[j] theta[j] dnorm(mu, tau.theta tau.y[j]<- pow(sigma.y[j], -2 mu dnorm(0.0, 1.0E-6 tau.theta<-pow(sigma.theta,-2 sigma.theta dunif(0, 1000 b Change for sigma.theta See included pdf file for history, autocorrelation density plots and the numerical summary for sigma.theta and theta[1]. My bugs code is below. 8

9 list(j=8, y=c(28, 8, -3, 7, -1, 1, 18, 12, sigma.y=c(15, 10, 16, 11, 9, 11, 10, 18 model{ for (j in 1:J{ y[j] dnorm(theta[j], tau.y[j] theta[j] dnorm(mu, tau.theta tau.y[j]<- pow(sigma.y[j], -2 mu dnorm(0.0, 1.0E-6 tau.theta<-pow(sigma.theta,-2 v dgamma(1,1 sigma.theta<-sqrt(1/v Note: I choose just to compare sigma.theta and theta[1] for a write up comparison to save space. Other variables could have been reviewed. Comparing parts a and b, we see that the values for sigma.theta are generally larger for part a (mean 6.4, sd 5.51 than for part b (mean 1.52, sd Both histories look as if the chain is stable and converged. However it is clear that the autocorrelation for the part a is larger than the autocorrelation for part b. The densities appear similarly shaped, with the changes in the summaries (i.e. mean and sd reflected. For theta[1] (chosen arbitrarily for comparison, we see similar items. For part a, the mean was 10.99, and the sd was For part b, the mean is smaller, at 7.5 and the sd is also smaller at Both histories look stable, indicating the chain is converged. Though here, the autocorrelation for part a is smaller than the autocorrelation for part b. This could explain the smaller sd for part b. 9

Introduction to Bayesian Computation

Introduction to Bayesian Computation Dr. Jarad Niemi STAT 544 - Iowa State University March 20, 2018 Jarad Niemi (STAT544@ISU) Introduction to Bayesian Computation March 20, 2018 1 / 30 Bayesian computation