CE 590 Applied Bayesian Statistics. Mid-term Take-home Exam

Size: px

Start display at page:

Download "CE 590 Applied Bayesian Statistics. Mid-term Take-home Exam"

Grace Alyson Sutton
5 years ago
Views:

1 CE 590 Applied Bayesian Statistics Mid-term Take-home Exam Due: April 1, 2015

2 ST495/590 Mid-term take-home exam Due 4/1 This portion of the exam is take-home and must be dropped in my office by 5PM on Wednesday, April 1. THIS IS AN EXAM - YOU MAY NOT DISCUSS THE PROBLEMS WITH ANYONE (INCLUDING OTHER STUDENTS OR THE TA)! If you have questions please visit office hours or me. Data for this analysis were downloaded from For each National Basketball Association team and each season from , the data set includes several statistics describing the team s performance that season. Your objective is to build a predictive model for margin of victory in terms of the other variables. Variables are described on the back of the exam, and available for download at Use the data from to fit the model, and 2014 to test predictions. independence across teams and years. You may assume 1. Fit at least 2-3 different models to the data and select a final model. 2. Verify that the MCMC algorithm is producing reliable output for your final model. 3. Determine which variables in your final model are statistically significant. 4. Are the results sensitive to the prior? 5. Make predictions from your final model for each team in 2014, and plot the posterior predictive distributions versus actual 2014 data for the final model. Summarize prediction accuracy in terms of mean squared error and give the coverage of the prediction intervals. Turn in a report summarizing this analysis. The report should be no more than 4 pages (11 font, 1-inch margins) and should be in manuscript style with paragraphs of text and numbered figures and tables. A substantial portion of the grade will be based on clarity of presentation. You should describe in the text the methods you are using in enough detail that the analysis could be replicated by another student in the class. Attach commented code in a separate report. Staple all material for the exam together. HAVE FUN!

3 VARIABLE DESCRIPTIONS FOR THE NBA DATASET 1. Team: Team 2. Year: Year 3. Conference: East/West Conference 4. Division: Division within conference 5. MarginOfVictory: Average margin of victory on the season 6. AverageAge: Average age of players on the team 7. StrengthOfSchedule: Strength of schedule (positive means you played good teams) 8. Pace: Pace of play (possessions per game) 9. FreeThrowAttemptRate: Number of free throw attempts divided by number of field goal attempts 10. 3PointAttemptRate: Number of three point attempts divided by number of field goal attempts 11. TrueShootingPCT: Percentage of shots made (accounting for two verses three pointers) 12. TurnoverPCT: Turnovers per 100 plays 13. OffensiveReboundPCT: Percent of missed shots reclaimed on the rebound 14. ThreePointPCT: Three-point percentage 15. FreeThrowPCT: Free-throw percentage 16. Opp3pointPCT: Opponent s three-point percentage 17. oppfreethrowpct: Opponent s free-throw percentage 18. OffAveFGDist: Average distance of shot attempt 19. Off2PAssd: Percentage of made two-point shots that resulted from an assist 20. Dunks: Number of dunks 21. Off3PAssd: Percentage of made three-point shots that resulted from an assist 22. DefAveFGDist: Average distance of opponent s shots 23. Def2PAssd: Percentage of opponent s made two-point shots that resulted from an assist 24. Def3PAssd: Percentage of opponent s made three-point shots that resulted from an assist In this regression, use margin of victory as the response and variables 6-24 as predictors. For more information about these variables, see

4 M - 1 (ANSWER) 1. In this mid-term exam, the multiple linear regression analysis has been performed to fit the model based on NBA data and test the prediction using 2014 NBA data. The independence across teams and years is assumed. The margin of victory is defined as response (Y i ) and 19 variables from AverageAge (X i1 ) to Def3PAssd (X i19 ) of data are used as predictors. The basic model equation is represented in Equation (1) for i = 1~n observations. Since NBA teams are 30 and 4 seasons, n is 120 observations to fit the model. 2 Y i ~ Normal Xi 11 X i1919, (1) Here, all variables are centered and scaled. The priors σ 2 ~ InvGamma (0.01, 0.01) and α ~ Normal (0, ) are assumed for the error variance and intercept, respectively. And for the regression coefficients β j, four different prior models of Table 1 are selected, evaluated and compared in order to choose the best model. Specifically, for the best model selection, the mean squared error (MSE), Bias, average standard deviation (AVESD) and coverage of 95% prediction interval (COV) for the cross validation, and the deviance information criteria (DIC) for these models are calculated and represented in Table 2. As a result of cross validation, the Cauchy prior has the smallest prediction MSE and the best coverage out of four models. Regarding DIC result, the Gaussian 2 prior has the smallest DIC. Based on these results, the Cauchy prior or Gaussian 2 can be the best possible models. In this study, Gaussian 2 is set to be a final model. The detailed calculation is described in Appendix 1 and For checking whether the posterior results are sensitive to these four different priors, the mean, standard deviation (SD), 95% confidence interval (CI) are represented in Table 3. The distribution of the posterior of regression coefficients β 7 and β 13 as the representative examples (i.e. not sensitive and sensitive case) are illustrated in Figure 1. Overall, the posterior results seem not sensitive to 4 different priors. However, there are some changes about 8 regression coefficients of FreeThrowAttemptRate (β 4 ), 3PointAttemptRate (β 5 ), TurnoverPCT (β 6 ), ThreePointPCT (β 9 ), Opp3pointPCT (β 11 ), oppfreethrowpct (β 12 ), OffAveFGDist (β 13 ) and Dunks (β 15 ) in terms of 4 different priors. The detailed calculation is described in Appendix Since variables are deemed statistically significant if their 95% CI exclude zero, total 12 variables corresponding to regression coefficients such as β 1, β 2, β 6, β 7, β 8, β 9, β 11, β 12, β 13, β 15, β 17, and β 18 are to be statistically significant as shown in Table 4. Table 1. Four different priors of regression coefficients β j Gaussian 1 Gaussian 2 Cauchy Bayesian LASSO j ~ Normal 0, j ~ Normal 0, b j ~ t1 0, b j ~ DoubleEx 0, b 2 2 InvG InvG InvG b ~.,. b ~.,. Table 2. Cross validation and DIC Cross validation MSE BIAS AVESD COV b ~.,. DIC (penalized deviance) Gaussian Gaussian Cauchy Bayes LASSO

5 M - 2 Table 3. Posterior summary of regression coefficients β j with 4 different priors Gaussian 1 Gaussian 2 Cauchy Bayesian LASSO Mean SD 95% 95% 95% 95% Mean SD Mean SD Mean SD CI CI CI CI β β β β β β β β β β β β β β β β β β β Table 4. Posterior 95% confidence interval of regression coefficients β j with Gaussian 2 prior CL β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β 9 β % % CL β 11 β 12 β 13 β 14 β 15 β 16 β 17 β 18 β % % (a) β 7 (b) β 13 Figure 1. Posterior distributions of regression coefficients β 7 and β 13 with 4 different priors

6 M - 3 (ANSWER) 2. In order to verify that the MCMC algorithm is producing reliable output using the final model (Gaussian 2 prior as above mentioned), the convergence test for σ, σ b and β j is performed. Using MCMC algorithm of JAGS, samples have been drawn. 10,000 warm up samples are drawn by using update function. These are the burn-in samples. The 20,000 samples are more produced to approximate the posterior by using number of iteration in coda.samples. For the more thorough convergence test, the three chains are used by using n.chains of jags.model. The detailed code is represented in Appendix 4. Specifically, the trace of the parameters, the auto-correlation function (ACF), and the Gelman stat ( ˆR ) are produced, and ACF and ˆR of σ, σb, β 6 and β 13 are only represented in the Figure 2. The effective sample size (ESS) of all parameters is evaluated in Table 5. Table 5. Effective sample size (ESS) σ σ b β 1 β 2 β 3 β 4 β 5 β 6 β 7 β 8 β β 10 β 11 β 12 β 13 β 14 β 15 β 16 β 17 β 18 β As a result, if we see the ACF of β 6 (the worst cases based on ESS), the samples of a chain are being uncorrelated as lag is going on. The effective sample size of σ, σ b and β j are huge and greater than Based on Gelman stat graph, the Gelman stat of samples of β 6 is close to one after 14,000 samples. Therefore, the samplings for all parameters are converged, which leads to the reliable output. Figure 2. Auto-correlation function and Gelman stat of parameters of final model (ANSWER) 5. The predictions using the final model for each team in 2014 are summarized in Table 6. Since NBA teams are 30 and 1 seasons, n is 30 observations to test the model. The predictive posterior distributions (PPD) of Y 14 (Memphis) and Y 21 (Oklahoma City) are representatively illustrated with the actual or true 2014 data and plug-in distributions in Figure 3. The prediction accuracy of each team using the final model is described by using mean squared error (MSE), Bias, average standard deviation (AVESD), coverage of 95% prediction intervals (COV). Based on these results, it is observed that this model can predict the response (the margin of victory) of each team in 2014 season quite well. The detailed code is represented in Appendix 5.

7 M - 4 Table 6. Predictive posterior summary of each team Y i using final model Y i Mean SD 2.5% Q 97.5% Q 95% CI 1 Atlanta Boston Charlotte Chicago Cleveland Dallas Denver Detroit Golden State Houston Indiana LA Clippers LA Lakers Memphis Miami Milwaukee Minnesota New Jersey New Orleans New York Oklahoma City Orlando Philadelphia Phoenix Portland Sacramento San Antonio Toronto Utah Washington Table 7. Prediction accuracy using final model MSE BIAS AVESD COV (a) Y 14 : Memphis (b) Y 21 Figure 3. Predictive posterior distributions with the actual 2014 data for Y14 and Y21

8 A - 1 Appendix 1. Model Selection via Cross-validation # # # Model Selection via Cross-validation # # rm(list=ls()) ## Load and standardize NBA data dat <- read.csv(" Y <- dat[,6] Y <- (Y-mean(Y))/sd(Y) X <- dat[,7:25] X <- scale(x) # : observed data, 2014: test or prediction data obs <- dat[,3]!= 2014 prd <- dat[,3] == 2014 Yo <- Y[obs] Xo <- X[obs,] Yp <- Y[prd] Xp <- X[prd,] no <- length(yo) np <- length(yp) p <- ncol(xo) ## Fit the linear regression model # (1) Gaussian: beta_j ~ Normal(0,100^2) model_string1 <- "model{ for(i in 1:no){ Yo[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(xo[i,],beta[]) # Prediction for(i in 1:np){ Yp[i] ~ dnorm(mup[i],inv.var) mup[i] <- alpha + inprod(xp[i,],beta[]) beta[j] ~ dnorm(0,0.0001) " # (2) Gaussian: beta_j ~ Normal(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string2 <- "model{ for(i in 1:no){

9 A - 2 Yo[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(xo[i,],beta[]) # Prediction for(i in 1:np){ Yp[i] ~ dnorm(mup[i],inv.var) mup[i] <- alpha + inprod(xp[i,],beta[]) beta[j] ~ dnorm(0,inv.var.b) " # (3) Cauchy: beta_j ~ t1(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string3 <- "model{ for(i in 1:no){ Yo[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(xo[i,],beta[]) # Prediction for(i in 1:np){ Yp[i] ~ dnorm(mup[i],inv.var) mup[i] <- alpha + inprod(xp[i,],beta[]) beta[j] ~ dt(0,inv.var.b,1) " # (4) Bayesian LASSO: beta_j ~ DoubleExpo(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string4 <- "model{ for(i in 1:no){ Yo[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(xo[i,],beta[]) # Prediction for(i in 1:np){ Yp[i] ~ dnorm(mup[i],inv.var) mup[i] <- alpha + inprod(xp[i,],beta[])

10 A - 3 beta[j] ~ ddexp(0,inv.var.b) " # Fit the model library(rjags) model1 <- jags.model(textconnection(model_string1), data = list(yo=yo,no=no,np=np,p=p,xo=xo,xp=xp)) update(model1, 10000, progress.bar="none") samps1 <- coda.samples(model1, variable.names=c("yp"), Yp1 <- samps1[[1]] model2 <- jags.model(textconnection(model_string2), data = list(yo=yo,no=no,np=np,p=p,xo=xo,xp=xp)) update(model2, 10000, progress.bar="none") samps2 <- coda.samples(model2, variable.names=c("yp"), Yp2 <- samps2[[1]] model3 <- jags.model(textconnection(model_string3), data = list(yo=yo,no=no,np=np,p=p,xo=xo,xp=xp)) update(model3, 10000, progress.bar="none") samps3 <- coda.samples(model3, variable.names=c("yp"), Yp3 <- samps3[[1]] model4 <- jags.model(textconnection(model_string4), data = list(yo=yo,no=no,np=np,p=p,xo=xo,xp=xp)) update(model4, 10000, progress.bar="none") samps4 <- coda.samples(model4, variable.names=c("yp"), Yp4 <- samps4[[1]] ## Compile the results post_mn1 <- apply(yp1,2,mean) post_sd1 <- apply(yp1,2,sd) post_low1 <- apply(yp1,2,quantile,0.025) post_high1 <- apply(yp1,2,quantile,0.975) post_mn2 <- apply(yp2,2,mean) post_sd2 <- apply(yp2,2,sd) post_low2 <- apply(yp2,2,quantile,0.025) post_high2 <- apply(yp2,2,quantile,0.975) post_mn3 post_sd3 <- apply(yp3,2,mean) <- apply(yp3,2,sd)

11 A - 4 post_low3 <- apply(yp3,2,quantile,0.025) post_high3 <- apply(yp3,2,quantile,0.975) post_mn4 <- apply(yp4,2,mean) post_sd4 <- apply(yp4,2,sd) post_low4 <- apply(yp4,2,quantile,0.025) post_high4 <- apply(yp4,2,quantile,0.975) MSE1 <- mean((post_mn1-yp)^2) BIAS1 <- mean(post_mn1-yp) AVESD1 <- mean(post_sd1) COV1 <- mean(yp>post_low1 & Yp<post_high1) MSE2 <- mean((post_mn2-yp)^2) BIAS2 <- mean(post_mn2-yp) AVESD2 <- mean(post_sd2) COV2 <- mean(yp>post_low2 & Yp<post_high2) MSE3 <- mean((post_mn3-yp)^2) BIAS3 <- mean(post_mn3-yp) AVESD3 <- mean(post_sd3) COV3 <- mean(yp>post_low3 & Yp<post_high3) MSE4 <- mean((post_mn4-yp)^2) BIAS4 <- mean(post_mn4-yp) AVESD4 <- mean(post_sd4) COV4 <- mean(yp>post_low4 & Yp<post_high4) MSE BIAS AVESD COV <- c(mse1,mse2,mse3,mse4) <- c(bias1,bias2,bias3,bias4) <- c(avesd1,avesd2,avesd3,avesd4) <- c(cov1,cov2,cov3,cov4) OUTPUT <- cbind(mse,bias,avesd,cov) rownames(output) <- c("gaussian1","gaussian2","cauchy","blasso") as.table(output,digits=2)

12 A - 5 Appendix 2. Model Selection via DIC # # # Model Selection via DIC # # rm(list=ls()) ## Load and standardize NBA data dat <- read.csv(" Y <- dat[,6] Y <- (Y-mean(Y))/sd(Y) X <- dat[,7:25] X <- scale(x) # only for NBA data obs <- dat[,3]!= 2014 Y <- Y[obs] X <- X[obs,] n <- length(y) p <- ncol(x) ## Fit the linear regression model # (1) Gaussian: beta_j ~ Normal(0,100^2) model_string1 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ dnorm(0,0.0001) # " # (2) Gaussian: beta_j ~ Normal(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string2 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ dnorm(0,inv.var.b)

13 A - 6 " # (3) Cauchy: beta_j ~ t1(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string3 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ dt(0,inv.var.b,1) " # (4) Bayesian LASSO: beta_j ~ DoubleExpo(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string4 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ ddexp(0,inv.var.b) " # Fit the model library(rjags) model1 <- jags.model(textconnection(model_string1), data = list(y=y,n=n,x=x,p=p),n.chains=3) update(model1, 10000) dic1 <- dic.samples(model1, variable.names=c("beta"), model2 <- jags.model(textconnection(model_string2), data = list(y=y,n=n,x=x,p=p),n.chains=3) update(model2, 10000) dic2 <- dic.samples(model2, variable.names=c("beta"), model3 <- jags.model(textconnection(model_string3),

14 A - 7 data = list(y=y,n=n,x=x,p=p),n.chains=3) update(model3, 10000) dic3 <- dic.samples(model3, variable.names=c("beta"), model4 <- jags.model(textconnection(model_string4), data = list(y=y,n=n,x=x,p=p),n.chains=3) update(model4, 10000) dic4 <- dic.samples(model4, variable.names=c("beta"),

15 A - 8 Appendix 3. Multiple Linear Regression with Different Priors # # # Multiple linear regression using shrinkage priors # # rm(list=ls()) ## Load and standardize NBA data dat <- read.csv(" Y <- dat[,6] Y <- (Y-mean(Y))/sd(Y) X <- dat[,7:25] X <- scale(x) # only for NBA data obs <- dat[,3]!= 2014 Y <- Y[obs] X <- X[obs,] n <- length(y) p <- ncol(x) boxplot(x,las=3,main="standardized Covariates",cex.axis=0.75) image(1:p,1:p,abs(cor(x)), xlab="",ylab="",main="correlation between predictors", axes=false,col=gray(1-seq(0,1,.01))) axis(1,1:p,colnames(x),las=2) axis(2,1:p,colnames(x),las=2) ## Fit the linear regression model # (1) Gaussian: beta_j ~ Normal(0,100^2) model_string1 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ dnorm(0,0.0001) # " # (2) Gaussian: beta_j ~ Normal(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string2 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[])

16 A - 9 beta[j] ~ dnorm(0,inv.var.b) " # (3) Cauchy: beta_j ~ t1(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string3 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ dt(0,inv.var.b,1) " # (4) Bayesian LASSO: beta_j ~ DoubleExpo(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string4 <- "model{ for(i in 1:n){ Y[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(x[i,],beta[]) beta[j] ~ ddexp(0,inv.var.b) " # Fit the model library(rjags) model1 <- jags.model(textconnection(model_string1), data = list(y=y,n=n,x=x,p=p)) update(model1, 10000, progress.bar="none") samp1 <- coda.samples(model1, variable.names=c("beta"), model2 <- jags.model(textconnection(model_string2), data = list(y=y,n=n,x=x,p=p))

17 A - 10 update(model2, 10000, progress.bar="none") samp2 <- coda.samples(model2, variable.names=c("beta"), model3 <- jags.model(textconnection(model_string3), data = list(y=y,n=n,x=x,p=p)) update(model3, 10000, progress.bar="none") samp3 <- coda.samples(model3, variable.names=c("beta"), model4 <- jags.model(textconnection(model_string4), data = list(y=y,n=n,x=x,p=p)) update(model4, 10000, progress.bar="none") samp4 <- coda.samples(model4, variable.names=c("beta"), ## Compare the posteriors from the four fits # Extract the MCMC samples from each fit: s1 <- samp1[[1]] s2 <- samp2[[1]] s3 <- samp3[[1]] s4 <- samp4[[1]] # Plot the posterior for each covariance for all four models: for(index in 1:p){ d1 <- density(s1[,index]) d2 <- density(s2[,index]) d3 <- density(s3[,index]) d4 <- density(s4[,index]) mx <- max(d1$y,d2$y,d3$y,d4$y) plot(d1,ylim=c(0,mx),xlab="beta",ylab="posterior density",main=colnames(x)[index]) lines(d2,col=2) lines(d3,col=3) lines(d4,col=4) legend("topright",c("gaussian 1", "Gaussian 2", "Cauchy", "LASSO"),lty=1,col=1:4,inset=0.05)

18 A - 11 Appendix 4. Convergence Test # # # Convergence Test # # rm(list=ls()) ## Load and standardize NBA data dat <- read.csv(" Y <- dat[,6] Y <- (Y-mean(Y))/sd(Y) X <- dat[,7:25] X <- scale(x) # : observed data, 2014: test or prediction data obs <- dat[,3]!= 2014 prd <- dat[,3] == 2014 Yo <- Y[obs] Xo <- X[obs,] Yp <- Y[prd] Xp <- X[prd,] no <- length(yo) np <- length(yp) p <- ncol(xo) ## Fit the linear regression model # (2) Gaussian: beta_j ~ Normal(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string1 <- "model{ for(i in 1:no){ Yo[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(xo[i,],beta[]) # Prediction for(i in 1:np){ Yp[i] ~ dnorm(mup[i],inv.var) mup[i] <- alpha + inprod(xp[i,],beta[]) beta[j] ~ dnorm(0,inv.var.b) sigma <- 1/sqrt(inv.var) sigmab <- 1/sqrt(inv.var.b) " # Fit the model library(rjags) model1 <- jags.model(textconnection(model_string1),

19 A - 12 n.chains = 3, data = list(yo=yo,no=no,np=np,p=p,xo=xo,xp=xp)) update(model1, 10000) samp1 <- coda.samples(model1, variable.names=c("beta[1]","sigma","sigmab"), summary(samp1) plot(samp1) effectivesize(samp1) gelman.plot(samp1) autocorr.plot(samp1)

20 A - 13 Appendix 5. Prediction with Real Data # # # Multiple linear regression prediction # # rm(list=ls()) ## Load and standardize NBA data dat <- read.csv(" Y <- dat[,6] Y <- (Y-mean(Y))/sd(Y) X <- dat[,7:25] X <- scale(x) # : observed data, 2014: test or prediction data obs <- dat[,3]!= 2014 prd <- dat[,3] == 2014 Yo <- Y[obs] Xo <- X[obs,] Yp <- Y[prd] Xp <- X[prd,] no <- length(yo) np <- length(yp) p <- ncol(xo) ## Fit the linear regression model # (2) Gaussian: beta_j ~ Normal(0,sigmab^2) & sigmab^2 ~ InvGamma(0.01,0.01) model_string1 <- "model{ for(i in 1:no){ Yo[i] ~ dnorm(muo[i],inv.var) muo[i] <- alpha + inprod(xo[i,],beta[]) # Prediction for(i in 1:np){ Yp[i] ~ dnorm(mup[i],inv.var) mup[i] <- alpha + inprod(xp[i,],beta[]) beta[j] ~ dnorm(0,inv.var.b) sigma <- 1/sqrt(inv.var) " # Fit the model library(rjags) model1 <- jags.model(textconnection(model_string1), data = list(yo=yo,no=no,np=np,p=p,xo=xo,xp=xp))

21 A - 14 update(model1, 10000) samp1 <- coda.samples(model1, variable.names=c("alpha","beta","yp","sigma"), summary(samp1) #plot(samp1) ## Plot samples for each parameter #Extract the samples for each parameter samps1 <- samp1[[1]] Yp.samps1 <- samps1[,1:30] alpha.samps1 <- samps1[,31] beta.samps1 <- samps1[,32:50] sigma.samps1 <- samps1[,51] # Compute the posterior mean for the plug-in predictions beta.mn <- colmeans(beta.samps1) sigma.mn <- mean(sigma.samps1) alpha.mn <- mean(alpha.samps1) # Plot the PPD and plug-in for(j in 1:np){ # PPD plot(density(yp.samps1[,j]),xlab="y",main="ppd") # Plug-in mu <- alpha.mn+sum(xp[j,]*beta.mn) y <- rnorm(20000,mu,sigma.mn) lines(density(y),col=2) # Truth abline(v=yp[j],col=3,lwd=2) legend("topright",c("ppd","plug-in","truth"),col=1:3,lty=1,inset=0.05) ## Compile the results post_mn1 <- apply(yp.samps1,2,mean) post_sd1 <- apply(yp.samps1,2,sd) post_low1 <- apply(yp.samps1,2,quantile,0.025) post_high1 <- apply(yp.samps1,2,quantile,0.975) MSE1 <- mean((post_mn1-yp)^2) BIAS1 <- mean(post_mn1-yp) AVESD1 <- mean(post_sd1) COV1 <- mean(yp>post_low1 & Yp<post_high1) OUTPUT <- cbind(mse1,bias1,avesd1,cov1) as.table(output,digits=2)

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may