Multiple Regression: Mixed Predictor Types. Tim Frasier
|
|
- Maude Walker
- 5 years ago
- Views:
Transcription
1 Multiple Regression: Mixed Predictor Types Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.
2 The Data
3 Data Fuel economy data from 1999 and 2008 for 38 popular models of car* I know, I know. It s neither biological nor that interesting, but it is hard to find good example data sets for this * As distributed with the ggplot2 package, and original data from the EPA (
4 Data
5 Data Predicted variable hwy
6 Data Two categorical predictors man class
7 Data Two metric predictors* displ cyl * I realize that cylinders is not really a metric variable, but we will treat it like one here for demonstration purposes
8 Data Read the data into R and parse out just the fields in which we are interested cardata <- read.table("mpg.csv", header = TRUE, sep = ",") carsub <- cardata[, c(2, 4, 6, 10, 12)]
9 Data Use summary function to get a feel for it summary(carsub) manufacturer displ cyl hwy class dodge :37 Min. :1.600 Min. :4.000 Min. : seater : 5 toyota :34 1st Qu.: st Qu.: st Qu.:18.00 compact :47 volkswagen:27 Median :3.300 Median :6.000 Median :24.00 midsize :41 ford :25 Mean :3.472 Mean :5.889 Mean :23.44 minivan :11 chevrolet :19 3rd Qu.: rd Qu.: rd Qu.:27.00 pickup :33 audi :18 Max. :7.000 Max. :8.000 Max. :44.00 subcompact:35 (Other) :74 suv :62
10 Data Plot the data to get a feel for it But keep in mind these can be misleading!!! pairs(carsub, pch = 16, col = rgb(0, 0, 1, 0.5))
11 Data manufacturer displ cyl hwy class
12 Data Positive relationship between engine displacement and the number of cylinders (makes sense) manufacturer displ cyl hwy class
13 Data Negative relationship between engine displacement & highway mpg manufacturer displ cyl hwy class
14 Data Negative relationship between number of cylinders & highway mpg manufacturer displ cyl hwy class
15 Data Mostly positive relationship between engine displacement & vehicle class manufacturer displ cyl hwy class
16 Data Some interesting patterns of relationships between class and highway mpg manufacturer displ cyl hwy class
17 Data Some interesting patterns of relationships between manufacturer and highway mpg manufacturer displ cyl hwy class
18 Frequentist Approach
19 Frequentist Approach Mixed predictors can be analyzed with the lm function cartest <- lm(hwy ~ manufacturer + displ + cyl + class, data = carsub)
20 Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** manufacturerchevrolet manufacturerdodge manufacturerford manufacturerhonda ** manufacturerhyundai manufacturerjeep manufacturerland rover manufacturerlincoln manufacturermercury manufacturernissan manufacturerpontiac manufacturersubaru manufacturertoyota manufacturervolkswagen * displ cyl *** classcompact classmidsize classminivan ** classpickup e-08 *** classsubcompact classsuv e-08 *** --- Residual standard error: on 211 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 22 and 211 DF, p-value: < 2.2e-16
21 Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** manufacturerchevrolet manufacturerdodge manufacturerford manufacturerhonda ** manufacturerhyundai manufacturerjeep manufacturerland rover manufacturerlincoln manufacturermercury manufacturernissan manufacturerpontiac manufacturersubaru manufacturertoyota manufacturervolkswagen * displ cyl *** classcompact classmidsize classminivan ** classpickup e-08 *** classsubcompact classsuv e-08 *** --- Residual standard error: on 211 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 22 and 211 DF, p-value: < 2.2e-16 Is the intercept plus the effect of being an audi. All other effects are differences from this reference
22 Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** manufacturerchevrolet manufacturerdodge manufacturerford manufacturerhonda ** manufacturerhyundai manufacturerjeep manufacturerland rover manufacturerlincoln manufacturermercury manufacturernissan manufacturerpontiac manufacturersubaru manufacturertoyota manufacturervolkswagen * displ cyl *** classcompact classmidsize classminivan ** classpickup e-08 *** classsubcompact classsuv e-08 *** --- Residual standard error: on 211 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 22 and 211 DF, p-value: < 2.2e-16 Manufacturer not too big an impact, but a little
23 Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** manufacturerchevrolet manufacturerdodge manufacturerford manufacturerhonda ** manufacturerhyundai manufacturerjeep manufacturerland rover manufacturerlincoln manufacturermercury manufacturernissan manufacturerpontiac manufacturersubaru manufacturertoyota manufacturervolkswagen * displ cyl *** classcompact classmidsize classminivan ** classpickup e-08 *** classsubcompact classsuv e-08 *** --- Residual standard error: on 211 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 22 and 211 DF, p-value: < 2.2e-16 Engine displacement has a negative, but not significant, effect
24 Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** manufacturerchevrolet manufacturerdodge manufacturerford manufacturerhonda ** manufacturerhyundai manufacturerjeep manufacturerland rover manufacturerlincoln manufacturermercury manufacturernissan manufacturerpontiac manufacturersubaru manufacturertoyota manufacturervolkswagen * displ cyl *** classcompact classmidsize classminivan ** classpickup e-08 *** classsubcompact classsuv e-08 *** --- Residual standard error: on 211 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 22 and 211 DF, p-value: < 2.2e-16 Cylinder number has a significant negative effect
25 Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** manufacturerchevrolet manufacturerdodge manufacturerford manufacturerhonda ** manufacturerhyundai manufacturerjeep manufacturerland rover manufacturerlincoln manufacturermercury manufacturernissan manufacturerpontiac manufacturersubaru manufacturertoyota manufacturervolkswagen * displ cyl *** classcompact classmidsize classminivan ** classpickup e-08 *** classsubcompact classsuv e-08 *** --- Residual standard error: on 211 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 22 and 211 DF, p-value: < 2.2e-16 Class category seems important
26 Bayesian Approach
27 Load Libraries & Functions library(runjags) library(coda) source("plotpost.r")
28 Organize the Data #--- The y data ---# y = carsub$hwy N = length(y) ymean = mean(y) ysd = sd(y) zy = (y - ymean) / ysd
29 Organize the Data #-- The metric x data ---# # displ displ <- carsub$displ displmean <- mean(displ) displsd <- sd(displ) zdispl <- (displ - displmean) / displsd # cyl cyl <- carsub$cyl cylmean <- mean(cyl) cylsd <- sd(cyl) zcyl <- (cyl - cylmean) / cylsd
30 Organize the Data #--- The nominal x data ---# man <- as.numeric(carsub$manufacturer) class <- as.numeric(carsub$class) manlevels <- levels(carsub$manufacturer) classlevels <- levels(carsub$class) nmans <- length(unique(man)) nclass <- length(unique(class))
31 Organize the Data datalist = list( y = zy, N = N, displ = zdispl, displmean = displmean, cyl = zcyl, cylmean = cylmean, man = man, class = class, nmans = nmans, nclass = nclass )
32 Organize the Data datalist = list( y = zy, N = N, displ = zdispl, displmean = displmean, cyl = zcyl, cylmean = cylmean, man = man, class = class, nmans = nmans, nclass = nclass ) Note that we need the means of the metric predictor variables here (we haven t in the past)
33 Define the Model µ τ = 1/σ 2 - norm yi
34 Define the Model Effect of being in each manufacturer category on mpg µ τ = 1/σ 2 - norm yi
35 Define the Model Effect of engine displacement on mpg µ τ = 1/σ 2 - norm yi
36 Define the Model Effect of being in each class category on mpg µ τ = 1/σ 2 - norm yi
37 Define the Model Effect of # of cylinders on mpg µ τ = 1/σ 2 - norm yi
38 Define the Model Note multiple personalities of β 0 now Metric predictors: y value when all predictors are zero Nominal predictors: Mean y value across all categories of all variables µ τ = 1/σ 2 - norm yi
39 Define the Model Note multiple personalities of β 0 now Metric predictors: y value when all predictors are zero Nominal predictors: Mean y value across all categories of all variables What should it be now? µ τ = 1/σ 2 - norm yi
40 Define the Model Note multiple personalities of β 0 now Metric predictors: y value when all predictors are zero Nominal predictors: Mean y value across all categories of all variables Makes sense to set it as the mean predicted value if the metric predictors are re-centred at their mean µ τ = 1/σ 2 - norm yi
41 Define the Model All α because they will need to be standardized µ τ = 1/σ 2 - norm yi
42 Define the Model All α because they will need to be standardized Now metric effects are centred around the mean µ τ = 1/σ 2 - norm yi
43 Define the Model 0 10 µ τ = 1/σ 2 - norm µ τ = 1/σ 2 - norm yi
44 Define the Model µ τ = 1/σ 2 norm µ τ = 1/σ 2 - norm µ τ = 1/σ 2 - norm yi
45 Define the Model µ τ = 1/σ 2 norm µ τ = 1/σ 2 - norm We ll also make each nominal variable hierarchical... µ τ = 1/σ 2 - norm yi
46 Define the Model µ τ = 1/σ 2 norm µ τ = 1/σ 2 - norm α gamma β µ τ = 1/σ 2 - norm yi
47 modelstring = " model { for (i in 1:N) { } #--- Likelihood ---# y[i] ~ dnorm(mu[i], tau) mu[i] <- a0 + a1[man[i]] + (a2 * (displ[i] - displmean)) + a3[class[i]] + (a4 * (cyl[i] - cylmean)) #--- Priors ---# sigma ~ dgamma(1.1, 0.11) tau <- 1 / sigma^2 a0 ~ dnorm(0, 1/10^2) a2 ~ dnorm(0, 1/10^2) a4 ~ dnorm(0, 1/10^2) # a1 for (j in 1:nMans) { a1[j] ~ dnorm(manmeans, 1/manSD^2) } # a3 for (j in 1:nClass) { a3[j] ~ dnorm(classmeans, 1/classSD^2) }
48 #--- Hyperpriors ---# manmeans ~ dnorm(0, 1/10^2) mansd ~ dgamma(1.1, 0.11) classmeans ~ dnorm(0, 1/10^2) classsd ~ dgamma(1.1, 0.11)
49 # # # Convert a0,a[] to sum-to-zero b0,b[] : # # # m1 <- mean(a1[1:nmans]) # Mean across a1 categories m3 <- mean(a3[1:nclass]) # Mean across a3 categories #- b0 is a0 + mean of each nominal predictor, minus mean effect -# #- of metric predictors. See Kruschke (2015) p. 570 for algebra -# b0 <- a0 + m1 + m3 - (a2 * displmean) - (a4 * cylmean) #- b1 is the the uncorrected a1 minus mean across categories for that nominal variable -# for (j in 1:nMans) { b1[j] <- a1[j] - m1 } #- b3 is the uncorrected a3 minus mean across categories for that nominal variable -# for (j in 1:nClass) { b3[j] <- a3[j] - m3 } #- Coefficients for metric variables stay the same -# b2 <- a2 b4 <- a4 } " # close quote for modelstring writelines(modelstring,con="model.txt")
50 Specify Initial Values initslist <- function() { list( sigma = rgamma(n = 1, shape = 1.1, rate = 0.11), a0 = rnorm(n = 1, mean = 0, sd = 10), b2 = rnorm(n = 1, mean = 0, sd = 10), b4 = rnorm(n = 1, mean = 0, sd = 10), manmeans = rnorm(n = 1, mean = 0, sd = 10), mansd = rgamma(n = 1, shape = 1.1, rate = 0.11), classmeans = rnorm(n = 1, mean = 0, sd = 10), classsd = rgamma(n = 1, shape = 1.1, rate = 0.11) ) }
51 Specify MCMC Parameters and Run runjagsout <- run.jags( method = "simple", model = "model.txt", monitor = c("b0", "b1", "b2", "b3", "b4", "sigma"), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE)
52 Evaluate Performance of the Model
53 Testing Model Performance Retrieve the data and take a peak at the structure codasamples = as.mcmc.list(runjagsout) head(codasamples[[1]]) Markov Chain Monte Carlo (MCMC) output: Start = 1501 End = 1507 Thinning interval = 1 b0 b1[1] b1[2] b1[3] b1[4] b1[5] b1[6] b1[7] b1[8] b1[9] b1[10] b1[11] b1[12] b1[13]
54 Testing Model Performance Can do this on your own
55 Extract & Parse Results mcmcchain = as.matrix(codasamples) # b0 zb0 = mcmcchain[, "b0"] # b1 chainlength = length(zb0) zb1 = matrix(0, ncol = chainlength, nrow = nmans) for (i in 1:nMans) { zb1[i, ] = mcmcchain[, paste("b1[", i, "]", sep = "")] } # b2 zb2 = mcmcchain[, "b2"] # b3 zb3 = matrix(0, ncol = chainlength, nrow = nclass) for (i in 1:nClass) { zb3[i, ] = mcmcchain[, paste("b3[", i, "]", sep = "")] } # b4 zb4 = mcmcchain[, "b4"] # sigma zsigma <- mcmcchain[, "sigma"]
56 Convert to Original Scale b0 <- (zb0 * ysd) + ymean b2 <- (zb2 * ysd) / displsd b4 <- (zb4 * ysd) / cylsd b1 <- zb1 * ysd b3 <- zb3 * ysd sigma <- zsigma * ysd
57 View Posteriors
58 Plotting Posterior Distributions β 0 par(mfrow = c(1, 1)) histinfo = plotpost(b0, xlab = "b0", main = "b0") b0 mean = % HDI b0
59 Plotting Posterior Distributions β 1 par(mfrow = c(3, 3)) for (i in 1:nMans) { histinfo = plotpost(b1[i, ], xlab = bquote(b1[.(i)]), main = paste("b1:", manlevels[i])) }
60 Plotting Posterior Distributions β 1 b1: audi mean = % HDI b1: chevrolet mean = % HDI b1: dodge mean = % HDI b b b1 3 b1: ford mean = % HDI b1: honda mean = % HDI b1: hyundai mean = % HDI b b b1 6 b1: jeep mean = % HDI b1: land rover mean = % HDI b1: lincoln mean = % HDI b b b1 9
61 Plotting Posterior Distributions β 1 b1: mercury mean = % HDI b1: nissan mean = % HDI b1: pontiac mean = % HDI b b b1 12 b1: subaru mean = % HDI b1: toyota mean = % HDI b1: volkswagen mean = % HDI b b b1 15
62 Plotting Posterior Distributions β 2 par(mfrow = c(1, 1)) histinfo = plotpost(b2, xlab = "b2", main = "Engine Displacement") Engine Displacement mean = % HDI b2
63 Plotting Posterior Distributions β 3 par(mfrow = c(2, 2)) for (i in 1:nClass) { histinfo = plotpost(b3[i, ], xlab = bquote(b3[.(i)]), main = paste("b3:", classlevels[i])) }
64 Plotting Posterior Distributions β 3 b3: 2seater mean = b3: compact mean = % HDI % HDI b b3 2 b3: midsize mean = b3: minivan mean = % HDI % HDI b b3 4
65 Plotting Posterior Distributions β 3 b3: pickup mean = b3: subcompact mean = % HDI % HDI b b3 6 b3: suv mean = % HDI b3 7
66 Plotting Posterior Distributions β 4 par(mfrow = c(1, 1)) histinfo = plotpost(b4, xlab = "b4", main = "# of Cylinders") # of Cylinders mean = % HDI b4
67 Posterior Predictive Check
68 Posterior Predictive Check Select a subset of the data on which to make predictions (let s pick 20) npred = 20 newrows <- round(seq(from = 1, to = NROW(carSub), length = npred)) newdata <- carsub[newrows, ]
69 Posterior Predictive Check Separate out just the x data, on which we will make predictions x1 <- as.numeric(newdata$manufacturer) x2 <- newdata$displ x3 <- as.numeric(newdata$class) x4 <- newdata$cyl
70 Posterior Predictive Check Next, define a matrix that will hold all of the predicted y values Number of rows is the number of x values for prediction Number of columns is the number of y values generated from the MCMC process We ll start with the matrix filled with zeros, but will fill it in later postsampsize = length(b0) ynew = matrix(0, nrow = npred, ncol = postsampsize)
71 Posterior Predictive Check Define a matrix for holding the HDI limits of the predicted y values Same number of rows as above Only two columns (one for each end of the HDI) yhdilim = matrix(0, nrow = npred, ncol = 2)
72 Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain Note that our coefficients for the metric predictors are centred around the mean, so we have to treat them this way here for (i in 1:nPred) { for (j in 1:postSampSize) { ynew[i, j] <- rnorm(1, mean = b0[j] + b1[x1[i], j] + (b2[j] * (x2[i] - displmean)) + b3[x3[i], j] + (b4[j] * (x4[i] - cylmean)), sd = sigma[j]) } }
73 Posterior Predictive Check Calculate means for each prediction, and the associated low and high 95% HDI estimates means <- rowmeans(ynew) source("hdiofmcmc.r") for (i in 1:nPred) { yhdilim[i, ] <- HDIofMCMC(yNew[i, ]) }
74 Posterior Predictive Check Combine into one data frame predtable <- cbind(means, yhdilim)
75 Posterior Predictive Check Plot predicted values dotchart(means, labels = 1:nPred, xlim = c(min(yhdilim), max(yhdilim)), xlab = hwy mpg", pch = 16) segments(yhdilim[, 1], 1:nPred, yhdilim[, 2], 1:nPred, lwd = 2) Add the truth points(x = newdata$hwy, y = 1:nPred, pch = 16, col = rgb(1, 0, 0, 0.5))
76 Posterior Predictive Check hwy mpg
77 Homework (last one!)
78 Homework Get the DIC for the full model Re-configure and run the model 4 more times, leaving a different predictor variable out each time, and get the DIC for each Compare the DIC values to decide which predictors are most important for your model Should explain your results and interpretation, but can do so as commented lines in your code (i.e., enclosed in # so that your code will still run, but also so that you have written explanations in there for me to read)
79 Creative Commons License Anyone is allowed to distribute, remix, tweak, and build upon this work, even commercially, as long as they credit me for the original creation. See the Creative Commons website for more information. Click here to go back to beginning
Metric Predicted Variable With One Nominal Predictor Variable
Metric Predicted Variable With One Nominal Predictor Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more
More informationMultiple Regression: Nominal Predictors. Tim Frasier
Multiple Regression: Nominal Predictors Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Goals
More informationHierarchical Modeling
Hierarchical Modeling Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. General Idea One benefit
More informationCount Predicted Variable & Contingency Tables
Count Predicted Variable & Contingency Tables Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.
More informationMetric Predicted Variable on Two Groups
Metric Predicted Variable on Two Groups Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Goals
More informationMetric Predicted Variable on One Group
Metric Predicted Variable on One Group Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Prior Homework
More informationOperators and the Formula Argument in lm
Operators and the Formula Argument in lm Recall that the first argument of lm (the formula argument) took the form y. or y x (recall that the term on the left of the told lm what the response variable
More informationBayesian Statistics: An Introduction
: An Introduction Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Outline 1. Bayesian statistics,
More informationWinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis
WinBUGS : part 2 Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert Gabriele, living with rheumatoid arthritis Agenda 2! Hierarchical model: linear regression example! R2WinBUGS Linear Regression
More informationIntroduction to R, Part I
Introduction to R, Part I Basic math, variables, and variable types Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here
More informationWhy Bayesian approaches? The average height of a rare plant
Why Bayesian approaches? The average height of a rare plant Estimation and comparison of averages is an important step in many ecological analyses and demographic models. In this demonstration you will
More informationR Demonstration ANCOVA
R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the
More information36-463/663: Multilevel & Hierarchical Models HW09 Solution
36-463/663: Multilevel & Hierarchical Models HW09 Solution November 15, 2016 Quesion 1 Following the derivation given in class, when { n( x µ) 2 L(µ) exp, f(p) exp 2σ 2 0 ( the posterior is also normally
More informationStatistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS
Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS 1. (a) The posterior mean estimate of α is 14.27, and the posterior mean for the standard deviation of
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationSTAT Lecture 11: Bayesian Regression
STAT 491 - Lecture 11: Bayesian Regression Generalized Linear Models Generalized linear models (GLMs) are a class of techniques that include linear regression, logistic regression, and Poisson regression.
More informationAnalytics 512: Homework # 2 Tim Ahn February 9, 2016
Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More informationlm statistics Chris Parrish
lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationPackage leiv. R topics documented: February 20, Version Type Package
Version 2.0-7 Type Package Package leiv February 20, 2015 Title Bivariate Linear Errors-In-Variables Estimation Date 2015-01-11 Maintainer David Leonard Depends R (>= 2.9.0)
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationMALA versus Random Walk Metropolis Dootika Vats June 4, 2017
MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented
More informationHierarchical Linear Models
Hierarchical Linear Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin The linear regression model Hierarchical Linear Models y N(Xβ, Σ y ) β σ 2 p(β σ 2 ) σ 2 p(σ 2 ) can be extended
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More information36-463/663Multilevel and Hierarchical Models
36-463/663Multilevel and Hierarchical Models From Bayes to MCMC to MLMs Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 Outline Bayesian Statistics and MCMC Distribution of Skill Mastery in a Population
More informationClass 04 - Statistical Inference
Class 4 - Statistical Inference Question 1: 1. What parameters control the shape of the normal distribution? Make some histograms of different normal distributions, in each, alter the parameter values
More informationChapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)
Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationLab #5 - Predictive Regression I Econ 224 September 11th, 2018
Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Introduction This lab provides a crash course on least squares regression in R. In the interest of time we ll work with a very simple, but
More informationGeneral Linear Statistical Models
General Linear Statistical Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin This framework includes General Linear Statistical Models Linear Regression Analysis of Variance (ANOVA) Analysis
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More informationPrediction problems 3: Validation and Model Checking
Prediction problems 3: Validation and Model Checking Data Science 101 Team May 17, 2018 Outline Validation Why is it important How should we do it? Model checking Checking whether your model is a good
More informationBUGS Bayesian inference Using Gibbs Sampling
BUGS Bayesian inference Using Gibbs Sampling Glen DePalma Department of Statistics May 30, 2013 www.stat.purdue.edu/~gdepalma 1 / 20 Bayesian Philosophy I [Pearl] turned Bayesian in 1971, as soon as I
More informationGeneralized Linear Models
Generalized Linear Models Assumptions of Linear Model Homoskedasticity Model variance No error in X variables Errors in variables No missing data Missing data model Normally distributed error Error in
More informationCommunity Health Needs Assessment through Spatial Regression Modeling
Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu Objectives: Assess community needs with respect to particular
More informationHoliday Assignment PS 531
Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationPackage effectfusion
Package November 29, 2016 Title Bayesian Effect Fusion for Categorical Predictors Version 1.0 Date 2016-11-21 Author Daniela Pauger [aut, cre], Helga Wagner [aut], Gertraud Malsiner-Walli [aut] Maintainer
More informationContents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.
2 2 dplyr lfe readr MASS auto.csv plot() plot() ggplot2 plot() # Start the.jpeg driver jpeg("your_plot.jpeg") # Make the plot plot(x = 1:10, y = 1:10) # Turn off the driver dev.off() # Start the.pdf driver
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationStat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationPackage horseshoe. November 8, 2016
Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding
More informationContents. 1 Introduction: what is overdispersion? 2 Recognising (and testing for) overdispersion. 1 Introduction: what is overdispersion?
Overdispersion, and how to deal with it in R and JAGS (requires R-packages AER, coda, lme4, R2jags, DHARMa/devtools) Carsten F. Dormann 07 December, 2016 Contents 1 Introduction: what is overdispersion?
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationThe lmm Package. May 9, Description Some improved procedures for linear mixed models
The lmm Package May 9, 2005 Version 0.3-4 Date 2005-5-9 Title Linear mixed models Author Original by Joseph L. Schafer . Maintainer Jing hua Zhao Description Some improved
More informationSTK 2100 Oblig 1. Zhou Siyu. February 15, 2017
STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 12 Analysing Longitudinal Data I: Computerised Delivery of Cognitive Behavioural Therapy Beat the Blues
More informationPackage bayeslm. R topics documented: June 18, Type Package
Type Package Package bayeslm June 18, 2018 Title Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors Version 0.8.0 Date 2018-6-17 Author P. Richard Hahn, Jingyu He, Hedibert Lopes Maintainer
More informationSection Least Squares Regression
Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple
More informationBayesian Dynamic Modeling for Space-time Data in R
Bayesian Dynamic Modeling for Space-time Data in R Andrew O. Finley and Sudipto Banerjee September 5, 2014 We make use of several libraries in the following example session, including: ˆ library(fields)
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationHW3 Solutions : Applied Bayesian and Computational Statistics
HW3 Solutions 36-724: Applied Bayesian and Computational Statistics March 2, 2006 Problem 1 a Fatal Accidents Poisson(θ I will set a prior for θ to be Gamma, as it is the conjugate prior. I will allow
More informationSolution to Series 11
Prof. Dr. M. Maathuis Multivariate Statistics SS 2014 Solution to Series 11 1. a) > car
More informationChapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)
Chapter 4 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006) Preliminaries > library(daag) Exercise 2 Draw graphs that show, for degrees of freedom between 1 and 100,
More informationPIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS
PIER HLM Course July 30, 2011 Howard Seltman Discussion Guide for Bayes and BUGS 1. Classical Statistics is based on parameters as fixed unknown values. a. The standard approach is to try to discover,
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationb. Write the rule for a function that has your line as its graph. a. What shadow location would you predict when the flag height is12 feet?
Regression and Correlation Shadows On sunny days, every vertical object casts a shadow that is related to its height. The following graph shows data from measurements of flag height and shadow location,
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationMotor Trend Car Road Analysis
Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are
More informationChapter 5: Exploring Data: Distributions Lesson Plan
Lesson Plan Exploring Data Displaying Distributions: Histograms For All Practical Purposes Mathematical Literacy in Today s World, 7th ed. Interpreting Histograms Displaying Distributions: Stemplots Describing
More informationA Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt
A Handbook of Statistical Analyses Using R 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 12 Quantile Regression: Head Circumference for Age 12.1 Introduction 12.2 Quantile Regression 12.3 Analysis
More informationIntroduction to the Analysis of Hierarchical and Longitudinal Data
Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models
More informationAdditional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:
Ron Heck, Summer 01 Seminars 1 Multilevel Regression Models and Their Applications Seminar Additional Notes: Investigating a Random Slope We can begin with Model 3 and add a Random slope parameter. If
More informationLecture 19. Spatial GLM + Point Reference Spatial Data. Colin Rundel 04/03/2017
Lecture 19 Spatial GLM + Point Reference Spatial Data Colin Rundel 04/03/2017 1 Spatial GLM Models 2 Scottish Lip Cancer Data Observed Expected 60 N 59 N 58 N 57 N 56 N value 80 60 40 20 0 55 N 8 W 6 W
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationIntroduction and Background to Multilevel Analysis
Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More informationChapter 24: Comparing means
Chapter 4: Comparing means Example: Consumer Reports annually conducts a survey of automobile reliability Approximately 4 million households are surveyed by mail, The 990 survey is summarized in the Figure
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationMath 2311 Written Homework 6 (Sections )
Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationElementary Statistics Lecture 3 Association: Contingency, Correlation and Regression
Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu Chong Ma (Statistics, USC) STAT 201
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationStatistics 572 Semester Review
Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.
More informationBayesian Inference for Regression Parameters
Bayesian Inference for Regression Parameters 1 Bayesian inference for simple linear regression parameters follows the usual pattern for all Bayesian analyses: 1. Form a prior distribution over all unknown
More informationDescribing Center: Mean and Median Section 5.4
Describing Center: Mean and Median Section 5.4 Look at table 5.2 at the right. We are going to make the dotplot of the city gas mileages of midsize cars. How to describe the center of a distribution: x
More informationStatistics. Introduction to R for Public Health Researchers. Processing math: 100%
Statistics Introduction to R for Public Health Researchers Statistics Now we are going to cover how to perform a variety of basic statistical tests in R. Correlation T-tests/Rank-sum tests Linear Regression
More informationQUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013
QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 3 Introduction Objectives of course: Regression and Forecasting
More informationConsider fitting a model using ordinary least squares (OLS) regression:
Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful
More informationPackage lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer
Package lmm March 19, 2012 Version 0.4 Date 2012-3-19 Title Linear mixed models Author Joseph L. Schafer Maintainer Jing hua Zhao Depends R (>= 2.0.0) Description Some
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationDifferent formulas for the same model How to get different coefficients in R
Outline 1 Re-parametrizations Different formulas for the same model How to get different coefficients in R 2 Interactions Two-way interactions between a factor and another predictor Two-way interactions
More informationSTAT 420: Methods of Applied Statistics
STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department
More informationIII. Inferential Tools
III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More information