Multiple Regression: Mixed Predictor Types. Tim Frasier

Similar documents
Metric Predicted Variable With One Nominal Predictor Variable

Multiple Regression: Nominal Predictors. Tim Frasier

Hierarchical Modeling

Count Predicted Variable & Contingency Tables

Metric Predicted Variable on Two Groups

Metric Predicted Variable on One Group

Operators and the Formula Argument in lm

Bayesian Statistics: An Introduction

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

Introduction to R, Part I

Why Bayesian approaches? The average height of a rare plant

R Demonstration ANCOVA

36-463/663: Multilevel & Hierarchical Models HW09 Solution

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

Introduction and Single Predictor Regression. Correlation

STAT Lecture 11: Bayesian Regression

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Chapter 3 - Linear Regression

Chapter 5 Exercises 1

lm statistics Chris Parrish

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Package leiv. R topics documented: February 20, Version Type Package

Metropolis-Hastings Algorithm

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

Hierarchical Linear Models


36-463/663Multilevel and Hierarchical Models

Class 04 - Statistical Inference

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Introduction to Statistics and R

Bayesian Graphical Models

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018

General Linear Statistical Models

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Prediction problems 3: Validation and Model Checking

BUGS Bayesian inference Using Gibbs Sampling

Generalized Linear Models

Community Health Needs Assessment through Spatial Regression Modeling

Holiday Assignment PS 531

STAT 3022 Spring 2007


Package effectfusion

Contents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

22s:152 Applied Linear Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

1 Introduction 1. 2 The Multiple Regression Model 1

Package horseshoe. November 8, 2016

Contents. 1 Introduction: what is overdispersion? 2 Recognising (and testing for) overdispersion. 1 Introduction: what is overdispersion?

Swarthmore Honors Exam 2012: Statistics

The lmm Package. May 9, Description Some improved procedures for linear mixed models

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Package bayeslm. R topics documented: June 18, Type Package

Section Least Squares Regression

MATH 644: Regression Analysis Methods

Gov 2000: 9. Regression with Two Independent Variables

Bayesian Dynamic Modeling for Space-time Data in R

Stat 5102 Final Exam May 14, 2015

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Homework 6 Solutions

HW3 Solutions : Applied Bayesian and Computational Statistics

Solution to Series 11

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)

PIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS

ST430 Exam 2 Solutions

b. Write the rule for a function that has your line as its graph. a. What shadow location would you predict when the flag height is12 feet?

Inferences on Linear Combinations of Coefficients

Motor Trend Car Road Analysis

Chapter 5: Exploring Data: Distributions Lesson Plan

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

Introduction to the Analysis of Hierarchical and Longitudinal Data

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Lecture 19. Spatial GLM + Point Reference Spatial Data. Colin Rundel 04/03/2017

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction and Background to Multilevel Analysis

Lecture 2. The Simple Linear Regression Model: Matrix Approach

General Linear Statistical Models - Part III

Chapter 24: Comparing means

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Math 2311 Written Homework 6 (Sections )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

CSC 2541: Bayesian Methods for Machine Learning

Statistics 572 Semester Review

Bayesian Inference for Regression Parameters

Describing Center: Mean and Median Section 5.4

Statistics. Introduction to R for Public Health Researchers. Processing math: 100%

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

Consider fitting a model using ordinary least squares (OLS) regression:

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

1 The Classic Bivariate Least Squares Model

Different formulas for the same model How to get different coefficients in R

STAT 420: Methods of Applied Statistics

III. Inferential Tools

DAG models and Markov Chain Monte Carlo methods a short overview

Transcription:

Multiple Regression: Mixed Predictor Types Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

The Data

Data Fuel economy data from 1999 and 2008 for 38 popular models of car* I know, I know. It s neither biological nor that interesting, but it is hard to find good example data sets for this * As distributed with the ggplot2 package, and original data from the EPA (http://fueleconomy.gov)

Data

Data Predicted variable hwy

Data Two categorical predictors man class

Data Two metric predictors* displ cyl * I realize that cylinders is not really a metric variable, but we will treat it like one here for demonstration purposes

Data Read the data into R and parse out just the fields in which we are interested cardata <- read.table("mpg.csv", header = TRUE, sep = ",") carsub <- cardata[, c(2, 4, 6, 10, 12)]

Data Use summary function to get a feel for it summary(carsub) manufacturer displ cyl hwy class dodge :37 Min. :1.600 Min. :4.000 Min. :12.00 2seater : 5 toyota :34 1st Qu.:2.400 1st Qu.:4.000 1st Qu.:18.00 compact :47 volkswagen:27 Median :3.300 Median :6.000 Median :24.00 midsize :41 ford :25 Mean :3.472 Mean :5.889 Mean :23.44 minivan :11 chevrolet :19 3rd Qu.:4.600 3rd Qu.:8.000 3rd Qu.:27.00 pickup :33 audi :18 Max. :7.000 Max. :8.000 Max. :44.00 subcompact:35 (Other) :74 suv :62

Data Plot the data to get a feel for it But keep in mind these can be misleading!!! pairs(carsub, pch = 16, col = rgb(0, 0, 1, 0.5))

Data 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Data Positive relationship between engine displacement and the number of cylinders (makes sense) 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Data Negative relationship between engine displacement & highway mpg 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Data Negative relationship between number of cylinders & highway mpg 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Data Mostly positive relationship between engine displacement & vehicle class 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Data Some interesting patterns of relationships between class and highway mpg 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Data Some interesting patterns of relationships between manufacturer and highway mpg 2 3 4 5 6 7 15 25 35 45 manufacturer 2 6 10 14 2 4 6 displ cyl 4 5 6 7 8 15 25 35 45 hwy class 1 3 5 7 2 6 10 14 4 5 6 7 8 1 2 3 4 5 6 7

Frequentist Approach

Frequentist Approach Mixed predictors can be analyzed with the lm function cartest <- lm(hwy ~ manufacturer + displ + cyl + class, data = carsub)

Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) 36.65662 2.11270 17.351 < 2e-16 *** manufacturerchevrolet 1.65228 1.15766 1.427 0.154984 manufacturerdodge 0.68563 1.11661 0.614 0.539857 manufacturerford 0.64843 1.04851 0.618 0.536962 manufacturerhonda 4.11170 1.32135 3.112 0.002117 ** manufacturerhyundai -0.13343 1.06491-0.125 0.900410 manufacturerjeep 0.84860 1.30195 0.652 0.515242 manufacturerland rover 0.54583 1.59905 0.341 0.733181 manufacturerlincoln 1.61904 1.81376 0.893 0.373067 manufacturermercury 0.81057 1.58446 0.512 0.609484 manufacturernissan 0.78152 1.04274 0.749 0.454401 manufacturerpontiac 2.16964 1.47980 1.466 0.144092 manufacturersubaru 0.08387 1.08103 0.078 0.938236 manufacturertoyota 1.41222 0.83692 1.687 0.093003. manufacturervolkswagen 1.81232 0.82522 2.196 0.029169 * displ -0.52109 0.53766-0.969 0.333562 cyl -1.28737 0.35135-3.664 0.000314 *** classcompact -2.17130 1.66099-1.307 0.192557 classmidsize -2.12355 1.59408-1.332 0.184250 classminivan -5.72148 1.90221-3.008 0.002951 ** classpickup -9.25680 1.58293-5.848 1.88e-08 *** classsubcompact -2.17163 1.65154-1.315 0.189966 classsuv -8.16278 1.43190-5.701 3.99e-08 *** --- Residual standard error: 2.583 on 211 degrees of freedom Multiple R-squared: 0.8296, Adjusted R-squared: 0.8118 F-statistic: 46.68 on 22 and 211 DF, p-value: < 2.2e-16

Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) 36.65662 2.11270 17.351 < 2e-16 *** manufacturerchevrolet 1.65228 1.15766 1.427 0.154984 manufacturerdodge 0.68563 1.11661 0.614 0.539857 manufacturerford 0.64843 1.04851 0.618 0.536962 manufacturerhonda 4.11170 1.32135 3.112 0.002117 ** manufacturerhyundai -0.13343 1.06491-0.125 0.900410 manufacturerjeep 0.84860 1.30195 0.652 0.515242 manufacturerland rover 0.54583 1.59905 0.341 0.733181 manufacturerlincoln 1.61904 1.81376 0.893 0.373067 manufacturermercury 0.81057 1.58446 0.512 0.609484 manufacturernissan 0.78152 1.04274 0.749 0.454401 manufacturerpontiac 2.16964 1.47980 1.466 0.144092 manufacturersubaru 0.08387 1.08103 0.078 0.938236 manufacturertoyota 1.41222 0.83692 1.687 0.093003. manufacturervolkswagen 1.81232 0.82522 2.196 0.029169 * displ -0.52109 0.53766-0.969 0.333562 cyl -1.28737 0.35135-3.664 0.000314 *** classcompact -2.17130 1.66099-1.307 0.192557 classmidsize -2.12355 1.59408-1.332 0.184250 classminivan -5.72148 1.90221-3.008 0.002951 ** classpickup -9.25680 1.58293-5.848 1.88e-08 *** classsubcompact -2.17163 1.65154-1.315 0.189966 classsuv -8.16278 1.43190-5.701 3.99e-08 *** --- Residual standard error: 2.583 on 211 degrees of freedom Multiple R-squared: 0.8296, Adjusted R-squared: 0.8118 F-statistic: 46.68 on 22 and 211 DF, p-value: < 2.2e-16 Is the intercept plus the effect of being an audi. All other effects are differences from this reference

Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) 36.65662 2.11270 17.351 < 2e-16 *** manufacturerchevrolet 1.65228 1.15766 1.427 0.154984 manufacturerdodge 0.68563 1.11661 0.614 0.539857 manufacturerford 0.64843 1.04851 0.618 0.536962 manufacturerhonda 4.11170 1.32135 3.112 0.002117 ** manufacturerhyundai -0.13343 1.06491-0.125 0.900410 manufacturerjeep 0.84860 1.30195 0.652 0.515242 manufacturerland rover 0.54583 1.59905 0.341 0.733181 manufacturerlincoln 1.61904 1.81376 0.893 0.373067 manufacturermercury 0.81057 1.58446 0.512 0.609484 manufacturernissan 0.78152 1.04274 0.749 0.454401 manufacturerpontiac 2.16964 1.47980 1.466 0.144092 manufacturersubaru 0.08387 1.08103 0.078 0.938236 manufacturertoyota 1.41222 0.83692 1.687 0.093003. manufacturervolkswagen 1.81232 0.82522 2.196 0.029169 * displ -0.52109 0.53766-0.969 0.333562 cyl -1.28737 0.35135-3.664 0.000314 *** classcompact -2.17130 1.66099-1.307 0.192557 classmidsize -2.12355 1.59408-1.332 0.184250 classminivan -5.72148 1.90221-3.008 0.002951 ** classpickup -9.25680 1.58293-5.848 1.88e-08 *** classsubcompact -2.17163 1.65154-1.315 0.189966 classsuv -8.16278 1.43190-5.701 3.99e-08 *** --- Residual standard error: 2.583 on 211 degrees of freedom Multiple R-squared: 0.8296, Adjusted R-squared: 0.8118 F-statistic: 46.68 on 22 and 211 DF, p-value: < 2.2e-16 Manufacturer not too big an impact, but a little

Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) 36.65662 2.11270 17.351 < 2e-16 *** manufacturerchevrolet 1.65228 1.15766 1.427 0.154984 manufacturerdodge 0.68563 1.11661 0.614 0.539857 manufacturerford 0.64843 1.04851 0.618 0.536962 manufacturerhonda 4.11170 1.32135 3.112 0.002117 ** manufacturerhyundai -0.13343 1.06491-0.125 0.900410 manufacturerjeep 0.84860 1.30195 0.652 0.515242 manufacturerland rover 0.54583 1.59905 0.341 0.733181 manufacturerlincoln 1.61904 1.81376 0.893 0.373067 manufacturermercury 0.81057 1.58446 0.512 0.609484 manufacturernissan 0.78152 1.04274 0.749 0.454401 manufacturerpontiac 2.16964 1.47980 1.466 0.144092 manufacturersubaru 0.08387 1.08103 0.078 0.938236 manufacturertoyota 1.41222 0.83692 1.687 0.093003. manufacturervolkswagen 1.81232 0.82522 2.196 0.029169 * displ -0.52109 0.53766-0.969 0.333562 cyl -1.28737 0.35135-3.664 0.000314 *** classcompact -2.17130 1.66099-1.307 0.192557 classmidsize -2.12355 1.59408-1.332 0.184250 classminivan -5.72148 1.90221-3.008 0.002951 ** classpickup -9.25680 1.58293-5.848 1.88e-08 *** classsubcompact -2.17163 1.65154-1.315 0.189966 classsuv -8.16278 1.43190-5.701 3.99e-08 *** --- Residual standard error: 2.583 on 211 degrees of freedom Multiple R-squared: 0.8296, Adjusted R-squared: 0.8118 F-statistic: 46.68 on 22 and 211 DF, p-value: < 2.2e-16 Engine displacement has a negative, but not significant, effect

Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) 36.65662 2.11270 17.351 < 2e-16 *** manufacturerchevrolet 1.65228 1.15766 1.427 0.154984 manufacturerdodge 0.68563 1.11661 0.614 0.539857 manufacturerford 0.64843 1.04851 0.618 0.536962 manufacturerhonda 4.11170 1.32135 3.112 0.002117 ** manufacturerhyundai -0.13343 1.06491-0.125 0.900410 manufacturerjeep 0.84860 1.30195 0.652 0.515242 manufacturerland rover 0.54583 1.59905 0.341 0.733181 manufacturerlincoln 1.61904 1.81376 0.893 0.373067 manufacturermercury 0.81057 1.58446 0.512 0.609484 manufacturernissan 0.78152 1.04274 0.749 0.454401 manufacturerpontiac 2.16964 1.47980 1.466 0.144092 manufacturersubaru 0.08387 1.08103 0.078 0.938236 manufacturertoyota 1.41222 0.83692 1.687 0.093003. manufacturervolkswagen 1.81232 0.82522 2.196 0.029169 * displ -0.52109 0.53766-0.969 0.333562 cyl -1.28737 0.35135-3.664 0.000314 *** classcompact -2.17130 1.66099-1.307 0.192557 classmidsize -2.12355 1.59408-1.332 0.184250 classminivan -5.72148 1.90221-3.008 0.002951 ** classpickup -9.25680 1.58293-5.848 1.88e-08 *** classsubcompact -2.17163 1.65154-1.315 0.189966 classsuv -8.16278 1.43190-5.701 3.99e-08 *** --- Residual standard error: 2.583 on 211 degrees of freedom Multiple R-squared: 0.8296, Adjusted R-squared: 0.8118 F-statistic: 46.68 on 22 and 211 DF, p-value: < 2.2e-16 Cylinder number has a significant negative effect

Frequentist Approach summary(cartest) Estimate Std. Error t value Pr(> t ) (Intercept) 36.65662 2.11270 17.351 < 2e-16 *** manufacturerchevrolet 1.65228 1.15766 1.427 0.154984 manufacturerdodge 0.68563 1.11661 0.614 0.539857 manufacturerford 0.64843 1.04851 0.618 0.536962 manufacturerhonda 4.11170 1.32135 3.112 0.002117 ** manufacturerhyundai -0.13343 1.06491-0.125 0.900410 manufacturerjeep 0.84860 1.30195 0.652 0.515242 manufacturerland rover 0.54583 1.59905 0.341 0.733181 manufacturerlincoln 1.61904 1.81376 0.893 0.373067 manufacturermercury 0.81057 1.58446 0.512 0.609484 manufacturernissan 0.78152 1.04274 0.749 0.454401 manufacturerpontiac 2.16964 1.47980 1.466 0.144092 manufacturersubaru 0.08387 1.08103 0.078 0.938236 manufacturertoyota 1.41222 0.83692 1.687 0.093003. manufacturervolkswagen 1.81232 0.82522 2.196 0.029169 * displ -0.52109 0.53766-0.969 0.333562 cyl -1.28737 0.35135-3.664 0.000314 *** classcompact -2.17130 1.66099-1.307 0.192557 classmidsize -2.12355 1.59408-1.332 0.184250 classminivan -5.72148 1.90221-3.008 0.002951 ** classpickup -9.25680 1.58293-5.848 1.88e-08 *** classsubcompact -2.17163 1.65154-1.315 0.189966 classsuv -8.16278 1.43190-5.701 3.99e-08 *** --- Residual standard error: 2.583 on 211 degrees of freedom Multiple R-squared: 0.8296, Adjusted R-squared: 0.8118 F-statistic: 46.68 on 22 and 211 DF, p-value: < 2.2e-16 Class category seems important

Bayesian Approach

Load Libraries & Functions library(runjags) library(coda) source("plotpost.r")

Organize the Data #--- The y data ---# y = carsub$hwy N = length(y) ymean = mean(y) ysd = sd(y) zy = (y - ymean) / ysd

Organize the Data #-- The metric x data ---# # displ displ <- carsub$displ displmean <- mean(displ) displsd <- sd(displ) zdispl <- (displ - displmean) / displsd # cyl cyl <- carsub$cyl cylmean <- mean(cyl) cylsd <- sd(cyl) zcyl <- (cyl - cylmean) / cylsd

Organize the Data #--- The nominal x data ---# man <- as.numeric(carsub$manufacturer) class <- as.numeric(carsub$class) manlevels <- levels(carsub$manufacturer) classlevels <- levels(carsub$class) nmans <- length(unique(man)) nclass <- length(unique(class))

Organize the Data datalist = list( y = zy, N = N, displ = zdispl, displmean = displmean, cyl = zcyl, cylmean = cylmean, man = man, class = class, nmans = nmans, nclass = nclass )

Organize the Data datalist = list( y = zy, N = N, displ = zdispl, displmean = displmean, cyl = zcyl, cylmean = cylmean, man = man, class = class, nmans = nmans, nclass = nclass ) Note that we need the means of the metric predictor variables here (we haven t in the past)

Define the Model µ τ = 1/σ 2 - norm yi

Define the Model Effect of being in each manufacturer category on mpg µ τ = 1/σ 2 - norm yi

Define the Model Effect of engine displacement on mpg µ τ = 1/σ 2 - norm yi

Define the Model Effect of being in each class category on mpg µ τ = 1/σ 2 - norm yi

Define the Model Effect of # of cylinders on mpg µ τ = 1/σ 2 - norm yi

Define the Model Note multiple personalities of β 0 now Metric predictors: y value when all predictors are zero Nominal predictors: Mean y value across all categories of all variables µ τ = 1/σ 2 - norm yi

Define the Model Note multiple personalities of β 0 now Metric predictors: y value when all predictors are zero Nominal predictors: Mean y value across all categories of all variables What should it be now? µ τ = 1/σ 2 - norm yi

Define the Model Note multiple personalities of β 0 now Metric predictors: y value when all predictors are zero Nominal predictors: Mean y value across all categories of all variables Makes sense to set it as the mean predicted value if the metric predictors are re-centred at their mean µ τ = 1/σ 2 - norm yi

Define the Model All α because they will need to be standardized µ τ = 1/σ 2 - norm yi

Define the Model All α because they will need to be standardized Now metric effects are centred around the mean µ τ = 1/σ 2 - norm yi

Define the Model 0 10 µ τ = 1/σ 2 - norm µ τ = 1/σ 2 - norm yi

Define the Model 0 10 0 10 - µ τ = 1/σ 2 norm µ τ = 1/σ 2 - norm µ τ = 1/σ 2 - norm yi

Define the Model 0 10 0 10 - µ τ = 1/σ 2 norm µ τ = 1/σ 2 - norm We ll also make each nominal variable hierarchical... µ τ = 1/σ 2 - norm yi

Define the Model 0 10 0 10 - µ τ = 1/σ 2 norm µ τ = 1/σ 2 - norm 1.1 0.11 α gamma β µ τ = 1/σ 2 - norm yi

modelstring = " model { for (i in 1:N) { } #--- Likelihood ---# y[i] ~ dnorm(mu[i], tau) mu[i] <- a0 + a1[man[i]] + (a2 * (displ[i] - displmean)) + a3[class[i]] + (a4 * (cyl[i] - cylmean)) #--- Priors ---# sigma ~ dgamma(1.1, 0.11) tau <- 1 / sigma^2 a0 ~ dnorm(0, 1/10^2) a2 ~ dnorm(0, 1/10^2) a4 ~ dnorm(0, 1/10^2) # a1 for (j in 1:nMans) { a1[j] ~ dnorm(manmeans, 1/manSD^2) } # a3 for (j in 1:nClass) { a3[j] ~ dnorm(classmeans, 1/classSD^2) }

#--- Hyperpriors ---# manmeans ~ dnorm(0, 1/10^2) mansd ~ dgamma(1.1, 0.11) classmeans ~ dnorm(0, 1/10^2) classsd ~ dgamma(1.1, 0.11)

#--------------------------------------------------------------# # Convert a0,a[] to sum-to-zero b0,b[] : # #--------------------------------------------------------------# m1 <- mean(a1[1:nmans]) # Mean across a1 categories m3 <- mean(a3[1:nclass]) # Mean across a3 categories #- b0 is a0 + mean of each nominal predictor, minus mean effect -# #- of metric predictors. See Kruschke (2015) p. 570 for algebra -# b0 <- a0 + m1 + m3 - (a2 * displmean) - (a4 * cylmean) #- b1 is the the uncorrected a1 minus mean across categories for that nominal variable -# for (j in 1:nMans) { b1[j] <- a1[j] - m1 } #- b3 is the uncorrected a3 minus mean across categories for that nominal variable -# for (j in 1:nClass) { b3[j] <- a3[j] - m3 } #- Coefficients for metric variables stay the same -# b2 <- a2 b4 <- a4 } " # close quote for modelstring writelines(modelstring,con="model.txt")

Specify Initial Values initslist <- function() { list( sigma = rgamma(n = 1, shape = 1.1, rate = 0.11), a0 = rnorm(n = 1, mean = 0, sd = 10), b2 = rnorm(n = 1, mean = 0, sd = 10), b4 = rnorm(n = 1, mean = 0, sd = 10), manmeans = rnorm(n = 1, mean = 0, sd = 10), mansd = rgamma(n = 1, shape = 1.1, rate = 0.11), classmeans = rnorm(n = 1, mean = 0, sd = 10), classsd = rgamma(n = 1, shape = 1.1, rate = 0.11) ) }

Specify MCMC Parameters and Run runjagsout <- run.jags( method = "simple", model = "model.txt", monitor = c("b0", "b1", "b2", "b3", "b4", "sigma"), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE)

Evaluate Performance of the Model

Testing Model Performance Retrieve the data and take a peak at the structure codasamples = as.mcmc.list(runjagsout) head(codasamples[[1]]) Markov Chain Monte Carlo (MCMC) output: Start = 1501 End = 1507 Thinning interval = 1 b0 b1[1] b1[2] b1[3] b1[4] b1[5] b1[6] 1501 0.1615540-0.1428200 0.1109130 0.0269200-0.1742880 0.1978230-0.15071500 1502 0.1552670-0.0826378 0.1884370-0.0403636-0.0872312 0.1875360-0.06216070 1503 0.1001840-0.0175505 0.1996980-0.0569787-0.0932590 0.0996678-0.00883381 1504 0.0676603-0.1242170 0.0388379-0.0892582-0.0544722 0.1295170-0.05160610 1505 0.0574967 0.0596947 0.2072000 0.0547035-0.0452218 0.1282700-0.07768330 1506 0.1424630-0.0746281 0.0516289-0.0279351-0.0471806-0.0418495-0.03025680 1507 0.1330610-0.0468138-0.0630458 0.1247140-0.0365874 0.1287520-0.09369140 b1[7] b1[8] b1[9] b1[10] b1[11] b1[12] b1[13] 1501-0.1281650 0.08830060 0.11898300 0.1815890-0.0421282-0.07847370-0.10213300 1502-0.0905903 0.08597800-0.01362000 0.1636370-0.1507060 0.04585780-0.15745300 1503-0.0052090-0.00374356-0.19175100 0.1362450-0.0953489-0.09058930-0.04872230 1504 0.1106150 0.09045220 0.00229036-0.0308884 0.0522702 0.00123141-0.26743000 1505-0.1661700-0.14270000-0.12711400 0.1074820 0.0087711-0.00288126-0.05062190 1506 0.1196960-0.01552510-0.03205860-0.0868129 0.0830015 0.13673200-0.10771300 1507-0.0157871-0.01230790-0.11858900 0.0305756-0.0964591-0.01032590 0.00492213...

Testing Model Performance Can do this on your own

Extract & Parse Results mcmcchain = as.matrix(codasamples) # b0 zb0 = mcmcchain[, "b0"] # b1 chainlength = length(zb0) zb1 = matrix(0, ncol = chainlength, nrow = nmans) for (i in 1:nMans) { zb1[i, ] = mcmcchain[, paste("b1[", i, "]", sep = "")] } # b2 zb2 = mcmcchain[, "b2"] # b3 zb3 = matrix(0, ncol = chainlength, nrow = nclass) for (i in 1:nClass) { zb3[i, ] = mcmcchain[, paste("b3[", i, "]", sep = "")] } # b4 zb4 = mcmcchain[, "b4"] # sigma zsigma <- mcmcchain[, "sigma"]

Convert to Original Scale b0 <- (zb0 * ysd) + ymean b2 <- (zb2 * ysd) / displsd b4 <- (zb4 * ysd) / cylsd b1 <- zb1 * ysd b3 <- zb3 * ysd sigma <- zsigma * ysd

View Posteriors

Plotting Posterior Distributions β 0 par(mfrow = c(1, 1)) histinfo = plotpost(b0, xlab = "b0", main = "b0") b0 mean = 24.099 95% HDI 23.569 24.661 23.0 23.5 24.0 24.5 25.0 b0

Plotting Posterior Distributions β 1 par(mfrow = c(3, 3)) for (i in 1:nMans) { histinfo = plotpost(b1[i, ], xlab = bquote(b1[.(i)]), main = paste("b1:", manlevels[i])) }

Plotting Posterior Distributions β 1 b1: audi mean = 0.54877 95% HDI 1.668 0.47912 b1: chevrolet mean = 0.408 95% HDI 0.57524 1.5384 b1: dodge mean = 0.22815 95% HDI 1.2746 0.75078 3 2 1 0 1 2 b1 1 2 1 0 1 2 3 b1 2 2 1 0 1 2 b1 3 b1: ford mean = 0.28718 95% HDI 1.2573 0.57371 b1: honda mean = 1.1616 95% HDI 0.21697 2.8029 b1: hyundai mean = 0.72662 95% HDI 1.9256 0.30148 3 2 1 0 1 2 b1 4 1 0 1 2 3 4 5 b1 5 4 3 2 1 0 1 b1 6 b1: jeep mean = 0.065258 95% HDI 1.2923 1.0588 b1: land rover mean = 0.11593 95% HDI 1.5114 1.2461 b1: lincoln mean = 0.14718 95% HDI 1.2934 1.5824 3 2 1 0 1 2 3 b1 7 4 2 0 2 b1 8 2 0 2 4 b1 9

Plotting Posterior Distributions β 1 b1: mercury mean = 0.0622 95% HDI 1.4568 1.2618 b1: nissan mean = 0.14101 95% HDI 1.1743 0.87598 b1: pontiac mean = 0.3685 95% HDI 0.88247 1.8248 4 2 0 2 4 b1 10 2 1 0 1 2 b1 11 2 0 2 4 b1 12 b1: subaru mean = 0.6035 95% HDI 1.7973 0.42863 b1: toyota mean = 0.23372 95% HDI 0.53345 1.0598 b1: volkswagen mean = 0.45959 95% HDI 0.37997 1.4468 3 2 1 0 1 b1 13 1 0 1 2 b1 14 1 0 1 2 b1 15

Plotting Posterior Distributions β 2 par(mfrow = c(1, 1)) histinfo = plotpost(b2, xlab = "b2", main = "Engine Displacement") Engine Displacement mean = 0.51353 95% HDI 1.3686 0.36219 2.0 1.5 1.0 0.5 0.0 0.5 1.0 b2

Plotting Posterior Distributions β 3 par(mfrow = c(2, 2)) for (i in 1:nClass) { histinfo = plotpost(b3[i, ], xlab = bquote(b3[.(i)]), main = paste("b3:", classlevels[i])) }

Plotting Posterior Distributions β 3 b3: 2seater mean = 4.1832 b3: compact mean = 1.863 95% HDI 1.8181 6.5796 95% HDI 0.83817 2.9301 0 2 4 6 8 b3 1 0 1 2 3 4 b3 2 b3: midsize mean = 2.0809 b3: minivan mean = 1.5848 95% HDI 1.1136 3.0201 95% HDI 3.1698 0.012184 0 1 2 3 4 b3 3 4 2 0 2 b3 4

Plotting Posterior Distributions β 3 b3: pickup mean = 4.9694 b3: subcompact mean = 2.3311 95% HDI 6.0004 3.9431 95% HDI 1.2279 3.3752 7 6 5 4 3 b3 5 0 1 2 3 4 b3 6 b3: suv mean = 3.904 95% HDI 4.7215 3.069 6 5 4 3 b3 7

Plotting Posterior Distributions β 4 par(mfrow = c(1, 1)) histinfo = plotpost(b4, xlab = "b4", main = "# of Cylinders") # of Cylinders mean = 1.3652 95% HDI 1.9561 0.78196 2.5 2.0 1.5 1.0 0.5 b4

Posterior Predictive Check

Posterior Predictive Check Select a subset of the data on which to make predictions (let s pick 20) npred = 20 newrows <- round(seq(from = 1, to = NROW(carSub), length = npred)) newdata <- carsub[newrows, ]

Posterior Predictive Check Separate out just the x data, on which we will make predictions x1 <- as.numeric(newdata$manufacturer) x2 <- newdata$displ x3 <- as.numeric(newdata$class) x4 <- newdata$cyl

Posterior Predictive Check Next, define a matrix that will hold all of the predicted y values Number of rows is the number of x values for prediction Number of columns is the number of y values generated from the MCMC process We ll start with the matrix filled with zeros, but will fill it in later postsampsize = length(b0) ynew = matrix(0, nrow = npred, ncol = postsampsize)

Posterior Predictive Check Define a matrix for holding the HDI limits of the predicted y values Same number of rows as above Only two columns (one for each end of the HDI) yhdilim = matrix(0, nrow = npred, ncol = 2)

Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain Note that our coefficients for the metric predictors are centred around the mean, so we have to treat them this way here for (i in 1:nPred) { for (j in 1:postSampSize) { ynew[i, j] <- rnorm(1, mean = b0[j] + b1[x1[i], j] + (b2[j] * (x2[i] - displmean)) + b3[x3[i], j] + (b4[j] * (x4[i] - cylmean)), sd = sigma[j]) } }

Posterior Predictive Check Calculate means for each prediction, and the associated low and high 95% HDI estimates means <- rowmeans(ynew) source("hdiofmcmc.r") for (i in 1:nPred) { yhdilim[i, ] <- HDIofMCMC(yNew[i, ]) }

Posterior Predictive Check Combine into one data frame predtable <- cbind(means, yhdilim)

Posterior Predictive Check Plot predicted values dotchart(means, labels = 1:nPred, xlim = c(min(yhdilim), max(yhdilim)), xlab = hwy mpg", pch = 16) segments(yhdilim[, 1], 1:nPred, yhdilim[, 2], 1:nPred, lwd = 2) Add the truth points(x = newdata$hwy, y = 1:nPred, pch = 16, col = rgb(1, 0, 0, 0.5))

Posterior Predictive Check 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 10 15 20 25 30 35 hwy mpg

Homework (last one!)

Homework Get the DIC for the full model Re-configure and run the model 4 more times, leaving a different predictor variable out each time, and get the DIC for each Compare the DIC values to decide which predictors are most important for your model Should explain your results and interpretation, but can do so as commented lines in your code (i.e., enclosed in # so that your code will still run, but also so that you have written explanations in there for me to read)

Creative Commons License Anyone is allowed to distribute, remix, tweak, and build upon this work, even commercially, as long as they credit me for the original creation. See the Creative Commons website for more information. Click here to go back to beginning