Metric Predicted Variable With One Nominal Predictor Variable

Similar documents
Hierarchical Modeling

Metric Predicted Variable on Two Groups

Metric Predicted Variable on One Group

Multiple Regression: Nominal Predictors. Tim Frasier

Multiple Regression: Mixed Predictor Types. Tim Frasier

Count Predicted Variable & Contingency Tables

Introduction to R, Part I

Bayesian Statistics: An Introduction

R Demonstration ANCOVA

36-463/663Multilevel and Hierarchical Models

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

Why Bayesian approaches? The average height of a rare plant

36-463/663: Multilevel & Hierarchical Models HW09 Solution

BUGS Bayesian inference Using Gibbs Sampling

Univariate Descriptive Statistics for One Sample

STAT Lecture 11: Bayesian Regression

Leslie matrices and Markov chains.

Markov Chain Monte Carlo

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

Package leiv. R topics documented: February 20, Version Type Package

Class 04 - Statistical Inference

Bayesian Networks in Educational Assessment

Prediction problems 3: Validation and Model Checking

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

A Parameter Expansion Approach to Bayesian SEM Estimation

Lecture 9: Predictive Inference for the Simple Linear Model

Statistical Simulation An Introduction

Metropolis-Hastings Algorithm

Trevor Davies, Dalhousie University, Halifax, Canada Steven J.D. Martell, International Pacific Halibut Commission, Seattle WA.

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

SAS Syntax and Output for Data Manipulation:

Bayesian Regression Linear and Logistic Regression

Homework 6 Solutions

Chapter 5 Exercises 1

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

Simple, Marginal, and Interaction Effects in General Linear Models

The evdbayes Package

Lecture 9: Predictive Inference

The lmm Package. May 9, Description Some improved procedures for linear mixed models

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Using R in 200D Luke Sonnet

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang


DAG models and Markov Chain Monte Carlo methods a short overview

Correlation. January 11, 2018

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Diagnostics and Transformations Part 2

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Bayesian Graphical Models

ST 740: Markov Chain Monte Carlo

Markov Chain Monte Carlo methods

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11)

Introduction to the Analysis of Hierarchical and Longitudinal Data

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

Bayesian inference for a population growth model of the chytrid fungus Philipp H Boersch-Supan, Sadie J Ryan, and Leah R Johnson September 2016

Gov 2000: 7. What is Regression?

Finite Mixture Model Diagnostics Using Resampling Methods

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)

Package bpp. December 13, 2016

Bayesian Estimation An Informal Introduction

Bayesian Inference for Regression Parameters

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Generalized Linear Models

Business Statistics. Lecture 9: Simple Regression

Inference Tutorial 2

Robust Bayesian Regression

Introduction to Statistics and R

Statistics Lab One-way Within-Subject ANOVA

WinLTA USER S GUIDE for Data Augmentation

Markov Chain Monte Carlo

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

Bayesian Regression for a Dirichlet Distributed Response using Stan

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Kernel density estimation in R

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Positioning Sytems: Trilateration and Correlation

Statistical Computing Session 4: Random Simulation

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Advanced Statistical Modelling

Package bayeslm. R topics documented: June 18, Type Package

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Stat 5101 Lecture Notes

Community Health Needs Assessment through Spatial Regression Modeling

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Information. Hierarchical Models - Statistical Methods. References. Outline

Gov 2000: 9. Regression with Two Independent Variables

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

Using SPSS for One Way Analysis of Variance

ASA Section on Survey Research Methods

1 Introduction 1. 2 The Multiple Regression Model 1

Addition to PGLR Chap 6

Generalized Linear Models

Transcription:

Metric Predicted Variable With One Nominal Predictor Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

Goals & General Idea

Goals When would we use this type of analysis? When we want to know the effect of being in a group on some metric predictor variable Very common type of data set Monetary income (metric) and political party (nominal) Drug effect across groups etc. Frequently analyzed with a single factor (or one-way) ANOVA

General Idea Trying to quantify the relationship between two different sets of data One (y) is the metric response (or predicted) variable The other (x) is the nominal predictor variable that represents the categories in which measurements, samples, individuals can belong

Equation or Kruschke (2015) p. 555

Equation or Mean value of y, across all groupings Kruschke (2015) p. 555

Equation or Degree to which values are deflected above or below mean value, based on being in group j Kruschke (2015) p. 555

Equation or βj = 0, by definition Will add this constraint to our model Kruschke (2015) p. 555

The Data

Data Suppose you re studying a horse population that had a large die-off in one year (from which you ve obtained samples) You re interested in the effect of inbreeding on survival in this population May expect the following patterns

Data Age Class Adults Juveniles Foals Prediction & Logic Lowest degree of inbreeding (have survived previous selection events; inbred individuals have already been weeded out ) Moderate degree of inbreeding (have survived some, but not too many, previous selection events; some inbred individuals have been weeded out ) Highest degree of inbreeding (have not had to survive any previous selection events; no inbred individuals have been weeded out yet, except in this event)

Data Metric predicted variable with one nominal predictor Inbreeding Coefficient Age Class 0.12 adult 0.23 juvenile 0.06 adult 0.22 foal 0.34 foal 0.17 juvenile

Getting A Feel for the Data Before we can create an appropriate model, we need to get a feel for the data I think two main plots would be useful here A box plot grouped by age class Points grouped by age class

Getting A Feel for the Data Put simhorsedata.csv file in R s working directory Load it into R horsedata <- read.table( simhorsedata.csv, header = TRUE, sep =, )

Getting A Feel for the Data Make a box plot Make the aclass field a factor aclass <- factor(horsedata$aclass, levels = c( adult, juvenile, foal ), ordered = TRUE)

Getting A Feel for the Data Make a box plot Make the aclass field a factor aclass <- factor(horsedata$aclass, levels = c( adult, juvenile, foal ), ordered = TRUE) Note that we re specifying the order here, so that adult will be 1, juvenile will be 2, and foal will be 3. Otherwise, these would be ordered alphabetically (i.e., adult, foal, juvenile)

Getting A Feel for the Data Make the boxplot boxplot(horsedata$ic ~ aclass, ylab = Inbreeding Coefficient, xlab = Age Class, col = skyblue ) Inbreeding Coefficient 0.0 0.2 0.4 0.6 0.8 1.0 adult juvenile foal Age Class

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal ))

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) Have to use numeric form, otherwise it defaults to a box plot.

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) Tell R not to plot the x-axis labels (we ll add our own later). These would be numbers.

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) Make points filled black circles that are opaque, so that you can see where there is overlap.

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) Function for adding a custom axis.

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) Position of the axis. 1 = bottom, 2 = left, 3 = top, 4 = right.

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 1, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) At what positions to add our custom labels (remember that as numeric, our categories are 1, 2, and 3)

Getting A Feel for the Data Can also plot the points to see raw data (requires a few tricks) plot(as.numeric(aclass), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 0, 0.25)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal )) What the labels should be at our indicated positions.

Getting A Feel for the Data Inbreeding Coefficient 0.0 0.2 0.4 0.6 0.8 1.0 adult juvenile foal Age Class

Getting A Feel for the Data This is a little too clumped to be useful Can jitter the x-values of the points to make it clearer This adds random noise to the data in the specified axis plot(jitter(as.numeric(aclass)), horsedata$ic, xaxt = n, ylab = Inbreeding Coefficient, xlab = Age Class, pch = 16, col = rgb(0, 0, 1, 0.5)) axis(1, at = c(1, 2, 3), labels = c( adult, juvenile, foal ))

Getting A Feel for the Data Inbreeding Coefficient 0.0 0.2 0.4 0.6 0.8 1.0 adult juvenile foal Age Class

Frequentist Approach

Frequentist Approach Again, these type of data are typically analyzed with an ANOVA, which can be called with the aov function anovatest <- aov(horsedata$ic ~ horsedata$aclass)

Frequentist Approach Can get the coefficient estimates by looking at the model tables print(model.tables(anovatest)) Tables of effects horsedata$aclass adult foal juvenile -0.215 0.2382-0.0167 rep 81.000 76.0000 41.0000

Frequentist Approach Can get the coefficient estimates by looking at the model tables print(model.tables(anovatest)) Tables of effects horsedata$aclass adult foal juvenile -0.215 0.2382-0.0167 rep 81.000 76.0000 41.0000 Adults have lower inbreeding coefficients than the population average, those of juveniles are slightly lower than average, and those for foals are substantially above average.

Frequentist Approach Can get the coefficient estimates by looking at the model tables print(model.tables(anovatest)) Tables of effects horsedata$aclass adult foal juvenile -0.215 0.2382-0.0167 rep 81.000 76.0000 41.0000 Number of subjects in each category.

Frequentist Approach Can get the coefficient estimates by looking at the model tables print(model.tables(anovatest)) Tables of effects horsedata$aclass adult foal juvenile -0.215 0.2382-0.0167 rep 81.000 76.0000 41.0000 Note lack of confidence intervals in coefficient estimates.

Frequentist Approach Can see if this effect is significant summary(anovatest) Df Sum Sq Mean Sq F value Pr(>F) horsedata$aclass 2 8.068 4.034 168.4 <2e-16 *** Residuals 195 4.670 0.024 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Yep.

Bayesian Approach

Load Libraries & Functions library(runjags) library(coda) source( plotpost.r )

Prepare the Data Standardize metric (y) data y <- horsedata$ic ymean <- mean(y) ysd <- sd(y) zy <- (y - ymean) / ysd N <- length(y)

Prepare the Data Organize the nominal (x) data x <- as.numeric(horsedata$aclass) xnames <- levels(as.factor(horsedata$aclass)) nageclass <- length(unique(horsedata$aclass))

Prepare the Data Organize the nominal (x) data x <- as.numeric(horsedata$aclass) xnames <- levels(as.factor(horsedata$aclass)) nageclass <- length(unique(horsedata$aclass)) Save the nominal data as x in numeric form (1, 2, and 3 instead of adult, foal, and juvenile )

Prepare the Data Organize the nominal (x) data x <- as.numeric(horsedata$aclass) xnames <- levels(as.factor(horsedata$aclass)) nageclass <- length(unique(horsedata$aclass)) Make a vector ( xnames ) with the names of each nominal category. xnames is now adult, foal, juvenile.

Prepare the Data Organize the nominal (x) data x <- as.numeric(horsedata$aclass) xnames <- levels(as.factor(horsedata$aclass)) nageclass <- length(unique(horsedata$aclass)) Get the number of categories in the data set (unique values in our nominal data set)

Define the Model

Define the Model µ τ = 1/σ 2 - norm yi

Define the Model 1.1 0.11 α gamma β µ τ = 1/σ 2 - norm yi

Define the Model 0 10 µ τ = 1/σ 2 - norm 1.1 0.11 α gamma β µ τ = 1/σ 2 - norm yi

Define the Model 0 10 0 10 µ τ = 1/σ 2 µ τ = 1/σ 2 - norm - norm 1.1 0.11 α gamma β µ τ = 1/σ 2 - norm yi

Define the Model 0 10 0 10 - µ τ = 1/σ 2 µ τ = 1/σ 2 norm norm - Same as before, data just coded differently!!! 1.1 (Will have some differences in actual model) α β gamma 0.11 - µ τ = 1/σ 2 norm yi

Define the Model modelstring = " model { #--- Likelihood ---# for (i in 1:N) { y[i] ~ dnorm(mu[i], tau) mu[i] <- a0 + a1[x[i]] } #--- Priors ---# a0 ~ dnorm(0, 1/10^2) for (j in 1:nAgeClass) { a1[j] ~ dnorm(0, 1/10^2) } sigma ~ dgamma(1.1, 0.11) tau <- 1 / sigma^2... (there s more)

Define the Model modelstring = " model { How to indicate that value of x[i] is categorical, rather than a number to be taken at face value #--- Likelihood ---# for (i in 1:N) { y[i] ~ dnorm(mu[i], tau) mu[i] <- a0 + a1[x[i]] } #--- Priors ---# a0 ~ dnorm(0, 1/10^2) for (j in 1:nAgeClass) { a1[j] ~ dnorm(0, 1/10^2) } sigma ~ dgamma(1.1, 0.11) tau <- 1 / sigma^2... (there s more)

Define the Model modelstring = " model { Using a instead of standard b (for beta) to indicate that these coefficients are not yet standardized to sum to zero #--- Likelihood ---# for (i in 1:N) { y[i] ~ dnorm(mu[i], tau) mu[i] <- a0 + a1[x[i]] } #--- Priors ---# a0 ~ dnorm(0, 1/10^2) for (j in 1:nAgeClass) { a1[j] ~ dnorm(0, 1/10^2) } sigma ~ dgamma(1.1, 0.11) tau <- 1 / sigma^2... (there s more)

Define the Model... #-------------------------------------------------------------# # Convert a0 and a1[] to sum-to-zero b0 and b1[] # #-------------------------------------------------------------# #--- Create matrix with values for each age class---# for (j in 1:nAgeClass) { m[j] <- a0 + a1[j] } #--- Make b0 the mean across all age classes ---# b0 <- mean(m[1:nageclass] #--- Make b1[j] the difference between that category and b0 ---# for (j in 1:nAgeClass) { b1[j] <- m[j] - b0 } } writelines(modelstring, con = model.txt )

Prepare Data for JAGS Specify as a list for JAGS datalist = list ( y = zy, x = x, N = N, nageclass = nageclass )

Specify Initial Values initslist <- function() { list( sigma = rgamma(n = 1, shape = 1.1, rate = 0.11), a0 = rnorm(n = 1, mean = 0, sd = 10), a1 = rnorm(n = nageclass, mean = 0, sd = 10) ) }

Specify Initial Values initslist <- function() { list( sigma = rgamma(n = 1, shape = 1.1, rate = 0.11), a0 = rnorm(n = 1, mean = 0, sd = 10), a1 = rnorm(n = nageclass, mean = 0, sd = 10) ) } Note that we need one for each category!

Specify MCMC Parameters and Run library(runjags) runjagsout <- run.jags( method = simple, model = model.txt, monitor = c( b0, b1, sigma ), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE)

Specify MCMC Parameters and Run library(runjags) runjagsout <- run.jags( method = simple, model = model.txt, monitor = c( b0, b1, sigma ), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE) Will keep track of all b1 values (one for each category)

Evaluate Performance of the Model

Testing Model Performance Retrieve the data and take a peak at the structure codasamples = as.mcmc.list(runjagsout) head(codasamples[[1]]) Markov Chain Monte Carlo (MCMC) output: Start = 1501 End = 1507 Thinning interval = 1 b0 b1[1] b1[2] b1[3] sigma 1501 0.01023750-0.814120 0.844818-0.0306979 0.601008 1502 0.01937490-0.851512 0.954419-0.1029070 0.609693 1503-0.00405330-0.881957 0.892746-0.0107887 0.608156 1504-0.02903510-0.814878 0.857112-0.0422340 0.595978 1505-0.00710644-0.874748 0.919850-0.0451014 0.608177 1506 0.00983978-0.877028 0.937645-0.0606174 0.603113 1507-0.04808760-0.904060 1.066600-0.1625380 0.624435

Testing Model Performance Trace plots par(mfrow = c(2,3)) traceplot(codasamples)

Testing Model Performance Autocorrelation plots autocorr.plot(codasamples[[1]]) b0 b1[1] Autocorrelation 1.0 0.5 0 5 10 15 20 25 30 35 Autocorrelation 1.0 0.5 0 5 10 15 20 25 30 35 Lag Lag b1[2] b1[3] Autocorrelation 1.0 0.5 0 5 10 15 20 25 30 35 Autocorrelation 1.0 0.5 0 5 10 15 20 25 30 35 Lag Lag sigma Autocorrelation 1.0 0.5 0 5 10 15 20 25 30 35 Lag

Testing Model Performance Gelman & Rubin diagnostic gelman.diag(codasamples) Potential scale reduction factors: Point est. Upper C.I. b0 1 1 b1[1] 1 1 b1[2] 1 1 b1[3] 1 1 sigma 1 1 Multivariate psrf 1

Testing Model Performance Effective size effectivesize(codasamples) b0 b1[1] b1[2] b1[3] sigma 60184.04 59571.20 59350.44 60017.57 34886.89

Viewing Results

Parsing Data Convert codasamples to a matrix Will concatenate chains into one long one mcmcchain = as.matrix(codasamples)

Parsing Data Separate out data for each parameter #--- sigma---# zsigma = mcmcchain[, sigma ] #--- b0---# zb0 = mcmcchain[, b0 ] #--- b1---# chainlength = length(zb0) zb1 = matrix(0, ncol = chainlength, nrow = nageclass) for (j in 1:nAgeClass) { zb1[j, ] = mcmcchain[, paste("b1[", j, "]", sep = "")] }

Parsing Data Separate out data for each parameter Create a matrix to hold posteriors for each category in our Age Class variable: #--- sigma---# zsigma = mcmcchain[, sigma ] One column for each step in the chain, One row per category #--- b0---# zb0 = mcmcchain[, b0 ] #--- b1---# chainlength = length(zb0) zb1 = matrix(0, ncol = chainlength, nrow = nageclass) for (j in 1:nAgeClass) { zb1[j, ] = mcmcchain[, paste("b1[", j, "]", sep = "")] }

Parsing Data Separate out data for each parameter #--- sigma---# zsigma = mcmcchain[, sigma ] #--- b0---# zb0 = mcmcchain[, b0 ] Fill each row (category) with the posteriors from the appropriately-named column from the mcmcchain #--- b1---# chainlength = length(zb0) zb1 = matrix(0, ncol = chainlength, nrow = nageclass) for (j in 1:nAgeClass) { zb1[j, ] = mcmcchain[, paste("b1[", j, "]", sep = "")] }

Parsing Data Separate out data for each parameter #--- sigma---# zsigma = mcmcchain[, sigma ] #--- b0---# zb0 = mcmcchain[, b0 ] Will be: b1[1] b1[2] b1[3] #--- b1---# chainlength = length(zb0) zb1 = matrix(0, ncol = chainlength, nrow = nageclass) for (j in 1:nAgeClass) { zb1[j, ] = mcmcchain[, paste("b1[", j, "]", sep = "")] }

Convert Back to Original Scale #--- sigma---# sigma <- zsigma * ysd #--- b0---# b0 <- zb0 * ysd + ymean #--- b1---# b1 = matrix(0, ncol = chainlength, nrow = nageclass) for (j in 1:nAgeClass) { b1[j, ] = zb1[j, ] * ysd }

Plot Posterior Distributions Sigma par(mfrow = c(1, 1)) histinfo = plotpost(sigma, xlab = bquote(sigma)) mean = 0.15573 95% HDI 0.14033 0.1712 0.14 0.16 0.18 0.20 σ

Plot Posterior Distributions b0 histinfo = plotpost(b0, xlab = bquote(beta[0])) mean = 0.49804 95% HDI 0.47505 0.52051 0.46 0.48 0.50 0.52 0.54 0.56 β 0

Plot Posterior Distributions b1 par(mfrow = c(1, 3)) for (j in 1:nAgeClass) { histinfo = plotpost(b1[j, ], xlab = bquote(b1[.(j)]), main = paste( b1:, xnames[j])) }

Plot Posterior Distributions b1 Remember, these are effects of being in each category (deflections) To get actual values, add b0 to each b1: adult b1: foal b1: juvenile mean = 0.21717 mean = 0.23591 mean = 0.018741 95% HDI 0.24693 0.18692 95% HDI 0.20605 0.26673 95% HDI 0.053996 0.017213 0.28 0.24 0.20 0.16 b1 1 0.18 0.22 0.26 0.30 b1 2 0.10 0.05 0.00 0.05 b1 3

Comparing Groups

Comparing Groups eadults <- b1[1, ] efoals <- b1[2, ] ejuveniles <- b1[3, ]

Comparing Groups eadults <- b1[1, ] efoals <- b1[2, ] ejuveniles <- b1[3, ] e to indicate we are looking at effects rather than actual values

Comparing Groups Adult vs foal par(mfrow = c(1, 1)) AvF <- eadults - efoals histinfo = plotpost(avf, main = "Adult v Foal", xlab = "") Adult v Foal mean = 0.45317 95% HDI 0.50106 0.40361 0.55 0.50 0.45 0.40 0.35

Comparing Groups Adult vs juvenile AvJ <- eadults - ejuveniles histinfo = plotpost(avj, main = "Adult v Juvenile", xlab = "") Adult v Juvenile mean = 0.19832 95% HDI 0.25676 0.13959 0.30 0.25 0.20 0.15 0.10

Comparing Groups Foal vs juvenile FvJ <- efoals - ejuveniles histinfo = plotpost(fvj, main = "Foal v Juvenile", xlab = "") Foal v Juvenile mean = 0.25486 95% HDI 0.19591 0.31364 0.15 0.20 0.25 0.30 0.35

Comparing Groups Adult vs others JpF <- (ejuveniles + efoals) / 2 AvO <- eadults - JpF histinfo = plotpost(avo, main = "Adults v Others", xlab = "") Adults v Others mean = 0.32574 95% HDI 0.37168 0.28173 0.40 0.35 0.30 0.25

Comparing Groups Foals vs others ApJ <- (eadults + ejuveniles) / 2 FvO <- efoals - ApJ histinfo = plotpost(fvo, main = "Foals v Others", xlab = "") Foals v Others mean = 0.35402 95% HDI 0.30766 0.39868 0.25 0.30 0.35 0.40 0.45

Comparing Groups Dot charts Nice for comparing the same parameter across groups Need to make a new data frame containing the mean, and error bars for the parameter in each group

Comparing Groups Dot charts First, store the mean of each coefficient as a new vector b1means <- c(mean(eadults), mean(efoals), mean(ejuveniles))

Comparing Groups Dot charts Get the highest density interval for each beta, and combine source( HDIofMCMC.R ) b1.adultshdi <- HDIofMCMC(eAdults) b1.foalshdi <- HDIofMCMC(eFoals) b1.juvenileshdi <- HDIofMCMC(eJuveniles) b1hdi <- rbind(b1.adultshdi, b1.foalshdi, b1.juvenileshdi)

Comparing Groups Dot charts Get the highest density interval for each beta, and combine source( HDIofMCMC.R ) b1.adultshdi <- HDIofMCMC(eAdults) b1.foalshdi <- HDIofMCMC(eFoals) b1.juvenileshdi <- HDIofMCMC(eJuveniles) b1hdi <- rbind(b1.adultshdi, b1.foalshdi, b1.juvenileshdi) Returns the upper and lower values

Comparing Groups Dot charts Get the highest density interval for each beta, and combine source( HDIofMCMC.R ) b1.adultshdi <- HDIofMCMC(eAdults) b1.foalshdi <- HDIofMCMC(eFoals) b1.juvenileshdi <- HDIofMCMC(eJuveniles) b1hdi <- rbind(b1.adultshdi, b1.foalshdi, b1.juvenileshdi) Can change what percentage to use with the credmass argument (i.e., to get the 89% HDI, credmass = 0.89)

Comparing Groups Dot charts Get the highest density interval for each beta, and combine source( HDIofMCMC.R ) b1.adultshdi <- HDIofMCMC(eAdults) b1.foalshdi <- HDIofMCMC(eFoals) b1.juvenileshdi <- HDIofMCMC(eJuveniles) b1hdi <- rbind(b1.adultshdi, b1.foalshdi, b1.juvenileshdi)

Comparing Groups Dot charts Plot the means of each group dotchart(b1means, pch = 16, labels = c("adult", "foal", juvenile"), xlim = c(min(b1hdi), max(b1hdi)), xlab = "Beta coefficient") juvenile foal adult 0.2 0.1 0.0 0.1 0.2 Beta coefficients

Comparing Groups Dot charts Add the HDI bars segments(b1hdi[, 1], 1:3, b1hdi[, 2], 1:3, lwd = 2) juvenile foal adult 0.2 0.1 0.0 0.1 0.2 Beta coefficients

Comparing Groups Dot charts Add the HDI bars segments(b1hdi[, 1], 1:3, b1hdi[, 2], 1:3, lwd = 2) juvenile x- and foal y-coordinates of starting positions for line adult segments 0.2 0.1 0.0 0.1 0.2 Beta coefficients

Comparing Groups Dot charts Add the HDI bars segments(b1hdi[, 1], 1:3, b1hdi[, 2], 1:3, lwd = 2) juvenile x- and foal y-coordinates of ending positions for line adult segments 0.2 0.1 0.0 0.1 0.2 Beta coefficients

Comparing Groups Dot charts Add the HDI bars segments(b1hdi[, 1], 1:3, b1hdi[, 2], 1:3, lwd = 2) juvenile foal Make line twice as thick as default adult 0.2 0.1 0.0 0.1 0.2 Beta coefficients

Comparing Groups Dot charts Add the HDI bars segments(b1hdi[, 1], 1:3, b1hdi[, 2], 1:3, lwd = 2) juvenile foal adult 0.2 0.1 0.0 0.1 0.2 Beta coefficients

Check Validity of Model: Posterior Predictive Check

Posterior Predictive Check Generate new y values for a subset of x values in the data set based on estimates for coefficients Compare these predicted y values to the real ones

Posterior Predictive Check Select a subset of the data on which to make predictions (let s pick 50) newrows <- seq(from = 1, to = NROW(horsedata), length = 50)

Posterior Predictive Check Select a subset of the data on which to make predictions (let s pick 30) npred <- 30 newrows <- seq(from = 1, to = NROW(horsedata), length = npred) newrows [1] 1.000000 5.020408 9.040816 13.061224 17.081633 21.102041 25.122449 29.142857 33.163265 [10] 37.183673 41.204082 45.224490 49.244898 53.265306 57.285714 61.306122 65.326531 69.346939 [19] 73.367347 77.387755 81.408163 85.428571 89.448980 93.469388 97.489796 101.510204 105.530612 [28] 109.551020 113.571429 117.591837 121.612245 125.632653 129.653061 133.673469 137.693878 141.714286 [37] 145.734694 149.755102 153.775510 157.795918 161.816327 165.836735 169.857143 173.877551 177.897959 [46] 181.918367 185.938776 189.959184 193.979592 198.000000

Posterior Predictive Check newrows <- round(newrows) newrows [1] 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 102 [27] 106 110 114 118 122 126 130 134 138 142 146 150 154 158 162 166 170 174 178 182 186 190 194 198

Posterior Predictive Check Parse out these data from the original data frame newdata <- horsedata[newrows, ]

Posterior Predictive Check Order based on categorical (predictor) variable to make plots clearer later newdata <- newdata[order(newdata$aclass), ]

Posterior Predictive Check Separate out just the x data too, on which we will make predictions xnew <- newdata$aclass xnewnums <- as.numeric(xnew)

Posterior Predictive Check Organize categorical coefficients into one data frame (makes indexing later easier) b <- rbind(eadults, efoals, ejuveniles)

Posterior Predictive Check Next, define a matrix that will hold all of the predicted y values Number of rows is the number of x values for prediction Number of columns is the number of y values generated from the MCMC process We ll start with the matrix filled with zeros, but will fill it in later postsampsize = length(b1) ynew = matrix(0, nrow = length(xnew), ncol = postsampsize)

Posterior Predictive Check Define a matrix for holding the HDI limits of the predicted y values Same number of rows as above Only two columns (one for each end of the HDI) yhdilim = matrix(0, nrow = length(xnew), ncol = 2)

Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain for (i in 1:postSampSize) { ynew[, i] <- rnorm(length(xnew), mean = b0 + b[xnewnums], sd = sigma) }

Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain for (i in 1:postSampSize) { ynew[, i] <- rnorm(length(xnew), mean = b0 + b[xnewnums], sd = sigma) } For every step in the chain, fill out a new column (all rows) of the new matrix...

Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain for (i in 1:postSampSize) { ynew[, i] <- rnorm(length(xnew), mean = b0 + b[xnewnums], sd = sigma) }...pulling the same number of x values as in our xnew list from a normal distribution...

Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain for (i in 1:postSampSize) { ynew[, i] <- rnorm(length(xnew), mean = b0 + b[xnewnums], sd = sigma) }...with a mean based on b0 plus which category each x value is in...

Posterior Predictive Check Now, populate the ynew matrix by generating one predicted y value for each step in the chain for (i in 1:postSampSize) { ynew[, i] <- rnorm(length(xnew), mean = b0 + b[xnewnums], sd = sigma) }...and a standard deviation based on those data from the posterior.

Posterior Predictive Check Calculate means for each prediction, and the associated low and high 95% HDI estimates means <- rowmeans(ynew) source("hdiofmcmc.r") for (i in 1:length(xNew)) { yhdilim[i, ] <- HDIofMCMC(yNew[i, ]) }

Posterior Predictive Check Combine the data predtable <- cbind(xnew, means, yhdilim)

Posterior Predictive Check Plot the results #--- Plot the predicted values (dot plot) ---# dotchart(means, labels = 1:nPred, xlim = c(min(yhdilim), max(yhdilim)), xlab = "Inbreeding Coefficient", pch = 16) segments(yhdilim[, 1], 1:nPred, yhdilim[, 2], 1:nPred, lwd = 2) #--- Plot the true values ---# points(x = newdata$ic, y = 1:nPred, pch = 16, col = rgb(1, 0, 0, 0.5))

Posterior Predictive Check 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0.0 0.2 0.4 0.6 0.8 1.0 Inbreeding Coefficient

Questions?

Homework!

Homework Current model assumes equal standard deviation (precision) for each category Modify the model to estimate a different standard deviation for each category, and evaluate in the same ways as we did the mean

Creative Commons License Anyone is allowed to distribute, remix, tweak, and build upon this work, even commercially, as long as they credit me for the original creation. See the Creative Commons website for more information. Click here to go back to beginning