Metric Predicted Variable on Two Groups

Metric Predicted Variable on Two Groups Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

Goals

Goals When would we use this type of analysis? Comparing data between two groups Are means different? Is variability different? etc. t-test and equivalents (but more flexible)

Data

Data Metric data from two groups twodata.csv

Data Read data into R twodata <- read.table( twodata.csv, header = TRUE, sep =, )

Data Let s get a feel for the data summary(twodata) y1 y2 Min. :-2.4980 Min. :-0.7580 1st Qu.:-0.2612 1st Qu.: 0.8475 Median : 0.2340 Median : 1.5610 Mean : 0.2524 Mean : 2.1695 3rd Qu.: 0.7738 3rd Qu.: 2.7652 Max. : 2.5830 Max. :11.6370

Data sd(twodata$y1) [1] 1.045462 sd(twodata$y2) [1] 2.738632

Data sd(twodata$y1) [1] 1.045462 sd(twodata$y2) [1] 2.738632 y2 seems to have larger standard deviation than y1

Data Let s look at the data Many potential ways to plot this. We ll look at three.

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 Define values for x- and y-axes. Here we want the same so that it is easy to compare. 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) Define labels for the x- and y-axes. y2 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 5 0 5 10 Use filled circles as plotting symbol (see?pch for more details). 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) Use rgb colour specifications to set fill colour (allows for transparency of symbols). First number indicates degree of red, second indicates degree of green, and third indicates degree of blue (on a scale from 0 to 1). The 4th number indicates how opaque the colour is (1 = solid, 0 = totally opaque). See?rgb for more details. y2 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 Add a line to the plot with an intercept of 0, a slope of 1, and a thickness of 2. 5 0 5 10 5 0 5 10 y1

Data y2 mostly larger than y1 y2 more spread out than y1 y2 5 0 5 10 5 0 5 10 y1

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Create variables with histogram data for each data set. Frequency 0 2 4 6 8 y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency Plot them, specifying the rgb parameters and scale of the x-axis. Note the add = TRUE argument to indicate that y1 the second histogram should be plotted in the y2 same frame as the first one. 0 2 4 6 8 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 Add a legend to the plot, and place it in the upper-right corner. y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 The text to be included in the legend y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 What shape to use for legend symbols (15 is square). See?pch for more details. y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 What colours to use for each symbol (in order!). y1 y2 0 5 10 15

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Combine the two data sets into one long vector. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Create a vector of labels for the values in the first group (y1). Will label first group as 1, so this vector will have 1 repeated for each value in the y1 group. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Create a vector of labels for the values in the second group (y2). Will label second group as 2, so this vector will have 2 repeated for each value in the y2 group. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Combine these into one long vector. This vector will be as long as our data vector, but contain a label indicating which group each value is from. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) 2 2 4 6 8 12 Draw a box plot of the data values, grouped by the groups values. y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) 2 2 4 6 8 12 Specify how to label the groups in the plot. y1 y2

Data 2 2 4 6 8 12 y1 y2

Data 2 2 4 6 8 12 Median y1 y2

Data 50% of values (1st and 3rd quartile) 2 2 4 6 8 12 y1 y2

Data Remaining values up to 1.5X inter-quartile range (difference between 1st and 3rd quartile; roughly 2 standard deviations) 2 2 4 6 8 12 y1 y2

Data Outliers - values falling outside 1.5X the inter-quartile range 2 2 4 6 8 12 y1 y2

Data Which plotting method (if any) is most informative for data like this? y2 5 0 5 10 5 0 5 10 y1 Frequency 0 2 4 6 8 y1 y2 2 2 4 6 8 12 0 5 10 15 y1 y2

Frequentist Approach

Frequentist Approach t-test t.test(twodata$y1, twodata$y2) Welch Two Sample t-test data: twodata$y1 and twodata$y2 t = -2.9247, df = 24.423, p-value = 0.007335 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.2687078-0.5654922 sample estimates: mean of x mean of y 0.2524 2.1695

Frequentist Approach Wilcoxon rank sum test t-test assumes: Data normally distributed Variances equal Wilcoxon rank sum test is a non-parametric alternative

Frequentist Approach Wilcoxon rank sum test wilcox.test(twodata$y1, twodata$y2) Wilcoxon rank sum test data: twodata$y1 and twodata$y2 W = 87, p-value = 0.001767 alternative hypothesis: true location shift is not equal to 0

Bayesian Approach

Standardize the Data y1 <- twodata$y1 y1mean <- mean(y1) y1sd <- sd(y1) zy1 <- (y1 - y1mean) / y1sd N1 <- length(zy1) y2 <- twodata$y2 y2mean <- mean(y2) y2sd <- sd(y2) zy2 <- (y2 - y2mean) / y2sd N2 <- length(zy2)

Specify the model Can just do two of our original model (simultaneously) 0 10 0 10 1.1 0.11 1.1 0.11 µ τ = 1/σ 2 µ τ = 1/σ 2 - norm α gamma β - norm α gamma β µ τ = 1/σ 2 µ τ = 1/σ 2 norm norm - - y1i y2i

Specify the model modelstring = model { # Likelihood for (i in 1:N1) { zy1[i] ~ dnorm(mu1, tau1) } for (j in 1:N2) { zy2[j] ~ dnorm(mu2, tau2) } # Priors mu1 ~ dnorm(0, (1 / 10^2)) mu2 ~ dnorm(0, (1 / 10^2)) sigma1 ~ dgamma(1.1, 0.11) sigma2 ~ dgamma(1.1, 0.11) tau1 <- 1 / sigma1^2 tau2 <- 1 / sigma2^2 } writelines(modelstring, con = model.txt )

Prepare Data for JAGS Specify as a list for JAGS datalist = list ( zy1 = zy1, zy2 = zy2, N1 = N1, N2 = N2 )

Specify Initial Values initslist <- function() { list( mu1 = rnorm(n = 1, mean = 0, sd = 10), mu2 = rnorm(n = 1, mean = 0, sd = 10), sigma1 = rgamma(n = 1, shape = 1.1, rate = 0.11) sigma2 = rgamma(n = 1, shape = 1.1, rate = 0.11) ) }

Specify MCMC Parameters and Run library(runjags) runjagsout <- run.jags( method = simple, model = model.txt, monitor = c( mu1, mu2, sigma1, sigma2 ), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE)

Evaluate Performance of the Model

Testing Model Performance Retrieve the data and take a peak at the structure codasamples = as.mcmc.list(runjagsout) head(codasamples[[1]]) Markov Chain Monte Carlo (MCMC) output: Start = 1501 End = 1507 Thinning interval = 1 mu1 mu2 sigma1 sigma2 1501-0.0255491-0.27213500 0.856140 0.941199 1502-0.2445680 0.16608800 1.096810 0.906884 1503 0.5478500 0.05227700 1.521910 1.132970 1504 0.1006070-0.09025910 1.085520 1.093390 1505 0.2066000-0.05893500 0.993417 1.194550 1506 0.1937230 0.00712917 0.805682 0.894510 1507-0.2859440-0.72554900 1.316050 0.988003

Testing Model Performance Trace plots par(mfrow = c(2,2)) traceplot(codasamples)

Testing Model Performance Autocorrelation plots autocorr.plot(codasamples[[1]]) mu1 mu2 Autocorrelation 1.0 0.5 0.0 0.5 1.0 Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 5 10 15 20 25 30 35 Lag 0 5 10 15 20 25 30 35 Lag sigma1 sigma2 Autocorrelation 1.0 0.5 0.0 0.5 1.0 Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 5 10 15 20 25 30 35 Lag 0 5 10 15 20 25 30 35 Lag

Testing Model Performance Gelman & Rubin diagnostic gelman.diag(codasamples) Potential scale reduction factors: Point est. Upper C.I. mu1 1 1 mu2 1 1 sigma1 1 1 sigma2 1 1 Multivariate psrf 1

Testing Model Performance Effective size effectivesize(codasamples) mu1 mu2 sigma1 sigma2 58617.40 59942.37 25714.49 25945.56

Viewing Results

Parsing Data Convert codasamples to a matrix Will concatenate chains into one long one mcmcchain = as.matrix(codasamples)

Parsing Data Separate out data for each parameter zmu1 <- mcmcchain[, mu1 ] zmu2 <- mcmcchain[, mu2 ] zsigma1 <- mcmcchain[, sigma1 ] zsigma2 <- mcmcchain[, sigma2 ]

Convert Back to Original Scale mu1 <- (zmu1 * ysd1) + ymean1 mu2 <- (zmu2 * ysd2) + ymean2 sigma1 <- zsigma1 * ysd1 sigma2 <- zsigma2 * ysd2

Plot Posterior Distributions Means par(mfrow=c(1, 2)) histinfo = plotpost(mu1, xlab = bquote(mu[1])) histinfo = plotpost(mu2, xlab = bquote(mu[2])) mean = 0.25306 mean = 2.1706 95% HDI 0.25464 0.74952 95% HDI 0.87422 3.521 1.0 0.0 0.5 1.0 1.5 µ 1 1 0 1 2 3 4 5 µ 2

Plot Posterior Distributions Means Can work directly with posterior distributions!!! diffmu <- mu1 - mu2 par(mfrow = c(1,1)) histinfo = plotpost(diffmu, xlab = bquote(mu[1] - mu[2])) mean = 1.9175 95% HDI 3.3758 0.53741 5 4 3 2 1 0 1 2 µ 1 µ 2

Plot Posterior Distributions Standard deviation par(mfrow = c(1,2)) histinfo = plotpost(sigma1, xlab = bquote(sigma[1]), showmode = TRUE) histinfo = plotpost(sigma2, xlab = bquote(sigma[2]), showmode = TRUE) mode = 1.0465 mode = 2.7656 95% HDI 0.77833 1.52 95% HDI 2.028 3.9841 1.0 1.5 2.0 2.5 σ 1 2 3 4 5 6 7 σ 2

Plot Posterior Distributions Standard deviation diffsigma <- sigma1 - sigma2 par(mfrow = c(1,1)) histinfo = plotpost(diffsigma, xlab = bquote(sigma[1] - sigma[2]), showmode = TRUE) mode = 1.6664 95% HDI 2.968 0.81001 6 5 4 3 2 1 0 σ 1 σ 2

Plot Posterior Distributions Effect size The difference in means, standardized by the variance Provides information on how big of an effect there is, considering the amount of variation. Should generally range from about -1 to 1

Plot Posterior Distributions Effect size esize <- (mu1 - mu2) / (sqrt((sigma1^2 + sigma2^2) / 2)) histinfo = plotpost(esize, xlab = bquote((mu[1] - mu[2]) / sqrt((sigma[1]^2 + sigma[2]^2)/2)), cex.lab = 0.9) mean = 0.88047 95% HDI 1.5679 0.22249 2.0 1.5 1.0 0.5 0.0 0.5 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Recap Think of the wealth of information we ve obtained mean = 0.25169 mode = 1.022 mean = 2.1678 mode = 2.6328 95% HDI 0.24305 0.73682 95% HDI 0.76329 1.4671 95% HDI 0.89754 3.4578 95% HDI 2.0018 3.8461 1.0 0.0 0.5 1.0 1.5 µ 1 0.5 1.0 1.5 2.0 2.5 σ 1 2 0 2 4 6 µ 2 2 3 4 5 6 7 σ 2 y1 y2 mean = 1.9162 mode = 1.6522 mean = 0.90438 95% HDI 3.2919 0.55531 4 2 0 2 µ 1 µ 2 95% HDI 2.8347 0.7986 4 3 2 1 0 σ 1 σ 2 95% HDI 1.5819 0.24047 2.0 1.5 1.0 0.5 0.0 0.5 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Recap Think of the wealth of information we ve obtained The goal of analyses should not be one value and a yes/no decision it should be to obtain information about the data so that you can evaluate the credibility of different hypotheses

Revision of the Goals of Bayesian Analysis

Bayesian Analysis Taken almost verbatim from Gelman et al. (2014)* A practical method for making inferences from data using probability models for quantities we observe and for quantities about which we wish to learn Explicit use of probability for quantifying uncertainty in inferences based on statistical data analysis * Gelman et al. (2014) Bayesian Data Analysis. CRC Press.

Bayesian Analysis Three main steps 1. Setting up a full probability model - a joint probability distribution for all observable and unobservable quantities in a problem. The model should be consistent with knowledge about the underlying scientific problem and the data collection process

Bayesian Analysis Three main steps 2. Condition on observed data - calculating and interpreting the appropriate posterior distribution - the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data

Bayesian Analysis Three main steps 3. Evaluating the fit of the model and the implications - How well does the model fit the data? Are the conclusions reasonable? How sensitive are the results to the modelling assumptions in step 1?

Bayesian Analysis Emphasis on Do the inferences make sense? Are the model s predictions consistent with the data?

Bayesian Analysis Emphasis on Do the inferences make sense? Are the model s predictions consistent with the data? Is the model true? What is the Pr(model is true) Can we reject the model Not

Bayesian Analysis Emphasis on Describing the data, and the factors influencing the data, in an explicit and probabilistic manner Making interpretations of these factors based on the analyses

How Well Does Our Model Fit The Data? Posterior Predictive Check

Assessing Model Fit y1 Plot data Choose some values from the posterior and plot over data

Assessing Model Fit y1 histinfo = hist(y1, xlab = "y1", main = "", col = "skyblue", prob = TRUE) Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 y1

Assessing Model Fit y1 Get range of values from observed distribution plot y1lims = range(histinfo$breaks) y1lims [1] -3 3

Assessing Model Fit y1 Get range of values from observed distribution plot y1lims = range(histinfo$breaks) y1lims [1] -3 3 Create a sequence of 500 values within this range y1sample = seq(from = y1lims[1], to = y1lims[2], length = 500)

Assessing Model Fit y1 Get length of posterior chainlength1 = length(mu1)

Assessing Model Fit y1 Get length of posterior chainlength1 = length(mu1) Get 20 values from this range (we ll draw 20 lines) y1new = floor(seq(from = 1, to = chainlength1, length = 20))

Assessing Model Fit y1 Loop through list and plot associated lines for (i in y1new) { lines(y1sample, dnorm(y1sample, mean = mu1[i], sd = sigma1[i]), col = gray47 ) } Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 y1

Assessing Model Fit y2 histinfo = hist(y2, xlab = "y2", main = "", col = "skyblue", prob = TRUE) Density 0.00 0.05 0.10 0.15 0.20 2 0 2 4 6 8 10 12 y2

Assessing Model Fit y2 Get range of values from observed distribution plot y2lims = range(histinfo$breaks) y2lims [1] -2 12

Assessing Model Fit y2 Get range of values from observed distribution plot y2lims = range(histinfo$breaks) y2lims [1] -2 12 Create a sequence of 500 values within this range y2sample <- seq(from = y2lims[1], to = y2lims[2], length = 500)

Assessing Model Fit y2 Get length of posterior chainlength2 = length(mu2)

Assessing Model Fit y2 Get length of posterior chainlength2 = length(mu2) Get 20 values from this range (we ll draw 20 lines) y2new = floor(seq(from = 1, to = chainlength2, length = 20))

Assessing Model Fit y2 Loop through list and plot associated lines for (i in y2new) { lines(y2sample, dnorm(y2sample, mean = mu2[i], sd = sigma2[i]), col = "gray47") } Density 0.00 0.05 0.10 0.15 0.20 2 0 2 4 6 8 10 12 y2

Were Priors Appropriate?

Assessing Priors Mean (mu) Make a list containing the range of values over which to evaluate performance Mean should be 0, with sd = 1, so a range from -2 to 2 should work par(mfrow = c(1, 2)) # To plot data for both mu1 and mu2 together mupriorlist <- seq(from = -2, to = 2, length = 500)

Assessing Priors Mean (mu) Then, generate priors using model parameters mu1prior <- dnorm(mupriorlist, mean = 0, sd = 10) mu2prior <- dnorm(mupriorlist, mean = 0, sd = 10)

Assessing Priors Mean (mu) Get the distribution of the posterior using the density function mu1post <- density(zmu1) mu2post <- density(zmu2)

Assessing Priors Mean (mu) Get ranges for data mu1high <- ceiling(max(mu1post$y)) mu2high <- ceiling(max(mu2post$y))

Assessing Priors Mean (mu) Plot data for mu1 plot(mupriorlist, mu1prior, ylim = c(0, mu1high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = zmu1") lines(mu1post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), bty = "n")

Assessing Priors Mean (mu) Plot data for mu2 plot(mupriorlist, mu2prior, ylim = c(0, mu2high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = zmu2") lines(mu2post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), bty = "n")

Assessing Priors Mean (mu) zmu1 zmu2 Probability 0.0 0.5 1.0 1.5 2.0 Prior Posterior Probability 0.0 0.5 1.0 1.5 2.0 Prior Posterior 2 1 0 1 2 2 1 0 1 2 Possible Values Possible Values

Assessing Priors Standard deviation (sigma) Make a list containing the range of values over which to evaluate performance Mode should be 1, with sd = 10, so a range from 0 to 5 should work par(mfrow = c(1, 2)) # To plot data for both sigma1 and sigma2 together sigmapriorlist <- seq(from = 0, to = 3, length = 500)

Assessing Priors Standard deviation (sigma) Then, generate priors using model parameters sigma1prior <- dgamma(sigmapriorlist, shape = 1.1, rate = 0.11) sigma2prior <- dgamma(sigmapriorlist, shape = 1.1, rate = 0.11)

Assessing Priors Standard deviation (sigma) Get the distribution of the posterior using the density function sigma1post <- density(zsigma1) sigma2post <- density(zsigma2)

Assessing Priors Standard deviation (sigma) Get ranges for data sigma1high <- ceiling(max(sigma1post$y)) sigma2high <- ceiling(max(sigma2post$y))

Assessing Priors Standard deviation (sigma) Plot data for sigma1 plot(sigmapriorlist, sigma1prior, ylim = c(0, sigma1high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = sigma1") lines(sigma1post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), lwd = c(2, 2), bty = "n")

Assessing Priors Standard deviation (sigma) Plot data for sigma2 plot(sigmapriorlist, sigma2prior, ylim = c(0, sigma2high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = sigma2") lines(sigma2post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), lwd = c(2, 2), bty = "n")

Assessing Priors Standard deviation (sigma) sigma1 sigma2 Probability 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Prior Posterior Probability 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Prior Posterior 0 1 2 3 4 5 0 1 2 3 4 5 Possible Values Possible Values

Re-evaluating Some Old Examples

Re-evaluating Old Examples Remember these? N = 10,000 each, means differ by 0.1 N = 10 each, means differ by 4 Density 0.00 0.02 0.04 0.06 0.08 Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 15 10 5 0 5 10 15 Effect size = -0.032 Effect size = -0.42 p = 0.023 p = 0.36

Re-evaluating Old Examples Remember these? What would you expect from Bayesian analyses?

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 mean = 0.019723 mean = 0.17007 95% HDI 0.078613 0.11718 0.2 0.0 0.1 0.2 µ 1 95% HDI 0.072394 0.26839 0.0 0.1 0.2 0.3 0.4 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 mean = 0.15035 95% HDI 0.28776 0.011461 0.4 0.3 0.2 0.1 0.0 0.1 µ 1 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 mean = 0.030098 95% HDI 0.057387 0.0020709 0.08 0.06 0.04 0.02 0.00 0.02 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 15 10 5 0 5 10 15 mean = 0.49828 mean = 2.0819 95% HDI 5.0558 4.1695 10 0 10 20 µ 1 95% HDI 1.0358 5.2717 10 5 0 5 10 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 15 10 5 0 5 10 15 mean = 2.5802 95% HDI 8.1708 2.9535 20 10 0 10 20 µ 1 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 15 10 5 0 5 10 15 mean = 0.43905 95% HDI 1.3398 0.44554 2 1 0 1 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Re-evaluating Old Examples Frequency 0 10000 30000 20 10 0 10 20 µ 1 µ 2

Re-evaluating Old Examples Frequency 0 10000 30000 10 5 0 5 µ 1 µ 2

Questions?

Homework!!

You guessed it: modify model using the t distribution instead of normal

Creative Commons License Anyone is allowed to distribute, remix, tweak, and build upon this work, even commercially, as long as they credit me for the original creation. See the Creative Commons website for more information. Click here to go back to beginning