Metric Predicted Variable on Two Groups

Similar documents
Metric Predicted Variable on One Group

Metric Predicted Variable With One Nominal Predictor Variable

Hierarchical Modeling

Multiple Regression: Mixed Predictor Types. Tim Frasier

Multiple Regression: Nominal Predictors. Tim Frasier

Count Predicted Variable & Contingency Tables

Bayesian Statistics: An Introduction

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

R Demonstration ANCOVA

Why Bayesian approaches? The average height of a rare plant

Homework 6 Solutions

BUGS Bayesian inference Using Gibbs Sampling

36-463/663Multilevel and Hierarchical Models

Statistical Simulation An Introduction

Package leiv. R topics documented: February 20, Version Type Package

First steps of multivariate data analysis

Metropolis-Hastings Algorithm

Markov Chain Monte Carlo

36-463/663: Multilevel & Hierarchical Models HW09 Solution

Theory of Inference: Homework 4

Introduction to R, Part I

Bayesian Networks in Educational Assessment

Package bpp. December 13, 2016

A Bayesian Approach to Phylogenetics

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

A noninformative Bayesian approach to domain estimation

Bayesian inference for a population growth model of the chytrid fungus Philipp H Boersch-Supan, Sadie J Ryan, and Leah R Johnson September 2016

Quantitative Understanding in Biology 1.7 Bayesian Methods

Bayesian Phylogenetics:

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOS 312: Precision of Statistical Inference

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Class 04 - Statistical Inference

Bayesian Graphical Models

STAT 3022 Spring 2007

Principles of Bayesian Inference

ST 740: Markov Chain Monte Carlo

Introduction to the Analysis of Hierarchical and Longitudinal Data

Chapter 5 Exercises 1

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Explore the data. Anja Bråthen Kristoffersen

Statistical Computing Session 4: Random Simulation

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

The evdbayes Package

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction and Single Predictor Regression. Correlation

DAG models and Markov Chain Monte Carlo methods a short overview

Markov Chain Monte Carlo methods

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Markov Chain Monte Carlo

36-463/663: Hierarchical Linear Models

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Principles of Bayesian Inference

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

STA414/2104 Statistical Methods for Machine Learning II

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Bayesian Model Diagnostics and Checking

Forward Problems and their Inverse Solutions

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

David Giles Bayesian Econometrics

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

Part 4: Multi-parameter and normal models

Principles of Bayesian Inference

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Business Statistics. Lecture 10: Course Review

Bayesian Inference for Regression Parameters

Bayesian Methods for Machine Learning

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

Jian WANG, PhD. Room A115 College of Fishery and Life Science Shanghai Ocean University

Lecture 7: The χ 2 and t distributions, frequentist confidence intervals, the Neyman-Pearson paradigm, and introductory frequentist hypothesis testing

L6: Regression II. JJ Chen. July 2, 2015

Package bayeslm. R topics documented: June 18, Type Package

STAT 425: Introduction to Bayesian Analysis

Using R in 200D Luke Sonnet

Robust Bayesian Regression

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Introductory Statistics with R: Simple Inferences for continuous data

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Statistical Methods for Astronomy


Psychology 282 Lecture #4 Outline Inferences in SLR

Bayesian Regression Linear and Logistic Regression

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)

(Re)introduction to Statistics Dan Lizotte

The lmm Package. May 9, Description Some improved procedures for linear mixed models

Confidence Distribution

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Inference for the mean of a population. Testing hypotheses about a single mean (the one sample t-test). The sign test for matched pairs

Part 6: Multivariate Normal and Linear Models

Bayesian Inference. Chapter 1. Introduction and basic concepts

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

Introduction to Bayesian Statistics 1

Transcription:

Metric Predicted Variable on Two Groups Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

Goals

Goals When would we use this type of analysis? Comparing data between two groups Are means different? Is variability different? etc. t-test and equivalents (but more flexible)

Data

Data Metric data from two groups twodata.csv

Data Read data into R twodata <- read.table( twodata.csv, header = TRUE, sep =, )

Data Let s get a feel for the data summary(twodata) y1 y2 Min. :-2.4980 Min. :-0.7580 1st Qu.:-0.2612 1st Qu.: 0.8475 Median : 0.2340 Median : 1.5610 Mean : 0.2524 Mean : 2.1695 3rd Qu.: 0.7738 3rd Qu.: 2.7652 Max. : 2.5830 Max. :11.6370

Data Let s get a feel for the data summary(twodata) y1 y2 Min. :-2.4980 Min. :-0.7580 1st Qu.:-0.2612 1st Qu.: 0.8475 Median : 0.2340 Median : 1.5610 Mean : 0.2524 Mean : 2.1695 3rd Qu.: 0.7738 3rd Qu.: 2.7652 Max. : 2.5830 Max. :11.6370 y2 seems to have higher values than y1

Data sd(twodata$y1) [1] 1.045462 sd(twodata$y2) [1] 2.738632

Data sd(twodata$y1) [1] 1.045462 sd(twodata$y2) [1] 2.738632 y2 seems to have larger standard deviation than y1

Data Let s look at the data Many potential ways to plot this. We ll look at three.

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 Define values for x- and y-axes. Here we want the same so that it is easy to compare. 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) Define labels for the x- and y-axes. y2 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 5 0 5 10 Use filled circles as plotting symbol (see?pch for more details). 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) Use rgb colour specifications to set fill colour (allows for transparency of symbols). First number indicates degree of red, second indicates degree of green, and third indicates degree of blue (on a scale from 0 to 1). The 4th number indicates how opaque the colour is (1 = solid, 0 = totally opaque). See?rgb for more details. y2 5 0 5 10 5 0 5 10 y1

Data plot(twodata$y1, twodata$y2, ylim = c(-6, 12), xlim = c(-6, 12), xlab = "y1", ylab = "y2", pch = 16, col = rgb(0, 0, 1, 0.5)) abline(0, 1, lwd = 2) y2 Add a line to the plot with an intercept of 0, a slope of 1, and a thickness of 2. 5 0 5 10 5 0 5 10 y1

Data y2 mostly larger than y1 y2 more spread out than y1 y2 5 0 5 10 5 0 5 10 y1

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Create variables with histogram data for each data set. Frequency 0 2 4 6 8 y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency Plot them, specifying the rgb parameters and scale of the x-axis. Note the add = TRUE argument to indicate that y1 the second histogram should be plotted in the y2 same frame as the first one. 0 2 4 6 8 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 Add a legend to the plot, and place it in the upper-right corner. y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 The text to be included in the legend y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 What shape to use for legend symbols (15 is square). See?pch for more details. y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 What colours to use for each symbol (in order!). y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 Don t draw a box around legend. y1 y2 0 5 10 15

Data h1 <- hist(twodata$y1) h2 <- hist(twodata$y2) plot(h1, col = rgb(0,0,1,0.25), xlim = c(-4, 15), main = "", xlab = "") plot(h2, col = rgb(1,0,0,0.25), xlim = c(-4, 15), add = TRUE) legend("topright", legend = c("y1", "y2"), pch = 15, col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25)), bty = n ) Frequency 0 2 4 6 8 y1 y2 0 5 10 15

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Combine the two data sets into one long vector. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Create a vector of labels for the values in the first group (y1). Will label first group as 1, so this vector will have 1 repeated for each value in the y1 group. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Create a vector of labels for the values in the second group (y2). Will label second group as 2, so this vector will have 2 repeated for each value in the y2 group. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) Combine these into one long vector. This vector will be as long as our data vector, but contain a label indicating which group each value is from. 2 2 4 6 8 12 y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) 2 2 4 6 8 12 Draw a box plot of the data values, grouped by the groups values. y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) 2 2 4 6 8 12 Specify how to label the groups in the plot. y1 y2

Data data <- c(twodata$y1, twodata$y2) ny1 <- rep(1, length(twodata$y1)) ny2 <- rep(2, length(twodata$y2)) groups <- c(ny1, ny2) boxplot(data ~ groups, names = c("y1", "y2"), col = c(rgb(0,0,1,0.25), rgb(1,0,0,0.25))) 2 2 4 6 8 12 Specify what colours to use for each group. y1 y2

Data 2 2 4 6 8 12 y1 y2

Data 2 2 4 6 8 12 Median y1 y2

Data 50% of values (1st and 3rd quartile) 2 2 4 6 8 12 y1 y2

Data Remaining values up to 1.5X inter-quartile range (difference between 1st and 3rd quartile; roughly 2 standard deviations) 2 2 4 6 8 12 y1 y2

Data Outliers - values falling outside 1.5X the inter-quartile range 2 2 4 6 8 12 y1 y2

Data Which plotting method (if any) is most informative for data like this? y2 5 0 5 10 5 0 5 10 y1 Frequency 0 2 4 6 8 y1 y2 2 2 4 6 8 12 0 5 10 15 y1 y2

Frequentist Approach

Frequentist Approach t-test t.test(twodata$y1, twodata$y2) Welch Two Sample t-test data: twodata$y1 and twodata$y2 t = -2.9247, df = 24.423, p-value = 0.007335 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.2687078-0.5654922 sample estimates: mean of x mean of y 0.2524 2.1695

Frequentist Approach Wilcoxon rank sum test t-test assumes: Data normally distributed Variances equal Wilcoxon rank sum test is a non-parametric alternative

Frequentist Approach Wilcoxon rank sum test wilcox.test(twodata$y1, twodata$y2) Wilcoxon rank sum test data: twodata$y1 and twodata$y2 W = 87, p-value = 0.001767 alternative hypothesis: true location shift is not equal to 0

Bayesian Approach

Standardize the Data y1 <- twodata$y1 y1mean <- mean(y1) y1sd <- sd(y1) zy1 <- (y1 - y1mean) / y1sd N1 <- length(zy1) y2 <- twodata$y2 y2mean <- mean(y2) y2sd <- sd(y2) zy2 <- (y2 - y2mean) / y2sd N2 <- length(zy2)

Specify the model Can just do two of our original model (simultaneously) 0 10 0 10 1.1 0.11 1.1 0.11 µ τ = 1/σ 2 µ τ = 1/σ 2 - norm α gamma β - norm α gamma β µ τ = 1/σ 2 µ τ = 1/σ 2 norm norm - - y1i y2i

Specify the model modelstring = model { # Likelihood for (i in 1:N1) { zy1[i] ~ dnorm(mu1, tau1) } for (j in 1:N2) { zy2[j] ~ dnorm(mu2, tau2) } # Priors mu1 ~ dnorm(0, (1 / 10^2)) mu2 ~ dnorm(0, (1 / 10^2)) sigma1 ~ dgamma(1.1, 0.11) sigma2 ~ dgamma(1.1, 0.11) tau1 <- 1 / sigma1^2 tau2 <- 1 / sigma2^2 } writelines(modelstring, con = model.txt )

Prepare Data for JAGS Specify as a list for JAGS datalist = list ( zy1 = zy1, zy2 = zy2, N1 = N1, N2 = N2 )

Specify Initial Values initslist <- function() { list( mu1 = rnorm(n = 1, mean = 0, sd = 10), mu2 = rnorm(n = 1, mean = 0, sd = 10), sigma1 = rgamma(n = 1, shape = 1.1, rate = 0.11) sigma2 = rgamma(n = 1, shape = 1.1, rate = 0.11) ) }

Specify MCMC Parameters and Run library(runjags) runjagsout <- run.jags( method = simple, model = model.txt, monitor = c( mu1, mu2, sigma1, sigma2 ), data = datalist, inits = initslist, n.chains = 3, adapt = 500, burnin = 1000, sample = 20000, thin = 1, summarise = TRUE, plots = FALSE)

Evaluate Performance of the Model

Testing Model Performance Retrieve the data and take a peak at the structure codasamples = as.mcmc.list(runjagsout) head(codasamples[[1]]) Markov Chain Monte Carlo (MCMC) output: Start = 1501 End = 1507 Thinning interval = 1 mu1 mu2 sigma1 sigma2 1501-0.0255491-0.27213500 0.856140 0.941199 1502-0.2445680 0.16608800 1.096810 0.906884 1503 0.5478500 0.05227700 1.521910 1.132970 1504 0.1006070-0.09025910 1.085520 1.093390 1505 0.2066000-0.05893500 0.993417 1.194550 1506 0.1937230 0.00712917 0.805682 0.894510 1507-0.2859440-0.72554900 1.316050 0.988003

Testing Model Performance Trace plots par(mfrow = c(2,2)) traceplot(codasamples)

Testing Model Performance Autocorrelation plots autocorr.plot(codasamples[[1]]) mu1 mu2 Autocorrelation 1.0 0.5 0.0 0.5 1.0 Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 5 10 15 20 25 30 35 Lag 0 5 10 15 20 25 30 35 Lag sigma1 sigma2 Autocorrelation 1.0 0.5 0.0 0.5 1.0 Autocorrelation 1.0 0.5 0.0 0.5 1.0 0 5 10 15 20 25 30 35 Lag 0 5 10 15 20 25 30 35 Lag

Testing Model Performance Gelman & Rubin diagnostic gelman.diag(codasamples) Potential scale reduction factors: Point est. Upper C.I. mu1 1 1 mu2 1 1 sigma1 1 1 sigma2 1 1 Multivariate psrf 1

Testing Model Performance Effective size effectivesize(codasamples) mu1 mu2 sigma1 sigma2 58617.40 59942.37 25714.49 25945.56

Viewing Results

Parsing Data Convert codasamples to a matrix Will concatenate chains into one long one mcmcchain = as.matrix(codasamples)

Parsing Data Separate out data for each parameter zmu1 <- mcmcchain[, mu1 ] zmu2 <- mcmcchain[, mu2 ] zsigma1 <- mcmcchain[, sigma1 ] zsigma2 <- mcmcchain[, sigma2 ]

Convert Back to Original Scale mu1 <- (zmu1 * ysd1) + ymean1 mu2 <- (zmu2 * ysd2) + ymean2 sigma1 <- zsigma1 * ysd1 sigma2 <- zsigma2 * ysd2

Plot Posterior Distributions Means par(mfrow=c(1, 2)) histinfo = plotpost(mu1, xlab = bquote(mu[1])) histinfo = plotpost(mu2, xlab = bquote(mu[2])) mean = 0.25306 mean = 2.1706 95% HDI 0.25464 0.74952 95% HDI 0.87422 3.521 1.0 0.0 0.5 1.0 1.5 µ 1 1 0 1 2 3 4 5 µ 2

Plot Posterior Distributions Means Can work directly with posterior distributions!!! diffmu <- mu1 - mu2 par(mfrow = c(1,1)) histinfo = plotpost(diffmu, xlab = bquote(mu[1] - mu[2])) mean = 1.9175 95% HDI 3.3758 0.53741 5 4 3 2 1 0 1 2 µ 1 µ 2

Plot Posterior Distributions Standard deviation par(mfrow = c(1,2)) histinfo = plotpost(sigma1, xlab = bquote(sigma[1]), showmode = TRUE) histinfo = plotpost(sigma2, xlab = bquote(sigma[2]), showmode = TRUE) mode = 1.0465 mode = 2.7656 95% HDI 0.77833 1.52 95% HDI 2.028 3.9841 1.0 1.5 2.0 2.5 σ 1 2 3 4 5 6 7 σ 2

Plot Posterior Distributions Standard deviation diffsigma <- sigma1 - sigma2 par(mfrow = c(1,1)) histinfo = plotpost(diffsigma, xlab = bquote(sigma[1] - sigma[2]), showmode = TRUE) mode = 1.6664 95% HDI 2.968 0.81001 6 5 4 3 2 1 0 σ 1 σ 2

Plot Posterior Distributions Effect size The difference in means, standardized by the variance Provides information on how big of an effect there is, considering the amount of variation. Should generally range from about -1 to 1

Plot Posterior Distributions Effect size esize <- (mu1 - mu2) / (sqrt((sigma1^2 + sigma2^2) / 2)) histinfo = plotpost(esize, xlab = bquote((mu[1] - mu[2]) / sqrt((sigma[1]^2 + sigma[2]^2)/2)), cex.lab = 0.9) mean = 0.88047 95% HDI 1.5679 0.22249 2.0 1.5 1.0 0.5 0.0 0.5 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Recap Think of the wealth of information we ve obtained mean = 0.25169 mode = 1.022 mean = 2.1678 mode = 2.6328 95% HDI 0.24305 0.73682 95% HDI 0.76329 1.4671 95% HDI 0.89754 3.4578 95% HDI 2.0018 3.8461 1.0 0.0 0.5 1.0 1.5 µ 1 0.5 1.0 1.5 2.0 2.5 σ 1 2 0 2 4 6 µ 2 2 3 4 5 6 7 σ 2 y1 y2 mean = 1.9162 mode = 1.6522 mean = 0.90438 95% HDI 3.2919 0.55531 4 2 0 2 µ 1 µ 2 95% HDI 2.8347 0.7986 4 3 2 1 0 σ 1 σ 2 95% HDI 1.5819 0.24047 2.0 1.5 1.0 0.5 0.0 0.5 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Recap Think of the wealth of information we ve obtained The goal of analyses should not be one value and a yes/no decision it should be to obtain information about the data so that you can evaluate the credibility of different hypotheses

Revision of the Goals of Bayesian Analysis

Bayesian Analysis Taken almost verbatim from Gelman et al. (2014)* A practical method for making inferences from data using probability models for quantities we observe and for quantities about which we wish to learn Explicit use of probability for quantifying uncertainty in inferences based on statistical data analysis * Gelman et al. (2014) Bayesian Data Analysis. CRC Press.

Bayesian Analysis Three main steps 1. Setting up a full probability model - a joint probability distribution for all observable and unobservable quantities in a problem. The model should be consistent with knowledge about the underlying scientific problem and the data collection process

Bayesian Analysis Three main steps 2. Condition on observed data - calculating and interpreting the appropriate posterior distribution - the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data

Bayesian Analysis Three main steps 3. Evaluating the fit of the model and the implications - How well does the model fit the data? Are the conclusions reasonable? How sensitive are the results to the modelling assumptions in step 1?

Bayesian Analysis Emphasis on Do the inferences make sense? Are the model s predictions consistent with the data?

Bayesian Analysis Emphasis on Do the inferences make sense? Are the model s predictions consistent with the data? Is the model true? What is the Pr(model is true) Can we reject the model Not

Bayesian Analysis Emphasis on Describing the data, and the factors influencing the data, in an explicit and probabilistic manner Making interpretations of these factors based on the analyses

How Well Does Our Model Fit The Data? Posterior Predictive Check

Assessing Model Fit y1 Plot data Choose some values from the posterior and plot over data

Assessing Model Fit y1 histinfo = hist(y1, xlab = "y1", main = "", col = "skyblue", prob = TRUE) Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 y1

Assessing Model Fit y1 Get range of values from observed distribution plot y1lims = range(histinfo$breaks) y1lims [1] -3 3

Assessing Model Fit y1 Get range of values from observed distribution plot y1lims = range(histinfo$breaks) y1lims [1] -3 3 Create a sequence of 500 values within this range y1sample = seq(from = y1lims[1], to = y1lims[2], length = 500)

Assessing Model Fit y1 Get length of posterior chainlength1 = length(mu1)

Assessing Model Fit y1 Get length of posterior chainlength1 = length(mu1) Get 20 values from this range (we ll draw 20 lines) y1new = floor(seq(from = 1, to = chainlength1, length = 20))

Assessing Model Fit y1 Loop through list and plot associated lines for (i in y1new) { lines(y1sample, dnorm(y1sample, mean = mu1[i], sd = sigma1[i]), col = gray47 ) } Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 y1

Assessing Model Fit y2 histinfo = hist(y2, xlab = "y2", main = "", col = "skyblue", prob = TRUE) Density 0.00 0.05 0.10 0.15 0.20 2 0 2 4 6 8 10 12 y2

Assessing Model Fit y2 Get range of values from observed distribution plot y2lims = range(histinfo$breaks) y2lims [1] -2 12

Assessing Model Fit y2 Get range of values from observed distribution plot y2lims = range(histinfo$breaks) y2lims [1] -2 12 Create a sequence of 500 values within this range y2sample <- seq(from = y2lims[1], to = y2lims[2], length = 500)

Assessing Model Fit y2 Get length of posterior chainlength2 = length(mu2)

Assessing Model Fit y2 Get length of posterior chainlength2 = length(mu2) Get 20 values from this range (we ll draw 20 lines) y2new = floor(seq(from = 1, to = chainlength2, length = 20))

Assessing Model Fit y2 Loop through list and plot associated lines for (i in y2new) { lines(y2sample, dnorm(y2sample, mean = mu2[i], sd = sigma2[i]), col = "gray47") } Density 0.00 0.05 0.10 0.15 0.20 2 0 2 4 6 8 10 12 y2

Were Priors Appropriate?

Assessing Priors Mean (mu) Make a list containing the range of values over which to evaluate performance Mean should be 0, with sd = 1, so a range from -2 to 2 should work par(mfrow = c(1, 2)) # To plot data for both mu1 and mu2 together mupriorlist <- seq(from = -2, to = 2, length = 500)

Assessing Priors Mean (mu) Then, generate priors using model parameters mu1prior <- dnorm(mupriorlist, mean = 0, sd = 10) mu2prior <- dnorm(mupriorlist, mean = 0, sd = 10)

Assessing Priors Mean (mu) Get the distribution of the posterior using the density function mu1post <- density(zmu1) mu2post <- density(zmu2)

Assessing Priors Mean (mu) Get ranges for data mu1high <- ceiling(max(mu1post$y)) mu2high <- ceiling(max(mu2post$y))

Assessing Priors Mean (mu) Plot data for mu1 plot(mupriorlist, mu1prior, ylim = c(0, mu1high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = zmu1") lines(mu1post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), bty = "n")

Assessing Priors Mean (mu) Plot data for mu2 plot(mupriorlist, mu2prior, ylim = c(0, mu2high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = zmu2") lines(mu2post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), bty = "n")

Assessing Priors Mean (mu) zmu1 zmu2 Probability 0.0 0.5 1.0 1.5 2.0 Prior Posterior Probability 0.0 0.5 1.0 1.5 2.0 Prior Posterior 2 1 0 1 2 2 1 0 1 2 Possible Values Possible Values

Assessing Priors Standard deviation (sigma) Make a list containing the range of values over which to evaluate performance Mode should be 1, with sd = 10, so a range from 0 to 5 should work par(mfrow = c(1, 2)) # To plot data for both sigma1 and sigma2 together sigmapriorlist <- seq(from = 0, to = 3, length = 500)

Assessing Priors Standard deviation (sigma) Then, generate priors using model parameters sigma1prior <- dgamma(sigmapriorlist, shape = 1.1, rate = 0.11) sigma2prior <- dgamma(sigmapriorlist, shape = 1.1, rate = 0.11)

Assessing Priors Standard deviation (sigma) Get the distribution of the posterior using the density function sigma1post <- density(zsigma1) sigma2post <- density(zsigma2)

Assessing Priors Standard deviation (sigma) Get ranges for data sigma1high <- ceiling(max(sigma1post$y)) sigma2high <- ceiling(max(sigma2post$y))

Assessing Priors Standard deviation (sigma) Plot data for sigma1 plot(sigmapriorlist, sigma1prior, ylim = c(0, sigma1high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = sigma1") lines(sigma1post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), lwd = c(2, 2), bty = "n")

Assessing Priors Standard deviation (sigma) Plot data for sigma2 plot(sigmapriorlist, sigma2prior, ylim = c(0, sigma2high), type = "l", lty = 2, lwd = 2, xlab = "Possible Values", ylab = "Probability", main = sigma2") lines(sigma2post, lwd = 2) legend("topleft", legend = c("prior", "Posterior"), lty = c(2, 1), lwd = c(2, 2), bty = "n")

Assessing Priors Standard deviation (sigma) sigma1 sigma2 Probability 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Prior Posterior Probability 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Prior Posterior 0 1 2 3 4 5 0 1 2 3 4 5 Possible Values Possible Values

Re-evaluating Some Old Examples

Re-evaluating Old Examples Remember these? N = 10,000 each, means differ by 0.1 N = 10 each, means differ by 4 Density 0.00 0.02 0.04 0.06 0.08 Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 15 10 5 0 5 10 15 Effect size = -0.032 Effect size = -0.42 p = 0.023 p = 0.36

Re-evaluating Old Examples Remember these? What would you expect from Bayesian analyses?

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 mean = 0.019723 mean = 0.17007 95% HDI 0.078613 0.11718 0.2 0.0 0.1 0.2 µ 1 95% HDI 0.072394 0.26839 0.0 0.1 0.2 0.3 0.4 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 mean = 0.15035 95% HDI 0.28776 0.011461 0.4 0.3 0.2 0.1 0.0 0.1 µ 1 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 20 10 0 10 20 mean = 0.030098 95% HDI 0.057387 0.0020709 0.08 0.06 0.04 0.02 0.00 0.02 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 15 10 5 0 5 10 15 mean = 0.49828 mean = 2.0819 95% HDI 5.0558 4.1695 10 0 10 20 µ 1 95% HDI 1.0358 5.2717 10 5 0 5 10 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 15 10 5 0 5 10 15 mean = 2.5802 95% HDI 8.1708 2.9535 20 10 0 10 20 µ 1 µ 2

Re-evaluating Old Examples Density 0.00 0.02 0.04 0.06 0.08 15 10 5 0 5 10 15 mean = 0.43905 95% HDI 1.3398 0.44554 2 1 0 1 (µ 1 µ 2 ) (σ 1 2 + σ 22 ) 2

Re-evaluating Old Examples Frequency 0 10000 30000 20 10 0 10 20 µ 1 µ 2

Re-evaluating Old Examples Frequency 0 10000 30000 10 5 0 5 µ 1 µ 2

Questions?

Homework!!

You guessed it: modify model using the t distribution instead of normal

Creative Commons License Anyone is allowed to distribute, remix, tweak, and build upon this work, even commercially, as long as they credit me for the original creation. See the Creative Commons website for more information. Click here to go back to beginning