REGRESSION MODELS ANOVA

Similar documents
REGRESSION METHODS. Logistic regression

Stat 139 Homework 7 Solutions, Fall 2015

Regression, Inference, and Model Building

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Chapter 13, Part A Analysis of Variance and Experimental Design

REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

1 Inferential Methods for Correlation and Regression Analysis

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Sample Size Determination (Two or More Samples)

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Properties and Hypothesis Testing

y ij = µ + α i + ɛ ij,

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Simple Linear Regression

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Correlation Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Biostatistics for Med Students. Lecture 2

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

1 Models for Matched Pairs

Describing the Relation between Two Variables

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Additional Notes and Computational Formulas CHAPTER 3

Linear Regression Models

Module 4: Regression Methods: Concepts and Applications

Common Large/Small Sample Tests 1/55

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

This is an introductory course in Analysis of Variance and Design of Experiments.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 6 Sampling Distributions

MA 575, Linear Models : Homework 3

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Output Analysis (2, Chapters 10 &11 Law)

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapter 12 Correlation

University of California, Los Angeles Department of Statistics. Simple regression analysis

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

STA6938-Logistic Regression Model

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed:

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

Final Examination Solutions 17/6/2010

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lecture 11 Simple Linear Regression

(all terms are scalars).the minimization is clearer in sum notation:

Statistical Hypothesis Testing. STAT 536: Genetic Statistics. Statistical Hypothesis Testing - Terminology. Hardy-Weinberg Disequilibrium

There is no straightforward approach for choosing the warmup period l.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Algebra of Least Squares

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

6 Sample Size Calculations

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

GG313 GEOLOGICAL DATA ANALYSIS

1036: Probability & Statistics

Lecture 10: Performance Evaluation of ML Methods

MA238 Assignment 4 Solutions (part a)

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Notes on Hypothesis Testing, Type I and Type II Errors

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

11 Correlation and Regression

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Read through these prior to coming to the test and follow them when you take your test.

Formulas and Tables for Gerstman

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Stat 200 -Testing Summary Page 1

Math 140 Introductory Statistics

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Transcription:

REGRESSION MODELS ANOVA 141

Cotiuous Outcome? NO RECAP: Logistic regressio ad other methods YES Liear Regressio Examie mai effects cosiderig predictors of iterest, ad cofouders Test effect modificatio if scietifically relevat Compute ad plot Residuals Assess ifluece Modify approach NO Do the assumptios appear reasoable? YES REPORT 142

COMING UP NEXT: ANOVA a special case of liear regressio What if the idepedet variables of iterest are categorical? I this case, comparig the mea of the cotiuous outcome i the differet categories may be of iterest This is what is called ANalysis Of VAriace We will show that it is just a special case of liear regressio 143

ANOVA a special case of liear regressio LINEAR REGRESSION Oe-way Aalysis of Variace Two-way Aalysis of Variace Aalysis of Covariace Oe Categorical POI Two Categorical POIs Oe Categorical POI + Oe cotiuous predictor Uses dummy variables to represet categorical variables! 144

Outlie Motivatio: We will cosider some examples of ANOVA ad show that they are special cases of liear regressio ANOVA as a regressio model Dummy variables Oe-way ANOVA models Cotrasts Multiple comparisos Two-way ANOVA models Iteractios ANCOVA models 145

ANOVA/ANCOVA: Motivatio Let s ivestigate if geetic factors are associated with cholesterol levels. Ideally, you would have a cofirmatory aalysis of scietific hypotheses formulated prior to data collectio Alteratively, you could cosider a exploratory aalysis hypotheses geeratio for future studies 146

ANOVA/ANCOVA: Motivatio Scietific hypotheses of iterest: Assess the effect of rs174548 o cholesterol levels. Assess the effect of rs174548 ad sex o cholesterol levels Does the effect of rs174548 o cholesterol differ betwee males ad females? Assess the effect of rs174548 ad age o cholesterol levels Does the effect of rs174548 o cholesterol differ depedig o subject s age? 147

ANOVA: Oe-Way Model Motivatio: Scietific questio: Assess the effect of rs174548 o cholesterol levels. 148

Motivatio: Example Here are some descriptive summaries: > tapply(chol, factor(rs174548), mea) 0 1 2 181.0617 187.8639 186.5000 > tapply(chol, factor(rs174548), sd) 0 1 2 21.13998 23.74541 17.38333 149

Motivatio: Example Aother way of gettig the same results: > by(chol, factor(rs174548), mea) factor(rs174548): 0 [1] 181.0617 ----------------------------------------------------------------- factor(rs174548): 1 [1] 187.8639 ----------------------------------------------------------------- factor(rs174548): 2 [1] 186.5 > by(chol, factor(rs174548), sd) factor(rs174548): 0 [1] 21.13998 ----------------------------------------------------------------- factor(rs174548): 1 [1] 23.74541 ----------------------------------------------------------------- factor(rs174548): 2 [1] 17.38333 150

Motivatio: Example Is rs174548 associated with cholesterol? 120 140 160 180 200 220 240 0 1 2 R commad: boxplot(chol ~ factor(rs174548)) 151

Motivatio: Example Aother graphical display: mea of chol 181 182 183 184 185 186 187 188 1 2 0 as.factor(rs174548) R commad: plot.desig(chol ~ factor(rs174548)) Factors 152

Motivatio: Example Feature: How do the mea resposes compare across differet groups? Categorical/qualitative predictor 153

REGRESSION MODELS Oe-way ANOVA as a regressio model 154

ANalysis Of VAriace Models (ANOVA) Compares the meas of several populatios 0.0 0.2 0.4 0.6 0.8-6 -4-2 0 2 4 6 Assumptios for Classical ANOVA Framework: Idepedece Normality Equal variaces 155

ANalysis Of VAriace Models (ANOVA) Compares the meas of several populatios 0.0 0.2 0.4 0.6 0.8-6 -4-2 0 2 4 6 156

ANalysis Of VAriace Models (ANOVA) Compares the meas of several populatios Couter-ituitive ame! 157

ANalysis Of VAriace Models (ANOVA) I both data sets, the true populatio meas are: 3 (A), 5 (B), 7(C) Situatio 1 Situatio 2 3 4 5 6 7-30 -20-10 0 10 20 30 40 A B C Low variace withi groups A B C High variace withi groups Where do you expect to detect differece betwee populatio meas? 158

ANalysis Of VAriace Models (ANOVA) Compares the meas of several populatios Couter-ituitive ame! Uderlyig cocept: To assess whether the populatio meas are equal, compares: Variatio betwee the sample meas (MSR) to Natural variatio of the observatios withi the samples (MSE). The larger the MSR compared to MSE the more support that there is a differece i the populatio meas! The ratio MSR/MSE is the F-statistic. We ca make these comparisos with multiple liear regressio: the differet groups are represeted with dummy variables 159

ANOVA as a multiple regressio model Dummy Variables: Suppose you have a categorical variable C with k categories 0,1, 2,, k-1. To represet that variable we ca costruct k-1 dummy variables of the form The omitted category (here category 0) is the referece group. 160

ANOVA as a multiple regressio model Dummy Variables: Back to our motivatig example: Predictor: rs174548 (coded 0=C/C, 1=C/G, 2=G/G) Outcome (Y): cholesterol Let s take C/C as the referece group. x 1 = ì1, í î0, if code1(c/g) otherwise x 2 = ì1, í î0, if code 2 (G/G) otherwise 161

ANOVA as a multiple regressio model rs174548 Mea cholesterol X 1 X 2 C/C µ 0 0 0 C/G µ 1 1 0 G/G µ 2 0 1 162

ANOVA as a multiple regressio model Regressio with Dummy Variables: Example: Model: E[Y x 1, x 2 ] = b 0 + b 1 x 1 + b 2 x 2 Iterpretatio of model parameters? 163

ANOVA as a multiple regressio model Mea Regressio Model µ 0 b 0 µ 1 b 0 + b 1 µ 2 b 0 + b 2 164

ANOVA as a multiple regressio model Regressio with Dummy Variables: Example: Model: E[Y x 1, x 2 ] = b 0 + b 1 x 1 + b 2 x 2 Iterpretatio of model parameters? µ 0 = b 0 : mea cholesterol whe rs174548 is C/C µ 1 = b 0 +b 1 : mea cholesterol whe rs174548 is C/G µ 2 = b 0 +b 2 : mea cholesterol whe rs174548 is G/G 165

ANOVA as a multiple regressio model Regressio with Dummy Variables: Example: Model: E[Y x 1, x 2 ] = b 0 + b 1 x 1 + b 2 x 2 Iterpretatio of model parameters? µ 0 = b 0 : mea cholesterol whe rs174548 is C/C µ 1 = b 0 +b 1 : mea cholesterol whe rs174548 is C/G µ 2 = b 0 +b 2 : mea cholesterol whe rs174548 is G/G Alteratively b 1 : differece i mea cholesterol levels betwee groups with rs174548 equal to C/G ad C/C (µ 1 - µ 0 ). b 2 : differece i mea cholesterol levels betwee groups with rs174548 equal to G/G ad C/C (µ 2 - µ 0 ). 166

ANOVA: Oe-Way Model Goal: Compare the meas of K idepedet groups (defied by a categorical predictor) Statistical Hypotheses: (Global) Null Hypothesis: H 0 : µ 0 = µ 1 = = µ K-1 or, equivaletly, H 0 : β 1 = β 2 = = β K-1 =0 Alterative Hypothesis: H 1 : ot all meas are equal If the meas of the groups are ot all equal (i.e. you rejected the above H 0 ), determie which oes are differet (multiple comparisos) 167

Estimatio ad Iferece Global Hypotheses µ = µ =... = µ K H 0 : vs. H 1 : ot all meas are equal 1 H 0 : β 1 = β 2 = = β K-1 =0 2 Aalysis of variace table Source df SS MS F 2 Regressio K-1 SSR= (y - y MSR= SSR/(K-1) Residual -K SSE= å 2 (yij - yi) MSE= i, j SSE/-K Total -1 SST= å i å i, j (y i ) 2 ij - y) MSR/ MSE 168

ANOVA: Oe-Way Model How to fit a oe-way model as a regressio problem? Need to use dummy variables Create o your ow (ca be tedious!) Most software packages will do this for you R creates dummy variables i the backgroud as log as you state you have a categorical variable (may eed to use: factor) 169

ANOVA: Oe-Way Model By had: Creatig dummy variables: > dummy1 = 1*(rs174548==1) > dummy2 = 1*(rs174548==2) Fittig the ANOVA model: > fit0 = lm(chol ~ dummy1 + dummy2) > summary(fit0) Call: lm(formula = chol ~ dummy1 + dummy2) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** dummy1 6.802 2.321 2.930 0.00358 ** dummy2 5.438 4.540 1.198 0.23167 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit0) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) dummy1 1 3624 3624 7.5381 0.006315 ** dummy2 1 690 690 1.4350 0.231665 Residuals 397 190875 481 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 170

ANOVA: Oe-Way Model Better: Let R do it for you! > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** factor(rs174548)1 6.802 2.321 2.930 0.00358 ** factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 171

ANOVA: Oe-Way Model Your tur! Compare model fit results (fit0 & fit1.1) What do you coclude? 172

ANOVA: Oe-Way Model > fit0 = lm(chol ~ dummy1 + dummy2) > summary(fit0) Call: lm(formula = chol ~ dummy1 + dummy2) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** dummy1 6.802 2.321 2.930 0.00358 ** dummy2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit0) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) dummy1 1 3624 3624 7.5381 0.006315 ** dummy2 1 690 690 1.4350 0.231665 Residuals 397 190875 481 --- > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** factor(rs174548)1 6.802 2.321 2.930 0.00358 ** factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- 173

ANOVA: Oe-Way Model > fit0 = lm(chol ~ dummy1 + dummy2) > summary(fit0) Call: lm(formula = chol ~ dummy1 + dummy2) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** dummy1 6.802 2.321 2.930 0.00358 ** dummy2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit0) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) dummy1 1 3624 3624 7.5381 0.006315 ** dummy2 1 690 690 1.4350 0.231665 Residuals 397 190875 481 --- > 1-pf(4.4865,2,397) [1] 0.01183671 > 1-pf(((3624+690)/2)/481,2,397) [1] 0.01186096 > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** factor(rs174548)1 6.802 2.321 2.930 0.00358 ** factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- 174

ANOVA: Oe-Way Model > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 factor(rs174548)1 6.802 2.321 2.930 0.00358 factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- Let s iterpret the regressio model results! What is the iterpretatio of the regressio model coefficiets? 175

ANOVA: Oe-Way Model > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 factor(rs174548)1 6.802 2.321 2.930 0.00358 factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- Iterpretatio: Estimated mea cholesterol for C/C group: 181.062 mg/dl Estimated differece i mea cholesterol levels betwee C/G ad C/C groups: 6.802 mg/dl Estimated differece i mea cholesterol levels betwee G/G ad C/C groups: 5.438 mg/dl 176

ANOVA: Oe-Way Model > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 factor(rs174548)1 6.802 2.321 2.930 0.00358 factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- Overall F-test shows a sigificat p-value. We reject the ull hypothesis that the mea cholesterol levels are the same across groups defied by rs174548 (p=0.01184). This does ot tell us which groups are differet! (Need to perform multiple comparisos! More soo ) 177

ANOVA: Oe-Way Model Alterative form: (better if you will perform multiple comparisos) > fit1.2 = lm(chol ~ -1 + factor(rs174548)) > summary(fit1.2) Call: lm(formula = chol ~ -1 + factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) factor(rs174548)0 181.062 1.455 124.41 <2e-16 *** factor(rs174548)1 187.864 1.809 103.88 <2e-16 *** factor(rs174548)2 186.500 4.300 43.37 <2e-16 *** --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.9861, Adjusted R-squared: 0.986 F-statistic: 9383 o 3 ad 397 DF, p-value: < 2.2e-16 > aova(fit1.2) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 3 13534205 4511402 9383.2 < 2.2e-16 *** Residuals 397 190875 481 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 178

ANOVA: Oe-Way Model How about this oe? How is rs174548 beig treated ow? Compare model fit results from (fit1.1 & fit2). > fit2 = lm(chol ~ rs174548) > summary(fit2) Call: lm(formula = chol ~ rs174548) Residuals: Mi 1Q Media 3Q Max -64.575-16.278-0.575 15.120 60.722 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.575 1.411 128.723 < 2e-16 *** rs174548 4.703 1.781 2.641 0.00858 ** --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.95 o 398 degrees of freedom Multiple R-squared: 0.01723, Adjusted R-squared: 0.01476 F-statistic: 6.977 o 1 ad 398 DF, p-value: 0.008583 > aova(fit2) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) rs174548 1 3363 3363 6.9766 0.008583 ** Residuals 398 191827 482 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 179

ANOVA: Oe-Way Model > fit2 = lm(chol ~ rs174548) > summary(fit2) Model: E[Y x] = b 0 + b 1 x where Y: cholesterol, x: rs174548 Call: lm(formula = chol ~ rs174548) Residuals: Mi 1Q Media 3Q Max -64.575-16.278-0.575 15.120 60.722 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.575 1.411 128.723 < 2e-16 *** rs174548 4.703 1.781 2.641 0.00858 ** Residual stadard error: 21.95 o 398 degrees of freedom Multiple R-squared: 0.01723, Adjusted R-squared: 0.01476 F-statistic: 6.977 o 1 ad 398 DF, p-value: 0.008583 > aova(fit2) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) rs174548 1 3363 3363 6.9766 0.008583 ** Residuals 398 191827 482 Iterpretatio of model parameters? b 0 : mea cholesterol i the C/C group [estimate: 181.575 mg/dl] b 1 : mea cholesterol differece betwee C/G ad C/C or betwee G/G ad C/G groups [estimate: 4.703 mg/dl] This model presumes differeces betwee cosecutive groups are the same (i this example, liear dose effect of allele) more restrictive tha the ANOVA model! Back to the ANOVA model 180

ANOVA: Oe-Way Model > fit1.1 = lm(chol ~ factor(rs174548)) > summary(fit1.1) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 factor(rs174548)1 6.802 2.321 2.930 0.00358 factor(rs174548)2 5.438 4.540 1.198 0.23167 --- We rejected the ull hypothesis that the mea cholesterol levels are the same across groups defied by rs174548 (p=0.01184). Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit1.1) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- What are the groups with differeces i meas? MULTIPLE COMPARISONS (comig up) 181

Oe-Way ANOVA allowig for uequal variaces We ca also perform oe-way ANOVA allowig for uequal variaces: > oeway.test(chol ~ factor(rs174548)) Oe-way aalysis of meas (ot assumig equal variaces) data: chol ad factor(rs174548) F = 4.3258, um df = 2.000, deom df = 73.284, p-value = 0.01676 We reject the ull hypothesis that the mea cholesterol levels are the same across groups defied by rs174548 (p=0.01676). What are the groups with differeces i meas? MULTIPLE COMPARISONS (comig up) 182

Oe-Way ANOVA with robust stadard errors > summary(gee(chol ~ factor(rs174548), id=seq(1,legth(chol)))) Begiig Cgee S-fuctio, @(#) geeformula.q 4.13 98/01/27 ruig glm to get iitial regressio estimate (Itercept) factor(rs174548)1 factor(rs174548)2 181.061674 6.802272 5.438326 GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA gee S-fuctio, versio 4.13 modified 98/01/27 (1998) Model: Lik: Idetity Variace to Mea Relatio: Gaussia Correlatio Structure: Idepedet Call: gee(formula = chol ~ factor(rs174548), id = seq(1, legth(chol))) Summary of Residuals: Mi 1Q Media 3Q Max -64.06167401-15.91337769-0.06167401 14.93832599 59.13605442 Coefficiets: Estimate Naive S.E. Naive z Robust S.E. Robust z (Itercept) 181.061674 1.455346 124.411431 1.400016 129.328297 factor(rs174548)1 6.802272 2.321365 2.930290 2.402005 2.831914 factor(rs174548)2 5.438326 4.539833 1.197913 3.624271 1.500530 Estimated Scale Parameter: 480.7932 Number of Iteratios: 1 183

Kruskal-Wallis Test No-parametric aalogue to the oe-way ANOVA Based o raks I our example: > kruskal.test(chol ~ factor(rs174548)) Kruskal-Wallis rak sum test data: chol by factor(rs174548) Kruskal-Wallis chi-squared = 7.4719, df = 2, p-value = 0.02385 Coclusio: Evidece that the cholesterol distributio is ot the same across all groups. With the global ull rejected, you ca also perform pairwise comparisos [Wilcoxo rak sum], but adjust for multiplicities! 184

REGRESSION METHODS MULTIPLE COMPARISONS 185

ANOVA: Oe-Way Model What are the groups with differeces i meas? MULTIPLE COMPARISONS: µ 0 = µ 1? µ 0 = µ 2? Pairwise comparisos µ 1 = µ 2? (µ 1 + µ 2 )/2 = µ 0? No-pairwise compariso 186

Multiple Comparisos: Family-wise error rates Illustratig the multiple compariso problem Truth: ull hypotheses Tests: pairwise comparisos - each at the 5% level. What is the probability of rejectig at least oe? #groups = K 2 3 4 5 6 7 8 9 10 #pairwise comparisos C = K(K-1)/2 P(at least oe sig) =1-(1-0.05) C 1 3 6 10 15 21 28 36 45 0.05 0.143 0.265 0.401 0.537 0.659 0.762 0.842 0.901 That is, if you have three groups ad make pairwise comparisos, each at the 5% level, your familywise error rate (probability of makig at least oe false rejectio) is over 14%! Need to address this issue! Several methods!!! 187

Multiple Comparisos Several methods: Noe (o adjustmet) Boferroi Holm Hochberg Hommel BH BY FDR Available i R 188

Multiple Comparisos Boferroi adjustmet: for C tests performed, use level α/c (or multiply p-values by C). Simple Coservative Must decide o umber of tests beforehad Widely applicable Ca be doe without software! 189

Multiple Comparisos FDR (False Discovery Rate) Less coservative procedure for multiple comparisos Amog rejected hypotheses, FDR cotrols the expected proportio of icorrectly rejected ull hypotheses (that is, type I errors). 190

Multiple Comparisos This optio cosiders all pairwise comparisos > ## call library for multiple comparisos > library(multcomp) > > ## fit model > fit1 = lm(chol ~ -1 + factor(rs174548)) > > ## all pairwise comparisos > ## -- first, defie matrix of cotrasts > M = cotrmat(table(rs174548), type="tukey") > M Multiple Comparisos of Meas: Tukey Cotrasts 0 1 2 1-0 -1 1 0 2-0 -1 0 1 2-1 0-1 1 > > ## -- secod, obtai estimates for multiple comparisos > mc = glht(fit1, lifct =M) Stads for geeral liear hypothesis testig 191

Multiple Comparisos > ## -- third, adjust the p-values (or ot) for multiple comparisos > summary(mc, test=adjusted("oe")) Simultaeous Tests for Geeral Liear Hypotheses Multiple Comparisos of Meas: Tukey Cotrasts Fit: lm(formula = chol ~ -1 + factor(rs174548)) Liear Hypotheses: Estimate Std. Error t value Pr(> t ) 1-0 == 0 6.802 2.321 2.930 0.00358 ** 2-0 == 0 5.438 4.540 1.198 0.23167 2-1 == 0-1.364 4.665-0.292 0.77015 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Adjusted p values reported -- oe method) 192

Multiple Comparisos > summary(mc, test=adjusted("boferroi")) Simultaeous Tests for Geeral Liear Hypotheses Multiple Comparisos of Meas: Tukey Cotrasts Fit: lm(formula = chol ~ -1 + factor(rs174548)) Liear Hypotheses: Estimate Std. Error t value Pr(> t ) 1-0 == 0 6.802 2.321 2.930 0.0107 * 2-0 == 0 5.438 4.540 1.198 0.6950 2-1 == 0-1.364 4.665-0.292 1.0000 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Adjusted p values reported -- boferroi method) 193

Multiple Comparisos > summary(mc, test=adjusted("fdr")) Simultaeous Tests for Geeral Liear Hypotheses Multiple Comparisos of Meas: Tukey Cotrasts Fit: lm(formula = chol ~ -1 + factor(rs174548)) Liear Hypotheses: Estimate Std. Error t value Pr(> t ) 1-0 == 0 6.802 2.321 2.930 0.0107 * 2-0 == 0 5.438 4.540 1.198 0.3475 2-1 == 0-1.364 4.665-0.292 0.7702 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Adjusted p values reported -- fdr method) 194

Multiple Comparisos What about usig other adjustmet methods? For example, we used: > summary(mc, test=adjusted("boferroi")) (all pairwise comparisos, with Boferroi adjustmet) > summary(mc, test=adjusted("fdr")) (all pairwise comparisos, with FDR adjustmet) Other optios are: summary(mc, test=adjusted("holm")) summary(mc, test=adjusted("hochberg")) summary(mc, test=adjusted("hommel")) summary(mc, test=adjusted("bh")) summary(mc, test=adjusted("by")) Results, i this particular example, are basically the same, but they do t eed to be! Differet criteria could lead to differet results! 195

Summary: Relatioships: GOAL: Compariso of Meas across K groups Oe-way ANOVA: H 0 :µ 0 = µ 1 = = µ K-1 H 1 : ot all meas are equal Rejected H 0? Multiple Regressio: Model: E[Y groups]= b 0 + b 1 group 2 + +b k-1 group k where group 1 is the referece group H 0 :b 1 = b 2 = = b k-1 =0 H 1 : ot all b i are equal to zero YES Multiple Comparisos (cotrol a overall) e.g. Boferroi: a/#comparisos 196

REGRESSION METHODS Two-way ANOVA models 197

ANOVA: Two-Way Model Motivatio: Scietific questio: Assess the effect of rs174548 ad sex o cholesterol levels. 198

ANOVA: Two-Way Model Factors: A ad B Goals: Test for mai effect of A Test for mai effect of B Test for iteractio effect of A ad B 199

ANOVA: Two-Way Model To simplify discussio, assume that factor A has three levels, while factor B has two levels Factor A A 1 A 2 A 3 Factor B B 1 µ 11 µ 21 µ 31 B 2 µ 12 µ 22 µ 32 200

ANOVA: Two-Way Model Meas B 2 Parallel lies = No iteractio B 1 A 1 A 2 A 3 B 2 Lies are ot parallel = Iteractio B 1 A 1 A 2 A 3 201

ANOVA: Two-Way Model Recall: Categorical variables ca be represeted with dummy variables Iteractios are represeted with cross-products 202

ANOVA: Two-Way Model Model 1: E[Y A 2, A 3, B 2 ] = b 0 + b 1 A 2 + b 2 A 3 + b 3 B 2. What are the meas i each combiatio-group? A 1 A 2 A 3 B 1 µ 11 =b 0 µ 21 =b 0 + b 1 µ 31 =b 0 + b 2 B 2 µ 12 =b 0 + b 3 µ 22 =b 0 + b 1 + b 3 µ 32 = b 0 + b 2 + b 3 203

ANOVA: Two-Way Model Model 1: E[Y A 2, A 3, B 2 ] = b 0 + b 1 A 2 + b 2 A 3 + b 3 B 2. A 1 A 2 A 3 B 1 µ 11 =b 0 µ 21 =b 0 + b 1 µ 31 =b 0 + b 2 B 2 µ 12 =b 0 + b 3 µ 22 =b 0 + b 1 + b 3 µ 32 = b 0 + b 2 + b 3 Model with o iteractio: Differece i meas betwee groups defied by factor B does ot deped o the level of factor A. Differece i meas betwee groups defied by factor A does ot deped o the level of factor B. 204

ANOVA: Two-Way Model Model 2: E[Y A 2, A 3, B 2 ] = b 0 + b 1 A 2 + b 2 A 3 + b 3 B 2 + b 4 A 2 B 2 + b 5 A 3 B 2 What are the meas i each combiatio-group? A 1 A 2 A 3 B 1 µ 11 =b 0 µ 21 =b 0 + b 1 µ 31 =b 0 + b 2 B 2 µ 12 =b 0 + b 3 µ 22 =b 0 + b 1 + b 3 + b 4 µ 32 = b 0 + b 2 + b 3 + b 5 205

ANOVA: Two-Way Model Three (possible) tests Iteractio of A ad B (may wat to start here) Rejectio would imply that differeces betwee meas of A depeds o the level of B (ad vice-versa) so stop Mai effect of A Test oly if o iteractio Mai effect of B Test oly if o iteractio [ Note: If you have oe observatio per cell, you caot test iteractio! ] 206

ANOVA: Two-Way Model Model without iteractio E[Y A 2, A 3, B 2 ] = b 0 + b 1 A 2 + b 2 A 3 + b 3 B 2. How do we test for mai effect of factor A? H 0 : b 1 = b 2 =0 vs. H 1 : b 1 or b 2 ot zero How do we test for mai effect of factor B? H 0 : b 3 =0 vs. H 1 : b 3 ot zero 207

ANOVA: Two-Way Model Model with iteractio: E[Y A 2, A 3, B 2 ] = b 0 + b 1 A 2 + b 2 A 3 + b 3 B 2 + b 4 A 2 B 2 + b 5 A 3 B 2 How do we test for iteractios? H 0 : b 4 = b 5 =0 vs. H 1 : b 4 or b 5 ot zero IMPORTANT: If you reject the ull, do ot test mai effects!!! 208

ANOVA: Two-Way Model (without iteractio) > fit1 = lm(chol ~ factor(sex) + factor(rs174548)) > summary(fit1) Call: lm(formula = chol ~ factor(sex) + factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -66.6534-14.4633-0.6008 15.4450 57.6350 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 175.365 1.786 98.208 < 2e-16 *** factor(sex)1 11.053 2.126 5.199 3.22e-07 *** factor(rs174548)1 7.236 2.250 3.215 0.00141 ** factor(rs174548)2 5.184 4.398 1.179 0.23928 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.24 o 396 degrees of freedom Multiple R-squared: 0.08458, Adjusted R-squared: 0.07764 F-statistic: 12.2 o 3 ad 396 DF, p-value: 1.196e-07 > aova(fit0,fit1) Aalysis of Variace Table Model 1: chol ~ factor(sex) Model 2: chol ~ factor(sex) + factor(rs174548) Res.Df RSS Df Sum of Sq F Pr(>F) 1 398 183480 2 396 178681 2 4799.1 5.318 0.005259 ** 209

ANOVA: Two-Way Model (without iteractio) > fit1 = lm(chol ~ factor(sex) + factor(rs174548)) > summary(fit1) Call: lm(formula = chol ~ factor(sex) + factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -66.6534-14.4633-0.6008 15.4450 57.6350 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 175.365 1.786 98.208 < 2e-16 *** factor(sex)1 11.053 2.126 5.199 3.22e-07 *** factor(rs174548)1 7.236 2.250 3.215 0.00141 ** factor(rs174548)2 5.184 4.398 1.179 0.23928 Residual stadard error: 21.24 o 396 degrees of freedom Multiple R-squared: 0.08458, Adjusted R-squared: 0.07764 F-statistic: 12.2 o 3 ad 396 DF, p-value: 1.196e-07 > aova(fit0,fit1) Aalysis of Variace Table Iterpretatio of results: Estimated mea cholesterol for male C/C group: 175.37 mg/dl Estimated differece i mea cholesterol levels betwee females ad males adjusted by geotype: 11.053 mg/dl Estimated differece i mea cholesterol levels betwee C/G ad C/C groups adjusted by sex: 7.236 mg/dl Estimated differece i mea cholesterol levels betwee G/G ad C/C groups adjusted by sex: 5.184 mg/dl Model 1: chol ~ factor(sex) Model 2: chol ~ factor(sex) + factor(rs174548) Res.Df RSS Df Sum of Sq F Pr(>F) 1 398 183480 2 396 178681 2 4799.1 5.318 0.005259 ** There is evidece that cholesterol is associated with sex (p< 0.001). There is evidece that cholesterol is associated with geotype (p=0.005) 210

ANOVA: Two-Way Model (without iteractio) I words: Adjustig for sex, the differece i mea cholesterol comparig C/G to C/C is 7.236 ad comparig G/G to C/C is 5.184. This differece does ot deped o sex (this is because the model does ot have a iteractio betwee sex ad geotype!) 211

ANOVA: Two-Way Model (with iteractio) > fit2 = lm(chol ~ factor(sex) * factor(rs174548)) > summary(fit2) Call: lm(formula = chol ~ factor(sex) * factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -70.5286-13.6037-0.9736 14.1709 54.8818 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 178.1182 2.0089 88.666 < 2e-16 *** factor(sex)1 5.7109 2.7982 2.041 0.04192 * factor(rs174548)1 0.9597 3.1306 0.307 0.75933 factor(rs174548)2-0.2015 6.4053-0.031 0.97492 factor(sex)1:factor(rs174548)1 12.7398 4.4650 2.853 0.00456 ** factor(sex)1:factor(rs174548)2 10.2296 8.7482 1.169 0.24297 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.07 o 394 degrees of freedom Multiple R-squared: 0.1039, Adjusted R-squared: 0.09257 F-statistic: 9.14 o 5 ad 394 DF, p-value: 3.062e-08 212

ANOVA: Model compariso > aova(fit1,fit2) Aalysis of Variace Table Model 1: chol ~ factor(sex) + factor(rs174548) Model 2: chol ~ factor(sex) * factor(rs174548) Res.Df RSS Df Sum of Sq F Pr(>F) 1 396 178681 2 394 174902 2 3779 4.2564 0.01483 * --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 213

ANOVA: Two-Way Model (with iteractio) > fit2 = lm(chol ~ factor(sex) * factor(rs174548)) > summary(fit2) Call: lm(formula = chol ~ factor(sex) * factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -70.5286-13.6037-0.9736 14.1709 54.8818 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 178.1182 2.0089 88.666 < 2e-16 *** factor(sex)1 5.7109 2.7982 2.041 0.04192 * factor(rs174548)1 0.9597 3.1306 0.307 0.75933 factor(rs174548)2-0.2015 6.4053-0.031 0.97492 factor(sex)1:factor(rs174548)1 12.7398 4.4650 2.853 0.00456 ** factor(sex)1:factor(rs174548)2 10.2296 8.7482 1.169 0.24297 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.07 o 394 degrees of freedom Multiple R-squared: 0.1039, Adjusted R-squared: 0.09257 F-statistic: 9.14 o 5 ad 394 DF, p-value: 3.062e-08 > aova(fit1,fit2) Aalysis of Variace Table Iterpretatio of results: Estimated mea cholesterol for male C/C group: 178.12 mg/dl Estimated mea cholesterol for female C/C group? (178.12 + 5.7109) mg/dl Estimated mea cholesterol for male C/G group: (178.12 +0.9597) mg/dl Estimated mea cholesterol for female C/G group: (178.12 + 5.7109 + 0.9597 + 12.7398) mg/dl Model 1: chol ~ factor(sex) + factor(rs174548) Model 2: chol ~ factor(sex) * factor(rs174548) Res.Df RSS Df Sum of Sq F Pr(>F) 1 396 178681 2 394 174902 2 3779 4.2564 0.01483 * --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 There is evidece for a iteractio betwee sex ad geotype (p= 0.015) 214

SUMMARY: Two-Way ANOVA Sigificat Iteractio? NO Iterpret mai effects of factor A ad factor B YES Iterpret the effect of factor A o mea respose for each level of factor B (or effect of factor B o mea respose for each level of factor A) 215

REGRESSION METHODS ANCOVA (aka ANACOVA) 216

ANalysis of COVAriace Models (ANCOVA) Motivatio: Scietific questio: Assess the effect of rs174548 o cholesterol levels adjustig for age 217

ANalysis of COVAriace Models (ANCOVA) ANOVA with oe or more cotiuous variables Equivalet to regressio with dummy variables ad cotiuous variables Primary compariso of iterest is across k groups defied by a categorical variable, but the k groups may differ o some other potetial predictor or cofouder variables [also called covariates]. 218

ANalysis of COVAriace Models (ANCOVA) To facilitate discussio assume Y: cotiuous respose (e.g. cholesterol) X: cotiuous variable (e.g. age) Z: dummy variable (e.g. idicator of C/G or G/G versus C/C) Model: Y 0 1 2 3 = b + b X + b Z + b XZ + e Note that: Z = 0 Þ E[ Y X, Z Z = 1Þ E[ Y X, Z = 0] = b = 1] = ( b 0 0 + b X 1 + b ) + 2 ( b + b ) X 1 Iteractio term 3 This model allows for differet itercepts/slopes for each group. 219

ANCOVA Testig coicidet lies: H0 : b2 = 0, b3 = Compares overall model with reduced model Y 0 1 = b + b X + e 0 Testig parallelism: H 0 : b3 = Compares overall model with reduced model Y 0 1 2 0 = b + b X + b Z + e 220

ANCOVA > fit0 = lm(chol ~ factor(rs174548)) > summary(fit0) Call: lm(formula = chol ~ factor(rs174548)) Residuals: Mi 1Q Media 3Q Max -64.06167-15.91338-0.06167 14.93833 59.13605 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 181.062 1.455 124.411 < 2e-16 *** factor(rs174548)1 6.802 2.321 2.930 0.00358 ** factor(rs174548)2 5.438 4.540 1.198 0.23167 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual stadard error: 21.93 o 397 degrees of freedom Multiple R-squared: 0.0221, Adjusted R-squared: 0.01718 F-statistic: 4.487 o 2 ad 397 DF, p-value: 0.01184 > aova(fit0) Aalysis of Variace Table Respose: chol Df Sum Sq Mea Sq F value Pr(>F) factor(rs174548) 2 4314 2157 4.4865 0.01184 * Residuals 397 190875 481 --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 221

ANCOVA > fit1 = lm(chol ~ factor(rs174548) + age) > summary(fit1) Call: lm(formula = chol ~ factor(rs174548) + age) Residuals: Mi 1Q Media 3Q Max -57.2089-14.4293 0.4443 14.2652 55.8985 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 163.28125 4.36422 37.414 < 2e-16 *** factor(rs174548)1 7.30137 2.27457 3.210 0.00144 ** factor(rs174548)2 5.08431 4.44331 1.144 0.25321 age 0.32140 0.07457 4.310 2.06e-05 *** Residual stadard error: 21.46 o 396 degrees of freedom Multiple R-squared: 0.06592, Adjusted R-squared: 0.05884 F-statistic: 9.316 o 3 ad 396 DF, p-value: 5.778e-06 > aova(fit0,fit1) Aalysis of Variace Table Model 1: chol ~ factor(rs174548) Model 2: chol ~ factor(rs174548) + age Res.Df RSS Df Sum of Sq F Pr(>F) 1 397 190875 2 396 182322 1 8552.9 18.577 2.062e-05 *** --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 222

ANCOVA Total cholesterol (mg/dl) 120 140 160 180 200 220 240 C/C C/G G/G 30 40 50 60 70 80 Age (years) 223

ANCOVA > fit2 = lm(chol ~ factor(rs174548) * age) > summary(fit2) Call: lm(formula = chol ~ factor(rs174548) * age) Residuals: Mi 1Q Media 3Q Max -57.5425-14.3002 0.7131 14.2138 55.7089 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 164.14677 5.79545 28.323 < 2e-16 *** factor(rs174548)1 3.42799 8.79946 0.390 0.69707 factor(rs174548)2 16.53004 18.28067 0.904 0.36642 age 0.30576 0.10154 3.011 0.00277 ** factor(rs174548)1:age 0.07159 0.15617 0.458 0.64692 factor(rs174548)2:age -0.20255 0.31488-0.643 0.52043 Residual stadard error: 21.49 o 394 degrees of freedom Multiple R-squared: 0.06777, Adjusted R-squared: 0.05594 F-statistic: 5.729 o 5 ad 394 DF, p-value: 4.065e-05 224

ANCOVA > fit0 = lm(chol ~ age) > summary(fit0) Call: lm(formula = chol ~ age) Residuals: Mi 1Q Media 3Q Max -60.453-14.643-0.022 14.659 58.995 Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) 166.90168 4.26488 39.134 < 2e-16 *** age 0.31033 0.07524 4.125 4.52e-05 *** --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Test of coicidet lies Residual stadard error: 21.69 o 398 degrees of freedom Multiple R-squared: 0.04099, Adjusted R-squared: 0.03858 F-statistic: 17.01 o 1 ad 398 DF, p-value: 4.522e-05 > aova(fit0,fit2) Aalysis of Variace Table Model 1: chol ~ age Model 2: chol ~ factor(rs174548) * age Res.Df RSS Df Sum of Sq F Pr(>F) 1 398 187187 2 394 181961 4 5226.6 2.8293 0.02455 * --- Sigif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 225

ANCOVA Test of parallel lies > aova(fit1,fit2) Aalysis of Variace Table Model 1: chol ~ factor(rs174548) + age Model 2: chol ~ factor(rs174548) * age Res.Df RSS Df Sum of Sq F Pr(>F) 1 396 182322 2 394 181961 2 361.11 0.391 0.6767 226

ANCOVA Total cholesterol (mg/dl) 120 140 160 180 200 220 240 C/C C/G G/G 30 40 50 60 70 80 Age (years) 227

228 ANCOVA I summary: If the slopes are ot equal, the age is a effect modifier If the slopes are the same, ) ( ) ( ) ( ) ( ], [ 5 4 3 2 1 0 GG x CG x GG CG x z x E Y * + * + + + + = b b b b b b ) ( ) ( ], [ 3 2 1 0 GG CG x z x Y E b b b b + + + =

229 ANCOVA If the slopes are the same, the oe ca obtai adjusted meas for the three geotypes usig the mea age over all groups For example, the adjusted meas for the three groups would be 1 3 0 3 1 2 0 2 1 0 1 ˆ ) ˆ ˆ ( Y (adj) ˆ ) ˆ ˆ ( Y (adj) ˆ ˆ Y (adj) b b b b b b b b x x x + + = + + = + = ) ( ) ( ], [ 3 2 1 0 GG CG x z x Y E b b b b + + + =

ANCOVA > ## mea cholesterol for differet geotypes adjusted by age > predict(fit1, ew=data.frame(age=mea(age),rs174548=0)) 1 180.9013 > predict(fit1, ew=data.frame(age=mea(age),rs174548=1)) 1 188.2026 > predict(fit1, ew=data.frame(age=mea(age),rs174548=2)) 1 185.9856 230

SUMMARY: ANCOVA Sigificat Iteractio? (slopes are differet?) NO Cotrol for potetial cofouder? YES YES Iterpret the differece i meas of the respose for give values of the cotiuous variable Compute adjusted meas at the commo X mea 231

Summary We have cosidered: ANOVA ad ANCOVA Iterpretatio Estimatio Iteractio Multiple comparisos 232