Linear models Analysis of Covariance

Similar documents
Linear models Analysis of Covariance

6. Multiple regression - PROC GLM

General Linear Model (Chapter 4)

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

Analysis of variance and regression. November 22, 2007

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

The General Linear Model. April 22, 2008

Parametrisations, splines

MATH ASSIGNMENT 2: SOLUTIONS

The General Linear Model. November 20, 2007

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Statistics 5100 Spring 2018 Exam 1

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

Answer to exercise 'height vs. age' (Juul)

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Table 1: Fish Biomass data set on 26 streams

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, 2009

Least Squares Analyses of Variance and Covariance

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Lecture 1 Linear Regression with One Predictor Variable.p2

using the beginning of all regression models

3 Variables: Cyberloafing Conscientiousness Age

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Topic 28: Unequal Replication in Two-Way ANOVA

Practical Biostatistics

FREC 608 Guided Exercise 9

Simple Linear Regression Analysis

ST430 Exam 1 with Answers

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Intro to Linear Regression

Intro to Linear Regression

Inferences for Regression

Sociology Exam 2 Answer Key March 30, 2012

Lecture 4 Multiple linear regression

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Model Selection Procedures

Lecture 11 Multiple Linear Regression

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Correlation and Simple Linear Regression

36-463/663: Multilevel & Hierarchical Models

Simple Linear Regression

SPECIAL TOPICS IN REGRESSION ANALYSIS

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Cohen s s Kappa and Log-linear Models

Analysis of Covariance

Ch 2: Simple Linear Regression

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

Calculating Odds Ratios from Probabillities

Two-Way ANOVA. Chapter 15

Beyond GLM and likelihood

1 A Review of Correlation and Regression

df=degrees of freedom = n - 1

STAT 705 Chapter 16: One-way ANOVA

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Topic 20: Single Factor Analysis of Variance

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Well-developed and understood properties

Analysing data: regression and correlation S6 and S7

A Re-Introduction to General Linear Models (GLM)

Lecture 11: Simple Linear Regression

Ch Inference for Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Multiple Regression: Chapter 13. July 24, 2015

Computer Exercise 3 Answers Hypothesis Testing

Simple Linear Regression for the Climate Data

ECON Introductory Econometrics. Lecture 2: Review of Statistics

Simple Linear Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Introduction to Crossover Trials

Lab 10 - Binary Variables

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Lecture 6 Multiple Linear Regression, cont.

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

STAT 3A03 Applied Regression With SAS Fall 2017

Chapter 1 Linear Regression with One Predictor

MATH 644: Regression Analysis Methods

Statistics for exp. medical researchers Regression and Correlation

Problem Set 10: Panel Data

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Lecture 3: Inference in SLR

Answer to exercise: Blood pressure lowering drugs

Analysis of Variance

Simple Linear Regression

BIOS 2083 Linear Models c Abdus S. Wahed

Statistical Modelling in Stata 5: Linear Models

Department of Mathematics & Statistics STAT 2593 Final Examination 17 April, 2000

Business Statistics. Lecture 10: Course Review

ECO220Y Simple Regression: Testing the Slope

Lab 11. Multilevel Models. Description of Data

Topic 29: Three-Way ANOVA

Transcription:

Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations

Analysis of Covariance group comparisons can become biased if an important predictor of the response is distributed differently in the groups An unbiased analysis can be obtained in a multiple regression analysis with the group variable and the predictor as independent variables Examples: Comparison of blood pressure level in men and women when they are not equally fat Comparison of lung capacity in men and women when they are not of the same height 1

Lung Capacity, TLC 32 patients are planned to have a heart/lung transplantation TLC (Total Lung Capacity) determined by means of whole body plethysmography Is there a difference in lung capacity between men and women? OBS SEX AGE HEIGHT TLC 1 F 35 149 3.40 2 F 11 138 3.41 3 M 12 148 3.80............... 29 F 20 162 8.05 30 M 25 180 8.10 31 M 22 173 8.70 32 M 25 171 9.45 2

Box plots: total lung capacity 4 6 8 female male height 140 160 180 female male 3

TTEST PROCEDURE Variable: TLC Marginal comparisons SEX N Mean Std Dev Std Error -------------------------------------------------------------------------- F 16 5.19812500 1.30082138 0.32520534 M 16 6.97687500 1.43801585 0.35950396 Variances T DF Prob> T Unequal -3.6693 29.7 0.0009 Equal -3.6693 30.0 0.0009 For H0: Variances are equal, F = 1.22 DF = (15,15) Prob>F = 0.7028 Variable: HEIGHT SEX N Mean Std Dev Std Error ----------------------------------------------------------------------- F 16 160.81250000 9.36816417 2.34204104 M 16 174.06250000 10.66126165 2.66531541 Variances T DF Prob> T Unequal -3.7344 29.5 0.0008 Equal -3.7344 30.0 0.0008 For H0: Variances are equal, F = 1.30 DF = (15,15) Prob>F = 0.6228 Clear difference for both TLC and HEIGHT 4

Analysis of covariance Comparison of parallel regression lines MODEL: Y gi = α g + βx gi + ǫ gi g = 1, 2; i = 1,...,n g 5

What happens if we forget about x? MODEL: Y gi = α g + βx gi + ǫ gi g = 1, 2; i = 1,...,n g If x 1 x 2, the difference in group means (Ȳ2 Ȳ1) is biased. 6

Interaction The two lines can have different slopes. More general model: y gi = α g + β g x gi + ǫ gi g = 1, 2; i = 1,...,n g If β 1 β 2, the two covariates interact: Effect of height depends on sex Difference between males and females depends on height 7

Relationship between TLC and HEIGHT: 8

Relationship between log-transformed TLC and height, HEIGHT 9

Model specification: Model with interaction proc glm; class sex; model ltlc=sex height sex*height / solution; run; Or in SAS Analyst: ANOVA/Linear models choose ltlc as dependent choose height as a quantitative variable choose sex as a class variable under the Model button insert the cross -term 10

Output Dependent Variable: LTLC Sum of Mean Source DF Squares Square F Value Pr > F Model 3 0.27230446 0.09076815 13.05 0.0001 Error 28 0.19478293 0.00695653 Corrected Total 31 0.46708739 R-Square C.V. Root MSE LTLC Mean 0.582984 10.85524 0.08341 0.76835 Source DF Type I SS Mean Square F Value Pr > F SEX 1 0.13626303 0.13626303 19.59 0.0001 HEIGHT 1 0.13451291 0.13451291 19.34 0.0001 HEIGHT*SEX 1 0.00152852 0.00152852 0.22 0.6429 Source DF Type III SS Mean Square F Value Pr > F SEX 1 0.00210426 0.00210426 0.30 0.5867 HEIGHT 1 0.13597107 0.13597107 19.55 0.0001 HEIGHT*SEX 1 0.00152852 0.00152852 0.22 0.6429 T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -.2190181620 B -0.62 0.5391 0.35221658 SEX F -.2810587157 B -0.55 0.5867 0.51102682 M 0.0000000000 B... HEIGHT 0.0060473650 B 2.99 0.0057 0.00201996 HEIGHT*SEX F 0.0014344422 B 0.47 0.6429 0.00306016 M 0.0000000000 B... 11

Relationship between log-transformed TLC and height, HEIGHT 12

The interaction term was excluded Reduction of the model Dependent Variable: LTLC Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.27077594 0.13538797 20.00 0.0001 Error 29 0.19631145 0.00676936 Corrected Total 31 0.46708739 R-Square C.V. Root MSE LTLC Mean 0.579712 10.70821 0.08228 0.76835 Source DF Type I SS Mean Square F Value Pr > F SEX 1 0.13626303 0.13626303 20.13 0.0001 HEIGHT 1 0.13451291 0.13451291 19.87 0.0001 Source DF Type III SS Mean Square F Value Pr > F SEX 1 0.00968023 0.00968023 1.43 0.2415 HEIGHT 1 0.13451291 0.13451291 19.87 0.0001 T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -.3278068826 B -1.25 0.2198 0.26135206 SEX F -.0421012632 B -1.20 0.2415 0.03520676 M 0.0000000000 B... HEIGHT 0.0066723630 4.46 0.0001 0.00149683 Note: Now the effect of sex has disappeared! 13

In this example we saw that Interpretation The observed difference in (log 10 ) lung function between females and males could be attributed to the difference in height A 95% confidence interval for log 10 -difference is 0.0421 ± 2 0.0352 = ( 0.0283,0.1125), corresponding to the interval (0.94, 1.30) for the ratio of lung capacity, i.e., men can have a 30% better lung function. It is also possible that Groups that appear to be equal in marginal analysis (e.g. blood pressure in men and women) show a difference after adjustment for important covariates (such as obesity) All variables with potential influence should be considered! 14

Example: Blood pressure vs. obesity and sex Marginal analysis indicates that there are no differences in blood pressure levels in males and females. However, when we adjust for the degree of obesity suddenly we can see a sex-difference. 15

with interaction: Model proc glm; class sex; model lbp=lobese sex sex*lobese / solution; run; 16

Output General Linear Models Procedure Dependent Variable: LBP Sum of Mean Source DF Squares Square F Value Pr > F Model 3 0.05583810 0.01861270 6.30 0.0006 Error 98 0.28952497 0.00295434 Corrected Total 101 0.34536306 Source DF Type I SS Mean Square F Value Pr > F LOBESE 1 0.03809379 0.03809379 12.89 0.0005 SEX 1 0.01597238 0.01597238 5.41 0.0221 LOBESE*SEX 1 0.00177193 0.00177193 0.60 0.4405 Source DF Type III SS Mean Square F Value Pr > F LOBESE 1 0.03920980 0.03920980 13.27 0.0004 SEX 1 0.01252714 0.01252714 4.24 0.0421 LOBESE*SEX 1 0.00177193 0.00177193 0.60 0.4405 T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 2.087171366 B 165.93 0.0001 0.01257865 SEX female -0.039290663 B -2.06 0.0421 0.01908066 male 0.000000000 B... LOBESE 0.227981122 B 1.73 0.0863 0.13158758 LOBESE*SEX female 0.123097524 B 0.77 0.4405 0.15894836 male 0.000000000 B... 17

Re-parametrization proc glm; class sex; model lbp=sex sex*lobese / noint solution; run; General Linear Models Procedure Dependent Variable: LBP Sum of Mean Source DF Squares Square F Value Pr > F Model 4 449.803216 112.450804 38062.97 0.0001 Error 98 0.289525 0.002954 Uncorrected Total 102 450.092741... Source DF Type III SS Mean Square F Value Pr > F SEX 2 141.530202 70.765101 23952.96 0.0001 LOBESE*SEX 2 0.054676 0.027338 9.25 0.0002 T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate SEX female 2.047880703 142.73 0.0001 0.01434744 male 2.087171366 165.93 0.0001 0.01257865 LOBESE*SEX female 0.351078645 3.94 0.0002 0.08915879 male 0.227981122 1.73 0.0863 0.13158758 18

The model is the same, 2 different parameterizations: 1. model lbp = lobese sex sex*lobese An intercept for the reference group (sex=1) An intercept difference from sex=0 to sex=1 An effect of lobese (slope) for the reference group A slope difference from sex=0 to sex=1 2. model lbp=sex sex*lobese / noint An intercept for each group (sex) A slope (lobese effect) for each group (sex) 19

Reduced model: no interaction (equal slopes) proc glm; class sex; model lbp=lobese sex / solution; run; 20

General Linear Models Procedure Reduced model, output Dependent Variable: LBP Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.05406617 0.02703308 9.19 0.0002 Error 99 0.29129690 0.00294239 Corrected Total 101 0.34536306... Source DF Type I SS Mean Square F Value Pr > F SEX 1 0.00116215 0.00116215 0.39 0.5311 LOBESE 1 0.05290402 0.05290402 17.98 0.0001 Source DF Type III SS Mean Square F Value Pr > F SEX 1 0.01597238 0.01597238 5.43 0.0218 LOBESE 1 0.05290402 0.05290402 17.98 0.0001 T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 2.081052655 B 213.05 0.0001 0.00976800 SEX female -0.027765105 B -2.33 0.0218 0.01191694 male 0.000000000 B... LOBESE 0.312347032 4.24 0.0001 0.07366198 NOTE: The X X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter B are biased, and are not unique estimators of the parameters. 21

Conclusion The male level is 0.0278 higher than the female level (for fixed level of obesity), but remember this is on a log 10 -scale Confidence interval: 0.0278 ± 2 0.0119 = (0.0040, 0.0516) Back-transformed: (1.009, 1.126), i.e. the male level is between 1% og 12.6% above the female level 22