6. Multiple regression - PROC GLM

Size: px
Start display at page:

Download "6. Multiple regression - PROC GLM"

Transcription

1 Use of SAS - November Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. kach@biostat.ku.dk, tel:

2 Contents Analysis of covariance (ANCOVA): the general linear model Interaction Multiple regression Automatic variable selection 2

3 Data example: lung capacity Data from 32 patients subject to a heart/lung transplantation. TLC (Total Lung Capacity) is determined from whole-body plethysmography. Are men and women different with respect to total lung capacity? OBS SEX AGE HEIGHT TLC 1 F F M F M M M

4 Box plots for comparison of sex groups PROC GPLOT DATA=TLCdata; PLOT tlc*sex / HAXIS=AXIS1 VAXIS=AXIS2; AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET=(6,6)CM; AXIS2 LABEL=(H=3 A=90) VALUE=(H=2); SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3; RUN; QUIT; 4

5 Box plots for comparison of sex groups 5

6 Group comparisons Using t-tests PROC TTEST DATA=tlc; CLASS sex; VAR tlc height; RUN; Note: we can specify more than one variable in the VAR statement 6

7 Output T-Tests Variable Method Variances DF t Value Pr > t TLC Pooled Equal TLC Satterthwaite Unequal Height Pooled Equal Height Satterthwaite Unequal Equality of Variances Variable Method Num DF Den DF F Value Pr > F TLC Folded F Height Folded F Obvious sex difference for TLC as well as for Height 7

8 Confounding when comparing groups Occurs if the distributions of some other relevant explanatory variables differ between the groups. Here relevant means things we would have liked to be the same (or at least very similar) for everybody, because we think of it as noise or distortion. Can be reduced by performing a regression analysis with the relevant variables as covariates. Confounding could be a problem in the current example, if we intended to compare the lung function between men and women of similar height 8

9 Relation between tlc and height: PROC GPLOT DATA=TLCdata; PLOT tlc*height=sex / HAXIS=AXIS1 VAXIS=AXIS2; AXIS1 LABEL=(H=4) VALUE=(H=3) MINOR=NONE; AXIS2 LABEL=(A=90 H=4) VALUE=(H=3) ORDER=(3 TO 10) MINOR=NONE; SYMBOL1 C=RED V=DOT H=2 I=SM75S L=1 W=3 MODE=INCLUDE; SYMBOL2 C=BLUE V=CIRCLE H=2 I=SM75S L=41 W=3 MODE=INCLUDE; LEGEND1 LABEL=(H=2.5) VALUE=(H=2 JUSTIFY=LEFT); RUN; QUIT; 9

10 Relation between tlc and height: (Plotted using I=RL) 10

11 Analysis of covariance Comparison of parallel regression lines Model: y gi = α g + βx gi + ε gi g = 1, 2; i = 1,, n g Here α 2 α 1 is the expected difference in the response between the two groups for fixed value of the covariate, that is, when comparing any two subjects who have the same value of (match on) the covariate x ( adjusted for x ). 11

12 But what if the lines are not parallel? More general model: y gi = α g + β g x gi + ε gi If β 1 β 2 there is an interaction between Height and Sex 12

13 Interaction Interaction between Height and Sex The effect of height depends on sex The difference between men and women depends on height 13

14 Model with interaction Group variables PROC REG only works for linear covariates. Group variables can be handled directly in PROC GLM by specifying the group variable as a CLASS variable. SAS code PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex height sex*height / SOLUTION; RUN; QUIT; The option SOLUTION is needed if we want to see the regression parameter estimates. 14

15 Output The GLM Procedure Class Level Information Class Levels Values Sex 2 F M Number of Observations Read 32 Number of Observations Used 32 Dependent Variable: TLC Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE TLC Mean

16 Output II Source DF Type I SS Mean Square F Value Pr > F Sex Height Height*Sex Source DF Type III SS Mean Square F Value Pr > F Sex Height Height*Sex The interaction is not significant. The Type III p-values for the two main effects should never be used for anything in a model including the interaction! 16

17 Output III Standard Parameter Estimate Error t Value Pr > t Intercept B Sex F B Sex M B... Height B Height*Sex F B Height*Sex M B... NOTE: The X X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter B are not uniquely estimable. These are the regression parameters 17

18 Where are the two lines in the output? Line for males (the reference group): TLC = Height Line for females: TLC = ( 1.727) + ( ) Height = Height 18

19 Same model, new parameterization PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex sex*height / NOINT SOLUTION; RUN; QUIT; Output... Standard Parameter Estimate Error t Value Pr > t Sex F Sex M Height*Sex F Height*Sex M

20 Same model, but with two different parameterizations PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex height sex*height / SOLUTION; RUN; QUIT; The (extrapolated) level at Height=0 for the reference group (Sex= M ) The (extrapolated) difference between the two sexes at Height=0 An effect of Height (slope) for the reference group The difference between the slopes for the two sexes PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex sex*height / NOINT SOLUTION; RUN; QUIT; The (extrapolated) level at Height=0 for each group (Sex) The effect of Height (slope) for each group (Sex) 20

21 Model without interaction No indication of interaction, we omit the term PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex height / SOLUTION CLPARM; RUN; QUIT; Are there also other possible parameterizations in this model? (and which one should we use?) 21

22 Source DF Type I SS Mean Square F Value Pr > F Sex Height Source DF Type III SS Mean Square F Value Pr > F Sex Height Note: The effect of sex seen in the group comparison has disappeared!! 22

23 Model without interaction - results Standard Parameter Estimate Error t Value Pr > t Intercept B Sex F B Sex M B... Height Parameter 95% Confidence Limits Intercept Sex F Sex M.. Height

24 Confounding? In this example it seems that 1. The observed difference in lung capacity between men and women can be explained by height differences 2. However, there may still be a sex difference for persons of the same height (women vs. men), estimated as 0.77 ± = ( 1.78, 0.24) 24

25 But... what if we had not had the two very small men to pull the line for the men? Let us look at the subjects above 152 cm (using the statement WHERE height>152; when running PROC GLM): Test of interaction: Source DF Type I SS Mean Square F Value Pr > F Sex Height Height*Sex Estimated additive effects: Parameter Estimate 95% Confidence Limits Pr > t Intercept B Sex F B Sex M B... Height A somewhat different conclusion... 25

26 Plots for model checking in the HTML output: ODS GRAPHICS ON; PROC GLM DATA=TLCdata PLOTS=(DIAGNOSTICS RESIDUALS(SMOOTH)); CLASS sex; MODEL tlc=sex height sex*height / SOLUTION; OUTPUT OUT=WithResid RSTUDENT=NormResidWithoutCurrent; RUN; QUIT; PROC GPLOT DATA=WithResid; PLOT NormResidWithoutCurrent * sex; SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3; RUN; QUIT; In addition to the ODS GRAPHICS plots for PROC GLM, residuals should be plotted against each of the CLASS variables (here sex) in order to check the variance homogeneity across the different values of each CLASS variable. 26

27 Exercise: Another look the Juul data. 1. Get the data into SAS using a libname statement. 2. Create a new data set including only individuals above 25 years, and make a new variable with log-transformed SIGF-I. 3. Use PROC GPLOT to plot the relationship between age and log-transformed SIGF-I. 4. Make separate regression lines for men and women. 5. Do a regression analysis to explore if slopes are equal in men and women. 6. Give an estimate for the difference in slopes, with 95% confidence interval. 27

28 Multiple regression. General linear model (GLM). Data: n sets of observations, made on the same unit : unit x 1...x p y 1 x 11...x 1p y 1 2 x 21...x 2p y 2 3 x 31...x 3p y n x n1...x np y n The linear regression model with p explanatory variables (covariates) is written: y = β 0 + β 1 x β p x p + ε 28

29 Interpretation of regression coefficients Model Y i = β 0 + β 1 X i1 + β 2 X i β p X ip + ɛ where ɛ N(0, σ 2 ). Consider two subjects: A has covariate values (X 1, X 2,..., X p ) B has covariate values (X 1 + 1, X 2,..., X p ) Expected difference in the response (B A) [β 0 +β 1 (X 1 +1)+β 2 X i2 +...] [β 0 +β 1 X 1 +β 2 X i2 +...] = β 1 This means that β 1 is the effect of one unit s difference in X 1 for fixed levels of the other variables (X 2,..., X p ) 29

30 Example: School-age obesity score versus height and weight measured at 1 year of age Obs Obesity Height1 Weight

31 Example: School-age obesity score versus height and weight measured at 1 year of age 31

32 SAS code PROC REG DATA=SchoolObesity; MODEL Obesity = Height1 Weight1 / CLB; RUN; QUIT; (part of the) output Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept Height Weight <.0001 Parameter Estimates Variable DF 95% Confidence Limits Intercept Height Weight

33 Interpretation of regression parameters Remember that β j is the effect of the j th explanatory variable, corrected for the effect of the other explanatory variables, that is, when comparing any two subject who match on all the other variables. The effect of Height1 corrected for the effect of Weight1 is found to be ˆβ 1 = (95% CI: to 0.024), p = In the univariate model without correction for Weight1, we got ˆβ 1 = (95% CI: to ), p =

34 Interpretation of regression parameters II The parameter for height answers two different questions depending on whether or not adjusted for weight: Unadjusted: The question is Are big 1-year-old children generally fatter during school age? The answer is yes! Adjusted: The question is Are slim 1-year-old children generally slimmer during school age? The answer is yes! Both questions are relevant and both answers are valid! 34

35 Relative effects and Products or ratios of covariates Both issues are solved by log-transforming the covariate(s)! Example: BMI = Weight/Height 2 is a ratio measure. Logarithmic rules give log(bmi) = log(weight) 2 log(height), so β log(bmi) = β log(weight) 2β log(height) Choice of log-transformation of covariates Use of log 10 means that the regression parameter shows the effect of two subjects differing by a factor 10. Do not use log 10 unless it is likely for two subjects to differ by a factor 10! Use log 2 [SAS code: LOG2( )] when doubling is likely. Use a covariate calculated as XX=LOG( )/LOG(1.1) if 10% differences are more likely. 35

36 Is BMI at age 1 year an appropriate predictor for school-age obesity? 1. BMI is a ratio measure involving weight and height, so we should investigate log-transformed weight and height 2. Doubling is not a realistic difference, so we look at per 10% DATA School1; SET SchoolObesity; HeightPer10pct = LOG(Height1)/LOG(1.1); WeightPer10pct = LOG(Weight1)/LOG(1.1); RUN; PROC REG DATA=School1; MODEL Obesity = HeightPer10pct WeightPer10pct / CLB; TestBMI: TEST HeightPer10pct = -2*WeightPer10pct; RUN; QUIT; 36

37 Part of output: Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t 95% Conf. Limits Intercept HeightPer10pct WeightPer10pct < Test TestBMI Results for Dependent Variable Obesity Mean Source DF Square F Value Pr > F Numerator <.0001 Denominator

38 Conclusion: 1. A 10% higher weight increases the expected school-age obesity score by (95% CI: ), and a 10% lower height increases the expected school-age obesity score by (95% CI: ), 2. BMI at age 1 year is not an appropriate choice (p < ). However, since the regression parameters for the log-transformed weight and height are of the same size, but with opposite signs, an appropriate predictor would be the ratio weight/height! 38

39 Model selection Lung function - 25 patients with cystic fibrosis O Neill et al (1983). 39

40 Which covariates have a univariate effect on the outcome P E max? Are these the variables to be included in the model? 40

41 Model with all covariates PROC REG DATA=pemax; MODEL pemax=age sex height weight bmp fev1 rv frc tlc; RUN; QUIT; Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept age sex height weight bmp fev rv frc tlc No significant effects... 41

42 Automatic variable selection: Forward selection Start with no covariates. In every step, add the most significant variable PROC REG DATA=pemax; MODEL pemax=age sex height weight bmp fev1 rv frc tlc / SELECTION=FORWARD; RUN; QUIT; Final model: Weight BMP FEV1 42

43 Automatic variable selection: Backward elimination Start with all covariates. At each step, omit the least significant variable PROC REG DATA=pemax; MODEL pemax=age sex height weight bmp fev1 rv frc tlc / SELECTION=BACKWARD; RUN; QUIT; Final model: Weight BMP FEV1 43

44 But... There is no guarantee that these automatic methods will give us the same result: Had observation no. 25 not been in the data set, backward elimination would have excluded Height as the first variable, while forward selection would have included Height as the first variable! A best automatic method has not been identified, but backward elimination is often recommended over forward selection. WARNING: Output from selected model does not take model selection uncertainty into account: The output (regression coefficients and p-values) is identical to what would have been obtained had we fitted the final model with out doing any model selection. The importance of the selected covariates is over-estimated! 44

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen November 20, 2007 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

Linear models Analysis of Covariance

Linear models Analysis of Covariance Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor

More information

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

Answer to exercise 'height vs. age' (Juul)

Answer to exercise 'height vs. age' (Juul) Answer to exercise 'height vs. age' (Juul) Question 1 Fitting a straight line to height for males in the age range 5-20 and making the corresponding illustration is performed by writing: proc reg data=juul;

More information

The General Linear Model. November 20, 2007

The General Linear Model. November 20, 2007 The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

The General Linear Model. April 22, 2008

The General Linear Model. April 22, 2008 The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model

More information

Parametrisations, splines

Parametrisations, splines / 7 Parametrisations, splines Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups

Outline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression10_2/index.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

Analysis of variance. April 16, Contents Comparison of several groups

Analysis of variance. April 16, Contents Comparison of several groups Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Analysis of variance. April 16, 2009

Analysis of variance. April 16, 2009 Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics

More information

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups

Outline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~jufo/varianceregressionf2011.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014.

Correlated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014. Faculty of Health Sciences Introduction Correlated data NFA, May 19, 2014 Introduction Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of Copenhagen The idea of the course

More information

Analysis of Variance

Analysis of Variance 1 / 70 Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression11_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,

More information

MATH ASSIGNMENT 2: SOLUTIONS

MATH ASSIGNMENT 2: SOLUTIONS MATH 204 - ASSIGNMENT 2: SOLUTIONS (a) Fitting the simple linear regression model to each of the variables in turn yields the following results: we look at t-tests for the individual coefficients, and

More information

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household. Swabs, revisited 18 families with 3 children each (in well defined age intervals) were followed over a certain period of time, during which repeated swabs were taken. The variable swabs indicates how many

More information

The program for the following sections follows.

The program for the following sections follows. Homework 6 nswer sheet Page 31 The program for the following sections follows. dm'log;clear;output;clear'; *************************************************************; *** EXST734 Homework Example 1

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

Interpreting the coefficients

Interpreting the coefficients Lecture Week 5 Multiple Linear Regression Interpreting the coefficients Uses of Multiple Regression Predict for specified new x-vars Predict in time. Focus on one parameter Use regression to adjust variation

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Repeated Measures Part 2: Cartoon data

Repeated Measures Part 2: Cartoon data Repeated Measures Part 2: Cartoon data /*********************** cartoonglm.sas ******************/ options linesize=79 noovp formdlim='_'; title 'Cartoon Data: STA442/1008 F 2005'; proc format; /* value

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Topic 28: Unequal Replication in Two-Way ANOVA

Topic 28: Unequal Replication in Two-Way ANOVA Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

EXST7015: Estimating tree weights from other morphometric variables Raw data print

EXST7015: Estimating tree weights from other morphometric variables Raw data print Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Topic 29: Three-Way ANOVA

Topic 29: Three-Way ANOVA Topic 29: Three-Way ANOVA Outline Three-way ANOVA Data Model Inference Data for three-way ANOVA Y, the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Factor C with levels

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 The MEANS Procedure DRINKING STATUS=1 Analysis Variable : TRIGL N Mean Std Dev Minimum Maximum 164 151.6219512 95.3801744

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

using the beginning of all regression models

using the beginning of all regression models Estimating using the beginning of all regression models 3 examples Note about shorthand Cavendish's 29 measurements of the earth's density Heights (inches) of 14 11 year-old males from Alberta study Half-life

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,

More information

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

Least Squares Analyses of Variance and Covariance

Least Squares Analyses of Variance and Covariance Least Squares Analyses of Variance and Covariance One-Way ANOVA Read Sections 1 and 2 in Chapter 16 of Howell. Run the program ANOVA1- LS.sas, which can be found on my SAS programs page. The data here

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Introduction to Crossover Trials

Introduction to Crossover Trials Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Lecture 7 Remedial Measures

Lecture 7 Remedial Measures Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11, Chapter 4 7-1 Topic Overview Review Assumptions & Diagnostics Remedial Measures for Non-normality Non-constant variance

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Chapter 8 Quantitative and Qualitative Predictors

Chapter 8 Quantitative and Qualitative Predictors STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic,

More information

Chapter 11. Analysis of Variance (One-Way)

Chapter 11. Analysis of Variance (One-Way) Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing

More information

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0 Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation?

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Data Set 8: Laysan Finch Beak Widths

Data Set 8: Laysan Finch Beak Widths Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.

More information

Statistics 5100 Spring 2018 Exam 1

Statistics 5100 Spring 2018 Exam 1 Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)

More information

Linear Regression. Example: Lung FuncAon in CF pts. Example: Lung FuncAon in CF pts. Lecture 9: Linear Regression BMI 713 / GEN 212

Linear Regression. Example: Lung FuncAon in CF pts. Example: Lung FuncAon in CF pts. Lecture 9: Linear Regression BMI 713 / GEN 212 Lecture 9: Linear Regression Model Inferences on regression coefficients R 2 Residual plots Handling categorical variables Adjusted R 2 Model selecaon Forward/Backward/Stepwise BMI 713 / GEN 212 Linear

More information

ACOVA and Interactions

ACOVA and Interactions Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA

More information

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1

Simple, Marginal, and Interaction Effects in General Linear Models: Part 1 Simple, Marginal, and Interaction Effects in General Linear Models: Part 1 PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 2: August 24, 2012 PSYC 943: Lecture 2 Today s Class Centering and

More information

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 = Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Multi-factor analysis of variance

Multi-factor analysis of variance Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2015 Two-way ANOVA and interaction Mathed samples ANOVA Random vs systematic variation

More information

Lab 11. Multilevel Models. Description of Data

Lab 11. Multilevel Models. Description of Data Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level

More information

VIII. ANCOVA. A. Introduction

VIII. ANCOVA. A. Introduction VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance Chapter 9 Multivariate and Within-cases Analysis 9.1 Multivariate Analysis of Variance Multivariate means more than one response variable at once. Why do it? Primarily because if you do parallel analyses

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Name 171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Use the selected SAS output to help you answer the questions. The SAS output is all at the back of the exam on pages

More information

a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination

a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination In Box, Hunter, and Hunter Statistics for Experimenters is a two factor example of dying times for animals, let's say cockroaches, using 4 poisons and pretreatments with n=4 values for each combination

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information