6. Multiple regression - PROC GLM
|
|
- Luke Long
- 6 years ago
- Views:
Transcription
1 Use of SAS - November Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. kach@biostat.ku.dk, tel:
2 Contents Analysis of covariance (ANCOVA): the general linear model Interaction Multiple regression Automatic variable selection 2
3 Data example: lung capacity Data from 32 patients subject to a heart/lung transplantation. TLC (Total Lung Capacity) is determined from whole-body plethysmography. Are men and women different with respect to total lung capacity? OBS SEX AGE HEIGHT TLC 1 F F M F M M M
4 Box plots for comparison of sex groups PROC GPLOT DATA=TLCdata; PLOT tlc*sex / HAXIS=AXIS1 VAXIS=AXIS2; AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET=(6,6)CM; AXIS2 LABEL=(H=3 A=90) VALUE=(H=2); SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3; RUN; QUIT; 4
5 Box plots for comparison of sex groups 5
6 Group comparisons Using t-tests PROC TTEST DATA=tlc; CLASS sex; VAR tlc height; RUN; Note: we can specify more than one variable in the VAR statement 6
7 Output T-Tests Variable Method Variances DF t Value Pr > t TLC Pooled Equal TLC Satterthwaite Unequal Height Pooled Equal Height Satterthwaite Unequal Equality of Variances Variable Method Num DF Den DF F Value Pr > F TLC Folded F Height Folded F Obvious sex difference for TLC as well as for Height 7
8 Confounding when comparing groups Occurs if the distributions of some other relevant explanatory variables differ between the groups. Here relevant means things we would have liked to be the same (or at least very similar) for everybody, because we think of it as noise or distortion. Can be reduced by performing a regression analysis with the relevant variables as covariates. Confounding could be a problem in the current example, if we intended to compare the lung function between men and women of similar height 8
9 Relation between tlc and height: PROC GPLOT DATA=TLCdata; PLOT tlc*height=sex / HAXIS=AXIS1 VAXIS=AXIS2; AXIS1 LABEL=(H=4) VALUE=(H=3) MINOR=NONE; AXIS2 LABEL=(A=90 H=4) VALUE=(H=3) ORDER=(3 TO 10) MINOR=NONE; SYMBOL1 C=RED V=DOT H=2 I=SM75S L=1 W=3 MODE=INCLUDE; SYMBOL2 C=BLUE V=CIRCLE H=2 I=SM75S L=41 W=3 MODE=INCLUDE; LEGEND1 LABEL=(H=2.5) VALUE=(H=2 JUSTIFY=LEFT); RUN; QUIT; 9
10 Relation between tlc and height: (Plotted using I=RL) 10
11 Analysis of covariance Comparison of parallel regression lines Model: y gi = α g + βx gi + ε gi g = 1, 2; i = 1,, n g Here α 2 α 1 is the expected difference in the response between the two groups for fixed value of the covariate, that is, when comparing any two subjects who have the same value of (match on) the covariate x ( adjusted for x ). 11
12 But what if the lines are not parallel? More general model: y gi = α g + β g x gi + ε gi If β 1 β 2 there is an interaction between Height and Sex 12
13 Interaction Interaction between Height and Sex The effect of height depends on sex The difference between men and women depends on height 13
14 Model with interaction Group variables PROC REG only works for linear covariates. Group variables can be handled directly in PROC GLM by specifying the group variable as a CLASS variable. SAS code PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex height sex*height / SOLUTION; RUN; QUIT; The option SOLUTION is needed if we want to see the regression parameter estimates. 14
15 Output The GLM Procedure Class Level Information Class Levels Values Sex 2 F M Number of Observations Read 32 Number of Observations Used 32 Dependent Variable: TLC Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total R-Square Coeff Var Root MSE TLC Mean
16 Output II Source DF Type I SS Mean Square F Value Pr > F Sex Height Height*Sex Source DF Type III SS Mean Square F Value Pr > F Sex Height Height*Sex The interaction is not significant. The Type III p-values for the two main effects should never be used for anything in a model including the interaction! 16
17 Output III Standard Parameter Estimate Error t Value Pr > t Intercept B Sex F B Sex M B... Height B Height*Sex F B Height*Sex M B... NOTE: The X X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter B are not uniquely estimable. These are the regression parameters 17
18 Where are the two lines in the output? Line for males (the reference group): TLC = Height Line for females: TLC = ( 1.727) + ( ) Height = Height 18
19 Same model, new parameterization PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex sex*height / NOINT SOLUTION; RUN; QUIT; Output... Standard Parameter Estimate Error t Value Pr > t Sex F Sex M Height*Sex F Height*Sex M
20 Same model, but with two different parameterizations PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex height sex*height / SOLUTION; RUN; QUIT; The (extrapolated) level at Height=0 for the reference group (Sex= M ) The (extrapolated) difference between the two sexes at Height=0 An effect of Height (slope) for the reference group The difference between the slopes for the two sexes PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex sex*height / NOINT SOLUTION; RUN; QUIT; The (extrapolated) level at Height=0 for each group (Sex) The effect of Height (slope) for each group (Sex) 20
21 Model without interaction No indication of interaction, we omit the term PROC GLM DATA=TLCdata; CLASS sex; MODEL tlc=sex height / SOLUTION CLPARM; RUN; QUIT; Are there also other possible parameterizations in this model? (and which one should we use?) 21
22 Source DF Type I SS Mean Square F Value Pr > F Sex Height Source DF Type III SS Mean Square F Value Pr > F Sex Height Note: The effect of sex seen in the group comparison has disappeared!! 22
23 Model without interaction - results Standard Parameter Estimate Error t Value Pr > t Intercept B Sex F B Sex M B... Height Parameter 95% Confidence Limits Intercept Sex F Sex M.. Height
24 Confounding? In this example it seems that 1. The observed difference in lung capacity between men and women can be explained by height differences 2. However, there may still be a sex difference for persons of the same height (women vs. men), estimated as 0.77 ± = ( 1.78, 0.24) 24
25 But... what if we had not had the two very small men to pull the line for the men? Let us look at the subjects above 152 cm (using the statement WHERE height>152; when running PROC GLM): Test of interaction: Source DF Type I SS Mean Square F Value Pr > F Sex Height Height*Sex Estimated additive effects: Parameter Estimate 95% Confidence Limits Pr > t Intercept B Sex F B Sex M B... Height A somewhat different conclusion... 25
26 Plots for model checking in the HTML output: ODS GRAPHICS ON; PROC GLM DATA=TLCdata PLOTS=(DIAGNOSTICS RESIDUALS(SMOOTH)); CLASS sex; MODEL tlc=sex height sex*height / SOLUTION; OUTPUT OUT=WithResid RSTUDENT=NormResidWithoutCurrent; RUN; QUIT; PROC GPLOT DATA=WithResid; PLOT NormResidWithoutCurrent * sex; SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3; RUN; QUIT; In addition to the ODS GRAPHICS plots for PROC GLM, residuals should be plotted against each of the CLASS variables (here sex) in order to check the variance homogeneity across the different values of each CLASS variable. 26
27 Exercise: Another look the Juul data. 1. Get the data into SAS using a libname statement. 2. Create a new data set including only individuals above 25 years, and make a new variable with log-transformed SIGF-I. 3. Use PROC GPLOT to plot the relationship between age and log-transformed SIGF-I. 4. Make separate regression lines for men and women. 5. Do a regression analysis to explore if slopes are equal in men and women. 6. Give an estimate for the difference in slopes, with 95% confidence interval. 27
28 Multiple regression. General linear model (GLM). Data: n sets of observations, made on the same unit : unit x 1...x p y 1 x 11...x 1p y 1 2 x 21...x 2p y 2 3 x 31...x 3p y n x n1...x np y n The linear regression model with p explanatory variables (covariates) is written: y = β 0 + β 1 x β p x p + ε 28
29 Interpretation of regression coefficients Model Y i = β 0 + β 1 X i1 + β 2 X i β p X ip + ɛ where ɛ N(0, σ 2 ). Consider two subjects: A has covariate values (X 1, X 2,..., X p ) B has covariate values (X 1 + 1, X 2,..., X p ) Expected difference in the response (B A) [β 0 +β 1 (X 1 +1)+β 2 X i2 +...] [β 0 +β 1 X 1 +β 2 X i2 +...] = β 1 This means that β 1 is the effect of one unit s difference in X 1 for fixed levels of the other variables (X 2,..., X p ) 29
30 Example: School-age obesity score versus height and weight measured at 1 year of age Obs Obesity Height1 Weight
31 Example: School-age obesity score versus height and weight measured at 1 year of age 31
32 SAS code PROC REG DATA=SchoolObesity; MODEL Obesity = Height1 Weight1 / CLB; RUN; QUIT; (part of the) output Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept Height Weight <.0001 Parameter Estimates Variable DF 95% Confidence Limits Intercept Height Weight
33 Interpretation of regression parameters Remember that β j is the effect of the j th explanatory variable, corrected for the effect of the other explanatory variables, that is, when comparing any two subject who match on all the other variables. The effect of Height1 corrected for the effect of Weight1 is found to be ˆβ 1 = (95% CI: to 0.024), p = In the univariate model without correction for Weight1, we got ˆβ 1 = (95% CI: to ), p =
34 Interpretation of regression parameters II The parameter for height answers two different questions depending on whether or not adjusted for weight: Unadjusted: The question is Are big 1-year-old children generally fatter during school age? The answer is yes! Adjusted: The question is Are slim 1-year-old children generally slimmer during school age? The answer is yes! Both questions are relevant and both answers are valid! 34
35 Relative effects and Products or ratios of covariates Both issues are solved by log-transforming the covariate(s)! Example: BMI = Weight/Height 2 is a ratio measure. Logarithmic rules give log(bmi) = log(weight) 2 log(height), so β log(bmi) = β log(weight) 2β log(height) Choice of log-transformation of covariates Use of log 10 means that the regression parameter shows the effect of two subjects differing by a factor 10. Do not use log 10 unless it is likely for two subjects to differ by a factor 10! Use log 2 [SAS code: LOG2( )] when doubling is likely. Use a covariate calculated as XX=LOG( )/LOG(1.1) if 10% differences are more likely. 35
36 Is BMI at age 1 year an appropriate predictor for school-age obesity? 1. BMI is a ratio measure involving weight and height, so we should investigate log-transformed weight and height 2. Doubling is not a realistic difference, so we look at per 10% DATA School1; SET SchoolObesity; HeightPer10pct = LOG(Height1)/LOG(1.1); WeightPer10pct = LOG(Weight1)/LOG(1.1); RUN; PROC REG DATA=School1; MODEL Obesity = HeightPer10pct WeightPer10pct / CLB; TestBMI: TEST HeightPer10pct = -2*WeightPer10pct; RUN; QUIT; 36
37 Part of output: Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t 95% Conf. Limits Intercept HeightPer10pct WeightPer10pct < Test TestBMI Results for Dependent Variable Obesity Mean Source DF Square F Value Pr > F Numerator <.0001 Denominator
38 Conclusion: 1. A 10% higher weight increases the expected school-age obesity score by (95% CI: ), and a 10% lower height increases the expected school-age obesity score by (95% CI: ), 2. BMI at age 1 year is not an appropriate choice (p < ). However, since the regression parameters for the log-transformed weight and height are of the same size, but with opposite signs, an appropriate predictor would be the ratio weight/height! 38
39 Model selection Lung function - 25 patients with cystic fibrosis O Neill et al (1983). 39
40 Which covariates have a univariate effect on the outcome P E max? Are these the variables to be included in the model? 40
41 Model with all covariates PROC REG DATA=pemax; MODEL pemax=age sex height weight bmp fev1 rv frc tlc; RUN; QUIT; Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept age sex height weight bmp fev rv frc tlc No significant effects... 41
42 Automatic variable selection: Forward selection Start with no covariates. In every step, add the most significant variable PROC REG DATA=pemax; MODEL pemax=age sex height weight bmp fev1 rv frc tlc / SELECTION=FORWARD; RUN; QUIT; Final model: Weight BMP FEV1 42
43 Automatic variable selection: Backward elimination Start with all covariates. At each step, omit the least significant variable PROC REG DATA=pemax; MODEL pemax=age sex height weight bmp fev1 rv frc tlc / SELECTION=BACKWARD; RUN; QUIT; Final model: Weight BMP FEV1 43
44 But... There is no guarantee that these automatic methods will give us the same result: Had observation no. 25 not been in the data set, backward elimination would have excluded Height as the first variable, while forward selection would have included Height as the first variable! A best automatic method has not been identified, but backward elimination is often recommended over forward selection. WARNING: Output from selected model does not take model selection uncertainty into account: The output (regression coefficients and p-values) is identical to what would have been obtained had we fitted the final model with out doing any model selection. The importance of the selected covariates is over-estimated! 44
Linear models Analysis of Covariance
Esben Budtz-Jørgensen November 20, 2007 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor
More informationLinear models Analysis of Covariance
Esben Budtz-Jørgensen April 22, 2008 Linear models Analysis of Covariance Confounding Interactions Parameterizations Analysis of Covariance group comparisons can become biased if an important predictor
More informationAnalysis of variance and regression. November 22, 2007
Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute
More informationAnswer to exercise 'height vs. age' (Juul)
Answer to exercise 'height vs. age' (Juul) Question 1 Fitting a straight line to height for males in the age range 5-20 and making the corresponding illustration is performed by writing: proc reg data=juul;
More informationThe General Linear Model. November 20, 2007
The General Linear Model. November 20, 2007 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model
More informationThe General Linear Model. April 22, 2008
The General Linear Model. April 22, 2008 Multiple regression Data: The Faroese Mercury Study Simple linear regression Confounding The multiple linear regression model Interpretation of parameters Model
More informationParametrisations, splines
/ 7 Parametrisations, splines Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationAnalysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking
Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,
More informationOutline. Analysis of Variance. Acknowledgements. Comparison of 2 or more groups. Comparison of serveral groups
Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression10_2/index.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk
More informationAnalysis of variance. April 16, Contents Comparison of several groups
Contents Comparison of several groups Analysis of variance April 16, 2009 One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics
More informationAnalysis of variance. April 16, 2009
Analysis of variance April 16, 2009 Contents Comparison of several groups One-way ANOVA Two-way ANOVA Interaction Model checking Acknowledgement for use of presentation Julie Lyng Forman, Dept. of Biostatistics
More informationOutline. Analysis of Variance. Comparison of 2 or more groups. Acknowledgements. Comparison of serveral groups
Outline Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~jufo/varianceregressionf2011.html Comparison of serveral groups Model checking Marc Andersen, mja@statgroup.dk
More informationT-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum
T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222
More informationCorrelated data. Introduction. We expect students to... Aim of the course. Faculty of Health Sciences. NFA, May 19, 2014.
Faculty of Health Sciences Introduction Correlated data NFA, May 19, 2014 Introduction Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of Copenhagen The idea of the course
More informationAnalysis of Variance
1 / 70 Analysis of Variance Analysis of variance and regression course http://staff.pubhealth.ku.dk/~lts/regression11_2 Marc Andersen, mja@statgroup.dk Analysis of variance and regression for health researchers,
More informationMATH ASSIGNMENT 2: SOLUTIONS
MATH 204 - ASSIGNMENT 2: SOLUTIONS (a) Fitting the simple linear regression model to each of the variables in turn yields the following results: we look at t-tests for the individual coefficients, and
More informationSwabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.
Swabs, revisited 18 families with 3 children each (in well defined age intervals) were followed over a certain period of time, during which repeated swabs were taken. The variable swabs indicates how many
More informationThe program for the following sections follows.
Homework 6 nswer sheet Page 31 The program for the following sections follows. dm'log;clear;output;clear'; *************************************************************; *** EXST734 Homework Example 1
More informationStatistics for exp. medical researchers Regression and Correlation
Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence
More informationInterpreting the coefficients
Lecture Week 5 Multiple Linear Regression Interpreting the coefficients Uses of Multiple Regression Predict for specified new x-vars Predict in time. Focus on one parameter Use regression to adjust variation
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationChapter 1 Linear Regression with One Predictor
STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the
More information1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as
ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available
More informationRepeated Measures Part 2: Cartoon data
Repeated Measures Part 2: Cartoon data /*********************** cartoonglm.sas ******************/ options linesize=79 noovp formdlim='_'; title 'Cartoon Data: STA442/1008 F 2005'; proc format; /* value
More informationTopic 20: Single Factor Analysis of Variance
Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory
More informationAnalysis of Covariance
Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2
More informationLecture 11 Multiple Linear Regression
Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression
More informationTopic 28: Unequal Replication in Two-Way ANOVA
Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationEXST7015: Estimating tree weights from other morphometric variables Raw data print
Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;
More informationOutline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping
Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals
More informationPredict y from (possibly) many predictors x. Model Criticism Study the importance of columns
Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;
More informationBiostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich
Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationTopic 29: Three-Way ANOVA
Topic 29: Three-Way ANOVA Outline Three-way ANOVA Data Model Inference Data for three-way ANOVA Y, the response variable Factor A with levels i = 1 to a Factor B with levels j = 1 to b Factor C with levels
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003
ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003 The MEANS Procedure DRINKING STATUS=1 Analysis Variable : TRIGL N Mean Std Dev Minimum Maximum 164 151.6219512 95.3801744
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationusing the beginning of all regression models
Estimating using the beginning of all regression models 3 examples Note about shorthand Cavendish's 29 measurements of the earth's density Heights (inches) of 14 11 year-old males from Alberta study Half-life
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationSAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model
Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,
More informationTopic 14: Inference in Multiple Regression
Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference
More informationLecture 11: Simple Linear Regression
Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink
More informationLeast Squares Analyses of Variance and Covariance
Least Squares Analyses of Variance and Covariance One-Way ANOVA Read Sections 1 and 2 in Chapter 16 of Howell. Run the program ANOVA1- LS.sas, which can be found on my SAS programs page. The data here
More informationSTAT 3A03 Applied Regression With SAS Fall 2017
STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.
More informationIntroduction to Crossover Trials
Introduction to Crossover Trials Stat 6500 Tutorial Project Isaac Blackhurst A crossover trial is a type of randomized control trial. It has advantages over other designed experiments because, under certain
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More information3 Variables: Cyberloafing Conscientiousness Age
title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More informationLecture 7 Remedial Measures
Lecture 7 Remedial Measures STAT 512 Spring 2011 Background Reading KNNL: 3.8-3.11, Chapter 4 7-1 Topic Overview Review Assumptions & Diagnostics Remedial Measures for Non-normality Non-constant variance
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationAnswer to exercise: Blood pressure lowering drugs
Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:
More informationChapter 8 Quantitative and Qualitative Predictors
STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic,
More informationChapter 11. Analysis of Variance (One-Way)
Chapter 11 Analysis of Variance (One-Way) We now develop a statistical procedure for comparing the means of two or more groups, known as analysis of variance or ANOVA. These groups might be the result
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationPLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis
PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing
More informationCorrelation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0
Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation?
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationData Set 8: Laysan Finch Beak Widths
Data Set 8: Finch Beak Widths Statistical Setting This handout describes an analysis of covariance (ANCOVA) involving one categorical independent variable (with only two levels) and one quantitative covariate.
More informationStatistics 5100 Spring 2018 Exam 1
Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all
More informationSTOR 455 STATISTICAL METHODS I
STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted
More informationIntroduction to SAS proc mixed
Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The
More informationSimple Linear Regression
Simple Linear Regression EdPsych 580 C.J. Anderson Fall 2005 Simple Linear Regression p. 1/80 Outline 1. What it is and why it s useful 2. How 3. Statistical Inference 4. Examining assumptions (diagnostics)
More informationLinear Regression. Example: Lung FuncAon in CF pts. Example: Lung FuncAon in CF pts. Lecture 9: Linear Regression BMI 713 / GEN 212
Lecture 9: Linear Regression Model Inferences on regression coefficients R 2 Residual plots Handling categorical variables Adjusted R 2 Model selecaon Forward/Backward/Stepwise BMI 713 / GEN 212 Linear
More informationACOVA and Interactions
Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA
More informationSimple, Marginal, and Interaction Effects in General Linear Models: Part 1
Simple, Marginal, and Interaction Effects in General Linear Models: Part 1 PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 2: August 24, 2012 PSYC 943: Lecture 2 Today s Class Centering and
More informationa. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =
Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More informationOverview Scatter Plot Example
Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables
More informationMulti-factor analysis of variance
Faculty of Health Sciences Outline Multi-factor analysis of variance Basic statistics for experimental researchers 2015 Two-way ANOVA and interaction Mathed samples ANOVA Random vs systematic variation
More informationLab 11. Multilevel Models. Description of Data
Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level
More informationVIII. ANCOVA. A. Introduction
VIII. ANCOVA A. Introduction In most experiments and observational studies, additional information on each experimental unit is available, information besides the factors under direct control or of interest.
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationCOMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION
COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,
More informationIntroduction to SAS proc mixed
Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationSection 9c. Propensity scores. Controlling for bias & confounding in observational studies
Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationChapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance
Chapter 9 Multivariate and Within-cases Analysis 9.1 Multivariate Analysis of Variance Multivariate means more than one response variable at once. Why do it? Primarily because if you do parallel analyses
More informationST Correlation and Regression
Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More information171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th
Name 171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th Use the selected SAS output to help you answer the questions. The SAS output is all at the back of the exam on pages
More informationa = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination
In Box, Hunter, and Hunter Statistics for Experimenters is a two factor example of dying times for animals, let's say cockroaches, using 4 poisons and pretreatments with n=4 values for each combination
More informationLecture notes on Regression & SAS example demonstration
Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also
More informationTopic 23: Diagnostics and Remedies
Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and
More informationLecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1
Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor
More informationLecture 1 Linear Regression with One Predictor Variable.p2
Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of
More informationMultiple Regression: Chapter 13. July 24, 2015
Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationTopic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects
Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More information