Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Similar documents
Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Review of the General Linear Model

Categorical Predictor Variables

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

General Linear Model (Chapter 4)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

Correlation and simple linear regression S5

A discussion on multiple regression models

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Regression and Models with Multiple Factors. Ch. 17, 18

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Ordinary Least Squares Regression Explained: Vartanian

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Formula for the t-test

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Multiple linear regression

Lecture 9: Linear Regression

Practical Biostatistics

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

( ), which of the coefficients would end

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Review of Multiple Regression

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

VIII. ANCOVA. A. Introduction

WELCOME! Lecture 13 Thommy Perlinger

Confidence Intervals, Testing and ANOVA Summary

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

Inferences for Regression

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Workshop 7.4a: Single factor ANOVA

Statistics and Quantitative Analysis U4320

Lecture 6 Multiple Linear Regression, cont.

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Data Analysis 1 LINEAR REGRESSION. Chapter 03

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Intro to Linear Regression

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Intro to Linear Regression

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

In Class Review Exercises Vartanian: SW 540

Simple Linear Regression: One Qualitative IV

Business Statistics. Lecture 10: Correlation and Linear Regression

Binary Logistic Regression

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

FREC 608 Guided Exercise 9

Simple Linear Regression

Lecture 18: Simple Linear Regression

Correlation and Simple Linear Regression

Basic Business Statistics, 10/e

MATH ASSIGNMENT 2: SOLUTIONS

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

ST430 Exam 2 Solutions

Section 4.6 Simple Linear Regression

Extensions of One-Way ANOVA.

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Unit 6 - Introduction to linear regression

Comparing Nested Models

Lecture 11: Simple Linear Regression

Experimental Design and Data Analysis for Biologists

Extensions of One-Way ANOVA.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression in R. Seth Margolis GradQuant May 31,

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Lecture 12: Effect modification, and confounding in logistic regression

Unit 6 - Simple linear regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Formal Statement of Simple Linear Regression Model

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Analysis of Covariance

Econometrics. 4) Statistical inference

Difference in two or more average scores in different groups

Math 423/533: The Main Theoretical Topics

Data Set 8: Laysan Finch Beak Widths

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Review of Statistics 101

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Lecture 7: OLS with qualitative information

Hypothesis Testing hypothesis testing approach

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Sociology 6Z03 Review II

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Inference for Regression Inference about the Regression Model and Using the Regression Line

Lecture 3: Inference in SLR

Interactions between Binary & Quantitative Predictors

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Multiple Linear Regression

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Transcription:

36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors with equal variance Group assignment assumed correct (fixed-x) One way (between-subjects) ANOVA Categorical IV (k levels) with means m 1 through m k Best prediction: Y i = Y j for subject i in group j Simple (one IV) regression Quantitative IV Coefficient parameters are and b 1 True mean outcome at each x is E(Y x)= +b 1 x (linearity) Best prediction: Y i = + b 1 x 2 Example Team Problem Solving Multiple Regression New Idea #1: extend the means model IVs are x 1, x 2, Means model: E(Y x 1,x 2, ) = + b 1 x 1 + b 2 x 2 + Prediction: Y = + b 1 x 1 + b 2 x 2 + Consequences: is the mean of the DV when all IVs equal 0 b 1 is the change in the mean of the DV when x 1 goes up by one and all other x s are held constant. E.g., with 2 x s and x 2 fixed at c, Y vs. x 1 is a line: E(Y x1,x2=c) = + b 1 x 1 + b 2 c = ( + b 2 c ) + b 1 x 1 So Y vs. x 1 forms parallel lines at various fixed x 2 values 3 4 1

Y 0 20 40 60 80 100 10/1/2015 Multiple Regression: dummies New idea #2: Dummy variables Multiple regression can accommodate categorical IVs but only if they are coded appropriately Indicator variable: A categorical variable (factor) with 2 levels should be named for one level and coded with: 1=named level, 0=other level, e.g., a Female variable F is coded 0=Male, 1=Female. E.g., x 1 =Age, x 2 =Female: E(Y)= +b A A+b F F is a means model of parallel lines: Males: E(Y) = + b A A Females: E(Y) = ( +b F F) + b A A = ( +b F ) + b A A 5 Multiple Regression: dummies, cont. Coding of a categorical IV with k>2 levels Choose an arbitrary baseline (e.g., control ) Create indicator variables for all non-baseline levels Throw away the original variable Example: Color (code) Color ( value ) Red Blue 3 Red 1 0 1 Blue 0 1 1 Blue 0 1 2 Green 0 0 Green is the arbitrary baseline. Red and Blue are the IVs used in the regression. 6 Multiple Regression: ANCOVA Generally the term ANCOVA (analysis of covariance) refers to multiple regression with one quantitative IV ( covariate ) and one categorical IV of primary interest coded as dummy variables. Example: covariate Age and factor Color (baseline=green) E(Y Age, Color) = E(Y A,B,R) = + b A A + b B B + b R R E(Y Age, Color=Green) = E(Y A, B=0, R=0) = + b A A E(Y Age, Red) = + b A A + b B 0 + b R 1 = ( +b R )+ b A A E(Y Age, Blue) = + b A A + b B 1 + b R 0 = ( +b B )+ b A A b B b R slope b A Multiple Regression: ANCOVA, cont. Model 1 (Constant) Coefficients Standardized Coefficients B Std. Error Beta In ANCOVA as regression, dummy variables slopes reflect different intercept offsets from the intercept of the baseline category. As opposed to individual regressions, inference for comparing lines is provided. t Sig. 49.992.318 157.009.000 A.996.017.572 58.144.000 B 9.875.301.345 32.806.000 R -19.793.293 -.721-67.568.000 0 5 10 15 20 25 30 A 7 8 2

Fear and Anger Example This is loosely based on Constraints for emotion specificity in fear and anger: The context counts by Stemmler, et al., Psychophysiology, 38, 275 291 (2001). One hundred and sixty-nine adult female subjects were randomized to a control condition or to induction of fear or anger. The outcome of interest is the subjects combined ratings on three 0-10 point scales of negativity. The covariate is a quantitative measure called heart-period-variability (HPV), which is measured before the emotion induction and is taken as a measure of individual physiological sensitivity to one s surroundings. Experiment or observational study? Experimental units? Interpretability? Generalizability? Power? Construct validity? EDA? Regression output Model Summary b Model R R Square Adjusted R Square Std. Error of the Estimate 1.891 a.794.790 3.181 a. Predictors: (Constant), Anger induction, Heart period variability, Fear induction b. Dependent Variable: Feelings of negativity Model? Null hypotheses? Alternative hypotheses? ANOVA Model Sum of Squares df Mean Square F Sig. 1 Regression 6438.971 3 2146.324 212.119.000 Residual 1669.550 165 10.118 Total 8108.521 168 9 10 95% Confidence Interval for B B Std. Error t Sig. Lower Bound Upper Bound (Constant) 2.707.691 3.920.000 1.343 4.071 Heart period variability 1.442.128 11.263.000 1.189 1.694 Fear induction 13.233.618 21.397.000 12.012 14.454 Anger induction 12.118.602 20.141.000 10.930 13.306 Prediction equations: Y i = 2.71 +1.44 HPV i + 13.23 Fear i + 12.12 Anger i Standardized coefficients: coefficients from running regression on standardized x s and Y: x ij = x ij x j Y i = Y i Y s Y Residual plots for assumption checking Residual = obs exp = Y i Y i (estimated error) Residual quantile normal plot: random scatter around reference line Normality OK s xj Controls (Fear=0, Anger=0): Y i = 2.71 + 1.44 HPV i Fear subjects: Y i = 2.71 + 1.44 HPV i + 13.23 (1) = 15.94 + 1.44 HPV i Anger subejcts: Y i = 2.71 + 1.44 HPV i + 12.12(1) = 14.83 + 1.44 HPV i Hidden assumption of the (non-interaction) ANCOVA means model: 11 12 3

Example: residual analysis, cont. Residual plots for assumption checking Residual vs. fit (predicted) plot y-axis: residuals, x-axis: fitted values Smile or frown suggests non-linearity (bad means model) Funneling suggests unequal variance Example: skipped EDA 13 14 Multiple Regression: interaction An interaction between two IVs in their effect on the DV implies non-additivity. The effect of a one unit increase in x 1 depends on the level of x 2 (and vice versa). Interaction is coded by adding a new IV which is the product of the two original IVs. If x 1 and x 2 are both quantitative there is one new IV, x 1 *x 2. If one is a k-level factor there are k-1 new IVs. ANCOVA with interaction Structural model: E(Y x 1,x 2 ) = + b 1 x 1 + b 2 x 2 + b 12 x 1 x 2 Prediction: Y i = β 0 + β 1 x 1 + β 2 x 2 + β 12 x 1 x 2 Example with interaction E(Negativity HPV, emotion) = E(N H,A,F) = + b H H + b A A + b F F + b H*A HA + b H*F HF Key step: simplification Key concept: b s are fixed; H,A,F are data values Controls: E(N H,A=0,F=0) = + b H H Anger: E(N H,A=1,F=0) = + b H H + b A + b H*A H = ( + b A )+ (b H + b H*A )H Fear: E(N H,A=0,F=1) = ( + b F )+ (b H + b H*F )H 15 16 4

Example with Interaction: Results Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 0.891 0.794 0.792 3.181 2 0.906 0.821 0.816 2.982 Change Statistics Model R Square Change F Change df1 df2 Sig. F Change 1.794 212.119 3 165.000 2.027 12.359 2 163.000 Example with interaction: Diagnostics Coefficients a Model B Std. Error t Sig. 1 (Constant) 2.707.691 3.920.000 Heart period variability 1.442.128 11.263.000 anger 12.118.602 20.141.000 fear 13.233.618 21.397.000 2 (Constant) 6.153 1.003 6.135.000 Heart period variability.612.220 2.780.006 anger 6.454 1.272 5.073.000 fear 9.807 1.350 7.264.000 HPV*Anger 1.439.289 4.972.000 HPV*Fear.825.312 2.643.009 a. Dependent Variable: Feelings of negativity 17 QN plot: OK for Normality Res. vs. Fit: OK for linearity and equal spread 18 Example: Subject Matter Conclusions There is a statistically significant interaction (F change =12.4, df=2, 163, p<0.0005) between HPV and emotion in their effects on negativity (N). Heart period variability is positively associated with negativity (t=2.78, df=163, p=0.006) in controls, and the estimated mean change in N is a rise of 0.612 points for each 1 unit rise in HPV (95% CI=[0.18,1.05]. The estimated mean N when HPV=0 is 6.15 for controls, and is 6.45 higher for induced anger (t=5.07, df=163, p<0.0005) and 9.81 higher for induced fear (t=7.26, df=163, p<0.0005). The change in N associated with a 1 point rise in HPV is estimated to be 1.44 points greater for anger compared to control (t=4.97, df=163, p<0.0005) and 0.82 points greater for fear compared to control (t=2.63, df=163, p=0.009). Overall compared to control inducing fear and anger increases N, and the increase is greater when HPV is greater. Class Summary In multiple regression, the means model adds terms of the form b v V when variable V is added. Any k-level categorical variable must be replaced with k-1 indicator variables [or similar]. Without interaction, a parallel means model is produced: at each level of one IV the slope of the DV vs. the other IV is the same. With interaction (adding product variables) different slopes are accommodated. You can deduce the meanings of parameters by simplifying the means model for each category to a Y=a+bX form where a is the intercept and b is the slope. Continue the deduction by finding equations that differ only in a single parameter. The p-value for that parameter is the null hypothesis that that b=0 which is equivalent to the two lines being the same. 19 20 5