Variance Partitioning

Similar documents
Variance Partitioning

Regression With a Categorical Independent Variable: Mean Comparisons

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable

An Introduction to Path Analysis

Regression With a Categorical Independent Variable

Introduction To Confirmatory Factor Analysis and Item Response Theory

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

Path Diagrams. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Correlation Analysis

An Introduction to Mplus and Path Analysis

Regression Analysis IV... More MLR and Model Building

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

1 Correlation and Inference from Regression

Applied Regression Analysis

2. Linear regression with multiple regressors

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

Econometrics Homework 1


Process Control and Instrumentation Prof. A. K. Jana Department of Chemical Engineering Indian Institute of Technology, Kharagpur

Two-Variable Analysis: Simple Linear Regression/ Correlation

Key Algebraic Results in Linear Regression

Basic Business Statistics 6 th Edition

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Official GRE Quantitative Reasoning Practice Questions, Volume 1

15. LECTURE 15. I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one.

Introduction to Linear Regression

Statistical Distribution Assumptions of General Linear Models

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Chapter 7 Student Lecture Notes 7-1

Lecture 1: Introduction Introduction

THE AUSTRALIAN NATIONAL UNIVERSITY. Second Semester Final Examination November, Econometrics II: Econometric Modelling (EMET 2008/6008)

Chapter 15 - Multiple Regression

FAQ: Linear and Multiple Regression Analysis: Coefficients

GRE Workshop Quantitative Reasoning. February 13 and 20, 2018

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Chapter 14 Student Lecture Notes 14-1

Structural Model Equivalence

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Chapter 3 Multiple Regression Complete Example

Multiple Linear Regression

Chapter 12 - Part I: Correlation Analysis

Regression Models - Introduction

SEM with observed variables: parameterization and identification. Psychology 588: Covariance structure and factor models

22 Approximations - the method of least squares (1)

MA Lesson 25 Section 2.6

STATISTICS 110/201 PRACTICE FINAL EXAM

Topic 1. Definitions

Business Statistics. Lecture 10: Correlation and Linear Regression

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Math Grade 8 Assessment Anchors and Eligible Content

AP Calculus AB. Slide 1 / 175. Slide 2 / 175. Slide 3 / 175. Integration. Table of Contents

Longitudinal Data Analysis of Health Outcomes

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

In this module I again consider compositing. This module follows one entitled, Composites and Formative Indicators. In this module, I deal with a

Quadratic Equations Part I


Section 3: Simple Linear Regression

1.1 Linear Equations and Inequalities

Chapter 4. Regression Models. Learning Objectives

Solving with Absolute Value

4. Path Analysis. In the diagram: The technique of path analysis is originated by (American) geneticist Sewell Wright in early 1920.

Project Planning & Control Prof. Koshy Varghese Department of Civil Engineering Indian Institute of Technology, Madras

Regression Models. Chapter 4

MS&E 226: Small Data

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee

Econometrics Problem Set 3

Practice General Test # 4 with Answers and Explanations. Large Print (18 point) Edition

Chapter 4: Regression Models

Probability Distributions

Part 1: You are given the following system of two equations: x + 2y = 16 3x 4y = 2

Electrical Circuits Question Paper 1

Approximations - the method of least squares (1)

Math Analysis Notes Mrs. Atkinson 1

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Practice General Test # 2 with Answers and Explanations. Large Print (18 point) Edition

Psychology 282 Lecture #3 Outline

Statistics II Exercises Chapter 5

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Simple Regression Model (Assumptions)

Linear Regression and Correlation

CSCI3390-Lecture 16: NP-completeness

Section 29: What s an Inverse?

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Introduction to Confirmatory Factor Analysis

ECON 497: Lecture 4 Page 1 of 1

STOR : Lecture 17. Properties of Expectation - II Limit Theorems

1.9 Algebraic Expressions

Graphing Data. Example:

experiment3 Introduction to Data Analysis

Solving Equations by Adding and Subtracting

Confidence Interval Estimation

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Intro to Linear Regression

11.5 Regression Linear Relationships

Adding and Subtracting Terms

2.2 Graphs of Functions

SIMPLE TWO VARIABLE REGRESSION

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Transcription:

Lecture 12 March 8, 2005 Applied Regression Analysis Lecture #12-3/8/2005 Slide 1 of 33

Today s Lecture Muddying the waters of regression. What not to do when considering the relative importance of variables in a regression model. Today s Lecture What you can say about certain techniques we have discussed. Discussion of the limitations of regression analysis. Lecture #12-3/8/2005 Slide 2 of 33

Last several classes we focused on the predictive aspects of multiple regression analysis. is the ultimate goal of science. Part of explanation is determining the relative importance of variables. Statistical techniques important in explanation: Variance partitioning. Analysis of effects. Models Lecture #12-3/8/2005 Slide 3 of 33

From the lecture on partial correlation, recall that the R 2 for a multiple regression model can be decomposed into a set of squared semi-partial correlations (R 2 y.1234 represents the R 2 for a regression with Y as the dependent variable, and X 1, X 2, X 3, and X 4 as independent variables): R 2 y.1234 = r 2 y1 + r 2 y(2.1) + r2 y(3.12) + r2 y(3.123) Rearranging the ordering of the variables will produce the same value of R 2 : Models R 2 y.1234 = r 2 y4 + r 2 y(1.4) + r2 y(2.41) + r2 y(3.412) What does change, however, is the proportion of variance incremented by adding an additional variable. Lecture #12-3/8/2005 Slide 4 of 33

Order of Entered Variables Recall the Erie house data, our most familiar data set... We have data for 27 houses sold in the mid 1970 s in Erie, Pennsylvania: X 1 : Current taxes (local, school, and county) 100 (dollars). X 2 : Number of bathrooms. X 4 : Living space 1000 (square feet). Y : Actual sale price 1000 (dollars). Models Lecture #12-3/8/2005 Slide 5 of 33

Order of Entered Variables First, fit the following sequence of models: 1. Y = a + b 1 X 1 2. Y = a + b 1 X 1 + b 2 X 2 3. Y = a + b 1 X 1 + b 2 X 2 + b 4 X 4 The R 2 for each model: 1. 0.838 - X 1 by itself. 2. 0.856 - X 2 adds 0.018 to model with X 1. 3. 0.935 - X 4 adds 0.074 to model with X 1 and X 2. Models Lecture #12-3/8/2005 Slide 6 of 33

Order of Entered Variables Now, fit the following sequence of models: 1. Y = a + b 4 X 4 2. Y = a + b 4 X 4 + b 2 X 2 3. Y = a + b 4 X 4 + b 2 X 2 + b 1 X 1 The R 2 for each model: 1. 0.863 - X 4 by itself. 2. 0.903 - X 2 adds 0.040 to model with X 4. 3. 0.935 - X 1 adds 0.074 to model with X 4 and X 2. Models Lecture #12-3/8/2005 Slide 7 of 33

Order of Entered Variables Finally, fit the following sequence of models: 1. Y = a + b 2 X 2 2. Y = a + b 2 X 2 + b 1 X 1 3. Y = a + b 2 X 2 + b 1 X 1 + b 4 X 4 The R 2 for each model: 1. 0.854 - X 2 by itself. 2. 0.902 - X 1 adds 0.048 to model with X 2. 3. 0.935 - X 4 adds 0.033 to model with X 2 and X 1. Models Lecture #12-3/8/2005 Slide 8 of 33

Which Variable is Most Important? From the three sets of regressions we found that: The R 2 for the full model (with all three variables) did not change as a function of how the variables were entered. The incremental variance accounted for changed as a function of when the variable was entered into the model. Based on these three models, which variable would you say does the best at explaining the sale price of a home? Number of bathrooms? Models Current taxes? Living space? Lecture #12-3/8/2005 Slide 9 of 33

Limitations of R 2 Realizing that the increase in R 2 is a function of the other variables already in the model, one can come to understand several things about using R 2 to determine the relative importance of variables. R 2 sample specific. Even if all regression effects are the same, R 2 can be reduced. R 2 changes from sample to sample because of the variability of a given sample on: Models The variables under study. The variables not under study. Errors in the measurement of the dependent variable (Y ). Lecture #12-3/8/2005 Slide 10 of 33

Limitations of R 2 for So, R 2 is not all that it seems to be. A quote against variance partitioning: It would be better to simply concede that the notion of independent contribution to variance has no meaning when predictor variables are intercorrelated (Darlington, 1968, p. 179) Models Lecture #12-3/8/2005 Slide 11 of 33

Purposes of If the order of entry of a variable is important in incremental partitioning, how does one determine the order? To answer this question, distinguish between two purposes of such an analysis: 1. To study the effect of an independent variable (or multiple independent variables) on the dependent variable after having controlled or other variables. 2. To study the relative effects of a set of independent variables on the dependent variable. Models Lecture #12-3/8/2005 Slide 12 of 33

Purposes of Purpose #1 equates to that of control. Such an analysis [for control purposes] is not intended to provide information about the relative importance of variables, but rather about the effect of a variable(s) after having controlled for another variable(s). (Pedhazur, 1997, p. 245) The purpose of today s class is to demonstrate why incremental partitioning of variance is not valid for determining the relative importance of variables. Models Incremental partitioning of variance can be used for studying the effects of a set of variables while controlling for a set of variables. Lecture #12-3/8/2005 Slide 13 of 33

Deciding upon which variables are in need of controlling for is not arbitrary. Unless dictated clearly by theory, there is often no way of knowing which variables to control for in an analysis. Imagine the example given by Pedhazur (p. 245), of a study with three variables (exclusively): B - Student background characteristics. S - School quality. A - Academic achievement. Models Lecture #12-3/8/2005 Slide 14 of 33

Definitions In this example, causality is depicted by what is called a path diagram. Path diagrams are commonly used in analyses using Structural Equation Modeling (SEM) techniques. Time permitting, we will briefly discuss this topic in a future lecture. Therefore, to understand this example, two terms are in need of definition: Exogenous variable - a variable whose variability is assumed to be determined by causes outside of the causal model under consideration. Models Endogenous variable - a variable whose variability is to be explained by exogenous and other endogenous variables in the causal model. Lecture #12-3/8/2005 Slide 15 of 33

Model #1 B S Exogenous variables - B and S. Endogenous variable - A. A Models Curved line with arrowheads at both ends - correlated variables (correlation is assumed and not modeled - always the case with correlated exogenous variables). Straight line - causal path. Lecture #12-3/8/2005 Slide 16 of 33

Model #1 Often in non-experimental or observational research, exogenous variables are correlated. Typically, these variables are used as input into a regression analysis. The correlation between exogenous variables is taken as fact, and is not being explained by such a model. Because such exogenous variables are typically correlated, inferences about the relative effects of either are impossible to ascertain. Models Incremental variance partitioning cannot be used to describe the relative effects of each variable. Lecture #12-3/8/2005 Slide 17 of 33

Model #2 B S Exogenous variable - B. Endogenous variables - S and A. A Models Lecture #12-3/8/2005 Slide 18 of 33

Model #2 In this situation, the hypothesized model has B (background) having a direct effect on S (school quality). B does not have a direct effect on A (achievement). S mediates the effects of B on A. Because of the direct effect, this model allows for the study of the effect of S after controlling for B: R 2 A.BS R 2 A.B You may recall this as the squared semipartial correlation between A and S, with the effects of B removed from S, or r 2 A(S.B). Models Lecture #12-3/8/2005 Slide 19 of 33

Model #3 B S Exogenous variable - B. Endogenous variables - S and A. S mediates the effects of B on A. A Models B additionally has a direct on A. Think path analysis. Lecture #12-3/8/2005 Slide 20 of 33

Model #4 B S Exogenous variable - B. Endogenous variables - S and A. A Models Here, controlling for B would equate to calculating the partial correlation between S and A with the effect of B removed from both S and A, or r SA.B. Lecture #12-3/8/2005 Slide 21 of 33

Of the four models, which best represents our well known Erie house analysis? Ask yourself the following questions: Which variables are exogenous? Which variables are endogenous? What are the direct effects? Are the exogenous variables correlated? Models Multiple regression analyses are fairly straight in the answers to these questions. Only experimental research can sometimes control the correlation of exogenous variables through experimental design and control. Lecture #12-3/8/2005 Slide 22 of 33

Erie House Diagram Current Taxes Number of Bathrooms Living Space Actual Sale Price Exogenous variables - Current Taxes, Number of Bathrooms, Living Space. Models Endogenous variables - Actual Sale Price. Because of the correlated exogenous variables, describing the relative importance of variables by incremental variance is not possible. Lecture #12-3/8/2005 Slide 23 of 33

Commonality analysis is a method of partitioning the variance to find the proportion of variance accounted for in a dependent variable by unique portions of each of the independent variables. The method of commonality breaks down to analyzing each of the possible regression models that could be built to predict the dependent variable. The unique portion of each independent variable is equivalent to the proportion of variance incremented when it is the last variable entered in an analysis. Models This, literally, is the squared semipartial correlation between the dependent variable and the independent variable in question, partialing out all other independent variables from the variable in question. Lecture #12-3/8/2005 Slide 24 of 33

For each variable in a regression analysis, the uniqueness can be found. For each variable combination (two-way, three-way, etc...) the amount of variance in the dependent variable accounted for by the variable combination is called the communality. These values are all attainable from running multiple multiple regression analyses. Models Lecture #12-3/8/2005 Slide 25 of 33

To find the communality of a set of variables, or to find the uniqueness of a variable, there is an easy schematic formula to use. 1. For each variable of interest, use (1 X a ), where a symbolizes the variable of interest. 2. For all other variables use X b. 3. Multiply all together, and then multiply by -1. 4. Expand the multiple, the resulting factor is the combination of model RY.ab 2 to produce the communality or uniqueness. Models Lecture #12-3/8/2005 Slide 26 of 33

For our Erie example, with X 1, X 2, and X 3, the uniqueness of X 1 is found by: (1 X 1 )X 2 X 4 = (X 2 X 4 X 1 X 2 X 4 ) = X 1 X 2 X 4 X 2 X 4 So the uniqueness (U(1)) is given by (note, this is the squared semipartial correlation): U(1) = R 2 Y.124 R 2 Y.24 The communality of X 1 and X 2 is found by: (1 X 1 )(1 X 2 )X 4 = X 2 X 4 + X 1 X 4 X 4 X 1 X 2 X 4 Models So the communality (C(12)) is found by: C(12) = R 2 Y.24 + R 2 Y.14 R 2 Y.4 R 2 Y.124 Lecture #12-3/8/2005 Slide 27 of 33

X 1 X 2 X 4 Y X 1 1.000 X 2 0.876 1.000 X 4 0.832 0.901 1.000 Y 0.915 0.924 0.929 1.000 R 2 Y.12 = 0.902 R 2 Y.14 = 0.928 R 2 Y.24 = 0.903 Models R 2 Y.124 = 0.935 Lecture #12-3/8/2005 Slide 28 of 33

X 1 X 2 X 4 U(1) 0.032 U(2) 0.007 U(4) 0.033 C(12) 0.033 0.033 C(14) 0.016 0.016 C(24) 0.058 0.058 C(124) 0.756 0.756 0.756 ˆρ 0.915 0.924 0.929 ˆρ 2 0.837 0.853 0.863 Models Lecture #12-3/8/2005 Slide 29 of 33

Comments When the independent variables are uncorrelated, the uniqueness of each is equal to the squared zero-order correlation between it and the dependent variable. When the independent variables are uncorrelated, all commonalities are zero. Models Lecture #12-3/8/2005 Slide 30 of 33

Commonality analysis again cannot answer the relative importance of the variables. Such analyses are useful for predictive research, but may not be too helpful in explanation. Commonality analysis, regardless of theoretical model, yields exactly the same results Commonalities can have negative signs (suppressor variables). Models Exponential increase in commonalities accompanies linear increase in independent variables. Lecture #12-3/8/2005 Slide 31 of 33

Final Thought Variance partitioning is good for describing model effects with other variables controlled statistically. Final Thought Next Class Variance partitioning is not good for determining Pedhazur provides many examples of published research reports where incremental variance is used inappropriately. This is suggested reading. Lecture #12-3/8/2005 Slide 32 of 33

Next Time More on explanation. Chapter 10 - Analysis of Effects. More questions, not many more answers. Final Thought Next Class Lecture #12-3/8/2005 Slide 33 of 33