Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Similar documents
Regression Diagnostics Procedures

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Unit 6 - Introduction to linear regression

Analysing data: regression and correlation S6 and S7

9 Correlation and Regression

Unit 6 - Simple linear regression

Can you tell the relationship between students SAT scores and their college grades?

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Review of Multiple Regression

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

Analysis of Bivariate Data

Chapter 4. Regression Models. Learning Objectives

Correlation and Linear Regression

9. Linear Regression and Correlation

Regression Models. Chapter 4. Introduction. Introduction. Introduction

CS 5014: Research Methods in Computer Science

10. Alternative case influence statistics

Multiple Linear Regression

THE PEARSON CORRELATION COEFFICIENT

Regression. Marc H. Mehlman University of New Haven

Statistical Modelling in Stata 5: Linear Models

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Intro to Linear Regression

Correlation and Regression Bangkok, 14-18, Sept. 2015

STATISTICS 110/201 PRACTICE FINAL EXAM

Regression Model Building

Inferences for Regression

Review of Statistics 101

The Simple Linear Regression Model

Introduction to Regression

A discussion on multiple regression models

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

CHAPTER 6: SPECIFICATION VARIABLES

Simple Linear Regression Using Ordinary Least Squares

REVIEW 8/2/2017 陈芳华东师大英语系

Classification & Regression. Multicollinearity Intro to Nominal Data

2. TRUE or FALSE: Converting the units of one measured variable alters the correlation of between it and a second variable.

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Daniel Boduszek University of Huddersfield

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Multiple linear regression S6

1 Multiple Regression

Linear Regression Models

Single and multiple linear regression analysis

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Lecture 18: Simple Linear Regression

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Introduction and Single Predictor Regression. Correlation

Multiple Linear Regression CIVL 7012/8012

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Final Exam - Solutions

Intro to Linear Regression

Checking model assumptions with regression diagnostics

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Multivariate Data Analysis Joseph F. Hair Jr. William C. Black Barry J. Babin Rolph E. Anderson Seventh Edition

y response variable x 1, x 2,, x k -- a set of explanatory variables

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

1 Correlation and Inference from Regression

Chapter 15 - Multiple Regression

Chapter 1 Statistical Inference

Chapter 8: Correlation & Regression

Applied Regression Analysis

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Ordinary Least Squares Regression Explained: Vartanian

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 8: Correlation & Regression

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Chapter 4: Regression Models

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Simple Linear Regression

Linear Regression with Multiple Regressors

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Multiple Linear Regression I. Lecture 7. Correlation (Review) Overview

Chapter 8. Linear Regression /71

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

Chapter 19 Sir Migo Mendoza

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

School of Mathematical Sciences. Question 1. Best Subsets Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture 10 Multiple Linear Regression

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Transcription:

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting b coefficients ANOVA model test R 2 Diagnostics Distance(Discrepancy), Leverage, Influence, & Multicollinearity The correlation coefficient (Pearson r) Continuous IV & DV (Interval/Ratio level data) Dichotomous variables Strength of linear relationship between variables r 2 coefficient of determination r Z xz y N 1 1

Partial and semipartial correlations Partial Correlation r between two variables when one or more other variables is partialled out of both X and Y example: experimenter interested in investigating the relationship between income and success in college measured both variables, ran correlation it was statistically significant he began repeating these results to all of his students do well in college = large salaries what might the bright student in the back of one of his classes have raised as an issue? Partial correlation (cont.) IQ as a third variable effecting both college success and salary earned The partial correlation between college success and salary with IQ partialled out of both variables salary(y) IQ(x 2 ) salary(y) IQ(x 2 ) gpa(x 1 ) gpa(x 1 ) Partial correlation (cont.) correlation between the residuals is the partial correlation between income and success, partialling out IQ generally represented by r y1.23 p subscripts to the left of the dot represent variables being correlated those to the right of the dot are those being partialled out r y1.2 =correlation IQ(x 2 ) between salary and salary(y) GPA with IQ partialled D A out pr 2 B =c/(c + d) gpa(x 1 ) C 2

Semipartial correlation also called the part correlation, far more routinely used r y(1.2) represents the correlation between the criterion (y) and a partialled predictor variable the partial correlation has variable x 2 partialled out of both y and x 1 for the semipartial, x 2 is partialled only out of x 1 the correlation between y and the residuals from x 1 predicted by x 2 correlation between y and the part of x 1 that is independent of x 2 r y(1.2) =correlation between salary and IQ(x 2 ) GPA with IQ partialled out salary(y) sr 2 =c/(a + b + c + d) D A B gpa(x 1 ) C Semipartial correlation (cont.) can rearrange semipartial correlation formula: R 2 y.12 = r 2 y2 + r 2 y(1.2) R (Multiple correlation) based on information from variable 1 plus additional nonredundant information from variable 2 through variable p so: R 2 y.123 p = r 2 y1 + r 2 y(2.1) + r 2 y(3.12) + + r 2 y(p.123 p-1) If the predictors were completely independent of one another, there would be no shared variance to partial out would simply sum the r 2 for each X variable with Y to get the Multiple R 2 Simple linear regression Finding the equation for the line of best fit Least squares property Minimizes errors of estimation The line will have the form: y = a + bx Where: y' = predicted value of y a = intercept of the line b = slope of the line x = score of x we are using to predict y Multiple regression an extension of this 3

Multiple regression models Simple linear regression models generally represent an oversimplification of reality Usually unreasonable to think in terms of a single cause for any variable of interest Our equation for simple linear regression: y = a + bx becomes: y = a+b 1 x 1 +b 2 x 2 + +b k x k Multiple regression (cont.) y =a+b 1 x 1 +b 2 x 2 + +b k x k Where y = dependent variable (what we are predicting) x 1, x 2,, x k = predictors (independent variables or regressors) b 1, b 2,, b k = regression coefficients associated with the k predictors a = y intercept The predictors (x s) may represent interaction terms, cross products, powers, logs or other functions that provide predictive power Interpreting b coefficients If x 1 is held constant, every unit change in x 2 results in a b 2 unit change in y The constant, a (y intercept) does not necessarily have a meaningful interpretation any time 0 is outside the reasonable range for predictors the intercept will be essentially meaningless 4

Overall ANOVA Testing overall model adequacy Null hypothesis H 0 : b 1 =b 2 = =b k =0 Alternative hypoth H 1 : any b <> 0 F statistic (ANOVA) = MS regression /MS error MS regression = SS regression / k MS error = SS error / n-(k+1) Where n = number of observations k = number of parameters in model (excluding a) ANOVA (cont.) SS regression = (y - M y ) 2 SS error = (y - y ) 2 When the overall regression model provides no predictive power the two mean square quantities will be about equal We are testing an F statistic with (k, n-(k+1)) degrees of freedom ANOVA in multiple regression 5

R 2 and Adjusted R 2 Multiple R 2, coefficient of determination Total proportion of variance accounted for by all predictors in the model High R 2, values do not necessarily mean that a model will be useful for predicting Y Overfitting (adding essentially irrelevant predictors) can result in a high R 2 in the absence of any predictive power If we fit a regression model with n-1 predictors R will always be = 1.0 unless we have a rare data set where two cases are identical on all predictors but differ on Y value Adjusted R 2 takes into account our n and the number of predictors we have used in the model Misuses/problems Multicollinearity exists when two or more independent variables (predictors) contribute redundant information to the model can think of conceptually as a problem in assigning unique variance components to variable creates computational problems and coefficients that do not make sense resolve by removing redundant predictor(s) Predicting outside the range of data that was used to generate the regression model positive linear relationship relationship may change-resulting in substantial error of estimation Failure to explore alternative models need to be familiar with data examine relationships between variables be aware of potential interactions and include in model if necessary try different models Standardized betas with more than one independent variable: b coefficients cannot be directly compared due to different scales across measures standardized Beta coefficients allow us to compare the relative predictive power of variables in the equation we get standardized Betas by converting all of our predictors to z-scores and running the regression 6

Regression diagnostics help us assess the validity of our conclusions and their accuracy careful screening of the data cannot be overemphasized Outliers outliers: data points that lie outside the general linear pattern (midline is the regression line) removal of outliers can dramatically affect the performance of a regression model should be removed if there is reason to believe that other variables not in the model explain why the cases are unusual these cases may need a separate model Outliers (cont.) outliers may also suggest that additional explanatory variables need to be brought into the model multivariate outliers can be difficult if not impossible to identify by visual inspection ex. 115 lbs. and 6 feet tall either observation alone is not unusual, but together they are. unusual combinations might include odd pairings of scores on separate measures of ability or different indices of production multivariate outliers represent results that when considered as a whole do not make sense Distance, leverage, & influence most common measure of distance (also called discrepancy) is the simple residual distance between any point and the regression surface identifies outliers on the dependent (y) variable leverage statistic, h, also called hat-value, identifies cases which may influence the regression model more than others with reasonably large n ranges from essentially 0 (no influence on the model, actually the min is 1/n) to 1 7

Distance, leverage, & influence (cont.) leverage identifies outliers in the dv s <.2 fine, >.5 case should be examined cases that are high on distance or leverage can strongly influence the regression but do not necessarily do so points that are relatively high on both distance and leverage are very likely to have strong influence Influence on regression line Influence on regression line 8

Influence on regression line Cook s distance (D) most common measure of influence of a case it is a function of the squared change that would occur in y if the observation were removed from the data interested in finding cases with D values much larger than the rest cut-off influential cases, D greater than 4/(n - k - 1) where n is the number of cases and k is the number of independents Cook's distance obtained by issuing postestimation command in Stata Regression diagnostics 9

Diagnostics put to use Multicollinearity intercorrelation of independent variables R 2 s near 1 violate assumption of no perfect collinearity high R 2 s increase the standard error of the beta coefficients and make assessment of the unique role of each independent difficult or impossible simple correlations tell something about multicollinearity preferred method of assessing multicollinearity is to regress each independent on all the other independent variables in the equation Multicollinearity (cont.) inspection of the correlation matrix reveals only bivariate multicollinearity, for bivariate correlations > 0.90 to assess multivariate multicollinearity, one uses tolerance or VIF may not be any extremely high bivariate correlations if any variable can be represented as a linear combination of other variables in the model, perfect multicollinearity exists 10

Tolerance/Variance Inflation Factor (VIF) Tolerance 1 - R 2 for the regression of that independent variable on all other independents, ignoring the dependent as many tolerance coefficients as there are independents higher intercorrelation of the independents, the tolerance will approach zero part of the denominator for calculating the confidence limits on the b (partial regression) coefficient Variance inflation factor (VIF) reciprocal of tolerance when VIF is high there is high multicollinearity and instability of the b coefficients Variance inflation factor (VIF) Rj Tolerance VIF Impact on SEb 0.0 1.00 1.00 1.0 0.4 0.84 1.19 1.09 0.6 0.64 1.56 1.25 0.75 0.44 2.25 1.5 0.8 0.36 2.78 1.67 0.87 0.25 4.00 2.0 11