Correlation and Regression Bangkok, 14-18, Sept. 2015

Size: px
Start display at page:

Download "Correlation and Regression Bangkok, 14-18, Sept. 2015"

Transcription

1 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept Australian Council for Educational Research

2 Correlation The strength of a mutual relation between 2 (or more) things You need to know 2 things about each unit of analysis student (e.g. maths and reading performance) school (e.g. funding level and mean reading performance) country (e.g. mean performance in 2010 and in 2013) No assumption about the direction of the relationship Correlation is simply standardised covariance i.e., covariance divided by the product of the standard deviations of the variables.

3 Formulas Variances: 2 σ ( = X N X 1 ) 2 Standard deviation ( = X σ N X 1 ) 2 Covariances: cov( x, y) ( )( = X X Y N 1 Y ) Correlation (Pearson s r) r = cov( x, y) σ y σ x

4 A note on sample vs population estimators Sample variances: ( X X ) 2 σ = N 2 Sample covariances: cov( x, y) ( X = X )( Y N Y ) Estimate of variance based on a sample is biased, it underestimates the true variance Needs a correction factor of produce an unbiased estimate N N 1 to

5 Type of correlation The correlation coefficient to use depends on the level of measurement of the variables Ordinal ranks, Likert scales, ordered categories Spearman correlation (ρ), Kendall s tau (τ) Interval/Ratio metric scales, measures of magnitude Pearson correlation (ρ)

6 Things to remember Independence are the two values independent of each other? Linearity is the relationship between the two values linear? Normality are the two values distributed normally? (if not, non-parametric correlation should be used)

7 Correlation values 0 = no relationship 1.0 = perfect positive relationship -1.0 = perfect negative relationship 0.1 = weak relationship (if significant) 0.3 = moderate relationship (if significant) 0.5 = strong relationship (if significant)

8 Strong correlation r =.80

9 Perfect correlations r = 1 r = -1

10 Moderate correlation r =.36

11 No correlation r =.06

12 Correlation vs Regression Correlation is not directional. The degree of association goes both ways. Correlation is not appropriate if the substantive meaning of X being associated with Y is different from Y being associated with X. For example, Height and Weight. Not appropriate when one of the variables is being manipulated, or being used to explain the other. Use regression instead.

13 Practical exercises Be careful about spurious correlations. Just because two variables correlate highly does not mean there is a valid relationship between them. Correlation is not causation. With large enough data, anything can be significantly correlated with something.

14 Regression Also describes a relationship between 2 things (or more), but assumes a direction Explain one variable with one (or more) other variable(s) How well does SES predict performance?

15 Regression cont. Two main statistics Size of the effect or slope Strength of the effect or explained variance

16 The General Idea Simple regression considers the relation between a single explanatory variable and response variable

17 Line of best fit (OLS)

18 Line of best fit (OLS)

19 Size of the effect 1 unit 50 = slope

20 Size of the effect cont. 1 unit 25 = slope

21 The R 2 The proportion of the total sample variance that is not explained by the regression will be: RRRRRRRRRRRRRRRR ssssss oooo ssssssssssssss TTTTTTTTTT ssssss oooo ssssssssssssss Therefore, the proportion of the variance in the dependent variable that is explained by the independent variable (R 2 ) will be: RR 2 = 1 RRRRRRRRRRRRRRRR ssssss oooo ssssssssssssss TTTTTTTTTT ssssss oooo ssssssssssssss

22 Strength of the effect For example, if the residual variance is a small proportion of the total variance R 2 = 1 (162.5/1250) R-squared = % of the variation in reading is explained by ESCS

23 Strength cont. For example, if the residual variance is a large proportion of the total variance R 2 = 1 (1075/1250) R 2 = 0.14 Only 14% of the variation in reading is explained by ESCS

24 Multiple Regression Multiple regression simultaneously considers the influence of multiple explanatory variables on a response variable Y The intent is to look at the independent effect of each variable while adjusting out the influence of potential confounders Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publi

25 Regression Modeling A simple regression model (one independent variable) fits a regression line in 2- dimensional space residual A multiple regression model with two explanatory variables fits a regression plane in 3-dimensional space This concept can be extended indefinitely but visualisation is no longer possible for >3 variables. Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

26 Multiple Regression Model Again, estimates for the multiple slope coefficients are derived by minimizing residuals 2 to derive this multiple regression model: Again, the standard error of the regression is based on the residuals 2 of all x n : Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publi

27 Multiple Regression Model Intercept α predicts where the regression plane crosses the Y axis Slope for variable X 1 (β 1 ) predicts the change in Y per unit X 1 holding X 2 constant The slope for variable X 2 (β 2 ) predicts the change in Y per unit X 2 holding X 1 constant Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

28 Main purpose of regression analysis Prediction Developing a prediction model based on a set of predictor/independent variables. This purpose also allows for the evaluation of the predictive powers between different models as well as different sets of predictors within a model. Explanation Validating or confirming an existing prediction model using new data. This purpose also allows for the assessment of the relationship between predictor and outcome variables.

29 Regression works provided assumptions are met Linearity Check using partial regression plots (PLOTS Produce all partial plots) Uniform variance (homoscedasticity) Check by plotting residuals against the predicted value (PLOTS Y:ZRESID, X:ZPRED) For ANOVA, check using Levene s test for homogeneity of variance (EXPLORE PLOTS Spread vs Level) Independence of error terms Check by plotting residuals against a sequencing variable (PLOTS Produce all partial plots) Normality of the residuals Check using Normal P-P plots of the residuals (PLOTS Normal probability plot)

30 Sample size Thorough method: a priori power analysis Compute sample sizes for given effect sizes, alpha levels, and power values (G*Power 3: Fast method (but less thorough): rules of thumb For R 2 significance testing: k For b-values significance testing : k For both, use the larger number

31 Multicollinearity y= b 0 + b 1 x 1 y= b 0 + b 1 x 1 + b 2 x 2 but if x 2 = x y= b 0 + b 1 x 1 + b 2 (x 1 +3) y= b 0 + b 1 x 1 + b 2 x 1 + 3b 2 Checking for multicollinearity For overall multicollinearity: VIF>10; Tolerance <0.10. For individual variables: Identify Condition Index >15, then check the Variance Proportions of each coefficient >.90.

32 Influential values Influential values are outliers that have substantial effect on the regression line. Source: Field, A. (2005). Discovering statistics using SPSS. (2nd ed). London: Sage.

33 When does linear regression modelling become inappropriate? When the dependent variable is dichotomous or polytomous (use Logistical Regression). When data are sequential over time and variables are auto correlated (use Time Series Analysis). When context effects need to be analysed and slopes are different across higher level units (use Multi-level Analysis).

34 Application: Illustrative Example Childhood respiratory health survey. Binary explanatory variable (SMOKE) is coded 0 for non-smoker and 1 for smoker Response variable Forced Expiratory Volume (FEV) is measured in liters/second (lung capacity) Regress FEV on SMOKE least squares regression line: ŷ = x The mean FEV in nonsmokers is The mean FEV in smokers is Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

35 Example, cont. ŷ = x Intercept (2.566) = the mean FEV of group 0 Slope = the mean difference in FEV (because x is 0,1) = t stat = with 652 df, p <.01 (b 1 is significant) The 95% CI for slope is to Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

36 Smoking increases lung capacity? Children who smoked had higher mean FEV How can this be true given what we know about the deleterious respiratory effects of smoking? ANS: Smokers were older than the nonsmokers AGE confounded the relationship between SMOKE and FEV A multiple regression model can be used to adjust for AGE in this situation Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

37 Extending the analysis: Multiple regression SPSS output for our example: Intercept a Slope b 1 Slope b 2 The multiple regression model is: FEV = (SMOKE) +.231(AGE) Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

38 Multiple Regression Coefficients, cont. The slope coefficient associated for SMOKE is.209, suggesting that smokers have.209 less FEV on average compared to nonsmokers (after adjusting for age) The slope coefficient for AGE is.231, suggesting that each year of age in associated with an increase of.231 FEV units on average (after adjusting for SMOKE) Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

39 Model 1 Inference About the Coefficients Inferential statistics are calculated for each regression coefficient. For example, in testing H 0 : β 1 = 0 (SMOKE coefficient controlling for AGE) (Constant) smoke age a. Dependent Variable: fev t stat = and P = Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig df = n k 1 = = 651 Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

40 Inference About the Coefficients The 95% confidence interval for this slope of SMOKE controlling for AGE is to Model 1 (Constant) smoke age Coefficients a a. Dependent Variable: fev 95% Confidence Interval for B Lower Bound Upper Bound Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publi

41 Assessing the significance of the model R Square (R 2 ) represents the proportion of variance in the outcome variable that is accounted for by the predictors in the model. For example, if for our previous model R 2 =.23, then 23% of the variance in FEV is accounted for by smoking status and age. Adjusted R 2 compensates for the inflation of R 2 due to overfitting. Useful for comparing the amount of variance explained across several models. Standard error of the estimate measure of accuracy of the predictions. For example, if the SE of the estimate = 0.35 for our previous model: FEV = (SMOKE) +.231(AGE) then the predicted FEV for a non-smoker aged 12 years is FEV= /- (t x 0.35)

42 Assessing the significance of the model Hierarchical models Suppose Model 1: FEV = (SMOKE) +.231(AGE), R 2 =.23 Model 2: FEV = (SMOKE) +.231(AGE) +.04(GENDER), R 2 =.29 What is the amount of unique variance explained by gender above and beyond that explained by smoking status and age? FEV FEV GENDER SMOKE AGE SMOKE AGE

43 Hierarchical regression in SPSS

44 Dummy Variables More than two levels For categorical variables with k categories, use k 1 dummy variables Ex. SMOKE2 has three levels, initially coded 0 = non-smoker 1 = former smoker 2 = current smoker Use k 1 = 3 1 = 2 dummy variables to code this information like this: Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.

45 Use of standardised coefficients Often thought to be easier to interpret. Standardisation depends on variances of independent variables. Unstandardised coefficient can be translated directly. Unstandardised coefficients cannot always be compared if different units are used for the variables.

46 The set of predictors must be chosen based on theory Avoid the whatever sticks to the wall approach. The grouping of predictors and the ordering of entry will matter. Selecting the best final model can sometimes be a judgment call. Finding the best regression model

47 How to judge whether a model is good? Explained variance proportion as measures by R 2 Size of regression coefficients. Significance tests (F-test for model, t- tests for parameters) Inclusion of all relevant variables (Theory!) Is method appropriate?

48 The six steps to interpreting results 1. Look at the prediction equation to see an estimate of the relationship. 2. Refer to the standard error of the estimate (in the appropriate model) when making predictions for individuals. 3. Refer to the standard errors of the coefficients (in the most complete model) to see how much you can trust the estimates of the effects of the explanatory variables. 4. Look at the significance levels of the t-ratios to see how strong is the evidence in support of including each of the explanatory variables in the model. 5. Use the coefficient of determination (R 2 ) to measure the potential explanatory power of the model. 6. Compare the beta-weights of the explanatory variables in order to rank them in order of explanatory importance.

49 Notes on interpreting the results Prediction is NOT causation. In inferring causation, there has to be at least temporal precedence, but temporal precedence alone is still not sufficient. Avoid extrapolating the prediction equation beyond the data range. Always consider the standard errors and the confidence intervals of the parameter estimates. The magnitude of the coefficient of determination (R 2 ), in terms of explanatory power, is a judgment call.

50 Practice exercises! Study: Mathematics Beliefs and Achievement of Elementary School Students in Japan and the United States: Results From the Third International Mathematics and Science Study (TIMSS). House, J. D., 2006 Interpret the parameter estimates Interpret the statistical significance of the predictors Make substantive interpretation about the findings

51 Extensions: Regression Multiple regression considers the relation between a set of explanatory variables and response or outcome variable Independent predictor (x 1 ) Outcome (y) Independent predictor (x 2 )

52 Moderating effect Moderated regression When the independent variable does not affect the outcome directly but rather affects the relationship between the predictor and the outcome. Independent predictor (x 1 ) Outcome (y) Independent variable (x 2 )

53 Moderating effect Simple Moderating effect When a categorical independent variable affects the relationship between the predictor and the outcome. C1 C2 Y C3 X

54 Moderating effects Categorical moderator Continuous moderator y = actual scaled score in the Multidimensional Perfectionism Scale (Hewitt & Flett)

55 Types of moderators (Sharma et al., 1981) Related to predictor and/or outcome Not related predictor and/or outcome No interaction with predictor Independent predictor Homologizer Interaction with predictor variable Quasi-moderator Pure moderator Homologizer variables affect the strength (rather than the form) of the relationship between predictor and outcome (Zedeck, 1971)

56 Testing Moderation Moderation effects are also known as interaction effects. Interaction terms are product terms of the moderator and the relevant predictor (the variable that the moderator interacts) Y = b 0 + b 1 x 1 + b 2 x 2 + b 3 m Interaction term = x 1 *m =i 1 Choosing the moderator and the relevant predictor must have theoretical support. For example, it is possible that the moderator interacts with x 2 instead (i.e., x 2 *m =i 1 ). Testing for the interaction effect necessitates the inclusion of the interaction term/s in the regression equation: Y = b 0 + b 1 x 1 + b 2 x 2 + b 3 m + b 4 i 1 And test H 0 : b 4 =0

57 Mediating effect Mediated regression When the independent predictor does not affect the outcome directly but affects it through an intermediary variable (the mediator). Intermediary predictor (x 2 ) Independent predictor (x 1 ) Outcome (y)

58 Mediation vs Moderation Mediators explain why or how an independent variable X causes the outcome Y while a moderator variable affects the magnitude and direction of the relationship between X and Y (Saunders, 1956). These two approaches can be combined for more complex analyses: Moderated mediation Mediated moderation

59 Checkists Moderation Collinearity between predictor and moderator (especially true for quasi-moderators). Unequal variances between groups based on the moderator. Reliability of measures (measurement errors are magnified when creating the product terms). Mediation Theoretical assumptions on the mediator Rationale for selecting the mediator Significance and type (full/partial) of the mediation effect. Implied causation (i.e., directional paths).

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to

More information

Practical Biostatistics

Practical Biostatistics Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Readings Howitt & Cramer (2014) Overview

Readings Howitt & Cramer (2014) Overview Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance

More information

Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (2014) Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Introduction to Regression

Introduction to Regression Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,

More information

Interactions between Binary & Quantitative Predictors

Interactions between Binary & Quantitative Predictors Interactions between Binary & Quantitative Predictors The purpose of the study was to examine the possible joint effects of the difficulty of the practice task and the amount of practice, upon the performance

More information

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV) Program L13 Relationships between two variables Correlation, cont d Regression Relationships between more than two variables Multiple linear regression Two numerical variables Linear or curved relationship?

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Multiple linear regression

Multiple linear regression Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat

More information

Data Analysis as a Decision Making Process

Data Analysis as a Decision Making Process Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Correlation: Relationships between Variables

Correlation: Relationships between Variables Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are

More information

Multiple Linear Regression I. Lecture 7. Correlation (Review) Overview

Multiple Linear Regression I. Lecture 7. Correlation (Review) Overview Multiple Linear Regression I Lecture 7 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 017 Creative Commons Attribution

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings

Multiple Linear Regression II. Lecture 8. Overview. Readings Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution

More information

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I

Multiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to moderator effects Hierarchical Regression analysis with continuous moderator Hierarchical Regression analysis with categorical

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors

Chapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors Chapter 4 Regression with Categorical Predictor Variables Page. Overview of regression with categorical predictors 4-. Dummy coding 4-3 4-5 A. Karpinski Regression with Categorical Predictor Variables.

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed) Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships

More information

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line

More information

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression. PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the

More information

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Reminder: Student Instructional Rating Surveys

Reminder: Student Instructional Rating Surveys Reminder: Student Instructional Rating Surveys You have until May 7 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

Classification & Regression. Multicollinearity Intro to Nominal Data

Classification & Regression. Multicollinearity Intro to Nominal Data Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,

More information

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Correlation Data files for today CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Defining Correlation Co-variation or co-relation between two variables These variables change together

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

WORKSHOP 3 Measuring Association

WORKSHOP 3 Measuring Association WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression

More information

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Job Training Partnership Act (JTPA)

Job Training Partnership Act (JTPA) Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among

More information

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi) Regression (, Lingkungan kerja dan ) Descriptive Statistics Mean Std. Deviation N 3.87.333 32 3.47.672 32 3.78.585 32 s Pearson Sig. (-tailed) N Kemampuan Lingkungan Individu Kerja.000.432.49.432.000.3.49.3.000..000.000.000..000.000.000.

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Types of Statistical Tests DR. MIKE MARRAPODI

Types of Statistical Tests DR. MIKE MARRAPODI Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample

More information

Chapter 16: Correlation

Chapter 16: Correlation Chapter : Correlation So far We ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e. for the population from which the sample came) Which answers

More information

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

Regression in R. Seth Margolis GradQuant May 31,

Regression in R. Seth Margolis GradQuant May 31, Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence

More information

TOPIC 9 SIMPLE REGRESSION & CORRELATION

TOPIC 9 SIMPLE REGRESSION & CORRELATION TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable].

More information

11 Correlation and Regression

11 Correlation and Regression Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value

More information

Workshop Research Methods and Statistical Analysis

Workshop Research Methods and Statistical Analysis Workshop Research Methods and Statistical Analysis Session 2 Data Analysis Sandra Poeschl 08.04.2013 Page 1 Research process Research Question State of Research / Theoretical Background Design Data Collection

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

Inter Item Correlation Matrix (R )

Inter Item Correlation Matrix (R ) 7 1. I have the ability to influence my child s well-being. 2. Whether my child avoids injury is just a matter of luck. 3. Luck plays a big part in determining how healthy my child is. 4. I can do a lot

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information