Correlation and Regression Bangkok, 14-18, Sept. 2015
|
|
- Mae Flowers
- 5 years ago
- Views:
Transcription
1 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept Australian Council for Educational Research
2 Correlation The strength of a mutual relation between 2 (or more) things You need to know 2 things about each unit of analysis student (e.g. maths and reading performance) school (e.g. funding level and mean reading performance) country (e.g. mean performance in 2010 and in 2013) No assumption about the direction of the relationship Correlation is simply standardised covariance i.e., covariance divided by the product of the standard deviations of the variables.
3 Formulas Variances: 2 σ ( = X N X 1 ) 2 Standard deviation ( = X σ N X 1 ) 2 Covariances: cov( x, y) ( )( = X X Y N 1 Y ) Correlation (Pearson s r) r = cov( x, y) σ y σ x
4 A note on sample vs population estimators Sample variances: ( X X ) 2 σ = N 2 Sample covariances: cov( x, y) ( X = X )( Y N Y ) Estimate of variance based on a sample is biased, it underestimates the true variance Needs a correction factor of produce an unbiased estimate N N 1 to
5 Type of correlation The correlation coefficient to use depends on the level of measurement of the variables Ordinal ranks, Likert scales, ordered categories Spearman correlation (ρ), Kendall s tau (τ) Interval/Ratio metric scales, measures of magnitude Pearson correlation (ρ)
6 Things to remember Independence are the two values independent of each other? Linearity is the relationship between the two values linear? Normality are the two values distributed normally? (if not, non-parametric correlation should be used)
7 Correlation values 0 = no relationship 1.0 = perfect positive relationship -1.0 = perfect negative relationship 0.1 = weak relationship (if significant) 0.3 = moderate relationship (if significant) 0.5 = strong relationship (if significant)
8 Strong correlation r =.80
9 Perfect correlations r = 1 r = -1
10 Moderate correlation r =.36
11 No correlation r =.06
12 Correlation vs Regression Correlation is not directional. The degree of association goes both ways. Correlation is not appropriate if the substantive meaning of X being associated with Y is different from Y being associated with X. For example, Height and Weight. Not appropriate when one of the variables is being manipulated, or being used to explain the other. Use regression instead.
13 Practical exercises Be careful about spurious correlations. Just because two variables correlate highly does not mean there is a valid relationship between them. Correlation is not causation. With large enough data, anything can be significantly correlated with something.
14 Regression Also describes a relationship between 2 things (or more), but assumes a direction Explain one variable with one (or more) other variable(s) How well does SES predict performance?
15 Regression cont. Two main statistics Size of the effect or slope Strength of the effect or explained variance
16 The General Idea Simple regression considers the relation between a single explanatory variable and response variable
17 Line of best fit (OLS)
18 Line of best fit (OLS)
19 Size of the effect 1 unit 50 = slope
20 Size of the effect cont. 1 unit 25 = slope
21 The R 2 The proportion of the total sample variance that is not explained by the regression will be: RRRRRRRRRRRRRRRR ssssss oooo ssssssssssssss TTTTTTTTTT ssssss oooo ssssssssssssss Therefore, the proportion of the variance in the dependent variable that is explained by the independent variable (R 2 ) will be: RR 2 = 1 RRRRRRRRRRRRRRRR ssssss oooo ssssssssssssss TTTTTTTTTT ssssss oooo ssssssssssssss
22 Strength of the effect For example, if the residual variance is a small proportion of the total variance R 2 = 1 (162.5/1250) R-squared = % of the variation in reading is explained by ESCS
23 Strength cont. For example, if the residual variance is a large proportion of the total variance R 2 = 1 (1075/1250) R 2 = 0.14 Only 14% of the variation in reading is explained by ESCS
24 Multiple Regression Multiple regression simultaneously considers the influence of multiple explanatory variables on a response variable Y The intent is to look at the independent effect of each variable while adjusting out the influence of potential confounders Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publi
25 Regression Modeling A simple regression model (one independent variable) fits a regression line in 2- dimensional space residual A multiple regression model with two explanatory variables fits a regression plane in 3-dimensional space This concept can be extended indefinitely but visualisation is no longer possible for >3 variables. Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
26 Multiple Regression Model Again, estimates for the multiple slope coefficients are derived by minimizing residuals 2 to derive this multiple regression model: Again, the standard error of the regression is based on the residuals 2 of all x n : Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publi
27 Multiple Regression Model Intercept α predicts where the regression plane crosses the Y axis Slope for variable X 1 (β 1 ) predicts the change in Y per unit X 1 holding X 2 constant The slope for variable X 2 (β 2 ) predicts the change in Y per unit X 2 holding X 1 constant Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
28 Main purpose of regression analysis Prediction Developing a prediction model based on a set of predictor/independent variables. This purpose also allows for the evaluation of the predictive powers between different models as well as different sets of predictors within a model. Explanation Validating or confirming an existing prediction model using new data. This purpose also allows for the assessment of the relationship between predictor and outcome variables.
29 Regression works provided assumptions are met Linearity Check using partial regression plots (PLOTS Produce all partial plots) Uniform variance (homoscedasticity) Check by plotting residuals against the predicted value (PLOTS Y:ZRESID, X:ZPRED) For ANOVA, check using Levene s test for homogeneity of variance (EXPLORE PLOTS Spread vs Level) Independence of error terms Check by plotting residuals against a sequencing variable (PLOTS Produce all partial plots) Normality of the residuals Check using Normal P-P plots of the residuals (PLOTS Normal probability plot)
30 Sample size Thorough method: a priori power analysis Compute sample sizes for given effect sizes, alpha levels, and power values (G*Power 3: Fast method (but less thorough): rules of thumb For R 2 significance testing: k For b-values significance testing : k For both, use the larger number
31 Multicollinearity y= b 0 + b 1 x 1 y= b 0 + b 1 x 1 + b 2 x 2 but if x 2 = x y= b 0 + b 1 x 1 + b 2 (x 1 +3) y= b 0 + b 1 x 1 + b 2 x 1 + 3b 2 Checking for multicollinearity For overall multicollinearity: VIF>10; Tolerance <0.10. For individual variables: Identify Condition Index >15, then check the Variance Proportions of each coefficient >.90.
32 Influential values Influential values are outliers that have substantial effect on the regression line. Source: Field, A. (2005). Discovering statistics using SPSS. (2nd ed). London: Sage.
33 When does linear regression modelling become inappropriate? When the dependent variable is dichotomous or polytomous (use Logistical Regression). When data are sequential over time and variables are auto correlated (use Time Series Analysis). When context effects need to be analysed and slopes are different across higher level units (use Multi-level Analysis).
34 Application: Illustrative Example Childhood respiratory health survey. Binary explanatory variable (SMOKE) is coded 0 for non-smoker and 1 for smoker Response variable Forced Expiratory Volume (FEV) is measured in liters/second (lung capacity) Regress FEV on SMOKE least squares regression line: ŷ = x The mean FEV in nonsmokers is The mean FEV in smokers is Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
35 Example, cont. ŷ = x Intercept (2.566) = the mean FEV of group 0 Slope = the mean difference in FEV (because x is 0,1) = t stat = with 652 df, p <.01 (b 1 is significant) The 95% CI for slope is to Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
36 Smoking increases lung capacity? Children who smoked had higher mean FEV How can this be true given what we know about the deleterious respiratory effects of smoking? ANS: Smokers were older than the nonsmokers AGE confounded the relationship between SMOKE and FEV A multiple regression model can be used to adjust for AGE in this situation Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
37 Extending the analysis: Multiple regression SPSS output for our example: Intercept a Slope b 1 Slope b 2 The multiple regression model is: FEV = (SMOKE) +.231(AGE) Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
38 Multiple Regression Coefficients, cont. The slope coefficient associated for SMOKE is.209, suggesting that smokers have.209 less FEV on average compared to nonsmokers (after adjusting for age) The slope coefficient for AGE is.231, suggesting that each year of age in associated with an increase of.231 FEV units on average (after adjusting for SMOKE) Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
39 Model 1 Inference About the Coefficients Inferential statistics are calculated for each regression coefficient. For example, in testing H 0 : β 1 = 0 (SMOKE coefficient controlling for AGE) (Constant) smoke age a. Dependent Variable: fev t stat = and P = Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig df = n k 1 = = 651 Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
40 Inference About the Coefficients The 95% confidence interval for this slope of SMOKE controlling for AGE is to Model 1 (Constant) smoke age Coefficients a a. Dependent Variable: fev 95% Confidence Interval for B Lower Bound Upper Bound Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publi
41 Assessing the significance of the model R Square (R 2 ) represents the proportion of variance in the outcome variable that is accounted for by the predictors in the model. For example, if for our previous model R 2 =.23, then 23% of the variance in FEV is accounted for by smoking status and age. Adjusted R 2 compensates for the inflation of R 2 due to overfitting. Useful for comparing the amount of variance explained across several models. Standard error of the estimate measure of accuracy of the predictions. For example, if the SE of the estimate = 0.35 for our previous model: FEV = (SMOKE) +.231(AGE) then the predicted FEV for a non-smoker aged 12 years is FEV= /- (t x 0.35)
42 Assessing the significance of the model Hierarchical models Suppose Model 1: FEV = (SMOKE) +.231(AGE), R 2 =.23 Model 2: FEV = (SMOKE) +.231(AGE) +.04(GENDER), R 2 =.29 What is the amount of unique variance explained by gender above and beyond that explained by smoking status and age? FEV FEV GENDER SMOKE AGE SMOKE AGE
43 Hierarchical regression in SPSS
44 Dummy Variables More than two levels For categorical variables with k categories, use k 1 dummy variables Ex. SMOKE2 has three levels, initially coded 0 = non-smoker 1 = former smoker 2 = current smoker Use k 1 = 3 1 = 2 dummy variables to code this information like this: Source: Gertsman, B. (2008). Basic biostatistics: Statistics for public health practice. Sudbury, MA: Jones and Bartlett Publishers.
45 Use of standardised coefficients Often thought to be easier to interpret. Standardisation depends on variances of independent variables. Unstandardised coefficient can be translated directly. Unstandardised coefficients cannot always be compared if different units are used for the variables.
46 The set of predictors must be chosen based on theory Avoid the whatever sticks to the wall approach. The grouping of predictors and the ordering of entry will matter. Selecting the best final model can sometimes be a judgment call. Finding the best regression model
47 How to judge whether a model is good? Explained variance proportion as measures by R 2 Size of regression coefficients. Significance tests (F-test for model, t- tests for parameters) Inclusion of all relevant variables (Theory!) Is method appropriate?
48 The six steps to interpreting results 1. Look at the prediction equation to see an estimate of the relationship. 2. Refer to the standard error of the estimate (in the appropriate model) when making predictions for individuals. 3. Refer to the standard errors of the coefficients (in the most complete model) to see how much you can trust the estimates of the effects of the explanatory variables. 4. Look at the significance levels of the t-ratios to see how strong is the evidence in support of including each of the explanatory variables in the model. 5. Use the coefficient of determination (R 2 ) to measure the potential explanatory power of the model. 6. Compare the beta-weights of the explanatory variables in order to rank them in order of explanatory importance.
49 Notes on interpreting the results Prediction is NOT causation. In inferring causation, there has to be at least temporal precedence, but temporal precedence alone is still not sufficient. Avoid extrapolating the prediction equation beyond the data range. Always consider the standard errors and the confidence intervals of the parameter estimates. The magnitude of the coefficient of determination (R 2 ), in terms of explanatory power, is a judgment call.
50 Practice exercises! Study: Mathematics Beliefs and Achievement of Elementary School Students in Japan and the United States: Results From the Third International Mathematics and Science Study (TIMSS). House, J. D., 2006 Interpret the parameter estimates Interpret the statistical significance of the predictors Make substantive interpretation about the findings
51 Extensions: Regression Multiple regression considers the relation between a set of explanatory variables and response or outcome variable Independent predictor (x 1 ) Outcome (y) Independent predictor (x 2 )
52 Moderating effect Moderated regression When the independent variable does not affect the outcome directly but rather affects the relationship between the predictor and the outcome. Independent predictor (x 1 ) Outcome (y) Independent variable (x 2 )
53 Moderating effect Simple Moderating effect When a categorical independent variable affects the relationship between the predictor and the outcome. C1 C2 Y C3 X
54 Moderating effects Categorical moderator Continuous moderator y = actual scaled score in the Multidimensional Perfectionism Scale (Hewitt & Flett)
55 Types of moderators (Sharma et al., 1981) Related to predictor and/or outcome Not related predictor and/or outcome No interaction with predictor Independent predictor Homologizer Interaction with predictor variable Quasi-moderator Pure moderator Homologizer variables affect the strength (rather than the form) of the relationship between predictor and outcome (Zedeck, 1971)
56 Testing Moderation Moderation effects are also known as interaction effects. Interaction terms are product terms of the moderator and the relevant predictor (the variable that the moderator interacts) Y = b 0 + b 1 x 1 + b 2 x 2 + b 3 m Interaction term = x 1 *m =i 1 Choosing the moderator and the relevant predictor must have theoretical support. For example, it is possible that the moderator interacts with x 2 instead (i.e., x 2 *m =i 1 ). Testing for the interaction effect necessitates the inclusion of the interaction term/s in the regression equation: Y = b 0 + b 1 x 1 + b 2 x 2 + b 3 m + b 4 i 1 And test H 0 : b 4 =0
57 Mediating effect Mediated regression When the independent predictor does not affect the outcome directly but affects it through an intermediary variable (the mediator). Intermediary predictor (x 2 ) Independent predictor (x 1 ) Outcome (y)
58 Mediation vs Moderation Mediators explain why or how an independent variable X causes the outcome Y while a moderator variable affects the magnitude and direction of the relationship between X and Y (Saunders, 1956). These two approaches can be combined for more complex analyses: Moderated mediation Mediated moderation
59 Checkists Moderation Collinearity between predictor and moderator (especially true for quasi-moderators). Unequal variances between groups based on the moderator. Reliability of measures (measurement errors are magnified when creating the product terms). Mediation Theoretical assumptions on the mediator Rationale for selecting the mediator Significance and type (full/partial) of the mediation effect. Implied causation (i.e., directional paths).
Single and multiple linear regression analysis
Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationPrepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti
Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to
More informationPractical Biostatistics
Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationReadings Howitt & Cramer (2014) Overview
Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance
More informationReadings Howitt & Cramer (2014)
Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance
More informationCorrelation and simple linear regression S5
Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More informationx3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators
Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.
More informationIntroduction to Regression
Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,
More informationInteractions between Binary & Quantitative Predictors
Interactions between Binary & Quantitative Predictors The purpose of the study was to examine the possible joint effects of the difficulty of the practice task and the amount of practice, upon the performance
More informationExample: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)
Program L13 Relationships between two variables Correlation, cont d Regression Relationships between more than two variables Multiple linear regression Two numerical variables Linear or curved relationship?
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationMultiple linear regression
Multiple linear regression Course MF 930: Introduction to statistics June 0 Tron Anders Moger Department of biostatistics, IMB University of Oslo Aims for this lecture: Continue where we left off. Repeat
More informationData Analysis as a Decision Making Process
Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically
More informationFAQ: Linear and Multiple Regression Analysis: Coefficients
Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable
More informationMultiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:
Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings
Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I
Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationCorrelation: Relationships between Variables
Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are
More informationMultiple Linear Regression I. Lecture 7. Correlation (Review) Overview
Multiple Linear Regression I Lecture 7 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 017 Creative Commons Attribution
More informationRegression Analysis. BUS 735: Business Decision Making and Research
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn
More informationUnivariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?
Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationAdvanced Quantitative Data Analysis
Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose
More informationSimple Linear Regression
Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings
Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I
Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution
More informationChapter 9 - Correlation and Regression
Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of
More informationMANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:
MULTIVARIATE ANALYSIS OF VARIANCE MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA: 1. Cell sizes : o
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationDaniel Boduszek University of Huddersfield
Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to moderator effects Hierarchical Regression analysis with continuous moderator Hierarchical Regression analysis with categorical
More informationsphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19
additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA
More informationChapter 4 Regression with Categorical Predictor Variables Page 1. Overview of regression with categorical predictors
Chapter 4 Regression with Categorical Predictor Variables Page. Overview of regression with categorical predictors 4-. Dummy coding 4-3 4-5 A. Karpinski Regression with Categorical Predictor Variables.
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationDEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1
DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationChecking model assumptions with regression diagnostics
@graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor
More informationArea1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)
Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships
More informationObjectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships
Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line
More informationPBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.
PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the
More informationY (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in
More informationOrdinary Least Squares Regression Explained: Vartanian
Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent
More informationRegression Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationChapter 19: Logistic regression
Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog
More informationReminder: Student Instructional Rating Surveys
Reminder: Student Instructional Rating Surveys You have until May 7 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationMore Statistics tutorial at Logistic Regression and the new:
Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual
More informationClassification & Regression. Multicollinearity Intro to Nominal Data
Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,
More informationData files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav
Correlation Data files for today CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Defining Correlation Co-variation or co-relation between two variables These variables change together
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationWORKSHOP 3 Measuring Association
WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression
More informationChapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models
Chapter 14 Multiple Regression Models 1 Multiple Regression Models A general additive multiple regression model, which relates a dependent variable y to k predictor variables,,, is given by the model equation
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationInference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3
Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency
More informationRegression Analysis: Exploring relationships between variables. Stat 251
Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationJob Training Partnership Act (JTPA)
Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training
More informationData Analysis 1 LINEAR REGRESSION. Chapter 03
Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative
More informationExample. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors
More informationANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among
More informationRegression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)
Regression (, Lingkungan kerja dan ) Descriptive Statistics Mean Std. Deviation N 3.87.333 32 3.47.672 32 3.78.585 32 s Pearson Sig. (-tailed) N Kemampuan Lingkungan Individu Kerja.000.432.49.432.000.3.49.3.000..000.000.000..000.000.000.
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationChapter 16: Correlation
Chapter : Correlation So far We ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e. for the population from which the sample came) Which answers
More informationMultivariate and Multivariable Regression. Stella Babalola Johns Hopkins University
Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationRegression in R. Seth Margolis GradQuant May 31,
Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence
More informationTOPIC 9 SIMPLE REGRESSION & CORRELATION
TOPIC 9 SIMPLE REGRESSION & CORRELATION Basic Linear Relationships Mathematical representation: Y = a + bx X is the independent variable [the variable whose value we can choose, or the input variable].
More information11 Correlation and Regression
Chapter 11 Correlation and Regression August 21, 2017 1 11 Correlation and Regression When comparing two variables, sometimes one variable (the explanatory variable) can be used to help predict the value
More informationWorkshop Research Methods and Statistical Analysis
Workshop Research Methods and Statistical Analysis Session 2 Data Analysis Sandra Poeschl 08.04.2013 Page 1 Research process Research Question State of Research / Theoretical Background Design Data Collection
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationInter Item Correlation Matrix (R )
7 1. I have the ability to influence my child s well-being. 2. Whether my child avoids injury is just a matter of luck. 3. Luck plays a big part in determining how healthy my child is. 4. I can do a lot
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More information