Correlation and Regression Notes. Categorical / Categorical Relationship (Chi-Squared Independence Test)

Similar documents
Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Nonlinear Regression Section 3 Quadratic Modeling

Chapter 4 Describing the Relation between Two Variables

Relationships Regression

BIVARIATE DATA data for two variables

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Analysis of Bivariate Data

Describing the Relationship between Two Variables

Chapter 2: Looking at Data Relationships (Part 3)

THE PEARSON CORRELATION COEFFICIENT

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Bivariate Data Summary

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Regression and Models with Multiple Factors. Ch. 17, 18

MATH 1150 Chapter 2 Notation and Terminology

Chapter 6: Exploring Data: Relationships Lesson Plan

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

UNIT 12 ~ More About Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

AMS 7 Correlation and Regression Lecture 8

Business Statistics. Lecture 10: Correlation and Linear Regression

Unit 6 - Introduction to linear regression

Chapter 5 Friday, May 21st

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature

Lecture 16 - Correlation and Regression

Midterm 2 - Solutions

ST Correlation and Regression

Chapter 5 Least Squares Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Sociology 6Z03 Review I

Lecture 4 Multiple linear regression

Chapter 9. Correlation and Regression

STAT5044: Regression and Anova. Inyoung Kim

Important note: Transcripts are not substitutes for textbook assignments. 1

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

Summarizing Data: Paired Quantitative Data

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

1) A residual plot: A)

Correlation and Regression

Statistics in medicine

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Linear Regression and Correlation. February 11, 2009

7.0 Lesson Plan. Regression. Residuals

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Stat 101: Lecture 6. Summer 2006

Recall, Positive/Negative Association:

Chapter 7 Linear Regression

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression

Topic 10 - Linear Regression

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

Latent Growth Models 1

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

AP Statistics - Chapter 2A Extra Practice

Regression. Marc H. Mehlman University of New Haven

Chapter 10 Correlation and Regression

9. Linear Regression and Correlation

Unit 6 - Simple linear regression

Chapter 3: Examining Relationships

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

Scatter plot of data from the study. Linear Regression

Simple Linear Regression

Module 19: Simple Linear Regression

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Data Analysis and Statistical Methods Statistics 651

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

CS 5014: Research Methods in Computer Science

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Inferences for Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Scatter plot of data from the study. Linear Regression

Regression Analysis: Exploring relationships between variables. Stat 251

A discussion on multiple regression models

y n 1 ( x i x )( y y i n 1 i y 2

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

Inference for Regression Simple Linear Regression

Six Sigma Black Belt Study Guides

y response variable x 1, x 2,, x k -- a set of explanatory variables

10.1: Scatter Plots & Trend Lines. Essential Question: How can you describe the relationship between two variables and use it to make predictions?

Factoring Review Types of Factoring: 1. GCF: a. b.

Nonlinear Regression Act4 Exponential Predictions (Statcrunch)

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Simple Linear Regression

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Lecture 20: Multiple linear regression

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

STAT 4385 Topic 03: Simple Linear Regression

Chapter 3: Describing Relationships

Correlation and simple linear regression S5

Lecture 4 Scatterplots, Association, and Correlation

Lecture 4 Scatterplots, Association, and Correlation

Transcription:

Relationship Hypothesis Tests Correlation and Regression Notes Categorical / Categorical Relationship (Chi-Squared Independence Test) Ho: Categorical Variables are independent (show distribution of conditional probabilities are the same) Ha: Categorical Variables are dependent (show distribution of conditional probabilities are different) Categorical / Quantitative Relationship (ANOVA) H : µ = µ = µ = µ = µ = µ 0 1 2 3 4 5 6 (categorical variable and quantitative variable are independent (not related) H : at least one is A (categorical variable and quantitative variable are dependent (related) Quantitative / Quantitative Relationship (Correlation Hypothesis Test) Regression Correlation : See if there is a linear relationship between two different quantitative variables. The study of that relationship is often called Correlation and Regression. Scatterplot : graph for visually seeing correlation or not I. Choosing your variables: Chose which variable will be x (explanatory variable or independent variable) and which variable will be y (response variable or dependent variable) Is one of the variables a natural response variable? Ex) Year (time) and unemployment rates in U.S. Let explanatory variable x be time (years) and let the response variable y be unemployment rate. Unemployment responds to time, but not the other way around.

If the variables respond to each other, pick the response variable to be the one you are most interested in or may want to make predictions about. Ex) The unemployment rate in U.S. and the national debt in the U.S. If you are studying national debt and factors that may be related to the national debt, then you should make the national debt be your response variable y (and that means that unemployment rate would be explanatory x). II. Graphing your data (Scatterplot and Correlation coefficient r ) Make ordered pairs from your x and y data (x, y) and create a scatterplot. Statcato: Graph scatterplot pick columns for x and y show regression curve linear OK StatCrunch: Graph scatterplot pick columns for x and y compute Correlation Study: see how well ordered pair quantitative data fit a line. (regression line) Correlation Coefficient (r) : number between -1 and +1 that measures the strength and direction of correlation. (Always look at the scatterplot with the r value, Do not just look at r value) r close to +1 (r = +0.893) Strong, Positive Correlation (line going up from left to right (positive slope) and the points in scatterplot are close to line), (r +0.6, +0.7, +0.8, +0.9 usually indicate pretty strong positive correlation) r close to -1 (r = -0.916) Strong Negative Correlation (line going down from left to right (negatve slope) and the points in the scatterplot are close to the line) (r 0.6, 0.7, 0.8, 0.9 usually indicate pretty strong negative correlation) r close to 0 (+0.037 or -0.009) No linear correlation. Points in the scatterplot do not follow any linear pattern (but still could be nonlinear). (r ±0.1, ±0.0 usually indicate no linear correlation) r ±0.2, ±0.3 usually indicate very weak linear correlation. There is some linear pattern but the points are very far from the regression line. r ±0.4, ±0.5 usually indicate moderate linear correlation. There is a linear pattern and points are only moderately close to the regression line.

III. R-Squared (Squaring the correlation coefficient r) R-squared : Percentage of variability in y (response) that can explained by the linear relationship with x (explanatory). Confounding Variables: Other variables that might influence the response variable (y) other than the explanatory variable (x) we are studying. IV. Standard Deviation of the residual errors (Se) (two meanings : Average distance from line & prediction error) 1. The average distance that points are from the regression line. 2. If we use the regression line to make a prediction, the standard deviation of the residuals gives us how much average error we can expect in that prediction. Residual : How far a point is above or below the regression line. Regression Line (Line of Best Fit, Line of Least Squares) ŷ = A + Bx (OLI book) ŷ = bb 00 + bb 11 X (most stat books) bb 00 is y intercept (where line crosses y axis) starting value bb 11 is slope (average rate of change) Note: Remember in a linear equation, the number in front of X is the slope. Note: ŷ refers to the predicted y value a y value predicted by the regression line equation and not an actual y value in one of the ordered pairs in the scatterplot. Definition of Slope (bb 11 ): The amount of increase (+) or decrease ( ) in the y-variable for every 1 unit increase in the x-variable (per unit of x). Definition of Y-intercept (bb 00 ): The predicted y value when x is zero. Can also be thought of as an initial value of y.

Statcato Directions: Statistics Correlation and Regression Linear pick x and y columns Show scatterplot and residual plots OK StatCrunch Directions: Stat Regression Simple Linear pick x and y columns compute Example 1: (Health Data) Is a woman s age related to her diastolic blood pressure? Pick x and y (blood pressure responds to age, but age does not respond to blood pressure) X: (explanatory or independent variable) Woman s Age Y: (response or dependent variable) Diastolic Blood Pressure Statcato Scatterplot and Correlation/Regression Printout

The scatterplot and r-value show a strong positive correlation. (r = 0.6359) r-squared = 0.404 = 40.44% r-squared sentence: 40.4% of the variability in a woman s diastolic blood pressure in mm of Hg can be explained by the relationship with woman s age in years. Confounding Variables (influence BP)? Race, Ethnicity, stress, genetics, diet, standard deviation of residual errors (Se) = 9.0898 mm of Hg Two sentences for Se: 1. Points in scatterplot are 9.1 mm of Hg away from the regression line on average.

2. If we use the regression line to predict a woman s diastolic blood pressure from her age, we could have an average error of 9.1 mm of Hg. Slope of regression line? 0.5937 (rate of change between x and y) Slope = CCCCCCCCCCCC iiii YY CCCCCCCCCCCC iiii XX = +00.5555 mmmm oooo HHHH +11 yyyyyyyy Slope Sentence: Women s diastolic blood pressure increases 0.59 mm of Hg per year on average. Y intercept? 47.7 (predicted y value when x is zero) Y intercept sentence: When a woman is zero years old (just born) we predict the diastolic blood pressure to be 47.7 mm of Hg. Note: Predicted Y values are only accurate in the scope of the X-values in the data. Many formulas are not designed to plug in zero for x, so y-intercepts don t always make sense in context. In the previous examples the women in the data set had ages between 12 and 59. Zero is not in this scope. This formula is not designed to plug in zero for x. So the y-intercept is an extrapolation and may not be very accurate. Extrapolation: Plugging in a number into a formula that is out of the scope of the data. Plugging in a number into a formula that the formula was never designed to handle. Use the regression line to predict the diastolic blood pressure of a 50 year old woman? (Replace x with 50 and work it out) Note: 50 is in the scope of the x-values (between 12 and 59) so would not be an extrapolation. ŷ = 47.6999 + 0.5937x ŷ = 47.6999 + 0.5937 (50) ŷ = 47.6999 + 29.685 ŷ = 77.3849 77.4 mm of Hg How much error is there in that prediction? (Use Se = 9.0898!!) The prediction could have an average error of about 9.1 mm of Hg.