AMS 7 Correlation and Regression Lecture 8

Similar documents
Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Unit 6 - Introduction to linear regression

Unit 6 - Simple linear regression

7.0 Lesson Plan. Regression. Residuals

THE PEARSON CORRELATION COEFFICIENT

Homework for Lecture Regression Analysis Sections

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Business Statistics. Lecture 10: Correlation and Linear Regression

Inferences for Regression

11 Correlation and Regression

Chapter 7. Scatterplots, Association, and Correlation

Chapter 5 Least Squares Regression

Lecture 18: Simple Linear Regression

Simple Linear Regression

Business Statistics. Lecture 9: Simple Regression

Can you tell the relationship between students SAT scores and their college grades?

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Inference for Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Introduction and Single Predictor Regression. Correlation

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Chapter 27 Summary Inferences for Regression

Correlation & Simple Regression

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Chapter 6: Exploring Data: Relationships Lesson Plan

Inference with Simple Regression

Statistical View of Least Squares

Chapter 4 Describing the Relation between Two Variables

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Lecture 30. DATA 8 Summer Regression Inference

Linear Regression and Correlation. February 11, 2009

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Data Analysis and Statistical Methods Statistics 651

Stat 101: Lecture 6. Summer 2006

ST Correlation and Regression

Chapter 12 - Part I: Correlation Analysis

Math 3330: Solution to midterm Exam

Sociology 6Z03 Review II

appstats27.notebook April 06, 2017

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Density Temp vs Ratio. temp

Chapter 16. Simple Linear Regression and dcorrelation

Warm-up Using the given data Create a scatterplot Find the regression line

Linear Regression with one Regressor

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

9. Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Chapter 14 Simple Linear Regression (A)

Lectures on Simple Linear Regression Stat 431, Summer 2012

Quoting from the document I suggested you read ( westfall/images/5349/practiceproblems_discussion.

Business Statistics. Lecture 10: Course Review

Exam Applied Statistical Regression. Good Luck!

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Statistics Introductory Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Scatter plot of data from the study. Linear Regression

Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Correlation and Regression (Excel 2007)

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

determine whether or not this relationship is.

INFERENCE FOR REGRESSION

Final Exam. Name: Solution:

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

ST430 Exam 1 with Answers

28. SIMPLE LINEAR REGRESSION III

Correlation 1. December 4, HMS, 2017, v1.1

Lecture 11: Simple Linear Regression

2 Regression Analysis

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Question Possible Points Score Total 100

Scatter plot of data from the study. Linear Regression

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Chapter 12 : Linear Correlation and Linear Regression

6.0 Lesson Plan. Answer Questions. Regression. Transformation. Extrapolation. Residuals

y n 1 ( x i x )( y y i n 1 i y 2

MS&E 226: Small Data

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Ordinary Least Squares Regression Explained: Vartanian

Inference for the Regression Coefficient

ISQS 5349 Final Exam, Spring 2017.

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Simple and Multiple Linear Regression

Correlation. What Is Correlation? Why Correlations Are Used

Simple Linear Regression

Regression Analysis: Exploring relationships between variables. Stat 251

Transcription:

AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18

Correlation pairs of continuous observations. Correlation exists between two variables when one of them is related to the other in some way. e.g. height and weight of people, temperature and altitude, quiz scores and midterm score Query 1: Do the two variables change together? Query 2: Can changes in one variable predict the changes in the other? Query 3: How do we measure the strength of the relationship between two quantitative variables? 2 / 18

Scatterplot: a graph of the paired sample data. 3 / 18

The linear correlation coefficient, r, measures the linear association between two variables. Properties 1 r 1 It does not change if we change the scale of the measurement. It is sensitive to outliers. You need to understand the concept, but you don t need to know the formula or how to test on it. If r = 1, there is a perfect positive linear relationship. If r = 1, there is perfect negative linear relationship. If r = 0, there is no relationship. 4 / 18

5 / 18

Correlation is not causation!!! Coefficient of determination, r 2 : Gives the proportion of the variation in variable 1 that is explained by the linear association between the two variables. 0 r 2 1 0 indicates no linear relationship while 1 indicates a perfect linear relationship. 6 / 18

Review of lines y = 1 + 2x slope = 2 = y x for each one unit change in x, y changes by 2 units. intercept = 1 : value of y when x = 0 y = b 0 + b 1 x, where b 0 is the intercept and b 1 is the slope. 7 / 18

8 / 18

Linear Regression Fitting a line to data - to model the relationship between two quantitative variables. Lots of lines can be fit to the data - which do we choose? Fitted line = regression line = least squares line Fitted values = predicted values = values predicted by the line for a particular value of x 9 / 18

Example: The fitted line is: ŷ = 1 + 1 2 x The fitted values would be : x = 1, ŷ = 1 + 1 2 = 3 2 x = 2, ŷ = 1 + 1 2 2 = 2 x = 3, ŷ = 1 + 1 2 3 = 5 2 x = 4, ŷ = 1 + 1 2 4 = 3 10 / 18

Regression is the predicting of Y from X assuming a linear relationship. X and Y are not treated the same. We are predicting Y from X! The regression line (least-squares line) is the one that minimizes the sum of squared errors in predicting Y (sum of squared residuals): b 0 and b 1 are chosen to minimize n n (ŷ i y i ) 2 = (b 0 +b 1 x i y i ) 2 i=1 i=1 This is always goes through ( x, ȳ) Note that this does not minimize the distance (perpendicular) to the line. 11 / 18

You don t need to know how to compute b 0 and b 1 by hand. You will need to know how to interpret JMP output, compute predicted values, and do hypothesis tests with JMP. Some data examples 12 / 18

How good is a regression model? Statistical significance - test if β 1 = 0 Practical significance - r 2 Check model assumptions - residual plots 13 / 18

Hypothesis Testing for Regression The model is y = β 0 + β 1 x, where β 0 and β 1 are population parameters. If there is a linear relationship between x and y then β 1 0. This is a t-test with n 2 degrees of freedom. 1. H 0 : β 1 = 0 vs. H 1 : β 1 0 2. Level of significance α = 0.05 3. Test statistic: t = b 1 0 (sampling distribution under H 0 is t with n 2 df) s b1 4. Compute t and its p-value with JMP 5. Reject if p-value < 0.05 6. Draw conclusions about linear relationship 14 / 18

r 2 = square of correlation between x and y = % of variability in y is explained by predicting from x n i=1 = (ŷ i ȳ) 2 n i=1 (y i ȳ) 2 = explained variation total variation Recall that s 2 y = 1 n 1 0 r 2 1 n (y i ȳ) 2 i=1 Gives a measure for practical significance 15 / 18

Model assumptions: 1. y is normally distributed with mean β 0 + β 1 x and standard deviation σ. 2. The relationship between x and y is linear. 3. σ is the same for all observations. 4. The observation (x i, y i ) is independent of (x j, y j ) (conditional on β 0, β 1 ) How do we check these? Hypothesis test for (1) and (2). Residual analysis for (2), (3) and (4). 16 / 18

Residuals: e i = y i ŷ i Plot x i vs. e i or ŷ i vs. e i (BUT not y i vs. e i, which are correlated) Make sure there are no patterns in the plot check for non-linearity check for change in variability (heteroscedasticity) Patterns indicate violations of assumptions! Prediction is valid only when statistically significant and no problems with residuals. Prediction interval: a confidence interval for a predicted value - get from JMP. 17 / 18

Correlation Key Concepts!!!!! Slope and Intercept Fitted vs. Predicted values Test for linear relationship r 2 Residual analysis 18 / 18