Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

Similar documents
Announcements. Lecture 18: Simple Linear Regression. Poverty vs. HS graduate rate

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Lecture 16 - Correlation and Regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Unit 6 - Simple linear regression

AMS 7 Correlation and Regression Lecture 8

Lecture 19: Inference for SLR & Transformations

Stat 101: Lecture 6. Summer 2006

7.0 Lesson Plan. Regression. Residuals

Lecture 18: Simple Linear Regression

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 7. Scatterplots, Association, and Correlation

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

2. Outliers and inference for regression

Review. Midterm Exam. Midterm Review. May 6th, 2015 AMS-UCSC. Spring Session 1 (Midterm Review) AMS-5 May 6th, / 24

appstats8.notebook October 11, 2016

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Unit 6 - Introduction to linear regression

Business Statistics. Lecture 9: Simple Regression

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Basic Business Statistics 6 th Edition

Linear Regression and Correlation. February 11, 2009

Chapter 3: Describing Relationships

Sample Statistics 5021 First Midterm Examination with solutions

Chapter 2: Looking at Data Relationships (Part 3)

Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

Simple Linear Regression for the Climate Data

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Summarizing Data: Paired Quantitative Data

Chapter 5 Friday, May 21st

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships

Simple Linear Regression for the MPG Data

Chapter 3: Describing Relationships

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Stat 20 Midterm 1 Review

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Warm-up Using the given data Create a scatterplot Find the regression line

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture 20: Multiple linear regression

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

3.2: Least Squares Regressions

Simple Linear Regression

Lecture 8 CORRELATION AND LINEAR REGRESSION

MATH 2560 C F03 Elementary Statistics I LECTURE 9: Least-Squares Regression Line and Equation

Looking at Data Relationships. 2.1 Scatterplots W. H. Freeman and Company

Intro to Linear Regression

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Intro to Linear Regression

6.0 Lesson Plan. Answer Questions. Regression. Transformation. Extrapolation. Residuals

appstats27.notebook April 06, 2017

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Chapter 5 Least Squares Regression

Statistical View of Least Squares

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Math 147 Lecture Notes: Lecture 12

Bivariate data analysis

Chapter 27 Summary Inferences for Regression

LECTURE 03: LINEAR REGRESSION PT. 1. September 18, 2017 SDS 293: Machine Learning

Stat 101 Exam 1 Important Formulas and Concepts 1

Chapter 4 Describing the Relation between Two Variables

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

Statistical View of Least Squares

Objectives. 2.1 Scatterplots. Scatterplots Explanatory and response variables Interpreting scatterplots Outliers

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression

STATS DOESN T SUCK! ~ CHAPTER 16

Density Temp vs Ratio. temp

Inferences for Regression

Dr. Allen Back. Sep. 23, 2016

Scatterplots. STAT22000 Autumn 2013 Lecture 4. What to Look in a Scatter Plot? Form of an Association

Regression Models - Introduction

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Nov 13 AP STAT. 1. Check/rev HW 2. Review/recap of notes 3. HW: pg #5,7,8,9,11 and read/notes pg smartboad notes ch 3.

INFERENCE FOR REGRESSION

Lectures on Simple Linear Regression Stat 431, Summer 2012

LECTURE 15: SIMPLE LINEAR REGRESSION I

Correlation and Regression

Scatterplots and Correlation

ECON 497: Lecture 4 Page 1 of 1

Review for Final Exam Stat 205: Statistics for the Life Sciences

Section 3: Simple Linear Regression

Correlation and Regression Notes. Categorical / Categorical Relationship (Chi-Squared Independence Test)

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Applied Regression Analysis

Inference with Simple Regression

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Announcements. Lecture 5 - Continuous Distributions. Discrete Probability Distributions. Quiz 1

Lecture 4: Multivariate Regression, Part 2

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

The Simple Linear Regression Model

The Simple Regression Model. Part II. The Simple Regression Model

Single and multiple linear regression analysis

Simple Linear Regression for the Advertising Data

Transcription:

Announcements Announcements Lecture : Relationship between Measurement Variables Statistics Colin Rundel February, 20 In class Quiz #2 at the end of class Midterm #1 on Friday, in class review Wednesday Today s material: Probably will not finish today Will not be on Midterm #1 Statistics (Colin Rundel) Lecture February, 20 2 / 35 Intro to Regression Intro to Regression Poverty vs. HS graduate rate Response vs. explanatory The scatterplot below shows the relationship between HS graduate rate in all 50 states and the District of Columbia vs the % of residents who live below the poverty line (< $22,350 for a family of 4). 1 1 Statistics (Colin Rundel) Lecture February, 20 3 / 35 What is the response and explanatory variable for this data? 1 1 Statistics (Colin Rundel) Lecture February, 20 4 / 35

Intro to Regression Residuals Eyeballing the line Residuals Which of the following appears to be the line that best fits the linear relationship between and? 1 1 Statistics (Colin Rundel) Lecture February, 20 5 / 35 (a) (b) (c) (d) Residuals are the leftovers from the model fit: Data = Fit + Residual 1 1 Statistics (Colin Rundel) Lecture February, 20 / 35 Residuals Residuals (cont.) Describing the relationship Residual Residual is the difference between the observed and predicted y. 1 1 y^ 4.1 y RI y 5.44 e i = y i ŷ i y^ DC % living in poverty in DC is 5.44% more than predicted. % living in poverty in RI is 4.1% less than predicted. Statistics (Colin Rundel) Lecture February, 20 7 / 35 What to include Shape, Direction, and Strength 1 1 How would you describe the relationship between and? Statistics (Colin Rundel) Lecture February, 20 / 35

Quantifying the relationship Guessing the correlation describes the strength and direction of the linear relationship between two variables. It takes values between -1 (perfect negative relationship) and +1 (perfect positive relationship). A value of 0 indicates no relationship. Which of the following is the best guess for the correlation between % in poverty and? 1 1 (a) 0. (b) -0.75 (c) -0.1 (d) 0.02 (e) -1.5 Statistics (Colin Rundel) Lecture February, 20 9 / 35 Statistics (Colin Rundel) Lecture February, 20 / 35 Guessing the correlation Calculating the correlation Which of the following is the best guess for the correlation between % in poverty and? 1 1 1 1 % female householder, no husband present (a) 0.1 (b) -0. (c) -0.4 (d) 0.9 (e) 0.5 Using computation: cor(poverty$poverty, poverty$graduates) Using a formula: R = 1 n 1 n ( ) ( ) xi x yi ȳ i=1 Note: You won t be asked you to calculate the correlation coefficient by hand, because nobody does it by hand. But you might be given a scatterplot and asked to guess the correlation. s x s y Statistics (Colin Rundel) Lecture February, 20 11 / 35 Statistics (Colin Rundel) Lecture February, 20 / 35

Assessing the correlation Play the game! Which of the following is has the strongest correlation, i.e. coefficient closest to +1 or -1? correlation http:// istics.net/ stat/ correlations/ (a) (b) (c) (d) Statistics (Colin Rundel) Lecture February, 20 13 / 35 Statistics (Colin Rundel) Lecture February, 20 / 35 Best line Best line A measure for the best line Why minimize squares? We want a line that has small residuals One option: Minimize the sum of magnitudes (absolute values) of residuals e 1 + e 2 + + e n Another option: Minimize the sum of squared residuals e 2 1 + e 2 2 + + e 2 n The line that minimizes the sum of squared residuals is the least squares line 1 Most commonly used 2 Easier to compute by hand and using software 3 In many applications, a residual twice as large as another is more than twice as bad Statistics (Colin Rundel) Lecture February, 20 15 / 35 Statistics (Colin Rundel) Lecture February, 20 1 / 35

The least squares line Given... predicted y Notation: Intercept: Parameter: β 0 Point estimate: b 0 Slope: Parameter: β 1 Point estimate: b 1 intercept ŷ = β 0 + β 1 x slope explanatory variable 1 1 (x) (y) mean x =.01 ȳ = 11.35 sd s x = 3.73 s y = 3.1 correlation R = 0.75 Statistics (Colin Rundel) Lecture February, 20 17 / 35 Statistics (Colin Rundel) Lecture February, 20 1 / 35 Slope Intercept Slope The slope of the regression can be calculated as In context... b 1 = s y s x R b 1 = 3.1 0.75 = 0.2 3.73 Interpretation For each % point increase in HS graduate rate, we would expect the % living in poverty to decrease on average by 0.2% points. Intercept The intercept is where the regression line intersects the y-axis. The calculation of the intercept uses the fact the a regression line always passes through ( x, ȳ). ȳ = b 0 + b 1 x 70 0 50 40 30 20 0 intercept 0 20 40 0 0 0 ȳ = b 0 + b 1 x b 0 = 11.35 ( 0.2).01 = 4. Statistics (Colin Rundel) Lecture February, 20 19 / 35 Statistics (Colin Rundel) Lecture February, 20 20 / 35

Interpreting regression line parameter estimates Regression line Interpretation of slope and intercept 1 1 = 4. 0.2 Statistics (Colin Rundel) Lecture February, 20 21 / 35 Intercept: When x = 0, y is expected to equal the intercept. Slope: For each unit increase in x, y is expected to increase/decrease on average by the slope. Statistics (Colin Rundel) Lecture February, 20 22 / 35 Examples of extrapolation Applying a model estimate to values outside of the realm of the original data is called extrapolation. Sometimes the intercept might be an extrapolation. 70 0 50 40 30 20 0 intercept 0 20 40 0 0 0 Statistics (Colin Rundel) Lecture February, 20 23 / 35 Statistics (Colin Rundel) Lecture February, 20 24 / 35

Examples of extrapolation Examples of extrapolation Statistics (Colin Rundel) Lecture February, 20 25 / 35 Statistics (Colin Rundel) Lecture February, 20 2 / 35 Conditions: (1) Linearity 1 Linearity: The relationship between the explanatory and the response variable should be linear. Methods for fitting a model to non-linear relationships exist, but are beyond the scope of this class. Check using a scatterplot of the data, or a residuals plot. 2 Nearly normal residuals: 3 Constant variability: Statistics (Colin Rundel) Lecture February, 20 27 / 35 Statistics (Colin Rundel) Lecture February, 20 2 / 35

Residuals plot Conditions: (2) Nearly normal residuals 15 RI: = 1 =.3 = 4. 0.2 1 =.4 e = % in poverty =.3.4 = 4.1 The residuals should be nearly normal. This condition may not be satisfied when there are unusual observations that don t follow the trend of the rest of the data. Check using a histogram or normal probability plot of residuals. Normal Q Q Plot 5 5 0 DC: = = 1. = 4. 0.2 = 11.3 e = % in poverty = 1. 11.3 = 5.44 frequency 0 2 4 Sample Quantiles 4 2 0 2 4 5 4 2 0 2 4 residuals 2 1 0 1 2 Theoretical Quantiles Statistics (Colin Rundel) Lecture February, 20 29 / 35 Statistics (Colin Rundel) Lecture February, 20 30 / 35 Conditions: (3) Constant variability Checking conditions What condition is this linear model obviously violating? The variability of points around the least squares line should be roughly constant. This implies that the variability of residuals around the 0 line should be roughly constant as well. Also called homoscedasticity. Check using a histogram or normal probability plot of residuals. 1 1 4 0 4 0 90 Statistics (Colin Rundel) Lecture February, 20 31 / 35 Statistics (Colin Rundel) Lecture February, 20 32 / 35

R 2 Checking conditions R 2 What condition is this linear model obviously violating? The strength of the fit of a linear model is most commonly evaluated using R 2. R 2 is calculated as the square of the correlation coefficient. It tells us what percent of variability in the response variable is explained by the model. The remainder of the variability is explained by variables not included in the model. For the model we ve been working with, R 2 = 0.2 2 = 0.3. Statistics (Colin Rundel) Lecture February, 20 33 / 35 Statistics (Colin Rundel) Lecture February, 20 34 / 35 R 2 Interpretation of R 2 Which of the below is the correct interpretation of R = 0.2, R 2 = 0.3? Statistics (Colin Rundel) Lecture February, 20 35 / 35