Simple Linear Regression for the MPG Data

Similar documents
Simple Linear Regression for the Climate Data

Multiple Linear Regression for the Supervisor Data

Simple Linear Regression for the Advertising Data

Regression Models - Introduction

Regression Models - Introduction

Multiple Linear Regression for the Salary Data

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

STAT5044: Regression and Anova. Inyoung Kim

Introduction to Logistic Regression

STAT Chapter 11: Regression

7.0 Lesson Plan. Regression. Residuals

Section Least Squares Regression

Statistics for Engineers Lecture 9 Linear Regression

Chapter 1 Linear Regression with One Predictor

Statistical View of Least Squares

Simple Linear Regression

ECON The Simple Regression Model

Correlation: basic properties.

Introduction to Logistic Regression

1. Simple Linear Regression

STA 302f16 Assignment Five 1

Statistical View of Least Squares

Dr. Allen Back. Sep. 23, 2016

Simple Linear Regression

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

AMS 7 Correlation and Regression Lecture 8

Simple Linear Regression

3 Multiple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression

Correlation & Simple Regression

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

Chapter 1. Linear Regression with One Predictor Variable

Environmental Econometrics

Multiple Linear Regression

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Lecture 6: Linear Regression

MATH 644: Regression Analysis Methods

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Introduction and Single Predictor Regression. Correlation

Inference for Regression Simple Linear Regression

Applied Statistics and Econometrics

Review of Econometrics

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Applied Econometrics (QEM)

Announcements. Lecture 10: Relationship between Measurement Variables. Poverty vs. HS graduate rate. Response vs. explanatory

1 Correlation between an independent variable and the error

WISE International Masters

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Simple Linear Regression: The Model

Statistical Methods for Data Mining

Model Checking and Improvement

Psychology 282 Lecture #3 Outline

4 Multiple Linear Regression

Summarizing Data: Paired Quantitative Data

Unit 6 - Simple linear regression

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Regression Modelling. Dr. Michael Schulzer. Centre for Clinical Epidemiology and Evaluation

Business Statistics. Lecture 10: Correlation and Linear Regression

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Lecture 3: Inference in SLR

Regression diagnostics

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

STAT 100C: Linear models

Introduction to Linear regression analysis. Part 2. Model comparisons

Inference in Regression Analysis

Lecture 9 SLR in Matrix Form

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Simple and Multiple Linear Regression

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Scatter plot of data from the study. Linear Regression

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Part 6: Multivariate Normal and Linear Models

Simple Linear Regression Estimation and Properties

Two-Variable Regression Model: The Problem of Estimation

Chapter 14. Linear least squares

Problem set 1: answers. April 6, 2018

Econometrics 1. Lecture 8: Linear Regression (2) 黄嘉平

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Ch 2: Simple Linear Regression

Lecture 16 - Correlation and Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Lecture 6: Linear models and Gauss-Markov theorem

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Unit 6 - Introduction to linear regression

Scatter plot of data from the study. Linear Regression

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Business Statistics. Lecture 9: Simple Regression

Transcription:

Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG

What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size

Exploratory Techniques 1. Scatterplots 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG 1. Form Linear? 2. Direction Positive or Negative 3. Strength 4. Outliers

Exploratory Techniques 1b. Scatterplots with smooth curve 1. Form Linear? 2. Direction Positive or Negative 3. Strength 4. Outliers 2000 2500 3000 3500 15 20 25 30 35 40 45 weight mpg

Exploratory Techniques 2. Numerical Summaries of the Data a) Covariance Cov(Y,X)= P n i=1 (y i ȳ)(x i x) n 1 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG I II III IV I and IV: (y i ȳ)(x i x) < 0 II and III: (y i ȳ)(x i x) > 0 Covariance is affected by scale of Y and X à Not interpretable Cov(Y,X)= 2304.581 Cov(Y,X) < 0 ) Neg. Rel. Cov(Y,X) > 0 ) Pos. Rel.

Exploratory Techniques 2. Numerical Summaries of the Data SD of Y SD of X b) Correlation Corr(Y,X)= 1 nx yi ȳ xi x n 1 s i=1 y s x P n i=1 = (y i ȳ)(x i x) pp n i=1 (y i ȳ) P 2 n i=1 (x i x) 2 Corr(Y,X)= 0.71 Properties: 1. 1 apple Corr(Y,X) apple 1 2. Scale invariant

Exploratory Techniques 2. Numerical Summaries of the Data b) Correlation Warning #1: Correlation is only for linear relationships! Y 25 30 35 40 45 50 Corr(Y,X)=0 4 2 0 2 4 X

Exploratory Techniques 2. Numerical Summaries of the Data b) Correlation Warning #2: Correlation is highly effected by outliers! Corr(Y,X)=0.81 Y 6 8 10 12 Y 5.5 6.0 6.5 7.0 7.5 8.0 8.5 Corr(Y,X)=0.99 4 6 8 10 12 14 X 4 6 8 10 12 14 X

What is a good model for the MPG Data? Y i iid p Y (y i ) E(y i )=f(x i1,...,x ip ) What to use for p and f?

Simple Linear Regression Model The Simple Linear Regression (SLR) Model is written as: iid y i N ( 0 + 1 x i, 2 ) Or, equivalently (how?) y i = 0 + 1 x i + i i iid N (0, A Few Notes: i : Residuals, distance to mean 2 ) 0 : Intercept Coe cient 1 : Slope Coe cient : Std. Deviation About Line 2 : Variance About Line Independent and identically distributed

Simple Linear Regression Model

Simple Linear Regression Model The Simple Linear Regression (SLR) Model is written as: y i iid N ( 0 + 1 x i, 2 ) Interpretations: 0 1 When x i (weight) is zero, the mean MPG (y) is 0 As x i (Weight) increases by 1, the mean MPG goes up by 1 For any x i (Weight), 99.7% of the MPG will be within 3 of 0 + 1 x i

Simple Linear Regression Model The Simple Linear Regression (SLR) Model is written as: y i iid N ( 0 + 1 x i, Assumptions: 1. Linear 2. Independent 3. Normal 4. Equal variance across the whole line (homoskedastic) 2 )

Fitting the SLR Model The Simple Linear Regression (SLR) Model is written as: y i iid N ( 0 + 1 x i, 2 ) What are the unknowns (parameters)? 0, 1, How do we estimate them? 1. Least squares estimation (this class) 2. Maximum likelihood estimation (take Stat 340) 3. Bayesian estimation (take Stat 451)

Fitting the SLR Model The Simple Linear Regression (SLR) Model is written as: y i iid N ( 0 + 1 x i, 2 ) Least Squares Estimation: Find ˆ0 & ˆ1 such that nx i=1 (y i ˆ0 ˆ1x i ) 2 =min 0, 1 nx (y i 0 1 x i ) 2 i=1 In other words, minimize Objective function O( 0, 1) = nx (y i 0 1 x i ) 2 i=1

DERIVE LEAST SQUARES ESTIMATES

Fitting the SLR Model The Simple Linear Regression (SLR) Model is written as: Least Squares Estimators: y i iid N ( 0 + 1 x i, ˆ0 =ȳ ˆ1 x 2 ) ˆ1 = P n i=1 (y i ȳ)(x i x) P n i=1 (x i x) 2 = Cov(Y,X) Var(X) = Corr(Y,X) s y s x

Fitting the SLR Model The Simple Linear Regression (SLR) Model is written as: Least Squares Estimators: y i iid N ( 0 + 1 x i, 2 ) Note: We don t get an estimate of maximum likelihood) we get: from least squares but (using ˆ = s Pn i=1 (y i ˆ0 n 2 ˆ1x i ) 2

2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG Fitting the SLR Model Least Squares Estimates: ˆ0 = 51.587 ˆ1 = 0.01 ˆ =4.723 How do we interpret these numbers?

Fitting the SLR Model Note: Notice that ˆ0 & ˆ1 are just estimators (guesses) at the true values 0 & 1. So we need to answer a few questions: 1. Does calculating return on average? ˆ0 & ˆ1 0 & 1 Yes this property is call unbiased Expected Value E( ˆ0) = 0 E( ˆ1) = 1 2. How accurate are the estimators? ˆ0 & ˆ1 Standard Error (How variable are the estimates) SE( ˆ0) =ˆ SE( ˆ1) = s 1 n + x 2 P (xi x) 2 ˆ pp (xi x) 2

Fitting the SLR Model Note: Notice that ˆ0 & ˆ1 are just estimators (guesses) at the true values 0 & 1. So we need to answer a few questions: 1. Does calculating return on average? ˆ0 & ˆ1 0 & 1 Yes this property is call unbiased Expected Value E( ˆ0) = 0 E( ˆ1) = 1 2. How accurate are the estimators? ˆ0 & ˆ1 Note: The Gauss Markov theorem states that ˆ0 & ˆ1will have the smallest variance among all linear unbiased estimates (BLUE = best linear unbiased estimates )

Fitting the SLR Model Note: Notice that ˆ0 & ˆ1 are just estimators (guesses) at the true values 0 & 1. So we need to answer a few questions: 1. Does calculating return on average? ˆ0 & ˆ1 0 & 1 Yes this property is call unbiased Expected Value E( ˆ0) = 0 E( ˆ1) = 1 2. How accurate are the estimators? ˆ0 & ˆ1 Corr( ˆ0, ˆ1) = x pp (xi x) 2 /n + x 2 = 0.9512

Fitting the SLR Model Note: Notice that ˆ is just an estimator (guess) of. So we need to answer a question: 1. Does calculating ˆ return on average? Yes! ˆ = s Pn i=1 (y i ˆ0 n 2 ˆ1x i ) 2 Called regression (or residual) standard error

Using the SLR Model How would we use the SLR model to predict the MPG for a weight of 3000 lbs? E(y new )=ŷ = ˆ0 + ˆ1 3000 = 51.587 0.01 3000 22.08 How do you interpret the number 22.08? 1. The mean MPG of all cars that weigh 3000 lbs is 22.08 2. The predicted value of a car that weighs 3000 lbs is 22.08.

Using the SLR Model Wait: ŷ = ˆ0 + ˆ1x is just an estimate (a guess) at what the true value is going to be. We need to know: 1. Does calculating return y on average? ŷ = ˆ0 + ˆ1x Yes: E(ŷ) =y 2. How accurate is ŷ? It depends on what you are trying to predict! SE All Cars (ŷ) =ˆ SE 1Car (ŷ) =ˆ s s 1 n (x x)2 + P (xi x) 2 1+ 1 n (x x)2 + P (xi x) 2 Recall: mean is 0 + 1 x but 1 obs. is 0 + 1 x +

Using the SLR Model How would we use the SLR model to predict the MPG for an 18 wheeler with weight 20,000lbs? ŷ = ˆ0 + ˆ1 20000 = 51.587 0.01 20000 145.08 What went wrong? You can t extrapolate (predict outside the data).

End of MPG Analysis (see webpage for R and SAS code)