Statistical View of Least Squares

Similar documents
Statistical View of Least Squares

Simple Linear Regression

7.0 Lesson Plan. Regression. Residuals

AMS 7 Correlation and Regression Lecture 8

Data Analysis and Statistical Methods Statistics 651

Business Statistics. Lecture 10: Correlation and Linear Regression

MATH 2560 C F03 Elementary Statistics I LECTURE 9: Least-Squares Regression Line and Equation

Business Statistics. Lecture 9: Simple Regression

Chapter 6: Exploring Data: Relationships Lesson Plan

Inference for Regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Introduction to Simple Linear Regression

Chapter 10 Correlation and Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Scatter plot of data from the study. Linear Regression

Density Temp vs Ratio. temp

Unit 6 - Simple linear regression

Introduction and Single Predictor Regression. Correlation

Scatter plot of data from the study. Linear Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Linear Regression and Correlation. February 11, 2009

MATH11400 Statistics Homepage

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Important note: Transcripts are not substitutes for textbook assignments. 1

11 Correlation and Regression

Analysis of Bivariate Data

MATH 1150 Chapter 2 Notation and Terminology

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Lecture 48 Sections Mon, Nov 16, 2009

Unit 6 - Introduction to linear regression

Notes 6. Basic Stats Procedures part II

Lectures on Simple Linear Regression Stat 431, Summer 2012

Measuring the fit of the model - SSR

INFERENCE FOR REGRESSION

STAT Regression Methods

2. Outliers and inference for regression

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Simple Linear Regression for the MPG Data

The Simple Linear Regression Model

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

Single and multiple linear regression analysis

Simple Linear Regression

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Simple Linear Regression

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line

Chapter 3: Examining Relationships

STAT5044: Regression and Anova. Inyoung Kim

Lecture 6: Linear Regression

Lecture 19: Inference for SLR & Transformations

Correlation & Simple Regression

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Section 3: Simple Linear Regression

Well-developed and understood properties

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Chapter 3: Describing Relationships

ECON 497 Midterm Spring

Intro to Linear Regression

Multiple Linear Regression for the Supervisor Data

Inference for Regression Simple Linear Regression

Ch Inference for Linear Regression

Approximate Linear Relationships

Intro to Linear Regression

THE PEARSON CORRELATION COEFFICIENT

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Inference with Simple Regression

Correlation: basic properties.

Determine is the equation of the LSRL. Determine is the equation of the LSRL of Customers in line and seconds to check out.. Chapter 3, Section 2

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

HUDM4122 Probability and Statistical Inference. February 2, 2015

Chapter 12: Linear regression II

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Bivariate data analysis

R 2 and F -Tests and ANOVA

Regression - Modeling a response

ST Correlation and Regression

appstats8.notebook October 11, 2016

Statistics II Exercises Chapter 5

Chapter 9. Correlation and Regression

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Inferences for Regression

Lecture 18: Simple Linear Regression

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Chapter 4 Describing the Relation between Two Variables

Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Lecture 6: Linear Regression (continued)

CHAPTER 4 DESCRIPTIVE MEASURES IN REGRESSION AND CORRELATION

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

ECON3150/4150 Spring 2015

7. Do not estimate values for y using x-values outside the limits of the data given. This is called extrapolation and is not reliable.

Simple Regression Model. January 24, 2011

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Transcription:

May 23, 2006

Purpose of Regression Some Examples Least Squares

Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y

Purpose of Regression Some Examples Least Squares Purpose of Regression Suppose we have two variables x and y To explore the relationship between two variables

Purpose of Regression Some Examples Least Squares Purpose of Regression Suppose we have two variables x and y To explore the relationship between two variables To predict the value of one variable as the other changes

Purpose of Regression Some Examples Least Squares Purpose of Regression Suppose we have two variables x and y To explore the relationship between two variables To predict the value of one variable as the other changes Identify unusual observations

Some Examples Outline Purpose of Regression Some Examples Least Squares example 1. Income and Education

Purpose of Regression Some Examples Least Squares Some Examples example 1. Income and Education example 2. Icecream consumption and temperature

Purpose of Regression Some Examples Least Squares Some Examples example 1. Income and Education example 2. Icecream consumption and temperature example 3. ozone levels and air pollution in New York City

Purpose of Regression Some Examples Least Squares Some Examples example 1. Income and Education example 2. Icecream consumption and temperature example 3. ozone levels and air pollution in New York City Usually we denote the independent or explanatory variable by x and the dependent or response variable by y

Purpose of Regression Some Examples Least Squares Some Examples example 1. Income and Education example 2. Icecream consumption and temperature example 3. ozone levels and air pollution in New York City Usually we denote the independent or explanatory variable by x and the dependent or response variable by y Can you identify the x and the y here?

Purpose of Regression Some Examples Least Squares Exact Relationship: y = 2+x y 2.0 2.2 2.4 2.6 2.8 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x

Purpose of Regression Some Examples Least Squares Usually for real data there is no exact relationship Hence no single line will pass through all the points in the scatterplot

Purpose of Regression Some Examples Least Squares No Exact Relationship y 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x

Purpose of Regression Some Examples Least Squares No Exact Relationship y 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 x

Purpose of Regression Some Examples Least Squares Least Squares Different people might draw different lines using eye estimation We need to draw a line that doesn t depend on our guess error = observed y - predicted y ê i = y i ŷ i (1)

Purpose of Regression Some Examples Least Squares Least Squares Different people might draw different lines using eye estimation We need to draw a line that doesn t depend on our guess error = observed y - predicted y We want to minimize the aggregate error ê i = y i ŷ i (1)

Purpose of Regression Some Examples Least Squares Least Squares Different people might draw different lines using eye estimation We need to draw a line that doesn t depend on our guess error = observed y - predicted y We want to minimize the aggregate error One common way: Least Squares to minimize the sum of squared errors ê i = y i ŷ i (1)

Mathematical Representation Suppose we have n observations y i = β 0 + β 1 x i + ɛ i (2) y i : value of the response for the i th observation x i : value of the predictor variable for the i th observation β 0 : intercept β 1 : slope ɛ i : random error term corresponding to i th observation

Estimation Using Least Squares We want to minimize the sum of squared errors E = n (y i β 0 β 1 x i ) 2 (3) i=1 We want to find that pair of(β 0, β 1 ) for which E is smallest

Estimation Using Least Squares We want to minimize the sum of squared errors E = n (y i β 0 β 1 x i ) 2 (3) i=1 We want to find that pair of(β 0, β 1 ) for which E is smallest we ll call them ˆβ 0 ˆβ1

Estimation Using Least Squares We want to minimize the sum of squared errors E = n (y i β 0 β 1 x i ) 2 (3) i=1 We want to find that pair of(β 0, β 1 ) for which E is smallest we ll call them ˆβ 0 ˆβ1 Let s find them analytically

Normal Equations Outline To find ˆβ 0 ˆβ1 we take the partial derivatives of E with respect to β 0 and β 1 and get the following equations called normal equations: δe δβ 0 = 0 (4) δe δβ 1 = 0 (5) problem 1. Solve these two equations simultaneously to obtain ˆβ 0 ˆβ 1

Assumptions made for doing Statistical Inference Now once we have got the estimates we may want to see how good they are that is how close are our estimates and the true values

Assumptions made for doing Statistical Inference Now once we have got the estimates we may want to see how good they are that is how close are our estimates and the true values In order to do that we have to make some more assumptions

Assumptions made for doing Statistical Inference Now once we have got the estimates we may want to see how good they are that is how close are our estimates and the true values In order to do that we have to make some more assumptions Remember our model was y i = β 0 + β 1 x i + ɛ i (6)

Assumptions made for doing Statistical Inference Now once we have got the estimates we may want to see how good they are that is how close are our estimates and the true values In order to do that we have to make some more assumptions Remember our model was We now assume ɛ i N(0, σ 2 ) y i = β 0 + β 1 x i + ɛ i (6)

Assumptions made for doing Statistical Inference Now once we have got the estimates we may want to see how good they are that is how close are our estimates and the true values In order to do that we have to make some more assumptions Remember our model was We now assume ɛ i N(0, σ 2 ) y i = β 0 + β 1 x i + ɛ i (6) we also assume ɛ i and ɛ j are independent for i j

Checking Model Assumptions Always do some exploratory data analysis before making inference

Checking Model Assumptions Always do some exploratory data analysis before making inference The results will not make sense if the assumptions are terribly violated

Checking Model Assumptions Always do some exploratory data analysis before making inference The results will not make sense if the assumptions are terribly violated ŷ i s are called fitted values ŷ i = ˆβ 0 + ˆβ 1 x i (7)

Checking Model Assumptions Always do some exploratory data analysis before making inference The results will not make sense if the assumptions are terribly violated ŷ i s are called fitted values ŷ i = ˆβ 0 + ˆβ 1 x i (7) ê i = y i ˆβ 0 + ˆβ 1 x i (8) ê i s are called residuals

Checking Model Assumptions Always do some exploratory data analysis before making inference The results will not make sense if the assumptions are terribly violated ŷ i s are called fitted values ŷ i = ˆβ 0 + ˆβ 1 x i (7) ê i = y i ˆβ 0 + ˆβ 1 x i (8) ê i s are called residuals The residuals are used to check if any of the model assumptions have been violated

Checking Model Assumptions We will look at the plot of the residuals vs. the fitted values

Checking Model Assumptions We will look at the plot of the residuals vs. the fitted values If the assumption of independence really holds we would expect the residuals to fluctuate in a random pattern around 0

Checking Model Assumptions We will look at the plot of the residuals vs. the fitted values If the assumption of independence really holds we would expect the residuals to fluctuate in a random pattern around 0 A curved pattern may indicate that the relationship between y and x is not linear

Residual Plot Where y is a Quadratic Function of x Residuals 0.2 0.1 0.0 0.1 0.2 0.3 1.0 1.2 1.4 1.6 1.8 Fitted Values

Residual Plot After Making the Explanatory Variable x^2 Residuals 0.2 0.1 0.0 0.1 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 Fitted Values

Checking Model Assumptions If there s some distinct pattern like a sinusoidal curve we ll start worrying that they are correlated

Checking Model Assumptions If there s some distinct pattern like a sinusoidal curve we ll start worrying that they are correlated To check whether the assumption of constant variance holds we also look at the residual plots

Residual Plot Where the Observations are Correlated Residuals 10 0 10 20 30 40 0 10 20 30 40 50 Observation Number

Residual Plot for Non constant Variance Residuals 5 0 5 2.0 2.5 3.0 3.5 4.0 4.5 Fitted Values

Checking Model Assumptions If an observation lies outside the general pattern of the other observations it is called an outlier

Checking Model Assumptions If an observation lies outside the general pattern of the other observations it is called an outlier An outlier has the potential to change the Least Squares Estimates dramatically

Checking Model Assumptions If an observation lies outside the general pattern of the other observations it is called an outlier An outlier has the potential to change the Least Squares Estimates dramatically Ideally we should make a separate analysis after excluding the outlier to see how much effect it has on the regression line and report both

Scatterplot and Fitted Regression Line y 0 1 2 3 4 5 0 1 2 3 4 5 6 x

Scatterplot Without the Two Outliers y 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 x

Matrix Representation The regression model y i = β 0 + β 1 x i + ɛ i can also be written using matrix notation as: y = X β + ɛ (9) How does X look like here?

Matrix Representation The normal equations are X X β = X y (10) estimate of β : residuals: ˆβ = (X X ) 1 X y (11) e = Y ŷ = y X ˆβ (12)

Problem 2. Now we have all the tools to actually implement these using matlab Download the dataset smoke.txt Look at the scatterplot first to see if doing linear regression seems appropriate Find the least square estimates of the model parameters Look at residual plots Do the assumptions seem valid here? Look at the graph of fitted values superimposed on the observed values How does the fit look?