Section Least Squares Regression

Similar documents
sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

ECO220Y Simple Regression: Testing the Slope

Interpreting coefficients for transformed variables

sociology 362 regression

sociology 362 regression

Problem Set 1 ANSWERS

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Statistical Modelling in Stata 5: Linear Models

ECON3150/4150 Spring 2016

Lecture 11: Simple Linear Regression

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

STATISTICS 110/201 PRACTICE FINAL EXAM

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Lecture 4: Multivariate Regression, Part 2

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

ECON3150/4150 Spring 2015

1 Independent Practice: Hypothesis tests for one parameter:

The Classical Linear Regression Model

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Simple Linear Regression Using Ordinary Least Squares

Empirical Application of Simple Regression (Chapter 2)

Correlation and regression. Correlation and regression analysis. Measures of association. Why bother? Positive linear relationship

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Business Statistics. Lecture 9: Simple Regression

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Essential of Simple regression

Ch 2: Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Chapter 4. Regression Models. Learning Objectives

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

Problem Set 10: Panel Data

BNAD 276 Lecture 10 Simple Linear Regression Model

Regression Models. Chapter 4

Lab 6 - Simple Regression

Greene, Econometric Analysis (7th ed, 2012)

General Linear Model (Chapter 4)

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Simple Linear Regression for the MPG Data

Correlation and Simple Linear Regression

Lecture 4: Multivariate Regression, Part 2

1 The basics of panel data

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Lab 10 - Binary Variables

Week 3: Simple Linear Regression

ECON3150/4150 Spring 2016

Basic Business Statistics 6 th Edition

Model Selection Procedures

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

1 A Review of Correlation and Regression

BIOSTATISTICS NURS 3324

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Correlation Analysis

THE MULTIVARIATE LINEAR REGRESSION MODEL

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Statistical Inference with Regression Analysis

Lecture 3: Multivariate Regression

Ordinary Least Squares (OLS): Multiple Linear Regression (MLR) Analytics What s New? Not Much!

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Simple Linear Regression

Lecture#12. Instrumental variables regression Causal parameters III

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

At this point, if you ve done everything correctly, you should have data that looks something like:

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Confidence Interval for the mean response

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Topic 10 - Linear Regression

Lecture 24: Partial correlation, multiple regression, and correlation

Inference for Regression Inference about the Regression Model and Using the Regression Line

Specification Error: Omitted and Extraneous Variables

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Overview Scatter Plot Example

ECON 497 Final Exam Page 1 of 12

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Applied Statistics and Econometrics

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

STOR 455 STATISTICAL METHODS I

STAT Chapter 11: Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Monday 7 th Febraury 2005

Inference for Regression

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

CS 5014: Research Methods in Computer Science

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Chapter 4: Regression Models

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Intro to Linear Regression

Lecture 5. In the last lecture, we covered. This lecture introduces you to

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Practice exam questions

Autoregressive models with distributed lags (ADL)

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

Simple Linear Regression Analysis

Lab 07 Introduction to Econometrics

Transcription:

Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin

Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it is. A regression line is one approach to give that relationship. A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. The regression line is often used to predict the value of y for a given value of x. Regression requires an explanatory variable and a response variable. Example: Car Weight and Highway MPG MP G = a + bw eight Section 2.3 - Least Squares Regression 1

How to pick a good line, as there are main possible ones. Highway MPG 20 25 30 35 40 45 50 2000 2500 3000 3500 4000 Car Weight Section 2.3 - Least Squares Regression 2

Want a rule to do it - we will use one known as least squares. Least Squares: Data (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) Pick a line y = a + bx (e.g. pick intercept a and slope b) Predicted values ŷ i = a + bx i Residual e i = y i ŷ i = y i (a + bx i ) The residual measures how well a line describes each data point. We will like to have each of the e i as small as possible. Section 2.3 - Least Squares Regression 3

Highway MPG 20 25 30 35 40 45 50 y^i e i = y i y^i 2000 2500 3000 3500 4000 Car Weight Section 2.3 - Least Squares Regression 4

Trying to deal with n residuals separately can get difficult quickly. We need a single measure describing the fit of each possible line. One possible measure is the Error Sums of Squares (SSE) (or Residual Sums of Squares) SSE = n e 2 i = i=1 n (y i ŷ i ) 2 = i=1 n (y i a bx i ) 2 The least squares criterion picks a and b to make this as small as possible. The least squares solution is i=1 b = (xi x)(y i ȳ) (xi x) 2 a = ȳ b x Can get these very easily from Stata Section 2.3 - Least Squares Regression 5

. regress HighMPG Weight Source SS df MS Number of obs = 93 ---------+------------------------------ F( 1, 91) = 174.43 Model 1718.69528 1 1718.69528 Prob > F = 0.0000 Residual 896.616546 91 9.85292907 R-squared = 0.6572 ---------+------------------------------ Adj R-squared = 0.6534 Total 2615.31183 92 28.4273025 Root MSE = 3.1389 -------------------------------------------------------------------------- HighMPG Coef. Std. Err. t P> t [95% Conf. Interval] ---------+---------------------------------------------------------------- Weight -.0073271.0005548-13.21 0.000 -.008429 -.0062251 _cons 51.60137 1.73555 29.73 0.000 48.15391 55.04882 -------------------------------------------------------------------------- The line for Weight gives information about b and the line for cons gives information about a. The column Coef. gives the estimates for the slope b and the intercept a. We will talk about the other output during the term. Section 2.3 - Least Squares Regression 6

Least Squares Fit for Weight vs Highway MPG HighMPG = 51.6014 0.007327 Weight Highway MPG 20 30 40 50 1500 2000 2500 3000 3500 4000 Car Weight Section 2.3 - Least Squares Regression 7

In this example, b = 0.007327 a = 51.6014 MP G = 51.6014 0.007327 W eight Interpreting a and b b (slope): Expected change in response when x increases by 1. So for this example, for each 1 lb increase, expect the Highway MPG to go down by 0.007327. So for a 100 lbs increase, expect the MPG to do down by 0.7327. a (intercept): Expected (mean) value of y when x = 0. This doesn t always make sense. For this example, the idea implies that an average car with 0 weight will get 51.6 MPG. Anybody have a zero weight car? Section 2.3 - Least Squares Regression 8

Relationship between regression and correlation The formula for the slope b can also be written as b = r s y s x So another way of thinking of b is that a change of one standard deviation in x, will lead to an expected change in y of r standard deviations. Since 1 r 1, the expected magnitude of the change in y will be less than the change in magnitude of x. The least squares line can also be written as ŷ = ȳ + r s y s x (x x) One thing that this formula implies is that the regression line must go through the point of means ( x, ȳ). Section 2.3 - Least Squares Regression 9

Regressing x on y and y on x While there is no ordering of variables when it comes to correlation (r xy = r yx ), it matters in regression. The lines corresponding to the two regressions y = a + bx x = c + dy are not the same. In fact it can be shown that bd = r 2 1. If they were the same relationship, this product would have to be one. Highway MPG 20 25 30 35 40 45 50 2000 2500 3000 3500 4000 Car Weight MPG(x) on Weight(y) Weight(x) on MPG(y) Section 2.3 - Least Squares Regression 10

r 2 in regression There is another tie-in between regression and correlation. The square of the correlation coefficient, r 2, is the fraction of variation in the values of y that can be explained by the least squares regression of y on x. r 2 = variance of ŷ i s variance of y i s 1 n 1 (ŷi ȳ) 2 = 1 (yi ȳ) 2 = n 1 (ŷi ȳ) 2 (yi ȳ) 2 = SSReg SST ot Highway MPG 20 25 30 35 40 45 50 y^i y y i y y 2000 2500 3000 3500 4000 Car Weight Section 2.3 - Least Squares Regression 11

. regress HighMPG Weight Source SS df MS Number of obs = 93 ---------+------------------------------ F( 1, 91) = 174.43 Model 1718.69528 1 1718.69528 Prob > F = 0.0000 Residual 896.616546 91 9.85292907 R-squared = 0.6572 ---------+------------------------------ Adj R-squared = 0.6534 Total 2615.31183 92 28.4273025 Root MSE = 3.1389 -------------------------------------------------------------------------- HighMPG Coef. Std. Err. t P> t [95% Conf. Interval] ---------+---------------------------------------------------------------- Weight -.0073271.0005548-13.21 0.000 -.008429 -.0062251 _cons 51.60137 1.73555 29.73 0.000 48.15391 55.04882 -------------------------------------------------------------------------- So for this example, the weight of the cars explains 65.3% of the variability in the highway MPG observations Section 2.3 - Least Squares Regression 12

Why least squares? Mathematically tractable - optimizing a quadratic is easy It has good statistical properties (e.g. small variances for predictions under certain modelling assumptions - Gauss/Markov Theorem) An alternative is to minimize yi ŷ i = y i a bx i This is a difficult function to optimize since the function x is not differentiable at x = 0. While this is more difficult to do, due to increased computing power, some people fit linear models this way (L1 regression). Tying this to univariate ideas, least squares is like finding means, and L1 regression is like finding medians. Section 2.3 - Least Squares Regression 13