Lecture 2 Linear Regression: A Model for the Mean. Sharyn O Halloran

Similar documents
Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Simple Linear Regression: A Model for the Mean. Chap 7

What to do if Assumptions are Violated?

III. Inferential Tools

ECON The Simple Regression Model

Regression Models - Introduction

Simple Linear Regression

Lecture 3: Inference in SLR

Business Statistics. Lecture 9: Simple Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

ECON3150/4150 Spring 2015

The Simple Linear Regression Model

STAT Chapter 11: Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Inference for Regression Inference about the Regression Model and Using the Regression Line

INFERENCE FOR REGRESSION

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Statistical Modelling in Stata 5: Linear Models

Unit 10: Simple Linear Regression and Correlation

Correlation Analysis

Lecture 1 Linear Regression with One Predictor Variable.p2

Business Statistics. Lecture 10: Course Review

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

9. Linear Regression and Correlation

Topic 14: Inference in Multiple Regression

Statistical View of Least Squares

General Linear Model (Chapter 4)

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Chapter 8 Heteroskedasticity

Ch 2: Simple Linear Regression

Inference for Regression Simple Linear Regression

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Lecture 14 Simple Linear Regression

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Simple linear regression

Lecture 11: Simple Linear Regression

Inference for the Regression Coefficient

ECON3150/4150 Spring 2016

AMS 7 Correlation and Regression Lecture 8

Applied Regression Analysis

Confidence Interval for the mean response

Statistics for Engineers Lecture 9 Linear Regression

Chapter 9. Correlation and Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

9 Correlation and Regression

Homework 2: Simple Linear Regression

STAT 3022 Spring 2007

Important note: Transcripts are not substitutes for textbook assignments. 1

ECON3150/4150 Spring 2016

Chapter 7: Simple linear regression

Inference for Regression

13 Simple Linear Regression

Applied Statistics and Econometrics

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chapter 13. Multiple Regression and Model Building

Correlation & Simple Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Lectures on Simple Linear Regression Stat 431, Summer 2012

1 A Review of Correlation and Regression

Regression and Models with Multiple Factors. Ch. 17, 18

Simple Linear Regression

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

Regression. Marc H. Mehlman University of New Haven

Statistical View of Least Squares

Inference with Simple Regression

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Mathematics for Economics MA course

Review of Statistics 101

Lecture 10 Multiple Linear Regression

Multiple Regression Analysis: Heteroskedasticity

Overview Scatter Plot Example

Psychology 282 Lecture #4 Outline Inferences in SLR

Section 3: Simple Linear Regression

Data Analysis and Statistical Methods Statistics 651

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Density Temp vs Ratio. temp

Review of Statistics

STK4900/ Lecture 3. Program

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

10. Alternative case influence statistics

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Inferences for Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Formal Statement of Simple Linear Regression Model

Statistics and Quantitative Analysis U4320. Segment 10 Prof. Sharyn O Halloran

CHAPTER EIGHT Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

y n 1 ( x i x )( y y i n 1 i y 2

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

Chapter 14 Simple Linear Regression (A)

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Lecture notes on Regression & SAS example demonstration

H. Diagnostic plots of residuals

Lab 11 - Heteroskedasticity

Transcription:

Lecture 2 Linear Regression: A Model for the Mean Sharyn O Halloran

Closer Look at: Linear Regression Model Least squares procedure Inferential tools Confidence and Prediction Intervals Assumptions Robustness Model checking Log transformation (of Y, X, or both) Spring 2005 2

Linear Regression: Introduction Data: (Y i, X i ) for i = 1,...,n Interest is in the probability distribution of Y as a function of X Linear Regression model: Mean of Y is a straight line function of X, plus an error term or residual Goal is to find the best fit line that minimizes the sum of the error terms Spring 2005 3

Estimated regression line Steer example (see Display 7.3, p. 177) Equation for estimated regression line: Intercept=6.98 PH 5.5 6 6.5 7.73 Fitted line 1 Y= ^ 6.98-.73X.73X Error term 0 1 2 ltime Fitted values PH Spring 2005 4

Create a new variable ltime=log(time) Regression analysis Spring 2005 5

Regression Terminology Regression: the mean of a response variable as a function of one or more explanatory variables: µ{y X} Regression model: an ideal formula to approximate the regression Simple linear regression model: µ { Y X} = β 0 + β 1X mean of Y given X or regression of Y on X Intercept Slope Unknown parameter Spring 2005 6

Regression Terminology Y Dependent variable Explained variable Response variable X Independent variable Explanatory variable Control variable Y s probability distribution is to be explained by X b 0 and b 1 are the regression coefficients (See Display 7.5, p. 180) Note: Y = b 0 + b 1 X is NOT simple regression Spring 2005 7

Regression Terminology: Estimated coefficients β 0 + β 1X β ˆ β 0 0 + + β 1 X ˆ β 1 X ˆ β 0 + ˆ β 1X ˆ β 0 β 0 + β 1 ˆ β 1 ˆ β 0 + ˆ β 1 ˆβ 0 1 ˆβ Choose and to make the residuals small Spring 2005 8

Regression Terminology Fitted value for obs. i is its estimated mean: ˆ = fit = µ { Y X} = β 0 + β X Residual for obs. i: Least Squares statistical estimation method finds those estimates that minimize the sum of squared residuals. Solution (from calculus) on p. 182 of Sleuth Y i 1 res n i= 1 i = Y i - fit ( y ( β β i i e i = Y i Yˆ n 2 2 0 + 1xi )) = ( yi yˆ ) i= 1 Spring 2005 9

Least Squares Procedure The Least-squares procedure obtains estimates of the linear equation coefficients β 0 and β 1, in the model ˆ y = β + β x i 0 1 i by minimizing the sum of the squared residuals or errors (e i) SSE 2 = ei = ( yi yi ˆ ) 2 This results in a procedure stated as SSE 2 = ei = ( yi ( β 0 + β1xi )) 2 Choose β 0 and β 1 so that the quantity is minimized. Spring 2005 10

Least Squares Procedure The slope coefficient estimator is ˆβ 1 = n i= 1 ( x n i i= 1 ( x X )( y i i X ) Y ) 2 = r xy s s CORRELATION BETWEEN X AND Y Y X STANDARD DEVIATION OF Y OVER THE STANDARD DEVIATION OF X And the constant or intercept indicator is ˆ β 0 = Y ˆ β X 1 Spring 2005 11

Least Squares Procedure(cont.) Note that the regression line always goes through the mean X, Y. Think of this regression line as Relation Between Yield and Fertilizer the expected value 100 of Y for a given 80 value of X. That is, for any value of the independent variable there is a single most likely value for the dependent variable Yield (Bushel/Acre) 60 40 20 0 0 100 200 300 400 500 600 700 800 Fertilizer (lb/acre) Trend line Spring 2005 12

Tests and Confidence Intervals for β 0, β 1 Degrees of freedom: (n-2) = sample size - number of coefficients Variance {Y X} σ 2 = (sum of squared residuals)/(n-2) Standard errors (p. 184) Ideal normal model: the sampling distributions of β 0 and β 1 have the shape of a t-distribution on (n-2) d.f. Do t-tests and CIs as usual (df=n-2) Spring 2005 13

P values for H o =0 Confidence intervals Spring 2005 14

Inference Tools Hypothesis Test and Confidence Interval for mean of Y at some X: Estimate the mean of Y at X = X 0 by ˆ µ { Y X } = ˆ β + 0 0 ˆ β X 1 0 Standard Error of SE [ ˆ{ µ Y X 0 ˆβ 0 }] = σˆ 1 n + ( X ( n 0 X 1) ) 2 s x 2 Conduct t-test and confidence interval in the usual way (df = n-2) Spring 2005 15

Confidence bands for conditional means confidence bands in simple regression have an hourglass shape, narrowest at the mean of X the lfitci command automatically calculate and graph the confidence bands Spring 2005 16

Prediction Prediction of a future Y at X=X 0 Pred( Y X ) = ˆ{ µ Y X } 0 0 Prediction Standard error of prediction: SE [ Pred( Y X ˆ ˆ( µ X 2 2 0 )] = σ + ( SE[ Y 0)]) Variability of Y about its mean Uncertainty in the estimated mean 95% prediction interval: Pred( Y X ) ± t (.975)* SE[Pred( Y X )] 0 df 0 Spring 2005 17

Residuals vs. predicted values plot After any regression analysis we can automatically draw a residual-versus-fitted plot just by typing Spring 2005 18

Predicted values (yhat) After any regression, the predict command can create a new variable yhat containing predicted Y values about its mean Spring 2005 19

Residuals (e) the resid command can create a new variable e containing the residuals Spring 2005 20

The residual-versus-predicted-values plot could be drawn by hand using these commands Spring 2005 21

Second type of confidence interval for regression prediction: prediction band This express our uncertainty in estimating the unknown value of Y for an individual observation with known X value Command: lftci with stdf option

Additional note: : Predict can generate two kinds of standard errors for the predicted y value, which have two different applications. Confidence bands for conditional means (stdp) Confidence bands for individual-case predictions (stdf) Distance 0 1 2 3 Distance -1 0 1 2 3-500 0 500 1000 VELOCITY -500 0 500 1000 VELOCITY

Confidence bands for conditional means (stdp) Distance 0 1 2 3-500 0 500 1000 VELOCITY 95% confidence interval for µ{y 1000} confidence band: a set of confidence intervals for µ{y X 0 } Confidence bands for individual-case predictions (stdf) 95% prediction interval for Y at X=1000 Calibration interval: values of X for which Y 0 is in a prediction interval Distance -1 0 1 2 3-500 0 500 1000 VELOCITY Spring 2005 24

Notes about confidence and prediction bands Both are narrowest at the mean of X Beware of extrapolation The width of the Confidence Interval is zero if n is large enough; this is not true of the Prediction Interval. Spring 2005 25

Spring 2005 26 Review of simple linear regression 2 2 0 2 1 1 2 1 0 1 0 1 2 1 1 2 1 0 1) /( ) (1/ / ˆ ) ˆ ( 1) ( / ˆ ) ˆ ( 2) /( ˆ ) 1,.., ( ˆ ˆ ˆ ˆ. ) ( ) / )( ( ˆ } var{ } { 2 x x n i i i i i n i i n i i i s n X n SE s n SE n res n i X Y res X Y X X Y Y X X X Y X X Y + = = = = = = = = + = = = = σ β σ β σ β β β β β σ β β µ 1. Model with constant variance. constant variance. 2. Least squares Least squares: choose estimators β 0 and β 1 to minimize the sum of squared residuals. 3. Properties Properties of estimators.

Assumptions of Linear Regression A linear regression model assumes: Linearity: µ {Y X} = β 0 + β 1 X Constant Variance: var{y X} = σ 2 Normality Dist. of Y s at any X is normal Independence Given X i s, the Y i s are independent Spring 2005 27

Examples of Violations Non-Linearity P(w) The true relation between the independent and dependent variables may not be linear. For example, consider campaign fundraising and the probability of winning an election. Probability of Winning an Election The probability of winning increases with each additional dollar spent and then levels off after $50,000. $50,000 Spending Spring 2005 28

Consequences of violation of linearity : If linearity is violated, misleading conclusions may occur (however, the degree of the problem depends on the degree of non-linearity) Spring 2005 29

Examples of Violations: Constant Variance Constant Variance or Homoskedasticity The Homoskedasticity assumption implies that, on average, we do not expect to get larger errors in some cases than in others. Of course, due to the luck of the draw, some errors will turn out to be larger then others. But homoskedasticity is violated only when this happens in a predictable manner. Example: income and spending on certain goods. People with higher incomes have more choices about what to buy. We would expect that there consumption of certain goods is more variable than for families with lower incomes. Spring 2005 30

Violation of constant variance Spending X 8 X 10 Relation between Income and Spending violates homoskedasticity X 6 ε 8 ε 6 = ( Y ( a+ bx)) 6 6 ε 6 ε 9 ε 9 = ( Y ( a+ bx )) 9 9 X X 1 2 X X 4 3 ε 5 X 5 X ε 7 7 X 9 As income increases so do the errors (vertical distance from the predicted line) income Spring 2005 31

Consequences of non-constant variance If constant variance is violated, LS estimates are still unbiased but SEs, tests, Confidence Intervals, and Prediction Intervals are incorrect However, the degree depends Spring 2005 32

Violation of Normality Non-Normality Nicotine use is characterized by a large number of people not smoking at all and another large number of people who smoke every day. Frequency of Nicotine use An example of a bimodal distribution Spring 2005 33

Consequence of non-normality If normality is violated, LS estimates are still unbiased tests and CIs are quite robust PIs are not Of all the assumptions, this is the one that we need to be least worried about violating. Why? Spring 2005 34

Violation of Non-independence Residuals of GNP and Consumption over Time Highly Correlated Non-Independence The independence assumption means that errors terms of two variables will not necessarily influence one another. Technically, the RESIDUALS or error terms are uncorrelated. The most common violation occurs with data that are collected over time or time series analysis. Example: high tariff rates in one period are often associated with very high tariff rates in the next period. Example: Nominal GNP and Consumption Spring 2005 35

Consequence of non-independence If independence is violated: - LS estimates are still unbiased - everything else can be misleading Plotting code is litter (5 mice from each of 5 litters) Log Height Note that mice from litters 4 and 5 have higher weight and height Spring 2005 36 Log Weight

Robustness of least squares The constant variance assumption is important. Normality is not too important for confidence intervals and p-values, but is important for prediction intervals. Long-tailed distributions and/or outliers can heavily influence the results. Non-independence problems: serial correlation (Ch. 15) and cluster effects (we deal with this in Ch. 9-14). Strategy for dealing with these potential problems Plots; Residual plots; Consider outliers (more in Ch. 11) Log Transformations (Display 8.6) Spring 2005 37

Tools for model checking Scatterplot of Y vs. X (see Display 8.6 p. 213)* Scatterplot of residuals vs. fitted values* *Look for curvature, non-constant variance, and outliers Normal probability plot (p.224) It is sometimes useful for checking if the distribution is symmetric or normal (i.e. for PIs). Lack of fit F-test when there are replicates (Section 8.5). Spring 2005 38

Scatterplot of Y vs. X Command: graph twoway Y X Case study: 7.01 page175 Spring 2005 39

Scatterplot of residuals vs. fitted values Command: rvfplot, yline(0) Case study: 7.01 page175 Spring 2005 40

Normal probability plot (p.224) Quantile normal plots compare quantiles of a variable distribution with quantiles of a normal distribution having the same mean and standard deviation. They allow visual inspection for departures from normality in every part of the distribution. Command: qnorm variable, grid Case study: 7.01, page 175 Spring 2005 41

Diagnostic plots of residuals Plot residuals versus fitted values almost always: For simple reg. this is about the same as residuals vs. x Look for outliers, curvature, increasing spread (funnel or horn shape); then take appropriate action. If data were collected over time, plot residuals versus time Check for time trend and Serial correlation If normality is important, use normal probability plot. A straight line is expected if distribution is normal Spring 2005 42

Voltage Example (Case Study 8.1.2) Goal: to describe the distribution of breakdown time of an insulating fluid as a function of voltage applied to it. Y=Breakdown time X= Voltage Statistical illustrations Recognizing the need for a log transformation of the response from the scatterplot and the residual plot Checking the simple linear regression fit with a lack-of-fit F-test Stata (follows) Spring 2005 43

Simple regression The residuals vs fitted values plot presents increasing spread with increasing fitted values Next step: We try with log(y) ) ~ log(time) Spring 2005 44

Simple regression with Y logged The residuals vs fitted values plot does not present any obvious curvature or trend in spread. Spring 2005 45

Interpretation after log transformations Model Dependent Variable Independent Variable Interpretation of β 1 Level-level Y X y=β 1 x Level-log Y log(x) y=(β 1 /100)% x Log-level log(y) X % y=(100β 1 ) x Log-log log(y) log(x) % y=(β 1 )% x Spring 2005 46

Dependent variable logged µ{log(y) X} = β 0 + β 1 X is the same as: (if the distribution of log(y), given X, is symmetric) 0 1 Median { Y X } = e β + β X As X increases by 1, what happens? Median{ Y Median{ Y X = x X = + 1} x} = e e β + 0 β1 ( 1) 0 x+ β + β x 1 = e β 1 Median{ Y X = x + 1} = e β 1 Median{ Y X = x} Spring 2005 47

Interpretation of Y logged As X increases by 1, the median of Y changes by the multiplicative factor of e β 1. Or, better: If β 1 >0: As X increases by 1, the median of Y β1 increases by ( e 1)*100% If β 1 < 0: As X increases by 1, the median β1 of Y decreases by ( 1 e ) *100 % Spring 2005 48

Example: µ{log(time) voltage} = β 0 β 1 voltage 1- e -0.5 =.4 Spring 2005 49

µ{log(time) voltage} = 18.96 -.507voltage 1- e -0.5 =.4 It is estimated that the median breakdown time decreases by 40% with each 1kV increase in voltage Log of time until breakdown -2 0 2 4 6 8 Breakdown time (minutes) 0 500 1000 1500 2000 2500 25 30 35 40 VOLTAGE 25 30 35 40 VOLTAGE Fitted values logarithm of breakdown time Fitted values TIME Spring 2005 50

If the explanatory variable (X) is logged If µ{y log(x)} = β 0 + β 1 log(x) then: Associated with each two-fold increase (i.e doubling) of X is a β 1 log(2) change in the mean of Y. An example will follow: Spring 2005 51

Example with X logged (Display 7.3 Case 7.1): Y = ph X = time after slaughter (hrs.) estimated model: µ{y log(x)} = 6.98 -.73log(X). -.73 log(2) = -.5 It is estimated that for each doubling of time after slaughter (between 0 and 8 hours) the mean ph decreases by.5. ph 5.5 6 6.5 7 ph 5.5 6 6.5 7 0.5 1 1.5 2 ltime Fitted values PH 0 2 4 6 8 TIME Spring 2005 Fitted values PH 52

Both Y and X logged µ{log(y) log(x)} = β 0 + β 1 log(x) is the same as: As X increases by 1, what happens? If β 1 >0: As X increases by 1, the median of Y log(2) β1 ( e 1)*100% increases by If β 1 < 0: As X increases by 1, the median of Y 1 ( 1 e )*100% decreases by log( 2) β Spring 2005 53

Example with Y and X logged Display 8.1 page 207 Y: number of species on an island X: island area µ{log(y) log(x)} = β 0 β 1 log(x) Spring 2005 54

Y and X logged µ{log(y) log(x)} = 1.94.25 log(x) Since e.25log(2) =.19 Associated with each doubling of island area is a 19% increase in the median number of bird species Spring 2005 55

Example: Log-Log In order to graph the Log-log plot we need to generate two new variables (natural logarithms) Spring 2005 56