The Simple Regression Model. Simple Regression Model 1

The Simple Regression Model Simple Regression Model 1

Simple regression model: Objectives Given the model: - where y is earnings and x years of education - Or y is sales and x is spending in advertising - Or y is the market value of an apartment and x is its area Our objectives for the moment will be: - Estimate the unknown parameters β 0 andβ 1 - Assess how much x explains y Simple Regression Model 2

The Simple Linear Regression (SLR) Model with Cross-Sectional Data Assumption SLR.1 (Linearity in parameters) y is the dependent variable (or explained variable, or response variable, or the regressand) x is the independent variable (or explanatory variable, or control variable, or the regressor) u is the error term or disturbance (y is allowed to vary for the same level of x) This is a Population equation. It is linear in the (unknown) parameters Simple Regression Model 3

Further assumptions Assumption SLR.2 (Random Sampling) We have a random sample from the population such that: Assumption SLR.3 (Variation in the Regressor) x i, i=1,2,...,n don t have all the same value Simple Regression Model 4

Further assumptions Assumption SLR.4 (Zero Conditional Mean) Crucial assumption! Remember previous examples: Earnings β 0 + β education+ u = 1 In these cases SLR.4 most likely fails. Assumption SLR.4 (Zero Mean) CrimeRate β 0 + β Policeman+ = 1 u Not really necessary given that SLR.4 implies SLR.4 This is not a restrictive assumption, since we can always use β 0 so that SLR.4 holds Simple Regression Model 5

Implication of Zero Conditional Mean implies f(y). y. E(y x) = β 0 + β 1 x For any x, distribution of y is centered at E(y x) x 1 x 2 Simple Regression Model 6

Population regression line, sample data points and the associated error terms y y 4 E(y x) = β 0 + β 1 x u 4 {. y 3 y 2.} u. 3 u 2 { y 1. } u 1 x 1 x 2 x 3 x 4 Simple Regression Model 7 x

Estimation of unknown parameters by method of moments Want to estimate the population parameters from a sample Let {(x i,y i ): i=1,,n} denote a random sample of size n from the population y i =β 0 + β 1 x i + u i i) implies since ii) implies Simple Regression Model 8

Estimation of unknown parameters by method of moments (intuition) Clever idea is to pick so that sample moments match the population moments The sample counterparts are: These are two equations in the two unknowns Simple Regression Model 9

Estimation of unknown parameters by method of moments (solution) implies: Plug-in in to obtain: this simplifies to: Need sample variation of the x variable. Otherwise the slope parameter is not well defined! Simple Regression Model 10

Slope estimator It is an estimator if we treat (x i,y i ) as random variables. It is an estimate once we substitute (x i,y i ) by the actual observations. The slope estimate is the sample covariance between x and y divided by the sample variance of x If x and y are positively correlated, the slope will be positive If x and y are negatively correlated, the slope will be negative Need x to vary in our sample Simple Regression Model 11

Example and interpretation n=11064, Data from the Employment Survey (INE), 2003 wage is net monthly wage and educ measures the number of years of schooling completed So, we estimate a positive relation between these two variables; an additional year of schooling leads to an estimated expected increase of 52.513 euros in the wage Again, must be cautious about the interpretation of these estimates Simple Regression Model 12

Definitions Fitted value Residual Sum of Squared Residuals (RSS) Simple Regression Model 13

Sample regression line, sample data points and the associated estimated error terms (called residuals); fitted values are on the line y y 4 y 3 y 2 }. û. 3 û 2 { û 4. { yˆ =β ˆ + βˆ 1x 0 y 1 }. û 1 x 1 x 2 x 3 x 4 Simple Regression Model 14 x

Alternate approach to derivation OLS Ordinary Least Squares Can fit a line so as to minimise the sum of squared residuals Equivalent to the equations we had before: same solution! Simple Regression Model 15

Review OLS estimators Given a random sample: OLS obtained by minimising: with respect to Alternatively, can exploit the fact that: Fitted value: Residual: Simple Regression Model 16

Today - Assess how much x explains y - Assess how close our estimators are from the true (population) values Simple Regression Model 17

Looking for a Goodness of fit measure: Decomposition of the Total Sum of Squares (SST) We know: This implies: Also:, to be used later Now define the Total Sum of Squares SST SSE, Explained Sum of Squares Simple Regression Model 18

Goodness-of-Fit How well does our sample regression line fit the data? If SSR is very small relative to SST (or SSE very large relative to SST), then a lot of the variation in the dependent variable y is explained by the model Can compute the fraction of the total sum of squares (SST) that is explained by the model, denote this as the R-squared of the regression R 2 = SSE/SST = 1 SSR/SST The lower SSR, the higher will be R 2 OLS maximises R 2 (minimises SSR). We have always 0 R 2 1 Simple Regression Model 19

Example (continued) n=11064, Data from the Employment Survey (INE), 2003 A lot of variation in the dependent variable (wage) is left unexplained by the model This is not related to the fact that educ is most probably correlated with u Can have low R 2 even if all the assumptions hold true Simple Regression Model 20

Assumptions again... Assumption SLR.1 (Linearity in parameters) Assumption SLR.2 (Random Sampling from the population) We have a random sample Assumption SLR.3 (Variation in the Regressor) x i, i=1,2,...,n don t have all the same value Assumption SLR.4 (Zero Conditional Mean) This implies Simple Regression Model 21

Properties of OLS Estimators Thought experiment: Get a sample of size n from the population many times (infinite times) Compute the OLS estimates Is the average of these estimates equal to the true parameter values? Under the previous assumption (SLR.1 to SLR.4) it turns out that the answer is positive These are estimators if we treat (x i,y i ) as random variables It will be the case that: Simple Regression Model 22

Unbiasedness of OLS (Proof) Put: SST x = n = ( x i 1 i x) 2 n n n Now, note that: ( x x) = and ( x x) x = ( x ) 2 0 i= 1 i i= 1 i i i= 1 i x Let us rewrite our estimator as: Simple Regression Model 23

Unbiasedness of OLS (cont) Now take expectations Conditional on the values of the x s, that is, treat the x s as constants. To avoid heavy notation, I don t make this explicit. And if the conditional expectation is a constant the unconditinal expectation is also that constant... Now, it s really easy to prove unbiasedness of Try it yourself! Simple Regression Model 24

Unbiasedness Summary The OLS estimators ofβ 1 andβ 0 are unbiased However, in a given sample we may be close or far from the true parameter, unbiasedness is a minimal requirement Unbiasedness depends on the 4 assumptions: if any assumption fails, then OLS is not necessarily unbiased Simple Regression Model 25

Variance of the OLS Estimators We know already that the sampling distribution of our estimators is centered around the true parameter Is the distribution concentrated around the true parameter? To answer this question, we add an additional assumption Assumption SLR.5 (Homoskedasticity) Simple Regression Model 26

Variance of OLS (cont.) Assumption SLR.5 (Homoskedasticity) Var(u x) = E(u 2 x)-[e(u x)] 2 E(u x) = 0, so σ 2 = E(u 2 x) = E(u 2 ) = Var(u) So, σ 2 is also the unconditional variance, called the variance of the error σ, the square root of the error variance is called the standard deviation of the error Can write: E(y x)=β 0 + β 1 x and Var(y x) = σ 2 Simple Regression Model 27

Homoskedastic Case f(y x) y.. E(y x) = β 0 + β 1 x x 1 x 2 Simple Regression Model 28

Heteroskedastic Case f(y x) y... E(y x) = β0 + β 1 x x 1 x 2 x 3 Simple Regression Model 29 x

Variance of OLS (cont.) SST x = n = ( x i 1 i x) 2 Note that: Var( a+ bx ) = Var( bx ) = b Var( Y + X ) = Var( Y ) + Var( X ) 2 Var( X ) - a and b are constants and X a Random Variable - If X and Y are independent ( ˆ ) 1 Var β1 = Var β 1 + ( xi x) u SST i = x 2 2 2 ( ( i ) i ) ( i ) ( i ) = 1 Var x x u 1 x x Var u SST = x SST x 2 2 2 2 2 2 ( xi x) σ σ SST σ x x x x = 1 1 SST = = SST SST This is a conditional variance! Simple Regression Model 30

Variance of OLS Summary Var ( ˆ ) 2 β 1= σ n SSTx = = ( x i 1 i SST x x) 2 The larger the error variance, σ 2, the larger the variance of the slope estimator The larger the variability in the x i, the smaller the variance of the slope estimate SST x tends to increase with the sample size n, so variance tends to decrease with the sample size Can also show: Simple Regression Model 31

Estimating the Error Variance In practice, the error variance σ 2 is unknown, must estimate it... Can show that this is an unbiased estimator of the error variance The so-called standar error of the regression is given by: Simple Regression Model 32

Estimated standard error (se) of the parameters And similarly for In our leading example: Simple Regression Model 33