Inside ECOOMICS Introduction to Econometrics Simple Linear Regression Model & Introduction to Introduction OLS Estimation We are interested in a model that explains a variable y in terms of other variables x. We are also interested in finding how much y changes as a result of change in x. The simple linear regression model is used to study the relationship between an independent variable and the explanatory variables. For instance we have one explanatory variable x and one dependent variable y as shown below. It is common to include a constant β 0 which indicates the point of intersection on the y axis. The error term denoted by u represents the factors other than x that have an effect on the dependent variable y. Please note in this document we shall be dealing with only cross sectional data y = β 0 + β 1 x 11 + u The β k are unknown coefficients and the x ik are the regressors. For the regressors x ik the i denotes the observation or individual and are indexed from 1 to, where is called the sample size. So for instance the regressor x 13 means that the coefficient relates to the third regressor of the model for individual or observation 1. In the above equation this is the first regressor for individual or observation 1. y 1 = β 0 + β 1 x 11 + β x 1 + β 3 x 13 + β 4 x 14 + u 1 In the equation above we have linear regression for the 1st observation or individual. y = β 0 + β 1 x 1 + β x + β 3 x 3 + β 4 x 4 + u In the equation above we now have an equation for the second observation or individual. Please note that the regressors are the same for both individuals but they may, have different beta coefficients. For example x i1 and x i could be variables such as education and age. Therefore x 11 and x 1 are education and age for the first individual and x 1 and x are the education and age regressors for the second individual. Suppose that for some k = 1,., k, x ik denotes age of individual i and if the individual i were one year older, the value of the dependent variable y i will increase by β k. Matrix otation y i = β 0 + β 1 x i1 + β x i + β 3 x i3 + + β k x ik + u i (1) In Matrix otation we can write the model as Y = Xβ + u () Where Y is a vector of dependent variablesy = (y 1, y,, y n ). X is a matrix of independent variables with dimensions n k + 1. The 1 column of the matrix is there for the intercept term. The error term is also a vector u = (u 1, u,, u n ). 1
Inside ECOOMICS y 1 1 x 11 Y =, X = y 1 x n1 x 1 x 1k β 0 u 1, β =, u = x n x K β k u Ordinary Least Squares Estimation There are various methods to estimate the coefficients. Ordinary Least Squares (OLS) is just one these methods. OLS is relatively simple and has some attractive properties that make it a popular estimation method. The OLS estimator minimises the sum of squared residuals. In the diagram above we see that this line is the line that minimises the sum of squared residuals. So the sum of the squared distance between the errors and the line is minimised with this line. If the line was to change then the sum of squared residuals would be larger and would not be the minimum variance estimator. We prefer estimators with OLS Assumptions Assumption 1: Independent and identically distributed (I.I.D) I.I.D observations: (x i, y i ) is independent from, and has the same distribution as, (x j, y j ) for all i j; We do not observe the population but only a sample therefore we assume that an I.I.D sample can be drawn from the population. The I.I.D assumption makes it easier for us to interpret some of the other assumption. It also allows us to use asymptotic results (as the sample size ). Assumption : Linearity The regresssion model is linear in the parameters, (this is evident in the structure of equation (1)) Essentially the response variable is a linear function of the regressors. In the case where the models may not be linear in parameters a linear regression model will be an approximation. However this approximation often results in minimal accuracy.
Inside ECOOMICS Assumption 3: Uncorrelatedness E[x i, u i ] = 0 It is assumed that E[u i ] = 0, which means that the errors in the regression should have conditional mean of zero. Therefore assumption 3 is equivalent to errors being uncorrelated with the regressors. If this assumption holds we can call the regressor exogenous variables. If however it does not hold then the regressors that are correlated with the error term are called endogenous variables. If the regression contains endogenous variables the OLS estimates will be invalid and instrumental variables will be required. Assumption 4: Full Rank (OLS) rank E[x i, x i ] = k This assumption eliminates the possibility of collinearity. In practice collinearity is not a large problem, esspecially if the sample size is large. Assumption 5: Homoskedasticity This can be written as E u i X = σ E u i x i x i = E u i E[x i x i ] = σ A, where σ E u i The u i are known as error terms and include all the differences in y i that are not captured by the x variables. Homoskedasticity means that the errors have the same variance σ for each observation. The variance of the error is treated as a constant. This means that the 1st observation and the last observation in the sample will have equal and identical variance for the error. As a result the probability distribution for the dependent variable has the same variance regardless of the values for the explanatory variables. If this assumption is violated we have hetroskedasticity which means that the variance of the error term is not constant and differs across observations. Aside: If hetroskedasticity is present the weighted least squares estimator will be a more efficient estimator and can be used. If the errors have infinite variance robust estimation techniques are preferred. Assumption 6: Exogeneity This implies that E β β x 1,, x = 0 E[u i x i ] = 0 3
Inside ECOOMICS This means that β is an unbiased estimator of β conditional on the regressors x 1,, x. As E β = β, irrespective of the value of β. This assumption is similar to assumption 3, however assumption 3 is a stronger assumption of strict exogeneity. Deriving the OLS Estimator (Summation otation) We will now minimise the sum of squared residuals to derive the OLS estimator. The OLS Estimator is BLUE (Best Linear Unbiased Estimator). We will firstly derive the OLS estimator with sigma notation. Assuming we have an intercept and one regressor so K =. Equation y i = β 0 + β 1 x i + u i The data collected on the x s and y s will be used to construct estimates for β 0 and β 1. OLS is on technique of estimation and requires the minimisation of the sum of square residuals. β 0β 1 = min β0,β 1 (y i β 0 + β 1x i ) (1) The First Order Conditions are as follows Please note that this implies that, y i β 0 + β 1x i = 0 x i y i β 0 + β 1x i = 0 u i = 0 and x i u i = 0 where x = 1 x i therefore the above equation holds because x u i = 0 x is the sample average of the independent variable and comes from the sum of all x i divided by the number of observations in the sample. (Remember there are observations i. ). Similarly the sample average of the dependent variable is y = 1 y i Let us turn our attention to the first FOC and solve for β 0 1 y i β 0 + β 1x i = 0 1 y i 1 β 0 + 1 β 1x i = 0 y β 0 + β 1x = 0 4
Inside ECOOMICS β 0 = y β 1x ow our task is to solve for β 1 x i y i β 0 + β 1x i = 0 First we can eliminate the - as this is just a constant. Then we can start by multiplying out the equation to get the following expression. x i y i β 0x i β 1x i x i = 0 x i y i β 0x i β 1x i = 0 Substitute the expression for β 0 into the above equation x i y i x i (y β 1x ) β 1x i = 0 The summation term applies to everything in the equation so to work out the step it is best to write it out. (Remember that you can always put a constant term out in front of the summation) x i y i y x i β 1x x i β 1 x i = 0 Using the properties x = 1 x i and y = 1 y i x i y i y x β 1x x β 1 x i = 0 x i y i y x β 1 x x i = 0 (x i x )(y i y ) (x i x ) β 1 Rearrange for β 1 β 1 = x iy i x y x i x β 1 = (x i x )(y i y) (x i x ) 5
Inside ECOOMICS Finally we have solved for both β 0 and β 1 OLS For an Arbitrary k > β 1 = β 0 = y β 1x (x i x )(y i y) (x i x ) So far we have only had two parameters the intercept β 0 and the explanatory variable β 1. When k > the previous equations are incorrect. If we have an arbitrary k number of variables we need to minimise the sum of squared residuals in terms of k terms. β = min (y i x i β ) x i (y i x i β ) = 0 x i (y i x i β ) = 0, which is the same as x i u i = 0 x i x i β = x i y i β = x i x i 1 x i y i This equation is equivalent to the Matrix otation OLS Estimator equation β = (X X) 1 X Y Deriving the OLS Estimator (Matrix otation) We will now minimise the sum of squared residuals to derive the OLS estimator using Matrix algebra. Y = Xβ + u u = Y Xβ min u u = Y Xβ Y Xβ Use matrix calculus d A A da d Y Xβ Y Xβ dβ = A and d(cb) db = 0 = C 6
Inside ECOOMICS ( X) Y Xβ = 0 X Y Xβ = 0 X Y X Xβ = 0 X Xβ = X Y Assuming that (X X) 1 exists (Assumption 4) β = (X X) 1 X Y Where this β is the OLS estimator of the true population beta β Key Equations 1. Linear Model y i = β 0 + β 1 x i1 + β x i + β 3 x i3 + + β k x ik + u i. Linear Model Matrix otation Y = Xβ + u 3. OLS Estimator β 0 = y β 1x 4. OLS Estimator β 1 = (x i x )(y i y) (x i x ) 5. OLS Estimator for Arbitrary k > β = x i x i 1 x i y i 6. OLS Estimator Matrix otation β = (X X) 1 X Y Gauss-Markov Theorem In the classical linear regression model under Assumptions 1,, 4 and 6 the OLS estimator of equation is the minimum variance unbiased estimator of β. The OLS Estimator is BLUE (Best Linear Unbiased Estimator). For the proof of OLS properties please refer to the document labelled Properties of OLS (Proofs). Also for a brief OLS derivation reference refer to the document labelled Derivation of the OLS Estimator. 7