Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 1 / 33
The multiple linear regression model The major drawback of the bivariate regression model is that the key assumption SLR.4 (zero conditional mean) is often unrealistic. The multiple linear regression model (MLR) allows to control for many other factors which might otherwise be captured in the error term. Thus it is more amenable to ceteris paribus analysis. The model with k independent variables given a sample (y i, x i1,..., x ik ), i = 1,..., n, reads y i = β 0 + β 1 x i1 + β 2 x i2 + + β k x ik + u i (1) y i = x i β + u i with x i = (1, x i1,..., x ik ) and β = (β 0, β 1,..., β k ). The key assumption is E(u i x i ) = 0 i = 1,..., n (3) (2) Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 2 / 33
MLR Matrix notation Let y be a n 1 vector of observations on y, let X be the data matrix with dimension n (k + 1) and associated parameter vector β R (k+1) 1, and let u be a n 1 vector of disturbances. With these ingredients we can write the model in (1) as y = Xβ + u (4) or more explicitly, y 1 1 x 11 x 12... x 1k β 0 u 1 y 2. = 1 x 21 x 22... x 2k β 1.......... + u 2. y n 1 x n1 x n2... x nk β k u n (5) where Xβ is the systematic and u the stochastic component. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 3 / 33
MLR An example We are interested in the effect of education on hourly wage: wage i = β 0 + β 1 educ i + β 2 exper i + u i, i = 1,..., n (6) We control for years of labor market experience (exper). We are still primarily interested in the effect of education. The MLR takes experience out of the error term u. With the SLR we would have to assume exper educ. 1 β 1 measures the effect of educ on wage when exper is held constant. β 2 measures the effect of exper on wage when educ is held constant. 1 denotes statistical independence. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 4 / 33
MLR Estimation As in Module 1, we are looking for a coefficient vector ˆβ R (k+1) 1 that minimizes the sum of squared residuals. Formally the problem reads arg min ˆβ n û 2 i = û û = (y X ˆβ) (y X ˆβ) (7) i=1 = y y ˆβ X y y X ˆβ + ˆβ X X ˆβ (8) = y y 2 ˆβ X y + ˆβ X X ˆβ (9) Note that the last step is possible because ˆβ X y = (y X ˆβ) = y X ˆβ. The first-order condition for minimization is û û ˆβ = 2X y + 2X X ˆβ = 0 (10) or written differently X X ˆβ = X y (11) which is called the system of least squares normal equations. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 5 / 33
MLR Estimation If X X is non-singular (i.e., there exists an inverse), pre-multiplying both sides of equation (11) with (X X) 1 yields the OLS estimators ˆβ: ˆβ = (X X) 1 X y (12) Important: The matrix X X is non-singular (= invertible) and ˆβ a unique solution to the minimization problem if and only if we have at least n k observations, and the data matrix X has rank (k + 1). The second point is violated if there are linear dependencies among the explanatory variables (i.e., perfect collinearity). Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 6 / 33
MLR Properties of the OLS estimator The OLS estimator has various important properties that do not depend on any assumptions, but rather arise by how it is constructed. First, substitute y = X ˆβ + û into the system of normal equations (11) to obtain X X ˆβ = X y X X ˆβ = X (X ˆβ + û) X X ˆβ = X X ˆβ + X û 0 = X û (13) A number of important properties can be derived from this condition. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 7 / 33
MLR Properties of the OLS estimator 1 The observed values of X are uncorrelated with the residuals û. This follows immediately from (13): X û = 0 iff. X û. Note that this does not mean that X is uncorrelated with u! We have to assume this. 2 The sum of residuals is zero. If there is a constant, the first column of X will be a column of ones. For the first element in the X û to be zero it must hold that ûi = 0. i 3 The sample mean of the residuals is zero. This follows from the previous property: û = n 1 n û = i=1 0. 4 The regression hyperplane passes through the means of the observed values X and ȳ. Recall that û = y X ˆβ. Dividing by n gives û = ȳ X ˆβ. From the previous property: û = ȳ X ˆβ = 0, so ȳ = X ˆβ. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 8 / 33
MLR Properties of the OLS estimator 5 The predicted values ŷ are uncorrelated with the residuals û. The predicted values are ŷ = X ˆβ. From this we have ŷ û = (X ˆβ) û = ˆβ X û = 0 because X û = 0. 6 The mean of the predicted Y s for the sample will equal the mean of the observed Y s, i.e. ŷ = ȳ. Proof is left as an exercise (use the result in item 4). Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 9 / 33
MLR Properties of the OLS estimator These properties always hold true, be careful not to infer anything from the residuals about the actual disturbances! So far we know nothing about ˆβ except that it satisfies all of the properties discussed above. We need to make some assumptions about the true model in order to make any inferences regarding β (the true population parameters) from ˆβ (our estimator of the true parameters). Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 10 / 33
MLR Expected value and variance The assumptions from the bivariate model translate to the multivariate case as follows: Assumption MLR.1 Linear in parameters The population model is linear: y = β X + u. Assumption MLR.2 Random sampling We have a random sample of n observations, {(y i, x i) i = 1,..., n}, that follows the population model in assumption MLR.1. Assumption MLR.3 No perfect collinearity The data matrix X has rank (k + 1). Assumption MLR.4 Zero conditional mean Conditional on the entire matrix X, each error u i has mean zero: E(u i X) = 0 i = 1,..., n. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 11 / 33
Finite sample properties Note that we only need assumption MLR.3 (no perfect collinearity) to obtain an OLS estimate ˆβ. Whether this estimate actually makes sense, i.e. is unbiased and representative for the full population, depends on the other assumptions. Especially the zero conditional mean assumption (MLR.4) often poses problems in practice. Requires that, conditional on the observed covariates x i, unobservables are on average orthogonal to the error term u i. Fails in the case of Simultaneity Selection Omitted variables Functional form misspecification Measurement error We will discuss sources and consequences of these cases in Module 6. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 12 / 33
Finite sample properties Expected values Theorem: Unbiasedness of OLS Under assumptions MLR.1 through MLR.4, E( ˆβ) = β. (14) In other words, ˆβ is an unbiased estimate for β. Proof. Rewrite the OLS estimator as ˆβ = (X X) 1 X y = (X X) 1 X (Xβ + u) = (X X) 1 (X X)β + (X X) 1 X u (15) Because X X is a square matrix, (X X) 1 (X X) = I, thus ˆβ = β + (X X) 1 X u (16) Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 13 / 33
Finite sample properties Expected values Taking conditional expectations on both sides of equation (16) gives E( ˆβ X) = β + (X X) 1 X E(u X) (17) By MLR.4, E(u X) = 0, so This completes the proof. E( ˆβ X) = β (18) Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 14 / 33
Finite sample properties Expected variances Assumption MLR.1 Linear in parameters The population model is linear: y = β X + u. Assumption MLR.2 Random sampling We have a random sample {(y i, x i) i = 1,..., n} that follows the population model: y i = β 0 + β 1x i + u i. Assumption MLR.3 No perfect collinearity The data matrix X has rank (k + 1). Assumption MLR.4 Zero conditional mean Conditional on the entire matrix X, each error u i has zero mean: E(u i X) = 0 i = 1,..., n. Assumption MLR.5 Homoskedasticity and no serial correlation The error u i has the same variance given any values of the covariates, i.e. Var(u i X) = σ 2, i = 1,..., n, and there is no serial correlation between the errors: Cov(u i, u j X) = 0 for all j i. We can write these two assumptions as Var(u X) = σ 2 I. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 15 / 33
Finite sample properties Expected variances Assumption MLR.5 requires that errors are homoskedastic and that there is no serial correlation (meaning that errors are not correlated across observations this is especially important if you deal with panel data, but sometimes also in cross-sectional settings). Combining these assumptions, we can write the variance-covariance matrix of the disturbances as σ 2 0... 0 Var(u X) = E(uu 0 σ 2... 0 X) =........ = σ2 I (19) 0 0... σ 2 Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 16 / 33
Finite sample properties Expected variances Theorem: Variance-covariance matrix of the OLS estimator Under assumptions MLR.1 through MLR.5, Var( ˆβ) = σ 2 (X X) 1 (20) Proof. See Wooldridge (2013), p. 805. For one particular ˆβ j ˆβ, the variance is obtained by multiplying σ 2 by the jth diagonal element of (X X) 1. It can also be written as Var( ˆβ j ) = σ 2 SST j (1 R 2 j ) (21) where SST j = n i=1 (x ij x j ) 2 is the total sample variation in x j and R 2 j is the R-squared from regressing x j on all other independent variables (and including an intercept). Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 17 / 33
Finite sample properties Expected variances The unbiased estimator of the error variance in the multivariate case is given by ˆσ 2 = u u n k 1 (22) where u u is again the sum of squared residuals. Theorem: Unbiasedness of σ 2 Under assumptions MLR.1 through MLR.5, ˆσ 2 is an unbiased estimate for σ 2. That is, E(ˆσ 2 X) = σ 2 σ 2 > 0 (23) Proof. See Wooldridge (2013), p. 807. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 18 / 33
Finite sample properties Gauss-Markov theorem Gauss-Markov Theorem Under assumptions MLR.1 through MLR.5, ˆβ is the best linear unbiased estimator. Proof. See Wooldridge (2013), p. 808. The Gauss-Markov theorem translated: OLS is the estimator with the smallest variance amongst all linear unbiased estimators. OLS is BLUE. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 19 / 33
Inference Sampling distributions of the OLS estimators Although it is not necessary for the Gauss-Markov theorem to hold, 2 we assume normally distributed disturbances to derive sampling distributions. Assumption MLR.6 Normality of errors Conditional on X, u is distributed as multivariate normal with mean zero and variance-covariance matrix σ 2 I. That is, u Normal(0, σ 2 I). (24) 2 We will show later that, as soon as asymptotics kick in, we don t need the normality assumption anymore for our test statistics to be valid. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 20 / 33
Inference Sampling distributions of the OLS estimators Theorem: Normality of ˆβ Under the classical linear model assumptions MLR.1 through MLR.6, ˆβ conditional on X is distributed as multivariate normal with mean β and variance-covariance matrix σ 2 X X 1. That is, ˆβ Normal(β, σ 2 (X X) 1 ) (25) Therefore, ˆβ j β j sd( ˆβ j ) Normal(0, 1) (26) Proof. Wooldridge (2013, p. 113) provides a sketch of the proof for (25). The result in (26) is straightforward; if we substract the mean from a normally distributed random variable and divide by its standard deviation, we get a standard normal variable with mean zero and a standard deviation of 1. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 21 / 33
Inference Sampling distributions of the OLS estimators Theorem: Distribution of t-statistics Under assumptions MLR.1 through MLR.6, Proof. Wooldridge (2013), p. 808. ˆβ j β j se( ˆβ t n k 1 (27) j ) }{{} t-statistic This is an important result for inference. It says that, when we estimate σ in sd( ˆβ j ) by ˆσ which yields se( ˆβ j ), ( ˆβ j β j )/se( ˆβ j ) is t-distributed with n k 1 degrees of freedom. Note that β j is some hypothesized value. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 22 / 33
Inference Testing a single population parameter Pick a significance level and formulate a null-hypothesis (H 0 ): One-sided alternatives: H 0 : β j 0; H 1 : β j > 0. Two-sided alternatives: H 0 : β j = 0; H 1 : β j 0. One-sided alternatives: H 0 : β j > α j; H 1 : β j α j, with α R. t-statistic: t (estimate hypothesized value), according to theorem (27). standard error p-value for t-test: what is the smallest significance level at which H 0 would be rejected? Confidence intervals range of likely values for β: CI ˆβ j ± c se( ˆβ j ) (28) where c is the 97.5 th percentile of the t n k 1 distribution. Economic vs. statistical significance. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 23 / 33
Inference Testing multiple linear restrictions The F -test allows to test for multiple hypotheses. Consider a model with k = 4 independent variables: y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i4 + u i. Suppose you want to test whether x 1, x 2, and x 3 are jointly insignificant Formulate a null-hypothesis: H 0 : β 1 = β 2 = β 3 = 0 H 1 : H 0 is not true. F -statistic: F (SSR r SSR ur )/q SSR ur /(n k 1) F q,n k 1 (29) where q is the number of restrictions, SSR r is the sum of squared residuals from the restricted model and SSR ur is the SSR from the unrestricted model. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 24 / 33
Inference Example We estimate the following model: final i = β 0 + β 1 attend i + β 2 hwrte i + u i (30) where final [10, 39] are final exam points, attend [2, 32] is the number of classes attended out of 32, and hwrte is the percentage of homeworks turned in times 100.. reg final attend hwrte Source SS df MS Number of obs = 674 F( 2, 671) = 9.20 Model 401.109761 2 200.554881 Prob > F = 0.0001 Residual 14623.5445 671 21.793658 R-squared = 0.0267 Adj R-squared = 0.0238 Total 15024.6543 673 22.324895 Root MSE = 4.6684 final Coef. Std. Err. t P> t [95% Conf. Interval] attend.0828712.043704 1.90 0.058 -.0029418.1686842 hwrte.0217245.0119752 1.81 0.070 -.0017889.0452378 _cons 21.8012.9725956 22.42 0.000 19.89151 23.7109 Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 25 / 33
Asymptotics So far, we have looked at finite sample properties of OLS, i.e., properties that hold independent of how large n is. However, it is also important to know large sample or asymptotic properties of OLS. These are defined as sample size grows without bound. An important result is that even without assuming normality (MLR.6), t and F statistics are approximately t and F distributed. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 26 / 33
Asymptotics Consistency In general, an estimator ˆθ is consistent, if it converges in probability 3 to the population parameter θ, that is, ˆθ p θ, or plim ˆθ = θ. Note that unbiasedness does not necessarily imply consistency, and consistency does not automatically imply unbiasedness. Consistency of OLS: OLS is unbiased under assumptions MLR.1 through MLR.4, so ˆβ j is always distributed around β j. The distribution of ˆβ j becomes more and more tightly distributed around β j as the sample size grows. As n, the distribution of ˆβ j collapses to a single point β j. 3 A random variable X n converges to X in probability if for some ε > 0, lim P( X n X ε) = 0 (31) n Note that here the probability converges, not the random variable itself. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 27 / 33
Asymptotics Consistency Figure: Sampling distributions of ˆβ 1 [Source: Wooldridge (2013), Figure 5.1]. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 28 / 33
Asymptotics Consistency Theorem: Consistency of OLS Under assumptions MLR.1 through MLR.4, the OLS estimator ˆβ is consistent for β. Proof. We show consistency for the bivariate case with one regressor β 1, the general proof for k regressors is given in Wooldridge (2013), p. 810. First note that we can also write the OLS estimator ˆβ 1 simply as ˆβ 1 = n i=1 x iy i n i=1 x2 i (32) and after plugging in y i = βx i + u i and some algebra, we obtain ˆβ 1 = β + n 1 n i=1 x iu i n 1 n i=1 x2 i (33) Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 29 / 33
Asymptotics Consistency As n, we have ˆβ 1 p β + plim n 1 n i=1 x iu i plim n 1 n i=1 x2 i By the law of large numbers, 4 we have plim n 1 n i=1 x i = E(x i ) and plim n 1 n i=1 u i = E(u i ). Since we assume zero conditional mean (MLR.4), which obviously implies E(u i ) = 0, we get ˆβ 1 p β + 0 plim n 1 n i=1 x2 i (34) = β (36) This proofs that ˆβ 1 converges to β in probability. 4 Let X 1,..., X n be some sequence of i.i.d. random variables with arbitrary distribution. The law of large number states that X n = n 1 (X 1,..., X n) E(X n) (35) That is, the sample average converges to the expected value. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 30 / 33
Asymptotics Consistency Consistency is related to bias as follows: An estimator ˆθ is consistent iff. it converges to some value θ and the bias, i.e., Bias(θ) = E(ˆθ) θ, converges to zero. Individual estimators in the sequence ˆθ j ˆθ may be biased, but the overall sequence is still consistent if the bias converges to zero. Estimators can be Unbiased but not consistent Biased but consistent Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 31 / 33
Asymptotics Normality Another important large sample property of OLS is that ˆβ j is asymptotically normally distributed under assumptions MLR.1 through MLR.5. Formally, ˆβ j β se( ˆβ j ) where Var( ˆβ j ) is the usual OLS standard error. Result stems from the central limit theorem. 5 a Normal(0, 1) (37) We do not need the normality assumption in MLR.6 for our test statistics to be valid, as long as the sample size is reasonably large. All we have to assume is finite variance; Var(u) <. 5 The central limit theorem states that the standardized sums of i.i.d. random variables converges to a normal distribution, irrespective of their own distributions. Let X 1,..., X n be i.i.d. random variables with finite expected value µ and variance σ 2. Then ) ) 1 nσ ( n i=1 X i nµ = 1 n ( n i=1 X µ σ = X µ σ d n N(0, 1) (38) Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 32 / 33
Literature Main reference: Wooldridge, J. M. (2015). Introductory Econometrics: A Modern Approach, 5th ed., South Western College Publishing. Additional reference: Greene, W. H. (2012). Econometric Analysis, 7th edition, Pearson. Alexander Ahammer (JKU) Module 2: Multivariate Linear Regression 33 / 33