ECON 3150/4150, Spring term Lecture 7

ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23

References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23

Omitted variable bias I Assume that we estimate a simple regression model Y i = β 0 + β 1 X 1i + ε i (1) by OLS, but that another regression model is Y i = β 0 + β 1X 1i + β 2X 2i + ε i (2) where X 2i is a second regressor and ε i has classical properties conditional on both X 1i and X 2i, in particular: E (ε i X 1i ) = 0 i E (ε i X 2i ) = 0 i 3 / 23

Omitted variable bias II and E (Y i X 1i, X 2i ) = β 0 + β 1X 1i + β 2X 2i i.e., linearity of the conditional expectation function. We want to evaluate the OLS estimator ˆβ 1, of (1), in the light of (2). First, show that ˆβ 1 = β 1 + β 2 n i=1(x 1i X 1 )X 2i n i=1(x 1i X 1 ) 2 + n i=1(x 1i X 1 )ε i n i=1(x 1i X 1 ) 2 4 / 23

Omitted variable bias III If X 1i and X 2i have a well defined joint probability function (after Y i has been marginalized out from f (Y i, X 1i,X 2i )) we can use plim algebra (Slutsky s theorem in Lecture 2) to show that plim( ˆβ 1 ) = β 1 + β 2τ 12 (3) τ 12 = cov(x 1, X 2 ) σ 2 X 1 = ρ X1 X 2 σ X2 σ X1 (4) i.e. the population regression coefficient between X 2 and X 1! We can also show (with no extra or stronger assumptions) that: E ( ˆβ 1 ) = β 1 + β 2τ 12 (5) 5 / 23

Omitted variable bias IV Returning to (3) and (4): These expressions are more precise than eq. (6.1) in SW, because ρ Xu = 0 in a regression model under quite mild assumptions. The point is that although E (ε i X 1i ) = 0, there is nothing in the specification of the simple regression model that implies E (ε i X 2i ) = 0. And it is this possibility: That Y is affected by other variables than X 1, that underlies the omitted variable bias. 6 / 23

Omitted variable bias V In summary (assume that causality is settled from theory, to keep that issue out of the way for the time being): With simple regression we are in general not estimating the partial effect of a change in X 1 For that to be true, we must have orthogonal regressors (τ 12 = 0), or that Y is independent of X 2 (β 2 = 0). In general therefore, simple regression gives us the gross (or total ) effect on Y of a change in X 1. 7 / 23

Bi- and multivariate regression I In general, the theory of multiple regression covers a large number of regressors, denoted k, some of them random other deterministic. Still, a lot can be learned by studying the case of k = 2 in detail first. We follow that route, and mention the extension to the case of k > 2 at the end, without going into the matrix algebra needed for the general case. 8 / 23

Model specification The model can be specified by the linear relationship and the set assumptions: Y i = β 0 + β 1 X 1i + β 2 X 2i + ε i i = 1, 2,..., n (6) a. X ji (j = 1, 2), (i = 1, 2,..., n) can be deterministic or random For a deterministic variable we assume that at least two values of the variables are distinct. For random Xs, we assume var(x ji ) = σx 2 j > 0 (j = 1, 2), ρ 2 X 1 X 2 < 1 and no excess kurtosis in the distributions b. E (ε i ) = 0, i c. var (ε i ) = σ 2, i d. cov (ε i, ε j ) = 0, i = j, e. β 0, β 1 and σ 2 are constant parameters For the purpose of statistical inference for finite n: f. ε i IIN ( 0, σ 2). 9 / 23

Comments to specification I a. is formulated to accommodate different data types (random/deterministic, continuous/binary) ρ 2 X 1 X 2 < 1 is a way of saying that the two random variables are truly separate variables. In many presentations, you will find an assumption about absence of exact linear relationships between the variables often called absence of exact collinearity. But this can only occur for the case for deterministic variables, and would be an example of bad model specification, e.g., specifying X 2i as a variables with the number 100 as the value for all i. (An example of the dummy-variable fallacy/pit-fall ). 10 / 23

Comments to specification II For random variables, we can of course be unlucky and draw a sample where rx 2 1 X 2 is very high. But this near exact collinearity is a property of the sample, not of the regression model b.-d. and f. These are the same as in the case with one variable. Since we want a model formulation that allows random explanatory variables they should be interpreted as conditional on X 1i = x 1i and X 2i = x 2i. The explicit conditioning has been omitted to save notation. 11 / 23

OLS estimation Nothing new here: Choose the estimates that minimize S(β 0,β 1,β 2 ) = n i=1 (Y i β 0 β 1 X 1i β 2 X 2i ) 2 (7) or, equivalently, using the same type of re-specification as with simple regression S(α,β 1,β 2 ) = n i=1 (Y i α β 1 (X 1i X 1 ) β 2 (X 2i X 2 )) 2 where α = β 0 + β 1 X 1 + β 2 X 2 Rest of derivation in class 12 / 23

OLS estimates I By solving the 1oc for minimum of S(α,β 1,β 2 ) (or S(β 0,β 1,β 2 ), we obtain, for ˆβ 1 and ˆβ 2 : ˆβ 1 = ˆσ2 X 2 ˆσ Y,X1 ˆσ Y,X2 ˆσ X1,X 2 ˆσ 2 X 1 ˆσ 2 X 2 ˆσ 2 X 1,X 2 (8) ˆβ 2 = ˆσ2 X 1 ˆσ Y,X2 ˆσ Y,X1 ˆσ X1,X 2 ˆσ 2 X 1 ˆσ 2 X 2 ˆσ 2 X 1,X 2 (9) where ˆσ 2 X j (j = 1, 2), ˆσ Y,Xj (j = 1, 2) and ˆσ X1,X 2 are empirical variances and covariances. The estimates for the two versions of the intercepts become: ˆβ 0 = Ȳ + ˆβ 1 X 1 + ˆβ 2 X 2 ˆα = Ȳ 13 / 23

Absence of perfect sample collinearity I It is clear that (8) for ˆβ 1 and (9) for ˆβ 2 require M := ˆσ 2 X 1 ˆσ 2 X 2 ˆσ 2 X 1,X 2 = ˆσ 2 X 1 ˆσ 2 X 2 (1 r 2 X 1 X 2 ) > 0 Cannot have perfect empirical correlation between the two regressors. Must have: ˆσ 2 X 1 > 0, and ˆσ 2 X 2 > 0 and r 2 X 1 X 2 < 1 1 < r X1 X 2 < 1 If any one of these conditions should fail, we have exact (or perfect) collinearity. Absence of perfect collinearity is a requirement about the nature of the sample. 14 / 23

Absence of perfect sample collinearity II The case of r X1 X 2 = 0 also has a name. It is called perfect orthogonality. It does not create any problems in (8) or (9). In practice, the relevant case is 1 < r X1 X 2 < 1, i.e. a degree of collinearity (not perfect) 15 / 23

Unbiasedness Expectation I Conditional on the values of X 1 and X 2, ˆβ 1 is still a random variable because ε i and Y i are random variables. In that interpretation ˆβ 1, ˆβ 2, and ˆβ 0 are estimators and we want to know their expectation, variance, and whether they are consistent or not. Start by considering E ( ˆβ 1 X 1, X 2 ), i.e., conditional on all the values of the two regressors. 16 / 23

Unbiasedness Expectation II Write ˆβ 1 as ˆβ 1 = (ˆσ 2 X2 ˆσ Y,X1 ˆσ Y,X2 ˆσ X1,X 2 ) M then E ( ˆβ 1 X 1, X 2 ) becomes E ( ˆβ 1 X 1, X 2 ) = ˆσ2 X 2 M E (ˆσ Y,X 1 X 1, X 2 ) ˆσ X 1,X 2 M E (ˆσ Y,X 2 X 1, X 2 ) (10) Evaluate this in class, in order to show that E ( ˆβ j ) = E [ E ( ˆβ j X 1, X 2 ) ] = β j, j = 1, 2 (11) since E (ε i X 1, X 2 ) = 0 i is generic for the regression model. 17 / 23

Variance of ˆβ j Variance I Find that (under the classical assumptions of the model): var( ˆβ j X 1, X 2 ) = σ 2 n ˆσ 2 X j [ 1 r 2 X 1,X 2 ], j = 1, 2 (12) and this also holds unconditionally. The BLUE property of the OLS estimators extends to the multivariate case (will no show) The variance (12) is low in samples that are informative about the separate contributions from X 1 and X 2 : ˆσ 2 X j high r 2 X 1,X 2 low 18 / 23

Variance of ˆβ j Variance II var( ˆβ j ) is lowest when r 2 X 1,X 2 = 0, the regressors are orthogonal 19 / 23

Covariance between ˆβ 1 and ˆβ 2 Covariance I In many applications need to know cov( ˆβ 1, ˆβ 2 ). It is easiest to find by starting from the second normal equation ˆβ 1 ˆσ 2 X 1 + ˆβ 2 ˆσ X1,X 2 = ˆσ YX1 When we take (conditional) variance on both sides, we get ˆσ 4 X 1 var( ˆβ 1 ) + ˆσ 2 X 1 X 2 var( ˆβ 2 ) + 2ˆσ 2 X 1 ˆσ X1,X 2 cov ( ˆβ 1, ˆβ 2 ) = 1 n 2 var(ˆσ YX 1 ) The rhs we have from before: n 2 2 σ2 var(ˆσ YX1 ) = n n ˆσ2 X 1 = n 1 σ 2 ˆσ X 2 1 20 / 23

Covariance between ˆβ 1 and ˆβ 2 Covariance II Insertion of expressions for var( ˆβ 1 ) and var( ˆβ 2 ), solving for cov ( ˆβ 1, ˆβ 2 ) gives cov ( ˆβ 1, ˆβ 2 ) = σ 2 n ˆσ X1 X 2 M Algebra details in note on the web-page. 21 / 23

Consistency Consistency of estimators I Show for ˆβ 1 plim ( ( ) ) ) (ˆσ 2 ˆβ 1 = plim X2 ˆσ Y,X1 ˆσ Y,X2 ˆσ X1,X 2 M = plim (ˆσ 2 X 2 ) plim(ˆσ Y,X1 ) plim(ˆσ Y,X2 ) plim(ˆσ X1,X 2 ) plim M Based on the assumptions of the regression model: plim(ˆσ 2 X j ) = σ 2 X j j = 1, 2 plim(ˆσ X1,X 2 ) = σ X1 X 2 plim M = σ 2 X 1 σ 2 X 2 σ 2 X 1,X 2 22 / 23

Consistency Consistency of estimators II plim(ˆσ Y,X1 ) = β 1 σ 2 X 1 + β 2 σ X1 X 2 + plim = β 1 σ 2 X 1 + β 2 σ X1 X 2 plim(ˆσ Y,X2 ) = β 1 σ X1 X 2 + β 2 σ 2 X 2 [ ] n 1 n ε i (X 1i X 1 ) i=1 plim ( ˆβ 1 ) = σ 2 X 2 [ β1 σ 2 X 1 + β 2 σ X1 X 2 ] [ β1 σ X1 X 2 + β 2 σ 2 X 2 ] σx1,x 2 σ 2 X 1 σ 2 X 2 σ 2 X 1,X 2 = β 1(σ 2 X 2 σ 2 X 1 σ 2 X 1 X 2 ) + β 2 σ 2 X 2 σ X1 X 2 β 2 σ 2 X 2 σ X1,X 2 σ 2 X 1 σ 2 X 2 σ 2 X 1,X 2 = β 1 The OLS estimators ˆβ 0 and ˆβ 2 are also consistent 23 / 23