Classical Least Squares Theory

Size: px
Start display at page:

Download "Classical Least Squares Theory"

Transcription

1 Classical Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University October 14, 2012 C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

2 Lecture Outline 1 The Method of Ordinary Least Squares (OLS) Simple Linear Regression Multiple Linear Regression Geometric Interpretations Measures of Goodness of Fit Example: Analysis of Suicide Rate 2 Statistical Properties of the OLS Estimator Classical Conditions Without the Normality Condition With the Normality Condition 3 Hypothesis Testing Tests for Linear Hypotheses Power of the Tests Alternative Interpretation of the F Test Confidence Regions Example: Analysis of Suicide Rate C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

3 Lecture Outline (cont d) 4 Multicollinearity Near Multicollinearity Regression with Dummy Variables Example: Analysis of Suicide Rate 5 Limitation of the Classical Conditions 6 The Method of Generalized Least Squares (GLS) The GLS Estimator Stochastic Properties of the GLS Estimator The Feasible GLS Estimator Heteroskedasticity Serial Correlation Application: Linear Probability Model Application: Seemingly Unrelated Regressions C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

4 Simple Linear Regression Given the variable of interest y, we are interested in finding a function of another variable x that can characterize the systematic behavior of y. y: Dependent variable or regressand x: Explanatory variable or regressor Specifying a linear function of x: α + βx with unknown parameters α and β The non-systematic part is the error: y (α + βx) Together we write: y = α + βx + e(α, β). }{{}}{{} linear model error C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

5 For the specification α + βx, the objective is to find the best fit of the data (y t, x t ), t = 1,..., T. 1 Minimizing a least-squares (LS) criterion function wrt α and β: Q T (α, β) := 1 T T (y t α βx t ) 2. t=1 2 Minimizing a least-absolute-deviation (LAD) criterion wrt α and β: 1 T T y t α βx t. t=1 3 Minimizing asymmetrically weighted absolute deviations: 1 θ y T t α βx t + (1 θ) y t α βx t, t:y t>α βx t t:y t<α βx t with 0 < θ < 1. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

6 For the specification α + βx, the objective is to find the best fit of the data (y t, x t ), t = 1,..., T. 1 Minimizing a least-squares (LS) criterion function wrt α and β: Q T (α, β) := 1 T T (y t α βx t ) 2. t=1 2 Minimizing a least-absolute-deviation (LAD) criterion wrt α and β: 1 T T y t α βx t. t=1 3 Minimizing asymmetrically weighted absolute deviations: 1 θ y T t α βx t + (1 θ) y t α βx t, t:y t>α βx t t:y t<α βx t with 0 < θ < 1. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

7 For the specification α + βx, the objective is to find the best fit of the data (y t, x t ), t = 1,..., T. 1 Minimizing a least-squares (LS) criterion function wrt α and β: Q T (α, β) := 1 T T (y t α βx t ) 2. t=1 2 Minimizing a least-absolute-deviation (LAD) criterion wrt α and β: 1 T T y t α βx t. t=1 3 Minimizing asymmetrically weighted absolute deviations: 1 θ y T t α βx t + (1 θ) y t α βx t, t:y t>α βx t t:y t<α βx t with 0 < θ < 1. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

8 The OLS Estimators The first order conditions (FOCs) of LS minimization are: Q T (α, β) α Q T (α, β) β = 2 T = 2 T T (y t α βx t ) = 0, t=1 T (y t α βx t )x t = 0. t=1 The solutions are known as the ordinary least squares (OLS) estimators: ˆβ T = T t=1 (y t ȳ)(x t x) T t=1 (x t x) 2, ˆα T = ȳ ˆβ T x. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

9 The estimated regression line is ŷ = ˆα T + ˆβ T x, which is the linear model evaluated at ˆα T and ˆβ T, and ê = y ŷ is the error evaluated at ˆα T and ˆβ T and also known as residual. The t-th fitted value of the regression line is ŷ t = ˆα T + ˆβ T x t. The t-th residual is ê t = y t ŷ t = e t (ˆα T, ˆβ T ). ˆβ T characterizes the the predicted change of y, given a change of one unit of x, whereas ˆα T is the predicted y without x. No linear model of the form a + bx can provide a better fit of the data in terms of sum of squared errors. For the OLS method here, we make no assumption on the data, except that x t can not be a constant. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

10 Algebraic Properties Substituting ˆα T and ˆβ T into the FOCs: 1 T T (y t α βx t ) = 0, t=1 1 T T (y t α βx t )x t = 0, t=1 we have the following algebraic results: T t=1 êt = 0. T t=1 êtx t = 0. T t=1 y t = T t=1 ŷt so that ȳ = ŷ. ȳ = ˆα T + ˆβ T x; that is, the estimated regression line must pass through the point ( x, ȳ). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

11 Example: Analysis of Suicide Rate Suppose we want to know how the suicide rate (s) in Taiwan can be explained by unemployment rate (u), GDP growth rate (g), or time (t). The suicide rate is 1/ Data ( ): s = 11.7 with s.d. 3.93; ḡ = 5.94 with s.d. 3.15; ū = 2.97 with s.d Estimation results: ŝ t = g t, R 2 = 0.12; ŝ t = g t 1, R 2 = 0.25; ŝ t = u t, R2 = 0.67; ŝ t = u t 1, R 2 = 0.66; ŝ t = t, R 2 = C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

12 5 Suicide_Rate Real_GDP_Growth_Rate Suicide_Rate Unemployment_Rate (a) Suicide & GDP growth rates (b) Suicide and unemploy. rates C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

13 Multiple Linear Regression With k regressors x 1,..., x k (x 1 is usually the constant one): y = β 1 x β k x k + e(β 1,..., β k ). With data (y t, x t1,..., x tk ), t = 1,..., T, we can write y = Xβ + e(β), (1) where β = (β 1 β 2 β k ), y = y 1 y 2. y T, X = x 11 x 12 x 1k x 21 x 22 x 2k x T 1 x T 2 x Tk, e(β) = e 1 (β) e 2 (β). e T (β). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

14 Least-squares criterion function: Q T (β) := 1 T e(β) e(β) = 1 T (y Xβ) (y Xβ). (2) The FOCs of minimizing Q T (β) are 2X (y Xβ)/T = 0, leading to the normal equations: X Xβ = X y. Identification Requirement [ID-1]: X is of full column rank k. Any column of X is not a linear combination of other columns. Intuition: X does not contain redundant information. When When X is not of full column rank, we say there exists exact multicollinearity among regressors. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

15 Given [ID-1], X X is positive definite and hence invertible. The unique solution to the normal equations is known as the OLS estimator of β: ˆβ T = (X X) 1 X y. (3) Under [ID-1], we have the second order condition: 2 β Q T (β) = 2(X X)/T is p.d. The result below holds whenever the identification requirement is satisfied, and it does not depend on the statistical properties of y and X. Theorem 3.1 Given specification (1), suppose [ID-1] holds. Then, the OLS estimator ˆβ T = (X X) 1 X y uniquely minimizes the criterion function (2). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

16 The magnitude of ˆβ T is affected by the measurement units of the dependent and explanatory variables; see homework. Thus, a larger coefficient does not imply that the associated regressor is more important. Given ˆβ T, the vector of the OLS fitted values is ŷ = Xˆβ T, and the vector of the OLS residuals is ê = y ŷ = e(ˆβ T ). Plugging ˆβ T into the FOCs X (y Xβ) = 0, we have: X ê = 0. When X contains a vector of ones, T t=1 êt = 0. ŷ ê = ˆβ T X ê = 0. These are all algebraic results under the OLS method. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

17 Geometric Interpretations Recall that P = X(X X) 1 X is the orthogonal projection matrix that projects vectors onto span(x), and I T P is the orthogonal projection matrix that projects vectors onto span(x), the orthogonal complement of span(x). Thus, PX = X and (I T P)X = 0. The vector of fitted values, ŷ = Xˆβ T = X(X X) 1 X y = Py, is the orthogonal projection of y onto span(x). The residual vector, ê = y ŷ = (I T P)y, is the orthogonal projection of y onto span(x). ê is orthogonal to X, i.e., X ê = 0, and it is also orthogonal to ŷ because ŷ is in span(x), i.e., ŷ ê = 0. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

18 y x 2 ê = (I P )y x 2 ˆβ2 P y = x 1 ˆβ1 + x 2 ˆβ2 x 1 ˆβ1 x 1 Figure: The orthogonal projection of y onto span(x 1,x 2 ). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

19 Theorem 3.3 (Frisch-Waugh-Lovell) Given y = X 1 β 1 + X 2 β 2 + e, the OLS estimators of β 1 and β 2 are ˆβ 1,T = [X 1(I P 2 )X 1 ] 1 X 1(I P 2 )y, ˆβ 2,T = [X 2(I P 1 )X 2 ] 1 X 2(I P 1 )y, where P 1 = X 1 (X 1X 1 ) 1 X 1 and P 2 = X 2 (X 2X 2 ) 1 X 2. This result shows that ˆβ 1,T can be computed from regressing (I P 2 )y on (I P 2 )X 1, where (I P 2 )y and (I P 2 )X 1 are the residual vectors of y on X 2 and X 1 on X 2, respectively. Similarly, regressing (I P 1 )y on (I P 1 )X 2 yields ˆβ 2,T. The OLS estimator of regressing y on X 1 is not the same as ˆβ 1,T, unless X 1 and X 2 are orthogonal to each other. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

20 Proof: Writing y = X 1 ˆβ 1,T + X 2 ˆβ 2,T + (I P)y, where P = X(X X) 1 X with X = [X 1 X 2 ], we have X 1(I P 2 )y = X 1(I P 2 )X 1 ˆβ 1,T + X 1(I P 2 )X 2 ˆβ 2,T + X 1(I P 2 )(I P)y = X 1(I P 2 )X 1 ˆβ 1,T + X 1(I P 2 )(I P)y. We know span(x 2 ) span(x), so that span(x) span(x 2 ). Hence, (I P 2 )(I P) = I P, and X 1(I P 2 )y = X 1(I P 2 )X 1 ˆβ 1,T + X 1(I P)y = X 1(I P 2 )X 1 ˆβ1,T, from which we obtain the expression for ˆβ 1,T. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

21 Frisch-Waugh-Lovell Theorem Observe that (I P 1 )y = (I P 1 )X 2 ˆβ 2,T + (I P 1 )(I P)y. (I P 1 )(I P) = I P, so that the residual vector of regressing (I P 1 )y on (I P 1 )X 2 is identical to the residual vector of regressing y on X = [X 1 X 2 ]: (I P 1 )y = (I P 1 )X 2 ˆβ2,T + (I P)y. P 1 = P 1 P, so that the orthogonal projection of y directly on span(x 1 ) (i.e., P 1 y) is equivalent to iterated projections of y on span(x) and then on span(x 1 ) (i.e., P 1 Py). Hence, (I P 1 )X 2 ˆβ 2,T = (I P 1 )Py = (P P 1 )y. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

22 y x 2 ê = (I P)y Py (I P 1 )y (P P 1 )y P 1 y x 1 Figure 1: An Illustration of the Frisch-Waugh-Lovell Theorem. Figure: An illustration of the Frisch-Waugh-Lovell Theorem. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

23 Measures of Goodness of Fit Given ŷ ê = 0, we have y y = ŷ ŷ + ê ê, where y y is known as TSS (total sum of squares), ŷ ŷ is RSS (regression sum of squares), and ê ê is ESS (error sum of squares). The non-centered coefficient of determination (or non-centered R 2 ), R 2 = RSS TSS = 1 ESS TSS, (4) measures the proportion of the total variation of y t that can be explained by the model. It is invariant wrt measurement units of the dependent variable but not invariant wrt constant addition. It is a relative measure such that 0 R 2 1. It is nondecreasing in the number of regressors. (Why?) C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

24 Centered R 2 When the specification contains a constant term, T (y t ȳ) 2 = t=1 }{{} centered TSS T (ŷ t ŷ) T 2 + t=1 }{{} centered RSS t=1 ê 2 t }{{} ESS The centered coefficient of determination (or centered R 2 ),. R 2 = T t=1 (ŷ t ȳ) 2 Centered RSS T t=1 (y = t ȳ) 2 Centered TSS = 1 ESS Centered TSS, measures the proportion of the total variation of y t that can be explained by the model, excluding the effect of the constant term. It is invariant wrt constant addition. 0 R 2 1, and it is non-decreasing in the number of regressors. It may be negative when the model does not contain a constant term. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

25 Centered R 2 : Alternative Interpretation When the specification contains a constant term, T (y t ȳ)(ŷ t ȳ) = t=1 T (ŷ t ȳ + ê t )(ŷ t ȳ) = t=1 because T t=1 ŷtê t = t t=1 êt = 0. Centered R 2 can also be expressed as R 2 = T t=1 (ŷ t ȳ) 2 T t=1 (y t ȳ) 2 = T (ŷ t ȳ) 2, t=1 [ T t=1 (y t ȳ)(ŷ t ȳ)] 2 [ T t=1 (y t ȳ) 2 ][ T t=1 (ŷ t ȳ) 2 ], which is the the squared sample correlation coefficient of y t and ŷ t, also known as the squared multiple correlation coefficient. Models for different dep. variables are not comparable in terms of R 2. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

26 Adjusted R 2 Adjusted R 2 is the centered R 2 adjusted for the degrees of freedom: R 2 = 1 ê ê/(t k) (y y T ȳ 2 )/(T 1). R 2 adds a penalty term to R 2 : R 2 = 1 T 1 T k (1 R2 ) = R 2 k 1 T k (1 R2 ), where the penalty term depends on the trade-off between model complexity and model explanatory ability. R 2 may be negative and need not be non-decreasing in k. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

27 Example: Analysis of Suicide Rate Q: How the suicide rate (s) can be explained by unemployment rate (u), GDP growth rate (g), and time (t) during ? Estimation results with g t and u t : ŝ t = g t, R 2 = 0.12; ŝ t = u t, R 2 = 0.67; ŝ t = u t g t, R 2 = Estimation results with g t 1 and u t 1 : ŝ t = g t 1, R2 = 0.25; ŝ t = u t 1, R 2 = 0.66; ŝ t = u t g t 1, R 2 = C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

28 Estimation results with t but without g: ŝ t = u t, R2 = 0.67; ŝ t = u t 0.01 t, R 2 = 0.66; ŝ t = u t 1, R 2 = 0.66; ŝ t = u t t, R 2 = Estimation results with t and g: ŝ t = u t g t, R 2 = 0.66; ŝ t = u t g t t, R2 = 0.65; ŝ t = u t g t 1, R 2 = 0.65; ŝ t = u t g t t, R 2 = C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

29 Estimation results with t and t 2 : ŝ t = t, R 2 = 0.39; ŝ t = t t 2, R 2 = 0.81; ŝ t = u t 0.75 t t 2, R2 = 0.84; ŝ t = g t 0.87 t t 2, R 2 = 0.81; ŝ t = u t 0.03 g t 0.76 t t 2, R 2 = 0.84; ŝ t = u t t t 2, R 2 = 0.85; ŝ t = g t t t 2, R2 = 0.80; ŝ t = u t g t t t 2, R2 = As far as R 2 is concerned, a specification with t, t 2, and u seems to provide good fit of data and reasonable interpretation. Q: Is there any other way to determine if a specification is good? C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

30 Classical Conditions To derive the statistical properties of the OLS estimator, we assume: [A1] [A2] X is non-stochastic. y is a random vector such that (i) IE(y) = Xβ o for some β o ; (ii) var(y) = σ 2 oi T for some σ 2 o > 0. [A3] y is a random vector s.t. y N (Xβ o, σ 2 oi T ) for some β o and σ 2 o > 0. The specification (1) with [A1] and [A2] is known as the classical linear model, whereas (1) with [A1] and [A3] is the classical normal linear model. When var(y) = σoi 2 T, the elements of y are homoskedastic and (serially) uncorrelated. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

31 Without Normality The OLS estimator of the parameter σ 2 o is an average of squared residuals: ˆσ 2 T = 1 T k T êt 2. t=1 Theorem 3.4 Consider the linear specification (1). (a) Given [A1] and [A2](i), ˆβ T is unbiased for β o. (b) Given [A1] and [A2], ˆσ T 2 is unbiased for σ2 o. (c) Given [A1] and [A2], var(ˆβ T ) = σo(x 2 X) 1. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

32 Proof: By [A1], IE(ˆβ T ) = IE[(X X) 1 X y] = (X X) 1 X IE(y). [A2](i) gives IE(y) = Xβ o, so that IE(ˆβ T ) = (X X) 1 X Xβ o = β o, proving unbiasedness. Given ê = (I T P)y = (I T P)(y Xβ o ), IE(ê ê) = IE[trace ( (y Xβ o ) (I T P)(y Xβ o ) )] = IE[trace ( (y Xβ o )(y Xβ o ) (I T P) )] = trace ( IE[(y Xβ o )(y Xβ o ) ](I T P) ) = trace ( σoi 2 T (I T P) ) = σo 2 trace(i T P). where the 4-th equality follows from [A2](ii) that var(y) = σoi 2 T. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

33 Proof (cont d): As trace(i T P) = rank(i T P) = T k, we have IE(ê ê) = σo(t 2 k) and IE(ˆσ T 2 ) = IE(ê ê)/(t k) = σo, 2 proving (b). By [A1] and [A2](ii), var(ˆβ T ) = var ( (X X) 1 X y ) = (X X) 1 X [var(y)]x(x X) 1 = σo(x 2 X) 1 X I T X(X X) 1 = σo(x 2 X) 1. This establishes the assertion of (c). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

34 Theorem 3.4 establishes unbiasedness of the OLS estimators ˆβ T and ˆσ 2 T but does not address the issue of efficiency. By Theorem 3.4(c), the elements of ˆβ T can be more precisely estimated (i.e., with a smaller variance) when X has larger variation. To see this, consider the simple linear regression: y = α + βx + e, it can be verified that var( ˆβ T ) = σ 2 o 1 T t=1 (x t x) 2. Thus, the larger the (squared) variation of x t (i.e., T t=1 (x t x) 2 ), the smaller is the variance of ˆβ T. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

35 The result below establishes efficiency of ˆβ T among all unbiased estimators of β o that are linear in y. Theorem 3.5 (Gauss-Markov) Given linear specification (1), suppose that [A1] and [A2] hold. Then the OLS estimator ˆβ T is the best linear unbiased estimator (BLUE) for β o. Proof: Consider an arbitrary linear estimator ˇβ T = Ay, where A is a non-stochastic matrix, say, A = (X X) 1 X + C. Then, ˇβ T = ˆβ T + Cy, such that var(ˇβ T ) = var(ˆβ T ) + var(cy) + 2 cov(ˆβ T, Cy). By [A1] and [A2](i), IE(ˇβ T ) = β o + CXβ o, which is unbiased iff CX = 0. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

36 Proof (cont d): The condition CX = 0 implies cov(ˆβ T, Cy) = 0. Thus, var(ˇβ T ) = var(ˆβ T ) + var(cy) = var(ˆβ T ) + σ 2 occ. This shows that var(ˇβ T ) var(ˆβ T ) is a p.s.d. matrix σ 2 occ, so that ˆβ T is more efficient than any linear unbiased estimator ˇβ T. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

37 Example: IE(y) = X 1 b 1 and var(y) = σoi 2 T. Two specification: y = X 1 β 1 + e. with the OLS estimator ˆb 1,T, and y = Xβ + e = X 1 β 1 + X 2 β 2 + e. with the OLS estimator ˆβ T = (ˆβ ˆβ 1,T 2,T ). Clearly, ˆb 1,T is the BLUE of b 1 with var(ˆb 1,T ) = σo(x 2 1X 1 ) 1. By the Frisch-Waugh-Lovell Theorem, IE(ˆβ 1,T ) = IE ( [X 1(I T P 2 )X 1 ] 1 X 1(I T P 2 )y ) = b 1, IE(ˆβ 2,T ) = IE ( [X 2(I T P 1 )X 2 ] 1 X 2(I T P 1 )y ) = 0. That is, ˆβ T is unbiased for (b 1 0 ). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

38 Example (cont d): var(ˆβ 1,T ) = var([x 1(I T P 2 )X 1 ] 1 X 1(I T P 2 )y) = σo[x 2 1(I T P 2 )X 1 ] 1. As X 1X 1 X 1(I T P 2 )X 1 = X 1P 2 X 1 is p.s.d., it follows that [X 1(I T P 2 )X 1 ] 1 (X 1X 1 ) 1 is p.s.d. This shows that ˆb 1,T is more efficient than ˆβ 1,T, as it ought to be. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

39 With Normality Under [A3] that y N (Xβ o, σ 2 oi T ), the log-likelihood function of y is log L(β, σ 2 ) = T 2 log(2π) T 2 log σ2 1 2σ 2 (y Xβ) (y Xβ). The score vector is s(β, σ 2 ) = 1 σ 2 X (y Xβ) T 2σ σ 4 (y Xβ) (y Xβ) Solutions to s(β, σ 2 ) = 0 are the (quasi) maximum likelihood estimators (MLEs). Clearly, the MLE of β is the OLS estimator, and the MLE of σ 2 is. σ 2 T = (y Xˆβ T ) (y Xˆβ T ) T = ê ê T ˆσ2 T. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

40 With the normality condition on y, a lot more can be said about the OLS estimators. Theorem 3.7 Given the linear specification (1), suppose that [A1] and [A3] hold. (a) ˆβ T N ( β o, σo(x 2 X) 1). (b) (T k)ˆσ T 2 /σ2 o χ 2 (T k). (c) ˆσ T 2 has mean σ2 o and variance 2σo/(T 4 k). Proof: For (a), we note that ˆβ T is a linear transformation of y N (Xβ o, σoi 2 T ) and hence also a normal random vector. As for (b), writing ê = (I T P)(y Xβ o ), we have (T k)ˆσ T 2 /σ2 o = ê ê/σo 2 = y (I T P)y, where y = (y Xβ o )/σ o N (0, I T ) by [A3]. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

41 Proof (cont d): Let C orthogonally diagonalizes I T P such that C (I T P)C = Λ. Since rank(i T P) = T k, Λ contains T k eigenvalues equal to one and k eigenvalues equal to zero. Then, [ ] y (I T P)y = y C[C (I T P)C]C y = η I T k 0 η. 0 0 where η = C y. As η N (0, I T ), η i are independent, standard normal random variables. It follows that y (I T P)y = T k i=1 η 2 i χ 2 (T k), proving (b). (c) is a direct consequence of (b) and the facts that χ 2 (T k) has mean T k and variance 2(T k). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

42 Theorem 3.8 Given the linear specification (1), suppose that [A1] and [A3] hold. Then the OLS estimators ˆβ T and ˆσ T 2 are the best unbiased estimators (BUE) for β o and σo, 2 respectively. Proof: The Hessian matrix of the log-likelihood function is H(β, σ 2 ) = 1 X X σ 2 1 X (y Xβ) σ 4. 1 (y Xβ) X T 1 (y Xβ) (y Xβ) σ 4 2σ 4 σ 6 Under [A3], IE[s(β o, σo)] 2 = 0 and IE[H(β o, σo)] 2 = 1 X X 0 σo 2 0 T 2σo 4. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

43 Proof (cont d): By the information matrix equality, IE[H(β o, σ 2 o)] is the information matrix. Then, its inverse, IE[H(β o, σo)] 2 1 = σ2 o(x X) is the Cramér-Rao lower bound. var(ˆβ T ) achieves this lower bound (the upper-left block) so that ˆβ T is the best unbiased estimator for β o. This conclusion is much 2σ 4 o T stronger than the Gauss-Markov Theorem. Although var(ˆσ 2 T ) = 2σ4 o/(t k) is greater than the lower bound (lower-right element), it can be shown that ˆσ T 2 is still the best unbiased estimator for σo; 2 see Rao (1973, p. 319) for a proof., C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

44 Tests for Linear Hypotheses Linear hypothesis: Rβ o = r, where R is q k with full row rank q and q < k, r is a vector of hypothetical values. A natural way to construct a test statistic is to compare Rˆβ T and r; we reject the null if their difference is too large. Given [A1] and [A3], Theorem 3.7(a) states: ˆβ T N ( β o, σo(x 2 X) 1), so that Rˆβ T N (Rβ o, σ 2 o[r(x X) 1 R ]). The comparison between Rˆβ T and r is based on this distribution. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

45 Suppose first that q = 1. Then, Rˆβ T and R(X X) 1 R are scalars. Under the null hypothesis, Rˆβ T r σ o [R(X X) 1 R ] 1/2 = R(ˆβ T β o ) σ o [R(X X) 1 R N (0, 1). ] 1/2 An operational statistic is obtained by replacing σ o with ˆσ T : τ = Rˆβ T r ˆσ T [R(X X) 1 R ] 1/2. Theorem 3.9 Given the linear specification (1), suppose that [A1] and [A3] hold. When R is 1 k, τ t(t k) under the null hypothesis. Note: The normality condition [A3] is crucial for this t distribution result. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

46 Proof: We write the statistic τ as / Rˆβ τ = T r (T k)ˆσ σ o [R(X X) 1 R T 2 /σ2 o ] 1/2, T k where the numerator is N (0, 1) and (T k)ˆσ 2 T /σ2 o is χ 2 (T k) by Theorem 3.7(b). The assertion follows when the numerator and denominator are independent. This is indeed the case, because ˆβ T and ê are jointly normally distributed with cov(ê, ˆβ T ) = IE[(I T P)(y Xβ o )y X(X X) 1 ] = (I T P) IE[(y Xβ o )y ]X(X X) 1 = σ 2 o(i T P)X(X X) 1 = 0. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

47 Examples To test β i = c, let R = [ ] and m ij be the (i, j) th element of M 1 = (X X) 1. Then, τ = ˆβ i,t c ˆσ T m ii t(t k), where m ii = R(X X) 1 R. τ is a t statistic; for testing β i = 0, τ is also referred to as the t ratio. It is straightforward to verify that to test aβ i + bβ j = c, with a, b, c given constants, the corresponding test reads: τ = a ˆβ i,t + b ˆβ j,t c ˆσ T [a 2 m ii + b 2 m jj + 2abm ij ] t(t k). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

48 When R is a q k matrix with full row rank, note that (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/σ 2 o χ 2 (q). An operational statistic is ϕ = (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/(σ 2 oq) (T k)ˆσ 2 T /[σ2 o(t k)] = (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r) ˆσ 2 T q. It is clear that ϕ = τ 2 when q = 1. Theorem 3.10 Given the linear specification (1), suppose that [A1] and [A3] hold. When R is q k with full row rank, ϕ F (q, T k) under the null hypothesis. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

49 Example: H o : β 1 = b 1 and β 2 = b 2. The F statistic, ϕ = 1 2ˆσ 2 T ( ˆβ 1,T b 1 ˆβ 2,T b 2 ) [ m 11 m 12 m 21 m 22 ] 1 ( ˆβ 1,T b 1 ˆβ 2,T b 2 ), is distributed as F (2, T k). Example: H o : β 2 = 0, and β 3 = 0, and β k = 0, ϕ = 1 (k 1)ˆσ T 2 ˆβ 2,T ˆβ 3,T. ˆβ k,t m 22 m 23 m 2k m 32 m 33 m 3k.. m k2 m k3 m kk 1 ˆβ 2,T ˆβ 3,T. ˆβ k,t, is distributed as F (k 1, T k) and known as regression F test. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

50 Test Power To examine the power of the F test, we evaluate the distribution of ϕ under the alternative hypothesis: Rβ o = r + δ, with R is a q k matrix with rank q < k and δ 0. Theorem 3.11 Given the linear specification (1), suppose that [A1] and [A3] hold. When Rβ o = r + δ, ϕ F (q, T k; δ D 1 δ, 0), where D = σo[r(x 2 X) 1 R ], and δ D 1 δ is the non-centrality parameter of the numerator of ϕ. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

51 Proof: When Rβ o = r + δ, [R(X X) 1 R ] 1/2 (Rˆβ T r)/σ o = D 1/2 [R(ˆβ T β o ) + δ], which is distributed as N (0, I q ) + D 1/2 δ. Then, (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r)/σo 2 χ 2 (q; δ D 1 δ), a non-central χ 2 distribution with the non-centrality parameter δ D 1 δ. It is also readily seen that (T k)ˆσ T 2 /σ2 o is still distributed as χ 2 (T k). Similar to the argument before, these two terms are independent, so that ϕ has a non-central F distribution. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

52 Test power is determined by the non-centrality parameter δ D 1 δ, where δ signifies the deviation from the null. When Rβ o deviates farther from the hypothetical value r (i.e., δ is large ), the non-centrality parameter δ D 1 δ increases, and so does the power. Example: The null distribution is F (2, 20), and its critical value at 5% level is Then for F (2, 20; ν 1, 0) with the non-centrality parameter ν 1 = 1, 3, 5, the probabilities that ϕ exceeds 3.49 are approximately 12.1%, 28.2%, and 44.3%, respectively. Example: The null distribution is F (5, 60), and its critical value at 5% level is Then for F (5, 60; ν 1, 0) with ν 1 = 1, 3, 5, the probabilities that ϕ exceeds 2.37 are approximately 9.4%, 20.5%, and 33.2%, respectively. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

53 Alternative Interpretation Constrained OLS: Finding the saddle point of the Lagrangian: min β,λ 1 T (y Xβ) (y Xβ) + (Rβ r) λ, where λ is the q 1 vector of Lagrangian multipliers, we have λ T = 2[R(X X/T ) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (X X/T ) 1 R λt /2. The constrained OLS residuals are ë = y Xˆβ T + X(ˆβ T β T ) = ê + X(ˆβ T β T ), with ˆβ T β T = (X X) 1 R [R(X X) 1 R ] 1 (Rˆβ T r). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

54 The sum of squared, constrained OLS residuals are: ë ë = ê ê + (ˆβ T β T ) X X(ˆβ T β T ) = ê ê + (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r), where the 2nd term on the RHS is the numerator of the F statistic. Letting ESS c = ë ë and ESS u = ê ê we have ϕ = ë ë ê ê qˆσ 2 T = (ESS c ESS u )/q, ESS u /(T k) suggesting that F test in effect compares the constrained and unconstrained models based on their lack-of-fitness. The regression F test is thus ϕ = (R2 u Rc 2 )/q which compares model (1 Ru 2 )/(T k) fitness of the full model and the model with only a constant term. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

55 The sum of squared, constrained OLS residuals are: ë ë = ê ê + (ˆβ T β T ) X X(ˆβ T β T ) = ê ê + (Rˆβ T r) [R(X X) 1 R ] 1 (Rˆβ T r), where the 2nd term on the RHS is the numerator of the F statistic. Letting ESS c = ë ë and ESS u = ê ê we have ϕ = ë ë ê ê qˆσ 2 T = (ESS c ESS u )/q, ESS u /(T k) suggesting that F test in effect compares the constrained and unconstrained models based on their lack-of-fitness. The regression F test is thus ϕ = (R2 u Rc 2 )/q which compares model (1 Ru 2 )/(T k) fitness of the full model and the model with only a constant term. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

56 Confidence Regions A confidence interval for β i,o is the interval (g α, g α ) such that IP{ g α β i,o g α } = 1 α, where (1 α) is known as the confidence coefficient. Letting c α/2 be the critical value of t(t k) with tail prob. α/2, ( ) IP{ ˆβi,T β i,o / (ˆσT ) } m ii cα/2 { IP ˆβ i,t c α/2ˆσ T m ii β i,o ˆβ } i,t + c α/2ˆσ T m ii = 1 α. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

57 The confidence region for a vector of parameters can be constructed by resorting to F statistic. For (β 1,o = b 1, β 2,o = b 2 ), suppose T k = 30 and α = Then, F 0.05 (2, 30) = 3.32, and ( ) [ ] 1 ( ) 1 ˆβ IP 1,T b 1 m 11 m 12 ˆβ 1,T b 1 2ˆσ T 2 ˆβ 2,T b 2 m 21 m ˆβ 2,T b 2 is 1 α, which results in an ellipse with the center ( ˆβ 1,T, ˆβ 2,T ). Note: It is possible that (β 1, β 2 ) is outside the confidence box formed by individual confidence intervals but inside the joint confidence ellipse. That is, while a t ratio may indicate statistic significance of a coefficient, the F test may suggest the opposite based on the confidence region. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

58 Example: Analysis of Suicide Rate Part I: Estimation results with t const u t u t 1 t R 2 Reg F (4.39 ) (7.78 ) (4.32 ) (4.81 ) ( 0.07) (4.60 ) (7.62 ) (4.51 ) (4.65 ) (0.10) Note: The numbers in parentheses are t-ratios; and stand for significance of a two-sided test at 1% and 5% levels. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

59 Part II: Estimation results with t and g const u t u t 1 g t g t 1 t R2 Reg F (2.26 ) (6.80 ) (0.29) (2.21 ) (4.62 ) (0.28) ( 0.04) (2.09 ) (5.73 ) (0.08) (2.00 ) (4.25 ) (0.10) (0.11) F tests for the joint significance of the coefficients of g and t: 0.04 (Model 2) and 0.01 (Model 4). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

60 Part III: Estimation results with t and t 2 const u t u t 1 g t g t 1 t t 2 R2 /F (13.30 ) ( 5.74 ) (7.90 ) (8.21 ) (2.61 ) ( 5.33 ) (5.70 ) (6.27 ) (2.38 ) ( 0.24) ( 5.21 ) (5.59 ) (8.83 ) (2.85 ) ( 5.57 ) (6.05 ) (5.83 ) (3.11 ) (1.28) ( 5.74 ) (6.26 ) F tests for the joint significance of the coefficients of g and t: (Model 3) and (Model 5). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

61 Selected estimation results: ŝ t = u t 0.75 t t 2, R 2 = 0.84; ŝ t = u t t t 2, R 2 = The marginal effect of u on s: The second model predicts an increase of this year s suicide rate by 1.15 (approx. 264 persons) when there is one percent increase of last year s unemployment rate. The time effect is t and changes with t: At 2010, this effect is approx 1.04 (approx. 239 persons). Since 1993 (about 12.6 years after 1980), there has been a natural increase of the suicide rate in Taiwan. Lowering unemployment rate would help cancel out the time effect to some extent. The predicted suicide rate in 2010 is (vs. actual suicide rate 16.8); the difference, approx persons, seems too large. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

62 Near Multicollinearity It is more common to have near multicollinearity: Xa 0. Writing X = [x i X i ], we have from the FWL Theorem that var( ˆβ i,t ) = σ 2 o[x i(i P i )x i ] 1 = σ 2 o T t=1 (x ti x i ) 2 (1 R 2 (i)), where P i = X i (X i X i) 1 X i, and R2 (i) is the centered R 2 from regressing x i on X i. Consequence of near multicollinearity: R 2 (i) is high, so that var( ˆβ i,t ) tend to be large and that ˆβ i,t are sensitive to data changes. Large var( ˆβ i,t ) lead to small (insignificant) t ratios. Yet, regression F test may suggest that the model (as a whole) is useful. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

63 How do we circumvent the problems from near multicollinearity? Try to break the approximate linear relation. Adding more data if possible. Dropping some regressors. Statistical approaches: Ridge regression: For some λ 0, ˆb ridge = (X X + λi k ) 1 X y. Principal component regression: Note: Multicollinearity vs. micronumerosity (Goldberger) C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

64 Digression: Regression with Dummy Variables Example: Let y t denote the wage of the t th individual and x t the working experience (in years). Consider the following specification: y t = α 0 + α 1 D t + β 0 x t + e t, where D t is a dummy variable such that D t = 1 if t is a male and D t = 0 otherwise. This specification puts together two regressions: Regression for female: D t = 0, and intercept is α 0. Regression for male: D t = 1, and intercept os α 0 + α 1. These two regressions coincide if α 1 = 0. Testing no wage discrimination against female amounts to testing the hypothesis of α 1 = 0. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

65 We may also consider the specification with a dummy variable and its interaction with a regressor: y t = α 0 + α 1 D t + β 0 x t + β 1 (x t D t ) + e t. Then, the slopes of the regressions for female and male are, respectively, β 0 and β 0 + β 1. These two regressions coincide if α 1 = 0 and β 1 = 0. In this case, testing no wage discrimination against female amounts to testing the joint hypothesis of α 1 = 0 and β 1 = 0. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

66 Example: Consider two dummy variables: D 1,t = 1 if high school is t s highest degree and D 1,t=0 otherwise; D 2,t = 1 if college or graduate is t s highest degree and D 2,t=0 otherwise. The specification below in effect puts together 3 regressions: y t = α 0 + α 1 D 1,t + α 2 D 2,t + βx t + e t, where below-high-school regression has intercepts α 0, high-school regression has intercept α 0 + α 1, college regression has intercept α 0 + α 2. Similar to the previous example, we may also consider a more general specification in which x interacts with D 1 and D 2. Dummy variable trap: To avoid exact multicollinearity, the number of dummy variables in a model (with the constant term) should be one less than the number of groups. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

67 Example: Analysis of Suicide Rate Let D t = 1 for t = T + 1,..., T and D t = 0 otherwise, where T is the year of structure change. Consider the specification: s t = α 0 + δd t + β 0 u t 1 + γu t 1 D t + e t. The before-change regression has intercept α 0 and slope β 0, and the after-change regression has intercept α 0 + δ and slope β 0 + γ. Testing a structure change at T amounts to testing δ = 0 and γ = 0 (Chow test). Alternatively, we can estimate the specification: s t = α 0 (1 D t ) + α 1 D t + β 0 u t 1 (1 D t ) + β 1 u t 1 D t + e t. We can also test a structure change at T by testing α 0 = α 1 and β 0 = β 1. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

68 Part I: Estimation results with a known change: Without t T const D t u t 1 u t 1 D t R 2 /Reg F Chow (2.77 ) ( 1.07) (1.15) (1.00) (2.51 ) ( 0.59) (1.45) (0.65) (2.41 ) ( 0.25) (1.66) (0.41) (2.38 ) (0.01) (1.75) (0.24) Chow test is the F test of the coefficients of D t and u t 1 D t being zero. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

69 Part II: Estimation results with a known change: With t T const D t u t 1 t td t R2 /Reg F (12.02 ) ( 8.44 ) (1.19) ( 5.58 ) (8.78 ) (12.12 ) ( 8.02 ) (1.18) ( 6.18 ) (8.92 ) (11.82 ) ( 7.49 ) (1.05) ( 6.23 ) (8.65 ) (11.11 ) ( 6.70 ) (0.91) ( 5.85 ) (8.04 ) F test of the coefficients of D t and td t being zero: ( 92); ( 93); ( 94); ( 95) C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

70 Selected estimation results: 1992 : ŝ t = D t u t t td t ; 1993 : ŝ t = D t u t t td t ; 1994 : ŝ t = D t u t t td t. There appears to be a structural change over time. For T = 1993, the before-change slope is 0.54 (a decrease over time), and the after-change slope is 0.68 (an increase over time). The marginal effect of u t 1 on s t is not significant even at 10% level. These models predict the suicide rate in 2010 as 19.83, 19.8 and 19.75, whereas the prediction based on the quadratic trend model is For the model with T = 1993, the difference between the predicted and actual suicide rates is 3.0 (approx. 690 persons). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

71 Limitation of the Classical Conditions [A1] X is non-stochastic: Economic variables can not be regarded as non-stochastic; also, lagged dependent variables may be used as regressors. [A2](i) IE(y) = Xβ o : IE(y) may be a linear function with more regressors or a nonlinear function of regressors. [A2](ii) var(y) = σoi 2 T : The elements of y may be correlated (serial correlation, spatial correlation) and/or may have unequal variances. [A3] Normality: y may have a non-normal distribution. The OLS estimator loses the properties derived before when some of the classical conditions fail to hold. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

72 When var(y) σ 2 oi T Given the linear specification y = Xβ + e, suppose, in addition to [A1] and [A2](i), var(y) = Σ o σoi 2 T, where Σ o is p.d. That is, the elements of y may be correlated and have unequal variances. The OLS estimator ˆβ T remains unbiased with var(ˆβ T ) = var((x X) 1 X y) = (X X) 1 X Σ o X(X X) 1. ˆβ T is not the BLUE for β o, and it is not the BUE for β o under normality. The estimator var(ˆβ T ) = ˆσ T 2 (X X) 1 is a biased estimator for var(ˆβ T ). Consequently, the t and F tests do not have t and F distributions, even when y is normally distributed. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

73 The GLS Estimator Consider the specification: Gy = GXβ + Ge, where G is nonsingular and non-stochastic. IE(Gy) = GXβ o and var(gy) = GΣ o G. GX has full column rank so that the OLS estimator can be computed: b(g) = (X G GX) 1 X G Gy, which is still linear and unbiased. It would be the BLUE provided that G is chosen such that GΣ o G = σ 2 oi T. Setting G = Σo 1/2, where Σ 1/2 o = CΛ 1/2 C and C orthogonally diagonalizes Σ o : C Σ o C = Λ, we have Σ 1/2 o Σ o Σo 1/2 = I T. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

74 With y = Σo 1/2 y and X = Σ 1/2 o X, we have the GLS estimator: ˆβ GLS = (X X ) 1 X y = (X Σ 1 o X) 1 (X Σ 1 o y). (5) The ˆβ GLS is a minimizer of weighted sum of squared errors: Q(β; Σ o ) = 1 T (y X β) (y X β) = 1 T (y Xβ) Σ 1 o (y Xβ). The vector of GLS fitted values, ŷ GLS = X(X Σ 1 o X) 1 (X Σ 1 o y), is an oblique projection of y onto span(x), because X(X Σ 1 o X) 1 X Σ 1 o is idempotent but asymmetric. The GLS residual vector is ê GLS = y ŷ GLS. The sum of squared OLS residuals is less than the sum of squared GLS residuals. (Why?) C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

75 Stochastic Properties of the GLS Estimator Theorem 4.1 (Aitken) Given linear specification (1), suppose that [A1] and [A2](i) hold and that var(y) = Σ o is positive definite. Then, ˆβ GLS is the BLUE for β o. Given [A3 ] y N (Xβ o, Σ o ), ˆβ GLS N ( β o, (X Σ 1 o X) 1). Under [A3 ], the log likelihood function is log L(β; Σ o ) = T 2 log(2π) 1 2 log(det(σ o)) 1 2 (y Xβ) Σ 1 o (y Xβ), with the FOC: X Σ 1 o (y Xβ) = 0. Thus, the GLS estimator is also the MLE under normality. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

76 Under normality, the information matrix is IE[X Σ 1 o (y Xβ)(y Xβ) Σ 1 o X] = X Σ 1 o X. β=βo Thus, the GLS estimator is the BUE for β o, because its covariance matrix reaches the Crámer-Rao lower bound. Under the null hypothesis Rβ o = r, we have (Rˆβ GLS r) [R(X Σ 1 o X) 1 R ] 1 (Rˆβ GLS r) χ 2 (q). A major difficulty: How should the GLS estimator be computed when Σ o is unknown? C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

77 Under normality, the information matrix is IE[X Σ 1 o (y Xβ)(y Xβ) Σ 1 o X] = X Σ 1 o X. β=βo Thus, the GLS estimator is the BUE for β o, because its covariance matrix reaches the Crámer-Rao lower bound. Under the null hypothesis Rβ o = r, we have (Rˆβ GLS r) [R(X Σ 1 o X) 1 R ] 1 (Rˆβ GLS r) χ 2 (q). A major difficulty: How should the GLS estimator be computed when Σ o is unknown? C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

78 The Feasible GLS Estimator The Feasible GLS (FGLS) estimator is ˆβ FGLS = (X Σ 1 T X) 1 X Σ 1 T y, where Σ T is an estimator of Σ o. Further difficulties in FGLS estimation: The number of parameters in Σ o is T (T + 1)/2. Estimating Σ o without some prior restrictions on Σ o is practically infeasible. Even when an estimator Σ T is available under certain assumptions, the finite-sample properties of the FGLS estimator are still difficult to derive. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

79 The Feasible GLS Estimator The Feasible GLS (FGLS) estimator is ˆβ FGLS = (X Σ 1 T X) 1 X Σ 1 T y, where Σ T is an estimator of Σ o. Further difficulties in FGLS estimation: The number of parameters in Σ o is T (T + 1)/2. Estimating Σ o without some prior restrictions on Σ o is practically infeasible. Even when an estimator Σ T is available under certain assumptions, the finite-sample properties of the FGLS estimator are still difficult to derive. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

80 Tests for Heteroskedasticity A simple form of Σ o is [ σ1 2 Σ o = I T σ2 2I T 2 ], with T = T 1 + T 2 ; this is known as groupwise heteroskedasticity. The null hypothesis of homoskedasticity: σ1 2 = σ2 2 = σ2 o. Perform separate OLS regressions using the data in each group and obtain the variance estimates ˆσ T 2 1 and ˆσ T 2 2. Under [A1] and [A3 ], the F test is: ϕ := ˆσ2 T 1 ˆσ T 2 = (T 1 k)ˆσ T σo(t 2 1 k) / (T2 k)ˆσ 2 T 2 σ 2 o(t 2 k) F (T 1 k, T 2 k). C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

81 More generally, for some constants c 0, c 1 > 0, σ 2 t = c 0 + c 1 x 2 tj. The Goldfeld-Quandt test: (1) Rearrange obs. according to the values of x j in a descending order. (2) Divide the rearranged data set into three groups with T 1, T m, and T 2 observations, respectively. (3) Drop the T m observations in the middle group and perform separate OLS regressions using the data in the first and third groups. (4) The statistic is the ratio of the variance estimates: Some questions: ˆσ T 2 1 /ˆσ T 2 2 F (T 1 k, T 2 k). Can we estimate the model with all observations and then compute ˆσ T 2 1 and ˆσ T 2 2 based on T 1 and T 2 residuals? If Σ o is not diagonal, does the F test above still work? C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

82 More generally, for some constants c 0, c 1 > 0, σ 2 t = c 0 + c 1 x 2 tj. The Goldfeld-Quandt test: (1) Rearrange obs. according to the values of x j in a descending order. (2) Divide the rearranged data set into three groups with T 1, T m, and T 2 observations, respectively. (3) Drop the T m observations in the middle group and perform separate OLS regressions using the data in the first and third groups. (4) The statistic is the ratio of the variance estimates: Some questions: ˆσ T 2 1 /ˆσ T 2 2 F (T 1 k, T 2 k). Can we estimate the model with all observations and then compute ˆσ T 2 1 and ˆσ T 2 2 based on T 1 and T 2 residuals? If Σ o is not diagonal, does the F test above still work? C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

83 More generally, for some constants c 0, c 1 > 0, σ 2 t = c 0 + c 1 x 2 tj. The Goldfeld-Quandt test: (1) Rearrange obs. according to the values of x j in a descending order. (2) Divide the rearranged data set into three groups with T 1, T m, and T 2 observations, respectively. (3) Drop the T m observations in the middle group and perform separate OLS regressions using the data in the first and third groups. (4) The statistic is the ratio of the variance estimates: Some questions: ˆσ T 2 1 /ˆσ T 2 2 F (T 1 k, T 2 k). Can we estimate the model with all observations and then compute ˆσ T 2 1 and ˆσ T 2 2 based on T 1 and T 2 residuals? If Σ o is not diagonal, does the F test above still work? C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

84 GLS and FGLS Estimation Under groupwise heteroskedasticity, [ ] Σ 1/2 σ1 1 o = I T σ2 1 I, T 2 so that the transformed specification is [ ] [ ] [ y 1 /σ 1 X = 1 /σ 1 e β + 1 /σ 1 y 2 /σ 2 X 2 /σ 2 e 2 /σ 2 ]. Clearly, var(σo 1/2 y) = I T. The GLS estimator is: [ X ˆβ GLS = 1 X 1 σ 2 1 ] 1 [ + X 2X 2 X ] 1 y 1 σ2 2 σ1 2 + X 2y 2 σ2 2. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

85 With ˆσ T 2 1 and ˆσ T 2 2 from separate regressions, an estimator of Σ o is [ ] ˆσ T Σ 2 = 1 I T1 0 0 ˆσ T 2. 2 I T2 The FGLS estimator is: [ X ] 1 [ ˆβ FGLS = 1 X 1 ˆσ X 2X 2 X 1 y 1 ˆσ 2 2 ˆσ 1 2 ] + X 2y 2 ˆσ 2 2. Note: If σt 2 = c xtj 2, a transformed specification is y t 1 x t,j 1 x t,j+1 x = β x j + β β tj x j 1 + β tj x j β tk tj x k + e t, tj x tj x tj where var(y t /x tj ) = c := σ 2 o. Here, the GLS estimator is readily computed as the OLS estimator for the transformed specification. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

86 Discussion and Remarks How do we determine the groups for groupwise heteroskedasticity? What if the diagonal elements of Σ o take multiple values (so that there are more than 2 groups)? A general form of heteroskedasticity: σt 2 = h(α 0 + z tα 1 ), with h unknown, z t a p 1 vector and p a fixed number less than T. When the F test rejects the null of homoskedasticity, groupwise heteroskedasticity need not be a correct description of Σ o. When the form of heteroskedasticity is incorrectly specified, the resulting FGLS estimator may be less efficient than the OLS estimator. The finite-sample properties of FGLS estimators and hence the exact tests are typically unknown. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

87 Serial Correlation When time series data y t are correlated over time, they are said to exhibit serial correlation. For cross-section data, the correlations of y t are known as spatial correlation. A general form of Σ o is that its diagonal elements (variances of y t ) are a constant σ 2 o, and the off-diagonal elements (cov(y t, y t i )) are non-zero. In the time series context, cov(y t, y t i ) are known as the autocovariances of y t, and the autocorrelations of y t are corr(y t, y t i ) = cov(y t, y t i ) var(yt ) var(y t i ) = cov(y t, y t i ). σ 2 o C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

88 Simple Model: AR(1) Disturbances A time series y t is said to be weakly (covariance) stationary if its mean, variance, and autocovariances are all independent of t. i.i.d. random variables White noise: A time series with zero mean, a constant variance, and zero autocovariances. Disturbance: ɛ := y Xβ o so that var(y) = var(ɛ) = IE(ɛɛ ). Suppose that ɛ t follows a weakly stationary AR(1) (autoregressive of order 1) process: ɛ t = ψ 1 ɛ t 1 + u t, ψ 1 < 1, where {u t } is a white noise with IE(u t ) = 0, IE(u 2 t ) = σ 2 u, and IE(u t u τ ) = 0 for t τ. C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 14, / 97

Classical Least Squares Theory

Classical Least Squares Theory Classical Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University October 18, 2014 C.-M. Kuan (Finance & CRETA, NTU) Classical Least Squares Theory October 18, 2014

More information

Generalized Least Squares Theory

Generalized Least Squares Theory Chapter 4 Generalized Least Squares Theory In Section 3.6 we have seen that the classical conditions need not hold in practice. Although these conditions have no effect on the OLS method per se, they do

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008

FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4. Prof. Mei-Yuan Chen Spring 2008 FENG CHIA UNIVERSITY ECONOMETRICS I: HOMEWORK 4 Prof. Mei-Yuan Chen Spring 008. Partition and rearrange the matrix X as [x i X i ]. That is, X i is the matrix X excluding the column x i. Let u i denote

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003

ECONOMETRICS (I) MEI-YUAN CHEN. Department of Finance National Chung Hsing University. July 17, 2003 ECONOMERICS (I) MEI-YUAN CHEN Department of Finance National Chung Hsing University July 17, 2003 c Mei-Yuan Chen. he L A EX source file is ec471.tex. Contents 1 Introduction 1 2 Reviews of Statistics

More information

Asymptotic Least Squares Theory

Asymptotic Least Squares Theory Asymptotic Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA December 5, 2011 C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 1 / 85 Lecture Outline

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Generalized Method of Moment

Generalized Method of Moment Generalized Method of Moment CHUNG-MING KUAN Department of Finance & CRETA National Taiwan University June 16, 2010 C.-M. Kuan (Finance & CRETA, NTU Generalized Method of Moment June 16, 2010 1 / 32 Lecture

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

Empirical Market Microstructure Analysis (EMMA)

Empirical Market Microstructure Analysis (EMMA) Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T, Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH LECURE ON HAC COVARIANCE MARIX ESIMAION AND HE KVB APPROACH CHUNG-MING KUAN Institute of Economics Academia Sinica October 20, 2006 ckuan@econ.sinica.edu.tw www.sinica.edu.tw/ ckuan Outline C.-M. Kuan,

More information

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2. Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance V (u i x i ) = σ 2 is common to all observations i = 1,..., In many applications, we may suspect

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

the error term could vary over the observations, in ways that are related

the error term could vary over the observations, in ways that are related Heteroskedasticity We now consider the implications of relaxing the assumption that the conditional variance Var(u i x i ) = σ 2 is common to all observations i = 1,..., n In many applications, we may

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 17, 2012 Outline Heteroskedasticity

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Ch.10 Autocorrelated Disturbances (June 15, 2016)

Ch.10 Autocorrelated Disturbances (June 15, 2016) Ch10 Autocorrelated Disturbances (June 15, 2016) In a time-series linear regression model setting, Y t = x tβ + u t, t = 1, 2,, T, (10-1) a common problem is autocorrelation, or serial correlation of the

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

The Linear Regression Model

The Linear Regression Model The Linear Regression Model Carlo Favero Favero () The Linear Regression Model 1 / 67 OLS To illustrate how estimation can be performed to derive conditional expectations, consider the following general

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2) RNy, econ460 autumn 04 Lecture note Orthogonalization and re-parameterization 5..3 and 7.. in HN Orthogonalization of variables, for example X i and X means that variables that are correlated are made

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

Christopher Dougherty London School of Economics and Political Science

Christopher Dougherty London School of Economics and Political Science Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this

More information

ECON 4230 Intermediate Econometric Theory Exam

ECON 4230 Intermediate Econometric Theory Exam ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College University at Albany PAD 705 Handout: Suggested Review Problems from Pindyck & Rubinfeld Original prepared by Professor Suzanne Cooper John F. Kennedy School of Government, Harvard

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Reference: Davidson and MacKinnon Ch 2. In particular page

Reference: Davidson and MacKinnon Ch 2. In particular page RNy, econ460 autumn 03 Lecture note Reference: Davidson and MacKinnon Ch. In particular page 57-8. Projection matrices The matrix M I X(X X) X () is often called the residual maker. That nickname is easy

More information

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε, THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, which can be written in the following form: (2) y 1 y 2.. y T = 1 x 11... x 1k 1

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning Økonomisk Kandidateksamen 2004 (I) Econometrics 2 Rettevejledning This is a closed-book exam (uden hjælpemidler). Answer all questions! The group of questions 1 to 4 have equal weight. Within each group,

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

Introduction to Quantile Regression

Introduction to Quantile Regression Introduction to Quantile Regression CHUNG-MING KUAN Department of Finance National Taiwan University May 31, 2010 C.-M. Kuan (National Taiwan U.) Intro. to Quantile Regression May 31, 2010 1 / 36 Lecture

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Linear Regression with Time Series Data

Linear Regression with Time Series Data Econometrics 2 Linear Regression with Time Series Data Heino Bohn Nielsen 1of21 Outline (1) The linear regression model, identification and estimation. (2) Assumptions and results: (a) Consistency. (b)

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

1 Introduction to Generalized Least Squares

1 Introduction to Generalized Least Squares ECONOMICS 7344, Spring 2017 Bent E. Sørensen April 12, 2017 1 Introduction to Generalized Least Squares Consider the model Y = Xβ + ɛ, where the N K matrix of regressors X is fixed, independent of the

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

L2: Two-variable regression model

L2: Two-variable regression model L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...

More information