Econometrics - Slides

Size: px
Start display at page:

Download "Econometrics - Slides"

Transcription

1 1 Econometrics - Slides 2011/2012 João Nicolau

2 2 1 Introduction 1.1 What is Econometrics? Econometrics is a discipline that aims to give empirical content to economic relations. It has been defined generally as the application of mathematics and statistical methods to economic data. Application of econometrics: forecast e.g. interest rates, inflation rates, and gross domestic product. study economic relations; testing economic theories; evaluating and implementing government and business policy. For example, what are the effects of political campaign expenditures on voting outcomes? What is the effect of school spending on student performance in the field of education?

3 3 1.2 Steps in Empirical Economic Analysis Formulate the question of interest. The question might deal with testing a certain aspect of an economic theory, or it might pertain to testing the effects of a government policy. Build the economic model. An economic model consists of mathematical equations that describe various relationships. Formal economic modeling is sometimes the starting point for empirical analysis, but it is more common to use economic theory less formally, or even to rely entirely on intuition. Specify the econometric model. Collect the data. Estimate and test the econometric model. Answer the question in step 1.

4 4 1.3 The Structure of Economic Data Cross-Sectional Data A cross-sectional data: sample of individuals, households, firms, cities, states, countries, etc. taken at a given point in time. An important feature of cross-sectional data: they are obtained by random sampling from the underlying population. For example, suppose that y i is the i-th observation of the dependent variable and x i is the i-th observation of the explanatory variable. Random sampling means that This implies that for i j {y i, x i } is an i.i.d. sequence. Cov y i, y j = 0, Cov xi, x j = 0, Cov yi, x j = 0. Obviously, if x i explains y i we will have Cov y i, x i 0. Cross-sectional data is closely aligned with the applied microeconomics fields, such as labor economics, state and local public finance, industrial organization, urban economics, demography, and health economics.

5 An example of Cross-Sectional Data: 5

6 6 Scatterplots may be adequate for analyzing cross-section data: Models based on Cross-Sectional Data usually satisfy the assumptions cover by the chapter Finite-Sample Properties of OLS.

7 Time-Series Data A time series data set consists of observations on a variable or several variables over time. E.g.: stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, and automobile sales figures, etc. Time series data cannot be assumed to be independent across time. For example, knowing something about the gross domestic product from last quarter tells us quite a bit about the likely range of the GDP during this quarter... The analysis of time series data is more diffi cult than that of cross-sectional data. Reasons: we need to account for the dependent nature of economic time series; time-series data exhibits unique features such as trends over time and seasonality; models based on time-series data rarely satisfy the assumptions cover be the chapter Finite-Sample Properties of OLS. The most adequate assumptions are cover by chapter Large-Sample Theory, which is theoretically more advanced.

8 An example of a time series scatterplots cannot in general be used here, but there are exceptions: 8

9 Pooled Cross Sections and Panel or Longitudinal Data Data sets have both cross-sectional and time series features Causality And The Notion Of Ceteris Paribus In Econometric Analysis Ceteris Paribus: other relevant factors being equal. Plays an important role in causal analysis. Example. Suppose that wages depend on education and labor force experience. Your goal is to measure the return to education. If your analysis involves only wages and education you may not uncover the ceteris paribus effect of education on wages. Consider the following data: monthly wages Euros years of experience years of education

10 10 Example. In a totalitarianism regime how can you measure the ceteris paribus effect of another year of education on wages? You may create 100 clones of a normal individual. Give to each person an amount of education and then measure their wages. Ceteris Paribus is relatively easy to analyze in Experimental Data. Example Experimental Data. Considered the effects of new fertilizers on crop yields. Suppose the crop under consideration is soybeans. Since fertilizer amount is only one factor affecting yields some others include rainfall, quality of land, and presence of parasites this issue must be posed as a ceteris paribus question. One way to determine the causal effect of fertilizer amount on soybean yield is to conduct an experiment, which might include the following steps. Choose several one-acre plots of land. Apply different amounts of fertilizer to each plot and subsequently measure the yields. In economics you have nonexperimental data, so in principle, it is diffi cult to estimate the ceteris paribus effects. However, we will see that econometric methods can simulate a ceteris paribus experiment. We will be able to do in nonexperimental environments what natural scientists are able to do in a controlled laboratory setting: keep other factors fixed.

11 11 2 Finite-Sample Properties of OLS This chapter covers the finite- or small-sample properties of the OLS estimator, that is, the statistical properties of the OLS estimator that are valid for any given sample size. 2.1 The Classical Linear Regression Model The dependent variable is related to several other variables called the regressors or the explanatory variables. Let y i be the i-th observation of the dependent variable. Let x i1, x i2,..., x ik be the i-th observation of the K regressors. The sample or data is a collection of those n observations. The data in economics cannot be generated by experiments except in experimental economics, so both the dependent and independent variables have to be treated as random variables, variables whose values are subject to chance.

12 The Linearity Assumption Assumption Linearity. We have y i = β 1 x i1 + β 2 x i β K x ik + ε i, i = 1, 2,..., n where β s are unknown parameters to be estimated, and ε i is the unobserved error term. β s : regression coeffi cients. They represent the marginal and separate effects of the regressors. Example 1.1. Consumption function: Consider con i = β 1 + β 2 yd i + ε i. con i : consumption; yd i is disposable income. Note: x i1 = 1, x i2 = yd i. The error ε i represents other variables besides disposable income that influence consumption. They include: those variables such as financial assets that might be observable but the researcher decided not to include as regressors, as well as those variables such as the mood of the consumer that are hard to measure. The equation is called the simple regression model.

13 13 The linearity assumption is not as restrictive as it might first seem. Example 1.2. Wage equation. Consider wage i = e β 1e β 2educ i e β 3tenure i e β 4expr i e ε i where WAGE = the wage rate for the individual, educ = education in years, tenure = years on the current job, and expr = experience in the labor. This equation can be written as log wage i = β 1 + β 2 educ i + β 3 tenure i + β 4 expr i + ε i The equation is said to be in the semi-log form or log-level form. Example. Does this model violate Assumption 1.1? y i = β 1 + β 2 x i2 + β 3 log x i2 + β 4 x 2 i3 + ε i There are, of course, cases of genuine nonlinearity. For example y i = β 1 + e β 2x i2 + ε i

14 14 Partial Effects To simplify let s consider, K = 2, and assume that E ε i x i1, x i2 = 0. What is the impact on the conditional expected value y, E y i x i1, x i2 when x i2 is increased by a small amount Let x i = x i1, x i2 x i = x i1, x i2 + x i2 holding the other variable fixed? E y i x i E y i x i1 = x i1, x i2 = x i2 + x i2 E y i x i1, x i2. Equation Interpretation of β 2 level-level y i = β 1 + β 2 x i2 + ε i E y i x i = β 2 x i2 level-log y i = β 1 + β 2 log x i2 + ε i E y i x i β 2 xi2 100 x 100 i2 Ey log-level log y i = β 1 + β 2 x i2 + ε i x i i β 2 x i2 Ey i x i log-log log y i = β 1 + β 2 log x i2 + ε i Ey i x i Ey i x i 100β 2 : semi-elast. 100 β xi2 2 x 100 i2 β 2 : elasticity

15 15 Exercise 2.1. Suppose, for example, the marginal effect of experience on wages declines with the level of experience. How can this be captured? Exercise 2.2. Provide an interpretation of β 2 in the following equations: a con i = β 1 + β 2 inc i + ε i, where inc: income, con: consumption both measured in dollars. Assume that β 2 = 0.8; b log wage i = β 1 + β 2 educ i + β 3 tenure i + β 4 expr i + ε i. Assume that β 2 = c log price i = β 1 + β 2 log dist i + ε i where prices = housing price and dist = distance from a recently built garbage incinerator. Assume that β 2 = 0.6.

16 Matrix Notation We have y i = β 1 x i1 + β 2 x i β K x ik + ε i = [ x i1 x i2 ] x ik = x i β + ε i β 1 β 2. β K + ε i where x i = x i1 x i2. x ik, β = β 1 β 2. β K y i = x i β + ε i.

17 17 More compactly y = y 1 y 2. y n, X = x 11 x 12 x 1K x 21. x 22. x 2K. x n1 x n2 x nk, ε i = ε 1 ε 2. ε n y = Xβ + ε. Example. y i = β 1 + β 2 educ i + β 3 exp i + ε i y i = wages in Euros. Cross-Sectional Data is y = , X = An example of Important: y and X or y i and x ik may be random variables or observed values. We use the same notation for both cases.

18 The Strict Exogeneity Assumption Assumption Strict exogeneity. E ε i X = 0, i This assumption can be written as E ε i x 1,..., x n = 0, i. With random sampling ε i is automatically independent of the explanatory variables for observations other than i. This implies that It remains to be analyzed whether or not E εi x j = 0, i, j i j E ε i x i? = 0.

19 19 Strict Exogeneity assumption can fail in situations such as: Cross-Section or Time Series Omitted variables; Cross-Section or Time Series Measurement error in some of the regressors; Time Series, Static models There is a feedback from y i on future values of x i ; Time Series, Dynamic models There is a lag dependent variable as a regressor; Cross-Section or Time Series Simultaneity. Example Omitted variables. Suppose that wage is determined by wage i = β 1 + β 2 x i2 + β 3 x i3 + v i, where x 2 : years of education, x 3 : ability. Assume that E v i X = 0. Since ability is not observed, we instead estimate the model. wage i = β 1 + β 2 x i2 + ε i, ε i = β 3 x i3 + v i. If Cov x i2, x i3 0 then Cov ε i, x i2 = Cov β 3 x i3 + v i, x i2 = β 3 Cov x i3, x i2 0 E ε i X 0.

20 20 Example Measurement error in some of the regressors. Consider y = household savings and w = disposable income and y i = β 1 + β 2 w i + v i, E v i w = 0. Suppose that w cannot be measured absolutely accurately for example, because of misreporting and denote the measured value for w i by x i2. We have x i2 = w i + u i. Assume: E u i = 0, Cov w i, u i = 0, Cov v i, u i = 0. Now substituting x i2 = w i + u i into y i = β 1 + β 2 w i + v i we obtain Hence, y i = β 1 + β 2 x i2 + ε i, ε i = v i β 2 u i. Cov ε i, x i2 =... = β 2 Var u i 0. Cov ε i, x i2 0 E ε i X 0.

21 21 Example Feedback from y on future values of x. Consider a simple static time-series model to explain a city s murder rate y t in terms of police offi cers per capita x t : y t = β 1 + β 2 x t + ε t, Suppose that the city adjusts the size of its police force based on past values of the murder rate. This means that, say, x t+1 might be correlated with ε t since a higher ε t leads to a higher y t. Example There is a lag dependent variable as a regressor. See section Exercise 2.3. Let kids denote the number of children ever born to a woman, and let educ denote years of education for the woman. A simple model relating fertility to years of education is kids i = β 1 + β 2 educ i + ε i. where ε i is the unobserved error. i What kinds of factors are contained in ε i? Are these likely to be correlated with level of education? ii Will a simple regression analysis uncover the ceteris paribus effect of education on fertility? Explain.

22 Implications of Strict Exogeneity The Assumption E ε i X = 0, i implies: E ε i = 0, i. E εi x j = 0, i, j. E xjk ε i = 0, i, j, k or E xj ε i = 0, i, j The regressors are orthogonal to the error term for all observations Cov x jk, ε i = 0. Note: if E εi x j 0 or E xjk ε i 0 or Cov xjk, ε i 0 E ε i X 0.

23 Strict Exogeneity in Time-Series Models For time-series models where strict exogeneity can be rephrased as: the regressors are orthogonal to the past, current, and future error terms. However, for most time-series models, strict exogeneity is not satisfied. Example. Consider y i = βy i 1 + ε i, E ε i y i 1 = 0 thus E y i 1 ε i = 0. Let x i = y i 1. By construction we have E x i+1 ε i = E y i ε i =... = E ε 2 i 0. The regressor is not orthogonal to the past error term, which is a violation of strict exogeneity. However, the estimator may possess good large-sample properties without strict exogeneity Other Assumptions of the Model Assumption no multicollinearity. The rank of the n K data matrix X is K with probability 1.

24 24 None of the K columns of the data matrix X can be expressed as a linear combination of the other columns of X. Example continuation of Example 1.2. If no individuals in the sample ever changed jobs, then tenure i = expr i for all i, in violation of the no multicollinearity assumption. There no way to distinguish the tenure effect on the wage rate from the experience effect. Remedy: drop tenure i or expri from the wage equation. Example Dummy Variable Trap. Consider where female i = In vectorial notation we have wage i = β 1 + β 2 educ i + β 3 female i + β 4 male i + ε i { 1 if i corresponds to a female 0 if i corresponds to a male, male i = 1 female i. wage = β β 2 educ + β 3 female + β 4 male + ε. It is obvious that 1 = female + male. Therefore the above model violates Assumption 1.3. One may also justify using scalar notation: x i1 = female i + male i because this relationship implies 1 = female + male. Can you overcome the dummy variable trap by removing x i1 1 from the equation?

25 25 Exercise 2.4. In a study relating college grade point average to time spent in various activities, you distribute a survey to several students. The students are asked how many hours they spend each week in four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four categories, so that for each student the sum of hours in the four activities must be 168. i In the model GP A i = β 1 + β 2 study i + β 3 sleep i + β 4 work i + β 5 leisure i + ε i does it make sense to hold sleep, work, and leisure fixed, while changing study? ii Explain why this model violates Assumption 1.3; iii How could you reformulate the model so that its parameters have a useful interpretation and it satisfies Assumption 1.3? Assumption spherical error variance. The error term satisfies: E ε 2 i X = σ 2 > 0, i, Homoskedasticity E εi ε j X = 0, i, j; i j. No correlation between observations. Exercise 2.5. Under the Assumptions 1.2 and 1.4, show that Cov y i, y j X = 0.

26 26 Assumption 1.4 and strict exogeneity implies: Var ε i X = E ε 2 i X = σ 2. Cov ε i, ε j X = 0. E εε X = σ 2 I. Var ε X = σ 2 I. Note E εε X = E ε 2 1 X E ε 1 ε 2 X E ε 1 ε n X E ε 1 ε 2 X E ε 2 2 X E ε 2 ε n X E ε 1 ε n X E ε 2 ε n X E ε 2 n X.

27 27 Exercise 2.6. Consider the savings function sav i = β 1 + β 2 inc i + ε i, ε i = inc i z i where z i is a random variable with E z i = 0 and Var z i = σ 2 z. Assume that z i is independent of inc j for all i, j. i Show that E ε inc = 0; ii Show that Assumption 1.4 is violated The Classical Regression Model for Random Samples The sample y, X is a random sample if {y i, x i } is i.i.d. independently and identically distributed across observations. Random sample automatically implies: E ε i X = E ε i x i, E ε 2 i X = E ε 2 i xi. Therefore Assumptions 1.2 and 1.4 can be rephrasing as Assumption 1.2 E ε i x i = E ε i = 0 Assumption 1.4 E ε 2 i xi = E ε 2 i = σ 2

28 Fixed Regressors This is a simplifying and generally an unrealistic assumption to make the statistical analysis tractable. It means that X is exactly the same in repeated samples. Sampling schemes that support this assumption: a Experimental situations. For example, suppose that y represents the yields of a crop grown on n experimental plots, and let the rows of X represent the seed varieties, irrigation and fertilizer for each plot. The experiment can be repeated as often as desired, with the same X. Only y varies across plots. b Stratified Sampling for more details see Wooldridge, chap. 9.

29 The Algebra of Least Squares OLS Minimizes the Sum of Squared Residuals Residual for observation i evaluated at β: Vector of residuals evaluated at β: Sum of squared residuals SSR: y i x i β. y X β. SSR β = n i=1 yi x i β 2 = y X β y X β. The OLS Ordinary Least Squares: b is such that SSR b is minimum. b = arg min β SSR β

30 30 K = 1, y i = βx i + ε i Example. Consider y i = β 1 + β 2 x i2 + ε i. The data: y X Verify that SSR β = 42 when β = 0 1.

31 Normal Equations To solve the optimization proble min β SSR β we use classical optimization: First Order Condition FOC: SSR β β = 0. Solve the previous equation with respect to β. Let b such solution. Second Order Condition SOC: 2 SSR β β β is a Positive Definite Matrix b is global minimum point.

32 32 To easily obtain the FOC we start writing SSR β as β SSR = y X β y X β =... = y y 2y X β + β X X β. Recalling from matrix algebra that a β β = a, β A β β = 2A β for A symmetric we have SSR β β = 2 y X + 2X X β = 0 i.e. replacing β by the solution b X Xb = X y or X y Xb = 0.

33 33 This is a system with K equations and K unknowns. These equations are called the normal equations. If rank X = K X X is nonsingular there exists X X 1. Therefore, if rank X = K we have a unique solution: b = X X 1 X y OLS estimator. The SOC is 2 SSR β β β = 2X X. If rank X = K then 2X X is a positive definite matrix thus SSR β is strictly convex in R k. Hence b is a global minimum point. The vector of residuals evaluated at β = b, e = y Xb is called the vector of OLS residuals or simply residuals.

34 34 The normal equations can be written as X e = 0 1 n n i=1 x i e i = 0. This shows that the normal equations can be interpreted as the sample analogue of the orthogonality conditions E x i ε i = 0. Notice the reasoning: by assuming in the population the orthogonality conditions E x i ε i = 0 we deduce by the method of moments the corresponding sample analogue 1 n i x i yi x i β = 0. We obtain the OLS estimator b by solving this equation with respect to β.

35 Two Expressions for the OLS Estimator b = X X 1 X y b = X X 1 X y n n = S 1 xxs xy, where S xx = X X n S xy = X y n = 1 n = 1 n n i=1 n i=1 x i x i sample average of x ix i x i y i sample average of x i y i. Example continuation of previous example. Consider the data. y X Obtain b, e and SSR b.

36 More Concepts and Algebra The fitted value for observation i: ŷ i = x i b. The vector of fitted value: ŷ = Xb. The vector of OLS residuals: e = y Xb = y ŷ. The projection matrix P and the annihilator M are defined as P = X X X 1 X, M = I P. Properties: Exercise 2.7. Show that P and M are symmetric and idempotent and PX = X MX = 0 ŷ = Py e = My = Mε SSR = e e = y My = ε Mε.

37 37 The OLS estimate of σ 2 the variance of the error term, denoted s 2, is s 2 = SSR n K = e e n K s 2 is called the standard error of regression. The sampling error b β =... = X X 1 X ε. Coeffi cient of Determination A measure of goodness of fit is the coeffi cient of determination R 2 = ni=1 ŷ i ȳ 2 ni=1 y i ȳ 2 = 1 ni=1 e 2 i ni=1 y i ȳ 2, 0 R2 1. It measures the proportion of the variation of y that is accounted for by variation in the regressors, x j s. Derivation of R2 : [board]

38 38 y 25 R^2 = x y y^ y R^2 = x y y^ y R^2 = x y y^

39 39 The most important thing about R 2 is that it is not important Goldberger. Why? We are concerned with parameters in a population, not with goodness of fit in the sample; We can always increase R 2 by adding more explanatory variables. At the limit, if K = n R 2 = 1. Exercise 2.8. Prove that K = n R 2 = 1 assume that Assumption 1.3 holds. It can be proved that R 2 = ˆρ 2, ˆρ = i ŷi ŷ y i ȳ /n SŷS y. Adjusted coeffi cient of determination R 2 = 1 n 1 n k 1 R 2 = 1 ni=1 e 2 i / n k ni=1 y i ȳ 2 / n 1. Contrary to R 2, R 2 may decline when a variable is added to the set of independent variables.

40 Finite-Sample Properties of OLS First of all we need to recognize that b and b X are random! Assumptions: Linearity: y i = β 1 x i1 + β 2 x i β K x ik + ε i Strict exogeneity: E ε i X = No multicollinearity Spherical error variance: E ε 2 i X = σ 2, E εi ε j X = 0. Proposition finite-sample properties of b. We have: a unbiasedness Under Assumptions , E b X = β. b expression for the variance Under Assumptions , Var b X = σ 2 X X 1. c Gauss-Markov Theorem Under Assumptions , the OLS estimator is effi cient in the class of linear unbiased estimators also called Best Linear Unbiased Estimator. That is, for any unbiased estimator ˆβ that is linear in y, Var b X Var ˆβ X in the matrix sense i.e. Var ˆβ X Var b X is a positive semidefinite matrix. d Under Assumptions , Cov b, e X = 0. Proof: [board]

41 41 Proposition Unbiasedness of s 2. Let s 2 = e e/ n K. We have E s 2 X = E s 2 = σ 2. Proof: [board] An unbiased estimator of Var b X is Example. Consider Var b X = s 2 X X 1. col GP A i = β 1 + β 2 HSGP A i + β 3 ACT i + β 4 SKIP P ED i + β 5 P C i + ε i where: col GP A : college grade point average GPA; HSGP A : high school GPA; ACT : achievement examination for college admission; SKIP P ED : average lectures missed per week; P C is a binary variable 0/1 to identify who owns a personal computer. Using a survey of 141 students Michigan State University in Fall 1994, we obtained the following results:

42 42 These results tell us that n = 141, s = 0.325, R 2 = 0.259, SSR = b = , Var b X = ????? ????? ????? ?????

43 More on Regression Algebra Regression Matrices Matrix P = X X X 1 X Py Fitted values from the regression of y on X Pz? Matrix M = I P = I X X X 1 X My Residuals from the regression of y on X Mz? Consider a partition of X as follows X = [ X 1 X 2 ] Matrix P 1 = X 1 X 1 1 X 1 X 1 P 1 y? Matrix M 1 = I P 1 = I X 1 X 1 1 X 1 X M 1 y? 1

44 Short and Long Regression Algebra Partition X as X = [ X 1 X 2 ], XK1 n, X K2 n, K 1 + K 2 = K Long Regression We have y = ŷ + e = Xb + e = [ X 1 X 2 ] [ b 1 b 2 ] + e = X 1 b 1 + X 2 b 2 + e. Short Regression Suppose that we shorten the list of explanatory variables and regress y on X 1. We have where y = ŷ + e = X 1 b 1 + e b 1 = X 1 X 1 1 X1 y e = M 1 y, M 1 = I X 1 X 1 X 1 1 X 1

45 45 How are b 1 and e related to b 1 and e? b 1 vs. b 1 We have, b 1 = X 1 X 1 1 X1 y = X 1 X 1 1 X 1 X 1 b 1 + X 2 b 2 + e = b 1 + X 1 X 1 1 X 1 X 2 b 2 + X 1 X 1 1 X 1 e }{{} 0 = b 1 + X 1 X 1 1 X 1 X 2 b 2 = b 1 + Fb 2, F = X 1 X 1 1 X 1 X 2. Thus, in general, b 1 b 1. Exceptional cases: b 2 = 0 or X 1 X 2 = O b 1 = b 1.

46 46 e vs. e We have, e = M 1 y = M 1 X 1 b 1 + X 2 b 2 + e = M 1 X 1 b 1 + M 1 X 2 b 2 + M 1 e = M 1 X 2 b 2 + e, = v + e Thus, e e = e e + v v e e Thus the SSR of the short regression e e exceeds the SSR of the long regression e e and e e = e e iff v = 0, that is iff b 2 = 0.

47 47 Example. Illustration of b 1 b 1 and e e e e. Find X, X 1, X 2, b, b 1, b 2, b 1, e e, e e.

48 Residual Regression Consider y = Xβ + ε = X 1 β 1 + X 2 β 2 + ε. Premultiplying both sides by M 1 and using M 1 X 1 = 0, we obtain M 1 y = M 1 X 1 β 1 + M 1 X 2 β 2 + M 1 ε ỹ = X 2 β 2 + M 1 ε The OLS gives b 2 = X 2 X 1 2 X 2ỹ = X 2 X 1 2 X 2 M 1y = X 2 X 1 2 X 2 y Thus b 2 = X 2 X 1 2 X 2 y

49 49 Another way to prove b 2 = X 2 X 1 2 X 2y you may skip this proof. We have X 2 X 2 1 X 2 y = X 2 X 2 1 X 2 X 1b 1 + X 2 b 2 + e since: = X 2 X 1 2 X 2 X 1b 1 }{{} 0 = b 2 + X 2 X 1 2 X 2 X 2b 2 + }{{} b 2 X 2 X 2 1 X 2 X 1b 1 = X 2 X 2 1 X 2 M 1 X 1 b 1 X 2 X 2 1 X 2 e }{{} 0 = 0 X 2 X 2 1 X 2 X 2b 2 = X 2 X 2 1 X 2 M 1 X 2 b 2 = X 2 M 1 M 1X 2 1 X 2 M 1 X 2 b 2 = X 2 M 1X 2 1 X 2 M 1 X 2 b 2 = b 2 X 2 e = X 2 M 1e = X 2 e = 0.

50 50 The conclusion is that we can obtain b 2 = X 2 X 2 1 X 2 y = X 2 X 2 1 X 2ỹ as follows: 1 Regress X 2 on X 1 to get the residuals X 2 = M 1 X 2. Interp. of X 2 : X 2 is X 2 after the effects of X 1 have been removed or, X 2 is the part X 2 that is uncorrelated with X 1. 2 Regress y on X 2 to get the coeffi cient b 2 of the long regression. OR: 1 Same as 1. 2 a Regress y on X 1 to get the residuals ỹ = M 1 y. 2 b Regress ỹ on X 2 to get the coeffi cient b 2 of the long regression. The conclusion of 1 and 2 is extremely important: b 2 relates y to X 2 after controlling for the effects of X 1. This is why b 2 can be obtained from the regression of y on X 2 where X 2 is X 2 after the effects of X 1 have been removed fixed or controlled for. This means that b 2 has in fact a ceteris paribus interpretation. To recover b 1 we consider the equation b 1 = b 1 + Fb 2. Regress y on X 1, obtaining b 1 = X 1 X 1 1 X 1 y and now b 1 = b 1 X 1 X 1 1 X 1 X 2 b 2 = b 1 Fb 2.

51 Example. Consider the example on page 9. 51

52 52 Example. Consider X = [ 1 exper tenure IQ educ ] and X 1 = [ 1 exper tenure IQ ], X 2 = educ

53 53

54 Application of Residual Regression A Trend Removal time series Suppose that y t and x t have a linear trend. regression as in the case Should the trend term be included in the y t = β 1 + β 2 x t2 + β 3 x t3 + ε t, x t3 = t or should the variables first be detrended and then used without the trend term included as in ỹ t = β 2 x t2 + ε t? According to the previous results, the OLS coeffi cient b 2 is the same in both regressions. In the second regression b 2 is obtained from the regression of ỹ = M 1 y on x 2 = M 1 x 2 where X 1 = [ 1 x 3 ] = n.

55 55 Example. Consider TXDES: unemployment rate, INF: inflation, t: time T XDES t = β 1 + β 2 INF t + β 3 t + ε t. We will show two ways to obtain b 2 compare EQ01 to EQ04. EQ01 Dependent Variable: TXDES Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C INF EQ02 Dependent Variable: TXDES Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C EQ03 Dependent Variable: INF Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C EQ04 Dependent Variable: TXDES_ Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. INF_

56 56 B Seasonal Adjustment and Linear Regression with Seasonal Data Suppose that we have data on the variable y, quarter by quarter, for m years. A way to deal with deterministic seasonality is the following where Let y t = β 1 Q t1 + β 2 Q t2 + β 3 Q t3 + β 4 Q t4 + β 5 x t5 + ε i Q ti = { 1 in quarter i 0 otherwise. X = [ Q 1 Q 2 Q 3 Q 4 x 5 ], X1 = [ Q 1 Q 2 Q 3 Q 4 ]. Previous results show that b 5 can be obtained from the regression of ỹ = M 1 y on x 5 = M 1 x 5. It can be proved ỹ t = where ȳ Qi is the seasonal mean of quarter i. y t ȳ Q1 in quarter 1 y t ȳ Q2 in quarter 2 y t ȳ Q3 in quarter 3 y t ȳ Q4 in quarter 4

57 57 c Deviations from Means Let x 1 be the summer vector. Instead of regressing y on [ x 1 x 2 x K ] to get b 1, b 2,..., b K, we can regress y on x 12 x 2. x 1K x K. x n2 x 2 x nk x K to get the same vector b 2,..., b K. We sketch the proof. Let X 2 = [ x 2 x K ] so that ŷ = x 1 b 1 + X 2 b 2. 1 Regress X 2 on x 1 to get the residuals X 2 = M 1 X 2 where M 1 = I x 1 x 1 x 1 1 x 1 = I x 1x 1 n.

58 58 As we know X 2 = M 1 X 2 = M 1 [ x 2 x K ] = [ M 1 x 2 M 1 x K ] = x 12 x 2. x 1K x K. x n2 x 2 x nk x K. 2 Regress y or ỹ = M 1 y on X 2 to get the coeffi cient b 2 of the long regression: b 2 = X 2 X 1 2 X 2 y = X 2 X 1 2 X 2ỹ. The intercept can be recovered as b 1 = b 1 x 1 x 1 1 x 1 x 1 X 2.

59 Short and Residual Regression in the Classical Regression Model Consider: y = X 1 b 1 + X 2 b 2 + e long regression y = X 1 b 1 + e short regression. The correct specification corresponds to the long regression: E y X = X 1 β 1 + X 2 β 2 = Xβ Var y X = σ 2 I, etc.

60 60 A Short-Regression Coeffi cients b 1 is a biased estimator of β 1 Given that we have b 1 = X 1 X 1 1 X 1 y = b 1 + Fb 2, F = X 1 X 1 1 X 1 X 2. E b 1 X = E b 1 + Fb 2 X = β 1 + Fβ 2, X Var b 1 X = Var 1 1 X 1 X 1 y X = X 1 X 1 1 X 1 Var y X X 1 X 1 1 X 1 = σ 2 X 1 X 1 1 thus, in general, unless: b 1 is a biased estimator of β 1 omitted-variable bias β 2 = 0. Corresponds to the case of Irrelevant Omitted Variables. F = O. Corresponds to the case of Orthogonal Explanatory Variables in sample space.

61 Var b 1 X Var b 1 X you may skip the proof 61 Consider b 1 = b 1 Fb 2 Var b 1 X = Var b 1 Fb 2 X = Var b 1 X + Var Fb 2 X since Cov b 1, b 2 X = O [board] = Var b 1 X + F Var b 2 X F. Because F Var b 2 X F is positive semidefinite or nonnegative definite, Var b 1 X Var b 1 X. This relation is still valid if β 2 = 0. In this case β 2 = 0, regressing y on X 1 and on irrelevant variables X 2 involves a cost: Var b 1 X Var b 1 X, although E b 1 X = β 1. In practise there may be a bias-variance trade-off between short and long regression when the target is β 1.

62 62 Exercise 2.9. Consider the standard simple regression model y i = β 1 + β 2 x i2 + ε i under Assumptions 1.1 through 1.4. Thus, the usual OLS estimators b 1 and b 2 are unbiased for their respective population parameters. Let b 2 be the estimator of β 2 obtained by assuming the intercept is zero i.e. β 1 = 0 i Find E b 2 X. Verify that b 2 is unbiased for β 2 when the population intercept β 1 is zero. Are there other cases where b 2 is unbiased? ii Find the variance of b 2. iii Show that Var b 2 X Var b 2 X; iv Comment on the trade-off between bias and variance when choosing between b 2 and b 2. Exercise Suppose that average worker productivity at manufacturing firms avgprod depends on two factors, average hours of training avgtrain and average worker ability avgabil: avgprod i = β 1 + β 2 avgtrain i + β 3 avgabil i + ε i Assume that this equation satisfies Assumptions 1.1 through 1.4. If grants have been given to firms whose workers have less than average ability, so that avgtrain and avgabil are negatively correlated, what is the likely bias in b 2 in obtained from the simple regression of avgprod on avgtrain?

63 63 B Short-Regression Residuals skip this Given that e = M 1 y we have E e X = M 1 E y X = M 1 E X 1 β 1 + X 2 β 2 X = X 2 β 2, Var e X = Var M 1 y X = M 1 Var y X M 1 = σ2 M 1. Thus E e X 0, unless β 2 = 0. Let s see now that the omission of explanatory variables leads to an increase in the expected SSR. We have, by R5, E e e X = E y M 1 y X = tr M1 Var y X + E y X M 1 E y X and E e e X = σ 2 n K thus = σ 2 tr M 1 + β 2 X 2 X 2 β 2 = σ 2 n K 1 + β 2 X 2 X 2 β 2 E e e X E e e X = σ 2 K 2 + β 2 X 2 X 2 β 2 > 0. Notice that: e e e e = b 2 X 2 X 2 b 2 0. check E b 2 X 2 X 2 b 2 X = σ 2 K 2 + β 2 X 2 X 2 β 2.

64 64 C Residual Regression The objective is to characterize Var b 2 X. We know that b 2 = X 2 X 1 2 X 2y. Thus Var b 2 X = Var X 2 X 1 2 X 2 y X = X 2 X 2 1 X 2 Var y X X 2 X 2 X 2 1 = σ 2 X 2 X 2 1 = σ 2 X 2 M 1X 2 1. Now suppose that X = [ X 1 x K ] i.e. x K = X 2

65 65 If follows that Var b K X = σ 2 x K M 1x K and x K M 1x K is the sum of the squared residuals in the auxiliary regression x K = α 1 x 1 + α 2 x α K 1 x K 1 + error. One can conclude assuming that x 1 is the summer vector: R 2 K = 1 Solving this equation for x K M 1x K we have x K M 1x K xik x K 2. x K M 1x K = 1 R 2 K xik x K 2. We get Var b K X = σ 2 1 R 2 K xik x K 2 = σ 2 1 R 2 K S 2 xk n.

66 66 Var b K X = σ 2 1 R 2 K xik x K 2 = σ 2 1 R 2 K S 2 xk n. We can conclude that the precision of b K is high i.e. Var b K is small when: σ 2 is low; S 2 x K is high imagine the regression wage = β 1 + β 2 educ + ε. If most people in the sample report the same education, S 2 x K will be low and β 2 will be estimated very imprecisely. n is high large sample is preferable to small sample. R 2 K is low multicollinearity increases R2 K.

67 67 Exercise Consider: sleep: minutes sleep at night per week; totwrk: hours worked per week; educ: years of schooling; female: binary variable equal to one if the individual is female. Do women sleep more than men? Explain the differences between the estimates and Dependent Variable: SLEEP Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C FEMALE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 1.39E+08 Schwarz criterion Dependent Variable: SLEEP Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C TOTWRK EDUC FEMALE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 1.23E+08 Schwarz criterion

68 68 Example. The goal is to analyze the impact of another year of education on wages. Consider: wage: monthly earnings; KWW: knowledge of world work score KWW is a general test of work-related abilities; educ: years of education; exper: years of work experience; tenure: years with current employer Dependent Variable: LOGWAGE Method: Least Squares Sample: White Heteroskedasticity Consistent Standard Errors & Covariance Variable Coefficient Std. Error t Statistic Prob. C EDUC R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Dependent Variable: LOGWAGE Method: Least Squares Sample: White Heteroskedasticity Consistent Standard Errors & Covariance Variable Coefficient Std. Error t Statistic Prob. C EDUC EXPER TENURE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Dependent Variable: LOGWAGE Method: Least Squares Sample: White Heteroskedasticity Consistent Standard Errors & Covariance Variable Coefficient Std. Error t Statistic Prob. C EDUC EXPER TENURE IQ KWW R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion

69 69 Exercise Consider y i = β 1 + β 2 x i2 + ε i, i = 1,..., n where x i2 is an impulse dummy, i.e. x 2 is a column vector with n 1 zeros and only one 1. To simplify let us suppose that this 1 is the first element of x 2, i.e. x 2 = [ ]. Find and interpret the coeffi cient from the regression of y on x 1 = M 2 x 1 and M 2 = I x 2 x 2 x 2 1 x 2 x 1 is the residual vector from the regression x 1 on x 2. Exercise Consider the long regression model under Assumptions 1.1 through 1.4: y = X 1 b 1 + X 2 b 2 + e, and the following coeffi cients obtained from the short regressions: b 1 = X 1 X 1 1 X 1 y, b 2 = X 2 X 2 1 X 2 y. Decide if you agree or disagree with the following statement: if Cov b 1, b 2 X 1, X 2 = O zero matrix then b 1 = b 1 and b 2 = b 2.

70 Multicollinearity If rank X < K then b is not defined. This is called strict multicollinearity. When this happens, the statistical software will be unable to construct X X 1. Since the error is discovered quickly, this is rarely a problem for applied econometric practice. The more relevant situation is near multicollinearity, which is often called multicollinearity for brevity. This is the situation when the X X is near singular, when the columns of X are close to linearly dependent. Consequence: the individual coeffi cient estimates will be imprecise. We have shown that Var b K X = σ 2 1 R 2 K S 2 xk n. where R 2 K is the coeffi cient of determination in the auxiliary regression x K = α 1 x 1 + α 2 x α K 1 x K 1 + error.

71 71 Exercise Do you agree with the following quotations: a But more data is no remedy for multicollinearity if the additional data are simply "more of the same." So obtaining lots of small samples from the same population will not help Johnston, 1984; b Another important point is that a high degree of correlation between certain independent variables can be irrelevant as to how well we can estimate other parameters in the model. Exercise Suppose you postulate a model explaining final exam score in terms of class attendance. Thus, the dependent variable is final exam score, and the key explanatory variable is number of classes attended. To control for student abilities and efforts outside the classroom, you include among the explanatory variables cumulative GPA, SAT score, and measures of high school performance. Someone says, You cannot hope to learn anything from this exercise because cumulative GPA, SAT score, and high school performance are likely to be highly collinear. What should be your answer?

72 Statistical Inference under Normality Assumption normality of the error term. ε X Normal Assumption 1.5 together with Assumptions 1.2 and 1.4 implies that ε X N 0,σ 2 I and y X N Xβ,σ 2 I. Suppose that we want to test H 0 : β 2 = 1. Although Proposition 1.1 guarantees that, on average, b 2 the OLS estimate of β 2 equals 1 if the hypothesis H 0 : β 2 = 1 is true, b 2 may not be exactly equal to 1 for a particular sample at hand. Obviously, we cannot conclude that the restriction is false just because the estimate b 2 differs from 1. In order for us to decide whether the sampling error b 2 1 is too large for the restriction to be true, we need to construct from the sampling distribution error some test statistic whose probability distribution is known given the truth of the hypothesis. The relevant theory is built from the following results:

73 73 1. z N 0, I z z χ 2 n. 2. w 1 χ 2 m, w 2 χ 2 n, w 1 and w 2 are independent w 1/m w 2 /n F m, n. 3. w χ 2 n, z N 0, 1, w and z are independent z t w/n n. 4. Asymptotic Results: v F m, n mv u t n u d χ 2 m as n d N 0, 1 as n. 5. Consider the vector n 1 vector y X N Xβ, Σ. Then, w = y Xβ Σ 1 y Xβ χ 2 n.

74 74 6. Consider the vector n 1 vector ε X N 0, I. Let M be a n n idempotent matrix with rank M = r n. Then, ε Mε X χ 2 r. 7. Consider the vector n 1 vector ε X N 0, I. Let M be a n n idempotent matrix with rank M = r n. Let L be a matrix such that LM = O. Let t 1 = Mε and t 2 = Lε. Then t 1 and t 2 are independent random vectors. 8. b X N β,σ 2 X X Let r = Rβ R p K with rank R = p in Hayashi s notation p is equal to #r. Then, Rb X N r,σ 2 R X X 1 R.

75 Let b k be the kth element of b and q kk the k, k element of X X 1. Then, b k X N β k, σ 2 q kk or z k = b k β k N 0, 1. σ qkk 11. w = Rb r R X X 1 R 1 Rb r /σ 2 χ 2 p. 12. w k = b k β k 2 σ 2 q kk χ w 0 = e e/σ 2 χ 2 n K. 14. The random vectors b and e are independent. 15. Each of the statistics e, e e, w 0, s 2, Var b, is independent of each of the statistics b, b k, Rb, w, w k.

76 t k = b k β k ˆσ bk t n K, where ˆσ 2 b k is the k, k element of s 2 X X s Rb Rβ RX X 1 R t n K, R is of type 1 K 18. F = Rb r R X X 1 R 1 Rb r / ps 2 F p, n K. Exercise Prove the results #8, #9, #16 and #18 take the other results as given. The two most important results are: t k = b k β k ˆσ bk = b k β k SE b k t n K F = Rb r R X X 1 R 1 Rb r / ps 2 F p, n K.

77 Confidence Intervals and Regions Let t α/2 t α/2 n k be such that P t < t α/2 = 1 α.

78 78 Let F α F α p, n K be such that P F > F α = 1 α

79 79 1 α 100% CI for an individual slope coeffi cient β k : β k : ˆσ bk b j β k t α/2 b k ± t α/2ˆσ bk. 1 α 100% CI for a single linear combination of the elements of β p = 1 Rβ : s Rb Rβ R X X 1 R t α/2 Rb ± t α/2 s R X X 1 R. In this case R is a vector 1 K. 1 α 100% Confidence Region for the parameter vector θ = Rβ : {θ : Rb θ R X X } 1 1 R Rb θ /s 2 pf α. 1 α 100% Confidence region for the parameter vector β consider R = I in the previous case { β : b β X X b β /s 2 pf α }.

80 80 Exercise Consider y i = β 1 x i1 + β 2 x i2 + ε i where y i = wages i wages, x i1 = educ i educ, x i2 = exper i exper. The results are Dependent Variable: Y Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. XX X R squared Mean dependent var 1.34E 15 Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter Durbin Watson stat X X = [ ], X X 1 = [ ] a Build the 95% confidence interval for β 2. b Build the 95% confidence interval for β 1 + β 2. c Build the 95% confidence region for the parameter vector β.

81 81 Confidence regions in the EVIEWS beta beta1 90% and 95% Confidence region for the parameter vector β

82 Testing on a Single Parameter Suppose that we have a hypothesis about the kth regression coeffi cient: H 0 : β k = β 0 k β 0 k is a specific value, e.g. zero, and that this hypothesis is tested against the alternative hypothesis H 1 : β k β 0 k. We do not reject H 0 at the α100% level if β 0 k lies within the 1 α 100% CI for β k, i.e., b k ± t α/2ˆσ bk ; reject H 0 otherwise. Equivalently, calculate the test statistic and, t obs = b k β 0 k ˆσ bk if t obs > t α/2 then reject H 0, if t obs t α/2 then do not reject H 0.

83 83 The reasoning is as follow. Under the null hypothesis we have t 0 k = b k β 0 k ˆσ bk t n K. If we observe t obs > t α/2 and the H 0 is true, then a low-probability event has occurred. We take t obs > t α/2 as an evidence against the null and the decision should be to reject H 0. Other cases: H 0 : β k = β 0 k vs. H 1 : β k > β 0 k, if t obs > t α then reject H 0 at the α100% level; otherwise do not reject H 0. H 0 : β k = β 0 k vs. H 1 : β k < β 0 k, if t obs < t α then reject H 0 at the α100% level; otherwise do not reject H 0.

84 Issues in Hypothesis Testing p-value p-value or p is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. p is an informal measure of evidence of the null hypothesis. Example. Consider H 0 : β k = β 0 k vs. H 1 : β k β 0 k p-value = 2P t 0 k > t obs H 0 is true. A p-value = 0.02 shows little evidence supporting H 0. At the 5% level you should reject the H 0 hypothesis. Example. Consider H 0 : β k = β 0 k vs. H 1 : β k > β 0 k EVIEWS: divide the reported p-value by two. p-value = P t 0 k > t obs H 0 is true.

85 85 Reporting the outcome of a test Correct wording in reporting the outcome of a test involving H 0 : β k = β 0 k vs. H 1 : β k β 0 k When the null is rejected we say that b k not β k is significantly different from β 0 k at α100%. When the null isn t rejected we say that b k not β k is not significantly different from β 0 k at α100%. Correct wording in reporting the outcome of a test involving H 0 : β k = 0 vs. H 1 : β k 0 When the null is rejected we say that b k not β k is significantly different from zero at α100% level, or the variable associated with b k is statistically significant at α100%. When the null isn t rejected we say that b k not β k is not significantly different from zero at α100% level, or the variable is not statistically significant at α100%.

86 86 More Remarks: Rejection of the null is not proof that the null is false. Why? Acceptance of the null is not proof that the null is true. Why? We prefer to use the language we fail to reject H0 at the x% level rather than H0 is accepted at the x% level. In a test of type H 0 : β k = β 0 k, if ˆσ b k is large b k is an imprecise estimator is more diffi cult to reject the null. The sample contains little information about the true value of β k parameter. Remember that ˆσ bk depends on σ 2, S 2 x k, n and R 2 k.

87 87 Statistical Versus Economic Significance The statistical significance of a variable is determined by the size of t obs = b k /se b k, whereas the economic significance of a variable is related to the size and sign of b k. Example. Suppose that in a business activity we have log wage i = female +... n = H 0 : β 2 = 0 vs. H 1 = β 2 0. We have: t 0 k = b 2 ˆσ b2 t 600 K N 0, 1 under the null t obs = = 10, p-value = 2P t 0 k > 10 H0 is true 0. Discuss statistical versus economic significance.

88 88 Exercise Can we say that students at smaller schools perform better than those at larger schools? To discuss this hypothesis we consider data on 408 high schools in Michigan for the year 1993 see Wooldridge, chapter 4. Performance is measured by the percentage of students receiving a passing score on a tenth grade math test math10. School size is measured by student enrollment enroll. We will control for two other factors, average annual teacher compensation totcomp and the number of staff per one thousand students staff. Teacher compensation is a measure of teacher quality, and staff size is a rough measure of how much attention students receive. Figure below reports the results. Answer to the initial question. Dependent Variable: MATH10 Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C TOTCOMP STAFF ENROLL R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

89 89 Exercise We want to relate the median housing price price in the community to various community characteristics: nox is the amount of nitrous oxide in the air, in parts per million; dist is a weighted distance of the community from five employment centers, in miles; rooms is the average number of rooms in houses in the community; and stratio is the average student-teacher ratio of schools in the community. Can we conclude that the elasticity of price with respect to nox is -1? Sample: 506 communities in the Boston area - see Wooldridge, chapter 4. Dependent Variable: LOGPRICE Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C LOGNOX LOGDIST ROOMS STRATIO R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

90 Test on a Set of Parameter I Suppose that we have a joint null hypothesis about β : where Rβ p 1, R p K. The test statistics is H 0 : Rβ = r vs. H 1 : Rβ r. F 0 = Rb r R X X 1 1 R Rb r / ps 2. Let F obs be the observed test statistics. We have reject H 0 if F obs > F α or if p-value < α do not reject H 0 if F obs F α. The reasoning is as follow. Under the null hypothesis we have F 0 F p,n K. If we observe F 0 > F α and the H 0 is true, then a low-probability event has occurred.

91 91 In the case p = 1 single linear combination of the elements of β one may use the test statistics t 0 Rb Rβ = s R X X 1 t n K. R Example. We consider a simple model to compare the returns to education at junior colleges and four-year colleges; for simplicity, we refer to the latter as universities See Wooldridge, chap. 4.The model is log wages i = β 1 + β 2 jc i + β 3 univ i + β 4 exper i + ε i. The population includes working people with a high school degree. jc is number of years attending a two-year college and univ is number of years at a four-year college. Note that any combination of junior college and college is allowed, including jc = 0 and univ = 0. The hypothesis of interest is whether a year at a junior college is worth a year at a university: this is stated as H 0 : β 2 = β 3. Under H0, another year at a junior college and another year at a university lead to the same ceteris paribus percentage increase in wage. The alternative of interest is one-sided: a year at a junior college is worth less than a year at a university. This is stated as H 1 : β 2 < β 3.

92 92 Dependent Variable: LWAGE Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C JC UNIV EXPER R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic X X 1 = Under the null, the test statistics is t 0 = Rb Rβ s R X X 1 t n K. R

93 93 We have s R = [ ] R X X 1 R = R X X 1 R = = Rb = [ ] Rβ = [ ] β 1 β 2 β 3 β 4 t obs = = t 0.05 = = = β 2 β 3 = 0 under H 0 We do not reject H 0 at the 5% level. There is no evidence against β 2 = β 3 at 5% level.

94 94 Remark: in this exercise t 0 can be written as t 0 = s Rb R X X 1 = b 2 b 3 R Var b2 b 3 = b 2 b 3 SE b 2 b 3. Exercise 2.20 continuation. Propose another way to test H 0 : β 2 = β 3 against H 0 : β 2 < β 3 along the following lines: define θ = β 2 β 3 ; write β 2 = θ + β 3 ; plug this into the equation log wages i = β 1 + β 2 jc i + β 3 univ i + β 4 exper i + ε i and test θ = 0. Use the database available on the webpage of the course.

95 Test on a Set of Parameter II We focus on another way to test H 0 : Rβ = r vs. H 1 : Rβ r. where Rβ p 1, R p K. It can be proved that F 0 = Rb r R X X 1 1 R Rb r / ps 2 = = e e e e /p e e/ n K R 2 R 2 /p 1 R 2 / n K F p, n K where refers to the short regression or the regression subjected to the constraint Rβ = r.

96 96 Example. Consider once again the equation log wages i = β 1 + β 2 jc i + β 3 univ i + β 4 exper i + ε i and H 0 : β 2 = β 3 against H 0 : β 2 β 3. The results of the regression subjected to the constraint H 0 : β 2 = β 3 are Dependent Variable: LWAGE Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C JC+UNIV EXPER R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic We have p = 1, e e = , e e = and F obs = e e e e /p /1 e = = 2.151, e/ n K / F 0.05 = We do not reject the null at 5% level, since F obs = < F 0.05 = 3.84.

97 97 In the case all slopes zero test of significance of the complete regression, it can be proved that F o equals F 0 = R 2 / K 1 1 R 2 / n K. Under the null H 0 : β k = 0, k = 2, 3,..., K, we have F 0 F K 1, n K. Exercise Consider the results: Dependent Variable: Y Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C X X R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Test: a H 0 : β 2 = 0 vs. H 1 : β 2 0; b H 0 : β 3 = 0 vs. H 1 : β 3 0; c H 0 : β 2 = 0, β 3 = 0 vs. H 1 : β i 0 i = 1, 2 d Are x i2 and x i3 truly relevant variables? How would you explain the results you obtained in parts a, b and c?

98 Relation to Maximum Likelihood Having specified the distribution of the error vector, we can use the maximum likelihood ML principle to estimate the model parameters θ = β, σ The Maximum Likelihood Principle ML principle: choose the parameter estimates to maximize the probability of obtaining the data. Maximizing the joint density associated with the data, f y, X; θ, leads to the same solution. Therefore: ML estimator of θ = arg max θ f y, X; θ.

99 99 Example Without X. We flipped a coin 10 times. If heads then y = 1. Obviously y Bernoulliθ. We don t know if the coin is fair, so we treated E Y = θ as unknown parameter. Suppose that 10 i=1 y i = 6. We have f y;θ = f y 1,..., y n ; θ = n i=1 = θ i y i 1 θ 10 i y i = θ 6 1 θ 4. f y i ; θ = θ y 1 1 θ 1 y 1... θ yn 1 θ 1 y n joint density theta

100 100 To obtain the ML estimate of θ we proceed with: and since dθ 6 1 θ 4 dθ = 0 ˆθ = 6 10 d 2 θ 6 1 θ 4 dθ 2 < 0 ˆθ = 0.6 maximizes f y; θ. ˆθ is the most likely value θ, that is the value that maximizes the probability of observing y 1,..., y 10. Notice that the ML estimator is ȳ. Since log x, x > 0 is a strictly increasing function we have: ˆθ maximizes f y; θ iff ˆθ maximizes log f y; θ, that is ˆθ = arg max θ f y, X; θ ˆθ = arg max θ log f y, X; θ. In most cases we prefer to solve max θ log f y, X; θ rather max θ f y, X; θ, since the transformation log greatly simplify the likelihood products become sums.

101 Conditional versus Unconditional Likelihood The joint density f y, X; ζ is in general diffi cult to handle. Consider: f y, X; ζ = f y X; θ f X; ψ, ζ = θ, ψ, log f y, X; ζ = log f y X; θ + log f X; ψ In general we don t know f X; ψ. Example. Consider y i = β 1 x i1 + β 2 x i2 + ε i where ε i X N 0, σ 2 y i X N x iβ, σ2 X N µ x, σ 2 xi. Thus, θ = [ β σ 2 ], ψ = [ µx σ 2 x ], ζ = [ θ ψ ]. If there is no functional relationship between θ and ψ such as a subset of ψ being a function of θ, then maximizing log f y, X; ζ with respect to ζ is achieved by separately maximizing f y X; θ with respect to θ and maximizing f X; ψ with respect to ψ. Thus the ML estimate of θ also maximizes the conditional likelihood f y X; θ.

102 The Log Likelihood for the Regression Model Assumption 1.5 the normality assumption together with Assumptions 1.2 and 1.4 imply that the distribution of ε conditional on X is N 0, σ 2 I. Thus, ε X N 0, σ 2 I y X N Xβ,σ 2 I f y X; θ = 2πσ 2 n/2 exp 1 2σ 2 y Xβ y Xβ log f y X; θ = n 2 log 2πσ 2 1 2σ 2 y Xβ y Xβ. It can be proved log f y X; θ = n i=1 log f y i x i = n 2 log 2πσ 2 1 2σ 2 n i=1 yi x i β 2. Proposition ML Estimator of β and σ 2. Suppose Assumptions hold. Then, ML estimator of β = X X 1 X y. ML estimator of σ 2 = e e n s2 = e e n K.

103 103 We know that E s 2 = σ 2. Therefore: E e e n σ 2. lim n E e e n = σ 2. Proposition b is the Best Unbiased Estimator BUE. Under Assumptions , the OLS estimator b of β is BUE in that any other unbiased but not necessarily linear estimator has larger conditional variance in the matrix sense. This result should be distinguished from the Gauss-Markov Theorem that b is minimum variance among those estimators that are unbiased and linear in y. Proposition 1.6 says that b is minimum variance in a larger class of estimators that includes nonlinear unbiased estimators. This stronger statement is obtained under the normality assumption Assumption 1.5 which is not assumed in the Gauss-Markov Theorem. Put differently, the Gauss-Markov Theorem does not exclude the possibility of some nonlinear estimator beating OLS, but this possibility is ruled out by the normality assumption.

104 104 Exercise Suppose y i = x i β + ε i where ε i X t v. Assume that Assumptions hold. Use your intuition to answer true or false to the following statements: a b is the BLUE; b b is the BUE; c the BUE estimator can only be obtained numerically i.e. there is not a closed formula for the BUE estimator. Just out of curiosity notice that the log-likelihood function is n i=1 log f y i x i = n 2 log σ2 n 2 log π n log v 2 2 +n log Γ v+1 2 Γ v + 1 n log yi x i β 2 v 2 v 2 σ 2 2 i=1.

105 Generalized Least Squares GLS We have assumed that E ε 2 i X = Var εi X = σ 2 > 0, i, Homoskedasticity E εi ε j X = 0, i, j; i j No correlation between observations. Matrix notation: E εε X = = E ε 2 1 X E ε 1 ε 2 X E ε 1 ε n X E ε 1 ε 2 X E ε 2 2 X E ε 2 ε n X E ε 1 ε n X E ε 2 ε n X E ε 2 n X σ σ = 0 0 σ 2 σ2 I.

106 106 The Assumption E εε X = σi is violated if either E ε 2 i X depends on X Heteroskedasticity, or E εi ε j X 0 Serial Correlation We will analyze this case later. Let s assume now that E εε X = σ 2 V V depends on X. The model y = Xβ + ε based on the assumptions Assumptions and E εε X = σ 2 V is called generalized regression model. Notice that by definition, we always have: E εε X = Var ε X = Var y X.

107 107 Example case where E ε 2 i X depends on X. Consider the following model y i = β 1 + β 2 x i2 + ε i to explain household expenditure on food y as a function of household income. Typical behavior: Low-income household do not have the option of extravagant food tastes: they have few choices and are almost forced to spend a particular portion of their income on food; High-income household could have simple food tastes or extravagant food tastes: income by itself is likely to be relatively less important as an explanatory variable. y : Expenditure x : Income

108 108 If e accurately reflects the behavior of the ε, the information in the previous figure suggests that the variability of y i increases as income increases, thus it is reasonable to suppose that This is the same as saying that For example if E ε 2 i xi2 = σ 2 x 2 i2 then E εε X = σ 2 Var y i x i2 is a function of x i2. E ε 2 i xi2 is a function of xi2. x x x 2 n2 } {{ } V = σv σ 2 I.

109 Consequence of Relaxing Assumption The Gauss-Markov Theorem no longer holds for the OLS estimator. The BLUE is some other estimator. 2. The t-ratio is not distributed as the t distribution. Thus, the t-test is no longer valid. The same comments apply to the F-test. Note that Var b X is no longer σ 2 X X 1. In effect, Var b X = Var X X 1 X y On the other hand, X = σ 2 X X 1 X VX X X 1. E s 2 E e e X X = n K tr Var e X = n K = X X 1 X Var y X X X X 1 = σ2 tr MVM n K = σ2 tr MV n K. The conventional standard errors are incorrect when Var y X σ 2 I. Confidence region and hypothesis test procedures based on the classical regression model are not valid.

110 However, the OLS estimator is still unbiased, because the unbiasedness result Proposition 1.1 a does not require Assumption 1.4. In effect, E b X = X X 1 X E y X = X X 1 X Xβ = β, E b = β Options in the presence of E εε X σ 2 I: Use b to estimate β and Var b X = σ 2 X X 1 X VX X X 1 for inference purposes. Note that y X N Xβ, σ 2 V implies b X N β,σ 2 X X 1 X VX X X 1. This is not a good solution as if you know V you may use a more effi cient estimator, as we will see below. Later on, in chapter Large Sample Theory we will find that σ 2 V may be replaced by a consistent estimator. Search for a better estimator of β.

111 Effi cient Estimation with Known V If the value of the matrix function V is known, a BLUE estimator for β, called generalized least squares GLS, can be deduced. The basic idea of the derivation is to transform the generalized regression model into a model that satisfies all the assumptions, including Assumption 1.4, of the classical regression model. Consider y = Xβ + ε, E εε X = σ 2 V. We should multiply both sides of the equation by a nonsingular matrix C depending on X Cy = CXβ + Cε ỹ = Xβ + ε such that the transformed error ε verify E ε ε X = σ 2 I, i.e. that is CVC = I. E ε ε X = E Cεε C X = C E εε X C = σ 2 CVC = σ 2 I

112 112 Given CVC = I, how to find C? Since V is by construction symmetric and positive definite, there exists a nonsingular n n matrix C such Note V = C 1 C 1 or V 1 = C C CVC = CC 1 C 1 C = I. It easy to see that if y = Xβ + ε satisfies Assumptions and Assumption 1.5 but not Assumption 1.4, then ỹ = Xβ + ε, where ỹ = Cy, X = CX satisfies Assumptions Let ˆβ GLS = X X 1 X ỹ = X V 1 X 1 X V 1 y.

113 113 Proposition finite-sample properties of GLS. a unbiasedness Under Assumption , E ˆβ GLS X = β. b expression for the variance Under Assumptions and the assumption E εε X = σ 2 V that the conditional second moment is proportional to V, Var ˆβGLS X = σ 2 X V 1 X 1. c the GLS estimator is BLUE Under the same set of assumptions as in b, the GLS estimator is effi cient in that the conditional variance of any unbiased estimator that is linear in y is greater than or equal to Var ˆβ GLS X in the matrix sense. Remark: Var b X Var ˆβGLS X is a positive semidefinite matrix. In particular, Var b j X Var ˆβj,GLS X.

114 A Special Case: Weighted Least Squares WLS Let s suppose that Recall: C is such that V 1 = C C. E ε 2 i X = σ 2 v i v i is a function of X. We have V = v v V 1 = 1/v /v C = 0 0 v n 1/ v / v /v n 0 0 1/ v n

115 115 Now ỹ = Cy = X = CX = = 1/ v / v / v n 1/ v / v / v n 1/ v 1 x 12 / v 1 x 1K / v 1 1/ v 2 x 22 / v 2 x 2K / v / v n x n2 / v n x nk / v n. y 1 y 2. y n = y 1 v1 y 2 v2. y n vn 1 x 12 x 1K 1 x 22 x 2K x n2 x nk Another way to express these relations: ỹ i = y i vi, x ik = x ik vi, i = 1, 2,..., n.

116 116 Example. Suppose that y i = α + βx i2 + ε i, Var y i x i2 = Var ε i x i2 = σ 2 e x i2, Cov y i, y j xi2, x j2 = 0 V = Transformed model matrix notation: or scalar notation: e y 1 x 12. y n e x n2 e x e x i e x n2 Cy = CXβ + Cε = 1 e x 12 x 12 e x e x n2. e x n2 xn2 [ β1 β 2 ] +. e ε 1 x 12. e ε n x n2 ỹ i = x i1 β 1 + x i2 β 2 + ε i, i = 1,..., n y i = 1 β e x i2 e x 1 + i2 x i2 e x i2 β 2 + ε i e x i2, i = 1,..., n.

117 117 Notice: Var ε i X = Var εi e x i2 x i2 = 1 e x i2 Var ε i x i2 = 1 e x i2 σ2 e x i2 = σ 2. Effi cient estimation under a known form of heteroskedasticity is called the weighted regression or the weighted least squares WLS. Example. Consider wage i = β 1 + β 2 educ i + β 3 exper i + ε i WAGE 15 WAGE EXPER EDUC

118 Dependent Variable: WAGE Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C EDUC EXPER RES R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic EDUC Assume Var ε i educ i, exper i = σ 2 educ 2 i. Transformed model: wage i educ i = β 1 educ i + β 2 educ i educ i + β 3 exper i educ i + ε i, i = 1,..., n

119 119 Dependent Variable: WAGE/EDUC Method: Least Squares Sample: IF EDUC>0 Variable Coefficient Std. Error t Statistic Prob. 1/EDUC EDUC/EDUC EXPER/EDUC R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter Durbin Watson stat Exercise Let {y i, i = 1, 2,...} be a sequence of independent random variables with distribution N β, σ 2 i, where σ 2 i is known note: we assume σ 2 1 σ When the variances are unequal, the sample mean ȳ is not the best linear unbiased estimator, i.e. BLUE. The BLUE has the form ˆβ = ni=1 w i y i where w i are nonrandom weights. a Find a condition on w i such that E ˆβ = β; b Find the optimal weights wi that make ˆβ the BLUE. Hint: You may translate this problem into an econometric framework: if {y i } is a sequence of independent random variables with distribution N β, σ 2 i then yi can be represented by the equation y i = β + ε i, where ε i N 0, σ 2 i. Then find the GLS estimator of β.

120 120 Exercise Consider y i = βx i1 + ε i, β > 0 and assume E ε i X = 0, Var ε i X = 1 + x i1, Cov ε i, ε j X = 0. a Suppose we have a lot of observations and plot a graph of the observation of y i and x i2. How would the scattered plot look like? b Propose an unbiased estimator with minimum variance; c Suppose we have the 3 following observation of x i2, y i : 0, 0, 3, 1 and 8, 5. Estimate the value of β from these 3 observations. Exercise Consider y t = β 1 + β 2 t + ε t, Var ε i = σ 2 t 2, i = 1,..., 20 Find σ 2 X X 1, Var b X and Var ˆβ GLS X and comment on the results. Solution: σ 2 X X 1 = σ 2 [ ] [ , Var b X = σ Var ˆβGLS [ ] X = σ ]

121 121 Exercise A research first ran a OLS regression. Then she was given the true V matrix. She transformed the data appropriately and obtained the GLS estimator. For several coeffi - cient, standard errors in the second regression were larger than those in the first regression. Does this contradict 1.7 proposition? See the previous exercise Limiting Nature of GLS Finite-sample properties of GLS rest on the assumption that the regressors are strictly exogenous. In time-series models the regressors are not strictly exogenous and the error is serially correlated. In practice, the matrix function V is unknown. V can be estimated from the sample. This approach is called the Feasible Generalized Least Squares FGLS. But if the function V is estimated from the sample, its value V becomes a random variable, which affects the distribution of the GLS estimator. Very little is known about the finite-sample properties of the FGLS estimator. We need to use the large-sample properties...

122 122 3 Large-Sample Theory The finite-sample theory breaks down if one of the following three assumptions is violated: 1. the exogeneity of regressors, 2. the normality of the error term, and 3. the linearity of the regression equation. This chapter develops an alternative approach based on large-sample theory n is suffi ciently large.

123 Review of Limit Theorems for Sequences of Random Variables Convergence in Probability in Mean Square and in Distribution Convergence in Probability A sequence of random scalars {z n } converges in probability to a constant non-random α if, for any ε > 0, We write lim n P z n α > ε = 0. z n p α or plim z n = α. As we will see, z n is usually a sample mean z n = ni=1 y i n or z n = ni=1 z i n.

124 124 Example. Consider a fair coin. Let z i = 1 if the ith toss results in heads and z i = 0 otherwise. Let z n = n 1 n p i=1 z i. The following graph suggests that z n 1/2.

125 125 A sequence of K dimensional vectors {z n } converges in probability to a K-dimensional vector of constants α if, for any ε > 0, We write lim n P z nk α k > ε = 0, k z n p α. Convergence in Mean Square A sequence of random scalars {z i } converges in mean square or in quadratic mean to a α if [ lim n E zn α 2] = 0 The extension to random vectors is analogous to that for convergence in probability.

126 126 Convergence in Distribution Let {z n } be a sequence of random scalars and F n be the cumulative distribution function c.d.f. of z n, i.e. z n F n. We say that {z n } converges in distribution to a random scalar z if the c.d.f. F n, of z n, converges to the c.d.f. F of z at every continuity point of F. We write z n d z, where z F, F is is the asymptotic or limiting distribution of z. If F is well-known, for example, if F is the cumulative normal N 0, 1 distribution we prefer to write z n d d N 0, 1 instead of z n z and z N 0, 1. Example. Consider z n t n. We know that z n d N 0, 1. In most applications z n is of type z n = n ȳ E y i. Exercise 3.1. For z n = n ȳ E y i calculate E z n and Var z n assume E y i = µ, Var y i = σ 2 and {y i } is an i.i.d. sequence.

127 Useful Results Lemma preservation of convergence for continuous transformation. Suppose f is a vector-valued continuous function that does not depend on n. Then: a if z n b if z n p α f z n d z f z n p f α ; d f z. An immediate implication of Lemma 2.3 a is that the usual arithmetic operations preserve convergence in probability: x n x n x n p p γ x n + y n p p β, y n β + γ. p β, y n γ p x n y n βγ. p p β, y n γ p x n /y n β/γ, γ 0. p Y n Γ Yn 1 p Γ 1 Γ is invertible.

128 128 Lemma 2.4. We have a x n b x n d x, y n d x, y n p α x n + y n p 0 y nx n d x + α. p 0. d p c x n x, A n A A n x n d A n x n N 0, AΣA. d Ax. In particular if x N 0, Σ, then d x n d x, A n p A x na 1 n x n d x A 1 x A is nonsingular. If x n p 0 we write x n = o p 1. If x n y n p 0 we write x n = y n + o p 1. In part c we may write A n x n distribution. d = Axn A n x n and Ax n have the same asymptotic

129 Viewing Estimators as Sequences of Random Variables Let ˆθ n be an estimator of a parameter vector θ based on a sample of size n. We say that an estimator ˆθ n is consistent for θ if ˆθ n p θ. The asymptotic bias of ˆθ n, is defined as plim n ˆθn θ. So if the estimator is consistent, its asymptotic bias is zero. Wooldridge s quotation: While not all useful estimators are unbiased, virtually all economists agree that consistency is a minimal requirement for an estimator. The famous econometrician Clive W.J. Granger once remarked: If you can t get it right as n goes to infinity, you shouldn t be in this business. The implication is that, if your estimator of a particular population parameter is not consistent, then you are wasting your time.

130 130 A consistent estimator ˆθ n is asymptotically normal if n ˆθ n θ Such an estimator is called n-consistent. d N 0, Σ. The variance matrix Σ is called the asymptotic variance and is denoted Avar ˆθn, i.e. lim n Var n ˆθ n θ = Avar ˆθ n = Σ. Some authors use the notation Avar ˆθ n to mean Σ/n which is zero in the limit.

131 Laws of Large Numbers and Central Limit Theorems Consider We say that z n obeys to the LLN if z n z n = 1 n n i=1 z i. p µ where µ = E z i or lim n E z n = µ. A Version of Chebychev s Weak LLN If lim E z n = µ lim Var z n = 0 z n p µ. Kolmogorov s Second Strong LLN If {z i } is i.i.d. with E z n = µ z n p µ. These LLNs extend readily to random vectors by requiring element-by-element convergence.

132 132 Theorem 1 Lindeberg-Levy CLT. Let {z i } be i.i.d. with E z n = µ and Var z i = Σ. Then n zn µ = 1 n n i=1 z i µ d N 0, Σ. Notice that E n zn µ = 0 E z n = µ Var n zn µ = Σ Var z n = Σ/n Given the previous equations, some authors write z n a N µ, Σ n.

133 133 Example. Let {z i } be i.i.d. with distribution χ 2 1. By the Lindeberg-Levy CLT scalar case we have z n = 1 n a z i N µ, σ2 n n where E z n = 1 n n i=1 Var z n = Var 1 n i=1 E z i = E z i = µ = 1; n z i i=1 = 1 n Var z i = σ2 n = 2 n.

134 134 Probability Density Function of z n obtained by Monte-Carlo Simulation Probability Density Function of n zn µ exact expressions for n = 5, 10 and 50

135 135 Example. In a random sampling, sample size = 30, on the variable z with E z = 10, Var z = 9 but unknown distribution, obtain an approximation to P z n < 9.5. We do not know the exact distribution of z n. However, from Lindeberg-Levy CLT we have z n µ d a n N 0, 1 or z n N µ, σ2. σ n Hence, P z n < 9.5 = n z n µ P < σ 3 Φ , [Φ is the cdf of N 0, 1 ] =

136 Fundamental Concepts in Time-Series Analysis Stochastic process SP: is a sequence of random variables. For this reason, it is more adequate to write a SP as {z i } means a sequence of random variables rather than z i means the random variable at time i.

137 Various Classes of Stochastic processes Definition Stationary Processes. A SP {z i } is strictly stationary if the joint distribution of z 1, z 2,..., z s equals to that of z k+1, z k+2,..., z k+s for any s N and k Z. Exercise 3.2. Consider a SP {z i } where E g z i <. Show that if {z i } is a strictly stationary process then E g z i is constant and do not depend on t. The definition implies that any transformation function of a stationary process is itself stationary, that is, if {z i } is stationary, then {g z i } is. For example, if {z i } is stationary then { z i z i} is also a SP. Definition Covariance Stationary Processes. A stochastic process {z i } is weakly or covariance stationary if: i E z i does not depend on i, and ii Cov z i, z i j exists, is finite, and depends only on j but not on i. If {z i } is a covariance SP then Cov z 1, z 5 = Cov z 1001, z A transformation function of a covariance stationary process may or may not be a covariance stationary process.

138 Example. It can be proved that {z i }, z i = α 0 + α 1 zi 1 2 ε i, where {ε i } is i.i.d. with mean zero and unit variance and α 0 > 0 and 1/3 α 1 < 1 is a covariance stationary process. However, w i = z 2 i is not a covariance stationary process as E w 2 i does not exist. Exercise 3.3. Consider the SP {u t } where u t = ξ t if t 2000 k 2 k ζ t if t > 2000 where ξ t and ζ s are independent for all t and s and ξ t iid N 0, 1 and ζ s iid t k. Explain why {u t } is weakly or covariance stationary but not strictly stationary. Definition White Noise Processes. A white noise process {z i } is a covariance stationary process with zero mean and no serial correlation: E z i = 0, Cov z i, z j = 0, i j. 138

139 139 Y Y Y Y

140 140 In the literature there is not a unique definition of ergodicity. We prefer to call weakly dependent process to what Hayashi calls ergodic process. Definition. A stationary process {z i } is said to be a weakly dependent process = ergodic in Hayashi s definition if, for any two bounded functions f : R k+1 R and g : R s+1 R, lim n E [ f z i,.., z i+k g zi+n,.., z i+n+s ] = lim n E f z i,.., z i+k E g z i+n,.., z i+n+s. Theorem 2 S&WD. Let {z i } be a stationary weakly dependent S&WD process with p E z i = µ. Then z n µ. Serial dependence, which is ruled out by the i.i.d. assumption in Kolmogorov s LLN, is allowed in this Theorem, provided that it disappears in the long run. Since, for any function f, {f z i } is a S&WD stationary whenever {z i } is, this theorem implies that any moment of a S&WD process if it exists and is finite is consistently estimated by the sample moment. For example, suppose {z i } is a S&WD process and E zi z i exists and is finite. Then z n = 1 n n i=1 z i z i p E zi z i.

141 141 Definition Martingale. A vector process {z i } is called a martingale with respect to {z i } if The process E z i z i 1,..., z 1 = z i 1 for i 2. z i = z i 1 + ε i where {ε i } is a white noise process with E ε i z i 1 = 0, is a martingale since E z i z i 1,..., z 1 = E z i z i 1 = z i 1 + E ε i z i 1 = z i 1. Definition Martingale Difference Sequence. A vector process {g i } with E g i = 0 is called a martingale difference sequence MDS or martingale differences if E g i g i 1,..., g 1 = 0. If {z i } is a martingale, the process defined as z i = z i z i 1 is a MDS. Proposition. If {g i } is a MDS then Cov g i, g i j = 0, j 0.

142 142 By definition Var ḡ n = 1 n 2 Var n g t t=1 = 1 n 2 n t=1 Var g t + 2 n 1 j=1 n i=j+1 However, if {g i } is a stationary MDS with finite second moment then Cov g i, g i j. so n t=1 Var g t = n Var g t, Cov g i, g i j = 0, Var ḡ n = 1 n Var g t. Definition Random Walk. Let {g i } be a vector independent white noise process. A random walk, {z i }, is a sequence of cumulative sums: z i = g i + g i g 1. Exercise 3.4. Show that the random walk can be written as z i = z i 1 + g i, z 1 = g 1.

143 Different Formulation of Lack of Serial Dependence We have three formulations of a lack of serial dependence for zero-mean covariance stationary processes: 1 {g i } is independent white noise. 2 {g i } is stationary MDS with finite variance. 3 {g i } is white noise Exercise 3.5 Process that satisfies 2 but not 1 - the ARCH process. Consider g i = α 0 + α 1 g 2 i 1 ε i, where {ε i } is i.i.d. with mean zero and unit variance and α 0 > 0 and α 1 < 1. Show that {g i } is a MDS but not a independent white noise.

144 The CLT for S&WD Martingale Difference Sequences Theorem 3 Stationary Martingale Differences CLT Billingsley, Let {g i } be a vector martingale difference sequence that is S&WD process with E gi g i = Σ and let ḡ i = 1 n gi. Then nḡn = 1 n g i n i=1 d N 0, Σ. Theorem 4 Martingale Differences CLT White, Let {g i } be a vector martingale difference sequence. Suppose that a E gi g i = Σt is a positive definite matrix with 1 ni=1 n Σ t Σ positive definite matrix, b g has finite 4th moment, c n 1 gi g i p Σ. Then nḡn = 1 n g i n i=1 d N 0, Σ.

145 Large-Sample Distribution of the OLS Estimator The model presented in this section has probably the widest range of economic applications: No specific distributional assumption such as the normality of the error term is required; The requirement in finite-sample theory that the regressors be strictly exogenous or fixed is replaced by a much weaker requirement that they be "predetermined." Assumption linearity. y i = x i β + ε i. Assumption S&WD. {y i, x i } is jointly S&WD. Assumption predetermined regressors. All the regressors are predetermined in the sense that they are orthogonal to the contemporaneous error term: E x ik ε i = 0, i, k. This can be written as E x i ε i = 0 or E g i = 0 where g i = x i ε i. Assumption rank condition. E xi x i = Σxx is nonsingular.

146 146 Assumption {g i } is a martingale difference sequence with finite second moments. {g i }, where g i = x i ε i, is a martingale difference sequence so a fortiori E g i = 0. The K K matrix of cross moments, E gi g i, is nonsingular. We use S for Avar ḡ the variance of nḡ, where ḡ = 1 n gi. By Assumption 2.2 and S&WD Martingale Differences CLT, S = E gi g i. Remarks: 1. S&WD A special case of S&WD is that {y i, x i } is i.i.d. random sample in crosssectional data. 2. The model accommodates conditional heteroskedasticity If {y i, x i } is stationary, then the error term ε i = y i x iβ is also stationary. The conditional moment E ε 2 i xi can depend on xi without violating any previous assumption, as long as E ε 2 i is constant.

147 E x i ε i = 0 vs. E ε i x i = 0 The condition E ε i x i = 0 is stronger than E x i ε i = 0. In effect, E x i ε i = E E x i ε i x i = E x i E ε i x i = E x i 0 = Predetermined vs. strictly exogenous regressors Assumption 2.3, restricts only the contemporaneous relationship between the error term and the regressors. The exogeneity assumption Assumption 1.2 implies that, for any regressor k, E xjk ε i = 0 for all i and j, not just for i = j. Strict exogeneity is a strong assumption that does not hold in general for time series models.

148 Rank condition as no multicollinearity in the limit Since where b = X X n 1 X y n = 1 n xi x i 1 1 n xi y = S 1 xxs xy S xx = X X n = 1 xi x i n sample average of x ix i S xy = X y n = 1 xi y i sample average of x i y i. n By Assumptions 2.2, 2.4 and theorem S&WD we have X X n = 1 n n i=1 x i x i p E xi x i. Assumption 2.4 guarantees that the limit in probability of X X n has rank K.

149 A suffi cient condition for {g i } to be a MDS Since a MDS is zero-mean by definition, Assumption 2.5 is stronger than Assumption 2.3 this latter is redundant in face of Assumption 2.5. We will need Assumption 2.5 to prove the asymptotic normality of the OLS estimator. A suffi cient condition for {g i } to be an MDS is E ε i F i = 0 where F i = I i 1 x i = {ε i 1, ε i 2,..., ε 1, x i, x i 1,..., x 1 }, I i 1 = {ε i 1, ε i 2,..., ε 1, x i 1,..., x 1 }. This condition implies that the error term is serially uncorrelated and also is uncorrelated with the current and past regressors. Proof. Notice: {g i } is a MDS if Now, using the condition E ε i F i = 0, E g i g i 1,..., g 1 = 0, g i = x i ε i. E x i ε i g i 1,..., g 1 = E [E x i ε i F i g i 1,..., g 1 ] = E [0 g i 1,..., g 1 ] = 0 thus E ε i F i = 0 {g i } is a MDS.

150 When the regressors include a constant Assumption 2.5 is E x i ε i g i 1,..., g 1 = E 1... x ik ε i g i 1,..., g 1 = 0 E ε i g i 1,..., g 1 = 0. E ε i ε i 1,..., ε 1 = E E ε i g i 1,..., g 1 ε i 1,..., ε 1 = 0. Assumption 2.5 implies that the error term itself is a MDS and hence is serially uncorrelated. 8. S is a matrix of fourth moments S = E gi g i = E xi ε i x i ε i = E ε 2 i x i x i. Consistent estimation of S will require an additional assumption.

151 S will take a different expression without Assumption 2.5 In general Avar ḡ = lim Var nḡ = lim Var n 1 n g i = lim Var 1 n g i n n i=1 = lim 1 n Var = lim 1 n = lim 1 n n i=1 n Given stationarity, we have n g i i=1 Var g i + n 1 i=1 Var g i + lim 1 n 1 n n i=1 n i=1 j=1 i=j+1 n 1 n j=1 Cov gi, g i j + Cov gi j, g i i=j+1 Var g i = Var g i. E gi g i j + E gi j g i. Thanks to the assumption 2.5 we have E gi g i j = E gi j g i = 0 so S = Avar ḡ = Var g i = E gi g i.

152 152 Proposition 2.1- asymptotic distribution of the OLS Estimator. a Consistency of b for β Under Assumptions , b p β. b Asymptotic Normality of b If Assumption 2.3 is strengthened as Assumption 2.5, then where n b β d N 0, Avar b Avar b = Σ 1 xxsσ 1 xx. c Consistent Estimate of Avar b Suppose there is available a consistent estimator Ŝ of S. Then under Assumption 2.2, Avar b is consistently estimated by Âvar b = S 1 xxŝs 1 xx where S xx = X X n = 1 n n i=1 x i x i.

153 153 Proposition consistent estimation of error variance. Under the Assumptions , provide E ε 2 i exists and is finite. s 2 = 1 n K n e 2 i i=1 p E ε 2 i Under conditional homocedasticity E ε 2 i xi = σ 2 we will see this in detail later we have, and S = E gi g i = E ε 2 i x i x i =... = σ 2 E xi x i = σ 2 Σ xx Thus Avar b = Σ 1 xxsσ 1 xx = Σ 1 xx σ2 Σ xx Σ 1 xx = σ2 Σ 1 Âvar b = s 2 X 1 X = s 2 n X X 1. n b a N β, Âvar b = N β, s 2 X X 1 n xx,

154 Statistical Inference Derivation of the distribution of test statistics is easier than in finite-sample theory because we are only concerned about the large-sample approximation to the exact distribution. Proposition robust t-ratio and Wald statistic. Suppose Assumptions hold, and suppose there is available a consistent estimate of Ŝ of S. As before, let Âvar b = S 1 xxŝs 1. Then a Under the null hypothesis H 0 : β k = β 0 k t 0 k = b k β 0 k ˆσ bk d N 0, 1, where ˆσ 2 b k = Âvar b k n b Under the null hypothesis H 0 : Rβ = r, with rank R = p = S 1 xxŝs 1 kk n. W = n Rb r RÂvar b R 1 Rb r d χ 2 p.

155 155 Remarks ˆσ bk is called is called the heteroskedasticity-consistent standard error, heteroskedasticity robust standard error, or White s standard error. The reason for this terminology is that the error term can be conditionally heteroskedastic. The t-ratio is called the robust t-ratio. The differences from the finite-sample t-test are: 1 the way the standard error is calculated is different, 2 we use the table of N0, 1 rather than that of t n K, and 3 the actual size or exact size of the test the probability of Type I error given the sample size equals the nominal size i.e., the desired significance level α only approximately, although the approximation becomes arbitrarily good as the sample size increases. The difference between the exact size and the nominal size of a test is called the size distortion. Both tests are consistent in the sense that power = P rejecting the null H 0 H 1 is true 1 as n.

156 Estimating S = E ε 2 i x ix i Consistently How to select an estimator for a population parameter? One of the most important method is the analog estimation method or the method of moments. The method of moment principle: To estimate a feature of the population, use the corresponding feature of the sample. Examples of analog estimators: Parameter of the population Estimator E y i Var y i σ xy σ 2 x Ȳ S 2 y S xy Sx 2 ni=1 I {yi c} P y i c n median y i sample median maxy i max i=1,...,n y i

157 157 The analogy principle suggests that E ε 2 i x i x i can be estimated using the estimator 1 n n i=1 Since ε i is not observable we need another one: ε 2 i x ix i. Ŝ = 1 n n i=1 e 2 i x ix i. Assumption finite fourth moments for regressors. E xik x ij 2 exists and is finite for all k and j k, j = 1,..., K. Proposition consistent estimation of S. Suppose S = E ε 2 i x i x i exists and is finite. Then, under Assumptions and 2.6, Ŝ is consistent for S.

158 158 The estimator S can be represented as Ŝ = 1 n n i=1 e 2 i x ix i = X BX n where B = e e e 2 n. Thus, Âvar b = S 1 xxŝs 1 xx = n X X 1 X BX X X 1. We have b a N β, Âvarb n = N β, S 1 xxŝs 1 n = N β, X X 1 X BX X X 1 W = n Rb r RÂvar b R 1 Rb r = n Rb r RS 1 xx ŜS 1 xx R 1 Rb r = Rb r R X X 1 X BX X X 1 R 1 Rb r d χ 2 p

159 159 Dependent Variable: WAGE Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C FEMALE EDUC EXPER TENURE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Dependent Variable: WAGE Method: Least Squares Sample: White Heteroskedasticity Consistent Standard Errors & Covariance Variable Coefficient Std. Error t Statistic Prob. C FEMALE EDUC EXPER TENURE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

160 Implications of Conditional Homoskedasticity Assumption conditional homoskedasticity. E ε 2 i xi = σ 2 > 0. Under Assumption 2.7 we have S = E ε 2 i x i x i =... = σ 2 E xi x i = σ 2 Σ xx and Avar b = Σ 1 xxsσ 1 xx = σ 2 Σ 1 xxσ xx Σ 1 xx = σ 2 Σ 1 xx. Proposition large-sample properties of b, t, and F under conditional homoskedasticity. Suppose Assumptions and 2.7 are satisfied. Then a Asymptotic distribution of b The OLS estimator b is consistent and asymptotically normal with Avar b = σ 2 Σ 1 xx. b Consistent estimation of asymptotic variance Under the same set of assumptions, Avar b is consistently estimated by Âvar b = s 2 S 1 xx = ns 2 X X 1.

161 161 c Asymptotic distribution of the t and F statistics of the finite-sample theory Under H 0 : β k = β 0 k we have t 0 k = b k β 0 k ˆσ bk d N 0, 1, where ˆσ 2 b k = Âvar b k n Under H 0 : Rβ = r with rank R = p, we have = s 2 X X 1 kk. pf 0 d χ 2 p where F 0 = Rb r R X X 1 R 1 Rb r / ps 2. Notice pf 0 = e e e e e e/ n K d χ 2 p where refers to the short regression or the regression subjected to the constraint Rβ = r Remark No need for fourth-moment assumption By S&WD and Assumptions , s 2 p S xx σ 2 Σ xx = S. We do not need the fourth-moment assumption Assumption 2.6 for consistency.

162 Testing Conditional Homoskedasticity With the advent of robust standard errors allowing us to do inference without specifying the conditional second moment testing conditional homoskedasticity is not as important as it used to be. This section presents only the most popular test due to White 1980 for the case of random samples. Let ψ i be a vector collecting unique and nonconstant elements of the K K symmetric matrix x i x i. Proposition White s Test for Conditional Heteroskedasticity. In addition to Assumptions 2.1 and 2.4, suppose that a {y i, x i } is i.i.d. with finite E ε 2 i x i x i thus strengthening Assumptions 2.2 and 2.5, b ε i is independent of x i thus strengthening Assumption 2.3 and conditional homoskedasticity, and c a certain condition holds on the moments of ε i and x i. Then under H 0 : E ε 2 i xi = σ 2 constant we have nr 2 d χ 2 m where R 2 is the R 2 from the auxiliary regression of e 2 i on a constant and ψ i dimension of ψ i. and m is the

163 163 Dependent Variable: WAGE Method: Least Squares Sample: Included observations: 526 Variable Coefficient Std. Error t Statistic Prob. C FEMALE EDUC EXPER TENURE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

164 164 Heteroskedasticity Test: White F statistic Prob. F13, Obs*R squared Prob. Chi Square Scaled explained SS Prob. Chi Square Test Equation: Dependent Variable: RESID^2 Variable Coefficient Std. Error t Statistic Prob. C FEMALE FEMALE*EDUC FEMALE*EXPER FEMALE*TENURE EDUC EDUC^ EDUC*EXPER EDUC*TENURE EXPER EXPER^ EXPER*TENURE TENURE TENURE^ R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

165 165 Dependent Variable: WAGE Method: Least Squares Included observations: 526 White Heteroskedasticity Consistent Standard Errors & Covariance Variable Coefficient Std. Error t Statistic Prob. C FEMALE EDUC EXPER TENURE R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Estimation with Parameterized Conditional Heteroskedasticity Even when the error is found to be conditionally heteroskedastic, the OLS estimator is still consistent and asymptotically normal, and valid statistical inference can be conducted with robust standard errors and robust Wald statistics. However, in the somewhat unlikely case of a priori knowledge of the functional form of the conditional second moment, it should be possible to obtain sharper estimates with smaller asymptotic variance.

166 166 To simplify the discussion, throughout this section we strengthen Assumptions 2.2 and 2.5 by assuming that {y i, x i } is i.i.d The Functional Form The parametric functional form for the conditional second moment we consider is where z i is a function of x i. E ε 2 i xi = z i α Por example, E ε 2 i xi = α1 + α 2 x 2 i2, z i = 1 x 2 i2.

167 WLS with Known α The WLS also GLS estimator can be obtained by applying the OLS to the regression where ỹ i = x i β + ε i ỹ i = y i z i α, x ik = x ik z i α, ε i = ε i z i α, i = 1, 2,..., n We have ˆβ GLS = ˆβ V = X X 1 X ỹ = X V 1 X 1 X V 1 y.

168 168 Note that E ε i x i = 0. Therefore, provided that E xi x i is nonsingular, Assumptions are satisfied for equation ỹ i = x i β+ ε i. Furthermore, by construction, the error ε i is conditionally homoskedastic: E ε i x i = 1. So Proposition 2.5 applies: the WLS estimator is consistent and asymptotically normal, and the asymptotic variance is Avar ˆβ V = E xi x i 1 n = plim 1 x i x i n i=1 1 = plim n X V 1 X 1 1. Thus 1 n X V 1 X is a consistent estimator of Avar ˆβ V. by S&WD theorem

169 Regression of e 2 i on z i Provides a Consistent Estimate of α If α is unknown we need to obtain ˆα. Assuming E ε 2 i xi = z i α we have ε 2 i = E ε 2 i xi + ηi where by construction E η i x i = 0. This suggest that the following regression can be considered ε 2 i = z i α + η i Provided that E zi z i is nonsingular, Proposition 2.1 is applicable to this auxiliary regression: the OLS estimator of α is consistent and asymptotically normal. However we cannot run this regression as ε i is not observable. In the previous regression we should replace ε i by the consistent estimate e i despite the presence of conditional heteroskedasticity. In conclusion, we may obtain a consistent estimate of α by considering the regression of e 2 i on z i to get ˆα = n i=1 z i z i 1 n i=1 z i e 2 i.

170 WLS with Estimated α Step 1: Estimate the equation y i = x i β + ε i by OLS and compute the OLS residuals e i. Step 2: Regress e 2 i on z i to obtain the OLS coeffi cient estimate ˆα. Step 3: Transform the original variables according to the rules ỹ i = y i z iˆα, x ik = x ik z iˆα, i = 1, 2,..., n and run the OLS estimator with respect to the model ỹ i = x i β + ε i to obtain the Feasible GLS FGLS: ˆβ ˆV = X ˆV 1 X 1 X ˆV 1 y

171 171 It can be proved that: ˆβ ˆV p β n ˆβ ˆV β d N ˆβ 0, Avar V 1 n X ˆV 1 X is a consistent estimator of Avar ˆβ V. No finite properties are known concerning the estimator ˆβ ˆV.

172 A popular specification for E ε 2 i xi The especification ε 2 i = z i α + η i may lead to z iˆα < 0. To overcome this problem a popular specification for E ε 2 i xi is { E ε 2 i xi = exp x i α } it guarantees that Var y i x i > 0 for all α R r. It implies log E ε 2 i xi = x i α. This suggests the following procedure: a Regress y on X to get the residual vector e. b Run the LS regression log e 2 i on x i to estimate α and calculate ˆσ 2 i = exp { x iˆα}. c Transform the data ỹ i = y i ˆσ, x i ij = x ij ˆσ. i d Regress ỹ on X and obtain ˆβ ˆV

173 173 Notice also that: E ε 2 i xi = { exp x i α } ε 2 i = exp { x i α} + v i, v i = ε 2 i E log ε 2 i x i α + v i log e 2 i x i α + v i. ε 2 i xi Example Part 1. We want to estimate a demand function for daily cigarette consumption cigs. The explanatory variables are: logincome - log of annual income, logcigprice - log of per pack price of cigarettes in cents, educ - years of education, age and restaurn - binary indicator equal to unity if the person resides in a state with restaurant smoking restrictions source: J. Mullahy 1997, Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior, Review of Economics and Statistics 79, Based on information below, are the standard errors reported in the first table reliable?

174 174 Heteroskedasticity Test: White Dependent Variable: CIGS Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C LOGINCOME LOGCIGPRIC EDUC AGE AGE^ RESTAURN R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic F statistic Prob. F25, Obs*R squared Prob. Chi Square Scaled explained SS Prob. Chi Square Test Equation: Dependent Variable: RESID^2 Variable Coefficient Std. Error t Statistic Prob. C LOGINCOME LOGINCOME^ LOGINCOME*LOGCIGPRIC LOGINCOME*EDUC LOGINCOME*AGE LOGINCOME*AGE^ LOGINCOME*RESTAURN LOGCIGPRIC LOGCIGPRIC^ LOGCIGPRIC*EDUC LOGCIGPRIC*AGE LOGCIGPRIC*AGE^ LOGCIGPRIC*RESTAURN EDUC EDUC^ EDUC*AGE EDUC*AGE^ EDUC*RESTAURN AGE AGE^ AGE*AGE^ AGE*RESTAURN AGE^2^ AGE^2*RESTAURN RESTAURN cigs: number of cigarettes smoked per day, logincome: log of annual income, logcigprice: log of per pack price of cigarettes in cents, educ: years of education, age and restaurn: binary indicator equal to unity if the person resides in a state with restaurant smoking restrictions.

175 175 Example Part 2. Discuss the results of the following figures. Dependent Variable: CIGS Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C LOGINCOME LOGCIGPRIC EDUC AGE AGE^ RESTAURN R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Dependent Variable: CIGS Method: Least Squares Sample: White Heteroskedasticity Consistent Standard Errors & Covariance Variable Coefficient Std. Error t Statistic Prob. C LOGINCOME LOGCIGPRIC EDUC AGE AGE^ RESTAURN R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

176 176 Example Part 3. a Regress y on X to get the residual vector e. Dependent Variable: CIGS Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C LOGINCOME LOGCIGPRIC EDUC AGE AGE^ RESTAURN R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

177 177 b Run the LS regression log e 2 i on x i Dependent Variable: LOGRES^2 Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. C LOGINCOME LOGCIGPRIC EDUC AGE AGE^ RESTAURN R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Calculate ˆσ 2 i = exp { } x iˆα} = exp { log e 2 i. Notice: log e 2 1,..., log e 2 n are the fitted values of the above regression.

178 178 c Transform the data ỹ i = y i ˆσ i, and d Regress ỹ on X and obtain β ˆV. x ij = x ij ˆσ i Dependent Variable: CIGS/SIGMA Method: Least Squares Sample: Variable Coefficient Std. Error t Statistic Prob. 1/SIGMA LOGINCOME/SIGMA LOGCIGPRIC/SIGMA EDUC/SIGMA AGE/SIGMA AGE^2/SIGMA RESTAURN/SIGMA R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter Durbin Watson stat

179 OLS versus WLS Under certain conditions we have: b and ˆβ ˆV are consistent. Assuming that the functional form of the conditional second moment is correctly specified, ˆβ ˆV is asymptotically more effi cient than b. It is not clear which estimator is better in terms of effi ciency in the following situations: the functional form of the conditional second moment is misspecified; in finite samples, even if the functional form is correctly specified, the large-sample approximation will probably work less well for the WLS estimator than for OLS because of the estimation of extra parameters a involved in the WLS procedure.

180 Serial Correlation Because the issue of serial correlation arises almost always in time-series models, we use the subscript "t" instead of "i" in this section. Throughout this section we assume that the regressors include a constant. The issue is how to deal with E εt ε t j xt j, x t 0.

181 Usual Inference is not Valid When the regressors include a constant true in virtually all known applications, Assumption 2.5 implies that the error term is a scalar martingale difference sequence, so if the error is found to be serially correlated or autocorrelated, that is an indication of a failure of Assumption 2.5. We have Cov g t, g t j 0. In fact, Cov g t, g t j = E xt ε t x t j ε t j = E E xt ε t x t j ε t j x t j, x t = E xt x t j E εt ε t j xt j, x t 0. Assumptions may hold under serial correlation, so the OLS estimator may be consistent even if the error is autocorrelated. However, the large-sample properties of b, t, and F of proposition 2.5 are not valid. To see why, consider n b β = S 1 xx nḡ.

182 182 We have Avar b = Σ 1 xxsσ 1 xx, Avar b = S 1 xxŝs 1 xx. If errors are not autocorrelated: S = Var nḡ = Var gt. If the errors are autocorrelated: S = Var nḡ = Var gt + 1 n n 1 j=1 n t=j+1 E gt g t j + E gt j g t. Since Cov g t, g t j 0 and E gt j g t 0 we have S Var g t i.e. S E gt g t. If the errors are serial correlated we cannot use n 1 nt=1 x t x t or n 1 conditional heteroskedasticity as a consistent estimators of S. nt=1 e 2 t x tx t robust to

183 Testing Serial Correlation Consider the regression y t = x t β +ε t. We want to test whether or not ε t is serial correlated. Consider ρ j = Cov ε t, ε t j Var ε t Var ε t j = Cov ε t, ε t j Var ε t = γ j = E εt ε t j. γ 0 E ε 2 t Since γ j is not observable, we need to consider γ j = 1 n n ρ j = γ j γ 0 t=j+1 ε t ε t j, γ 0 = 1 n n t=1 ε 2 t.

184 184 Proposition. If {ε t } is a stationary MDS with E ε 2 t εt 1, ε t 2,... = σ 2, then n γj d N 0, σ 4 and n ρ j d N 0, 1. Proposition. Under the assumptions of the previous proposition Box-Pierce Q statistics = Q BP = p j=1 n ρj 2 = n p ρ 2 j j=1 d χ 2 p. However, ρ j is still unfeasible as we do not observe the errors. Thus, ˆγ j = 1 n n ˆρ j = ˆγ j ˆγ 0 t=j+1 e t e t j, ˆγ 0 = 1 n n t=1 e 2 t =SSR. Exercise 3.6. Prove that ˆρ j can be obtained from the regression e t on e t j without intercept.

185 185 Testing with Strictly Exogenous Regressors To test H 0 : ρ j = 0 we consider the following proposition: Proposition testing for serial correlation with strictly exogeneous regressors. Suppose that Assumptions 1.2, 2.1, 2.2, 2.4 are satisfied. Then ˆρ j p 0, nˆρj d N 0, 1.

186 186 To test H 0 : ρ 1 = ρ 2 =... = ρ p = 0 we consider the following proposition: Proposition Box-Pierce Q & Ljung-Box Q. Suppose that Assumptions 1.2, 2.1, 2.2, 2.4 are satisfied. Then Q BP = n p ˆρ 2 j j=1 Q LB = n n + 2 d χ 2 p, p j=1 ˆρ 2 j n j d χ 2 p. It can be shown that the hypothesis H 0 : ρ 1 = ρ 2 =... = ρ p = 0 can also be tested through the following auxiliary regression: regression e t on e t 1,..., e t p. We calculate the F statistic for the hypothesis that the p coeffi cients of e t 1,..., e t p are all zero.

187 187 Testing with Predetermined, but Not Strictly Exogenous, Regressors If the regressors are not strictly exogenous, the nˆρ j has no longer N 0, 1 distribution and the residual-based Q statistic may not be asymptotically chi-squared. The trick consist in removing the effect of x i in the regression of e t on e t 1,..., e t p by considering now the regression e t on x t,e t 1,..., e t p and then calculate the F statistic for the hypothesis that the p coeffi cients of e t 1,..., e t p are all zero. This regression is still valid when the regressors are strictly exogenous so you may always use that regression. Given e t = θ 1 + θ 2 x t θ K x tk + γ 1 e t γ p e t p + error t the null hypothesis can be formulated as Use the F test. H 0 : γ 1 =... = γ p = 0

188 EVIEWS 188

189 189 Example. Consider, chnimp: the volume of imports of barium chloride from China, chempi: index of chemical production to control for overall demand for barium chloride, gas: the volume of gasoline production another demand variable, rtwex: an exchange rate index measures the strength of the dollar against several other currencies. Equation 1 Dependent Variable: LOGCHNIMP Method: Least Squares Sample: 1978M M12 Included observations: 131 Variable Coefficient Std. Error t Statistic Prob. C LOGCHEMPI LOGGAS LOGRTWEX R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

190 190 Equation 2 Breusch Godfrey Serial Correlation LM Test: F statistic Prob. F12, Obs*R squared Prob. Chi Square Test Equation: Dependent Variable: RESID Method: Least Squares Sample: 1978M M12 Included observations: 131 Presample missing value lagged residuals set to zero. Variable Coefficient Std. Error t Statistic Prob. C LOGCHEMPI LOGGAS LOGRTWEX RESID RESID RESID RESID RESID RESID RESID RESID RESID RESID RESID RESID R squared Mean dependent var 3.97E 15 Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

191 191 If you conclude that the errors are serial correlated you have a few options: a You know at least approximately the form of autocorrelation and so you use a feasible GLS estimator. b The second approach, parallels the use of the White estimator for heteroskedasticity: you don t know the form of autocorrelation so you rely on the OLS, but you use a consistent estimator for Avar b. c You are concerned only with the dynamic specification of the model and with forecast. You may try to convert your model into a dynamically complete model. d Your model may be misspecified: you respecified the model and the autocorrelation disappear.

192 Question a: feasible GLS estimator There are many forms of autocorrelation and each one leads to a different structure for the error covariance matrix V. The most popular form is known as the first-order autoregressive process. In this case the error term in is assumed to follow the AR1 model y t = x tβ + ε t ε t = ρε t 1 + v t, ρ < 1, where v t is an error term with mean zero and constant conditional variance that exhibits no serial correlation. We assume all assumptions was ρ = 0.

193 193 Initial Model: y t = x tβ + ε t, ε t = ρε t 1 + v t, ρ < 1 The GLS estimator is the OLS estimator applied to the transformed model where ỹ t = ỹ t = x tβ + v t { 1 ρ 2 y 1 t = 1 y t ρy t 1 t > 1, x t = Without the first observation, the transformed model is y t ρy t 1 = x t ρx t 1 β + v t. { 1 ρ 2 x 1 t = 1 x t ρx t 1 t > 1, If ρ is unknown we may replace it by a consistent estimator or we may use the nonlinear least squares estimator EVIEW.

194 194 Example continuation of the previous example. Let s consider the residuals of Equation 1: Equation 3 Dependent Variable: LOGCHNIMP Method: Least Squares Sample adjusted: 1978M M12 Included observations: 130 after adjustments Convergence achieved after 8 iterations Variable Coefficient Std. Error t Statistic Prob. C LOGCHEMPI LOGGAS LOGRTWEX AR R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Inverted AR Roots.31 Exercise 3.7. Consider y t = β 1 + β 2 x t2 + ε t where ε t = ρε t 1 + v t and {v t } is a white noise process. Using the first differences of the variables one gets y t = β 1 x t2 + ε t. Show that Corr ε t, ε t 1 = 1 ρ /2. Discuss the advantages and disadvantages of differentiating the variables as a procedure to remove autocorrelation.

195 Question b: Heteroskedasticity and autocorrelation-consistent HAC Covariance Matrix Estimator For sake of generality, assume that you have also a problem of heteroskedasticity. Given S = Var nḡ = Var gt + 1 n = E ε 2 t x t x t + 1 n n 1 j=1 n t=j+1 n 1 j=1 n t=j+1 E gt g t j + E gt j g t E εt ε t j x t x t j + E εt j ε t x t j x t, a possible estimator of S based on the analogy principle would be 1 n n t=1 e 2 t x t x t + 1 n n 1 j=1 n t=j+1 et e t j x t x t j + e t je t x t j x t, n < n. A major problem with this estimator is that it is not positive semi-definite and hence cannot be a well-defined variance-covariance matrix.

196 196 Newey and West show that with a suitable weighting function ω j, the estimator below is consistent and positive semi-definite: Ŝ HAC = 1 n n t=1 e 2 t x t x t + 1 n L n j=1 t=j+1 ω j e t e t j x t x t j + e t je t x t j x t where the weighting function ω j is ω j = 1 j L + 1. The maximum lag L must be determined in advance. Autocorrelations at lags longer than L are ignored. For a moving-average process, this value is in general a small number. This estimator is known as HAC covariance matrix estimator and is valid when both conditional heteroskedasticity and serial correlations are present but of an unknown form.

197 197 Example. For x t = 1, n = 9, L = 3 we have = L n j=1 t=j+1 L n j=1 t=j+1 ω j e t e t j x t x t j + e t je t x t j x t ω j 2e t e t j = ω 1 2e 1 e 2 + 2e 2 e 3 + 2e 3 e 4 + 2e 4 e 5 + 2e 5 e 6 + 2e 6 e 7 + 2e 7 e 8 + 2e 8 e 9 + ω 2 2e 1 e 3 + 2e 2 e 4 + 2e 3 e 5 + 2e 4 e 6 + 2e 5 e 7 + 2e 6 e 8 + 2e 7 e 9 + ω 3 2e 1 e 4 + 2e 2 e 5 + 2e 3 e 6 + 2e 4 e 7 + 2e 5 e 8 + 2e 6 e 9. ω 1 = = 0.75 ω 2 = = 0.50 ω 3 = = 0.25

198 198 Newey-West covariance matrix estimator EVIEWS: Âvar b = S 1 xxŝhacs 1 xx. L Eviews selects L = floor4 n 100 2/9 n

199 199 Example continuation... Newey-West covariance matrix estimator Âvar b = S 1 xxŝhacs 1 xx Equation 4 Dependent Variable: LOGCHNIMP Method: Least Squares Sample: 1978M M12 Included observations: 131 Newey West HAC Standard Errors & Covariance lag truncation=4 Variable Coefficient Std. Error t Statistic Prob. C LOGCHEMPI LOGGAS LOGRTWEX R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

200 Question c: Dynamically Complete Models Consider y t = x tβ + u t such that E u t x t = 0. This condition although necessary for consistency, does not preclude autocorrelation. You may try to increase the number of regressors to x t and get a new regression model Written in terms of y t y t = x tβ + ε t such that E ε t x t, y t 1, x t 1, y t 2,... = 0. E y t x t, y t 1, x t 1, y t 2,... = E y t x t. Definition. The model y t = x t β + ε t is dynamically complete DC if holds see Wooldridge. E ε t x t, y t 1, x t 1, y t 2,... = 0 or E y t x t, y t 1, x t 1, y t 2,... = E y t x t

201 201 Proposition. If a model is DC then the errors are not correlated. Moreover {g i } is a MDS. Notice that E ε t x t, y t 1, x t 1, y t 2,... = 0 can be rewritten as E ε i F i = 0 where F i = I i 1 x i = {ε i 1, ε i 2,..., ε 1, x i, x i 1,..., x 1 }, I i 1 = {ε i 1, ε i 2,..., ε 1, x i 1,..., x 1 }. Example. Consider y t = β 1 + β 2 x t2 + u t, u t = φu t 1 + ε t where {ε t } is a white noise process and E εt x t2, y t 1, x t 1,2, y t 2,... = 0. Set x t = 1 xt2. The above model is not DC since the errors are autocorrelated. Notice that does not coincide with E yt x t2, y t 1, x t 1,2, y t 2,... = β 1 + β 2 x t2 + φu t 1 E y t x t = E y t x t2 = β 1 + β 2 x t2.

202 202 However, it is easy to obtain a DC model. Since u t = y t β 1 + β 2 x t2 u t 1 = y t 1 β 1 + β 2 x t 1,2 we have y t = β 1 + β 2 x t2 + u t = β 1 + β 2 x t2 + φu t 1 + ε t = β 1 + β 2 x t2 + φ y t 1 β 1 + β 2 x t 1,2 + εt. This equation can be written in the form y t = γ 1 + γ 2 x t2 + γ 3 y t 1 + γ 4 x t 1,2 + ε t. Let x t = x t2, y t 1, x t 1,2. The previous models is DC as E y t x t, y t 1, x t 1,... = E y t x t = γ 1 + γ 2 x t2 + γ 3 y t 1 + γ 4 x t 1,2.

203 203 Example continuation... Dynamically Complete Model Equation 6 Breusch Godfrey Serial Correlation LM Test: F statistic Prob. F12, Obs*R squared Prob. Chi Square Equation 5 Dependent Variable: LOGCHNIMP Method: Least Squares Sample adjusted: 1978M M12 Included observations: 130 after adjustments Variable Coefficient Std. Error t Statistic Prob. C LOGCHEMPI LOGGAS LOGRTWEX LOGCHEMPI LOGGAS LOGRTWEX LOGCHNIMP R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic Test Equation: Dependent Variable: RESID Method: Least Squares Date: 05/12/10 Time: 19:13 Sample: 1978M M12 Included observations: 130 Presample missing value lagged residuals set to zero. Variable Coefficient Std. Error t Statistic Prob. C LOGCHEMPI LOGGAS LOGRTWEX LOGCHEMPI LOGGAS LOGRTWEX LOGCHNIMP RESID RESID RESID RESID RESID RESID RESID RESID RESID RESID RESID RESID R squared Mean dependent var 9.76E 15 Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic

204 Question d: Misspecification In many cases the finding of autocorrelation is an indication that the model is misspecified. If this is the case, the most natural route is not to change your estimator from OLS to GLS but to change your model. Types of misspecification may lead to a finding of autocorrelation in your OLS residuals: dynamic misspecification related to question c; omitted variables that are autocorrelated; y t and/or x tk are integrated processes, e.g. y t I 1. functional form misspecification.

205 205 Functional form misspecification. Suppose that the true linear relationship is y t = β 1 + β 2 log t + ε t. In the following figure we estimate a misspecified functional form: y t = β 1 + β 2 t + ε t. The residuals are clearly autocorrelated

206 Time Regressions Consider y t = α + δf t + ε t where f t is a function of time e.g. f t = t or f t = t 2 etc.. This kind of models do not satisfy the Assumption 2.2: {y i, x i } is jointly S&WD. This type of nonstationary is not serious and the OLS is applicable. Let s us focus on the case y t = α + δt + ε t = x tβ + ε t, x t = 1 t, β = [ α δ ]. α + δt is called time trend of y t. Definition. We say that a process is trend stationary if it can be written as the sum of a time trend and a stationary process. The process {y t } here is a special trend-stationary process where the stationary component is independent white noise.

207 The Asymptotic Distribution of the OLS Estimator Let b be the OLS estimate of p based on a sample of size n: [ ] b = = ˆαˆδ X X 1 X y. Proposition OLS estimation of the time regression. Consider the time regression y t = α + δt + ε t where ε t is independent white noise with E ε 2 t = σ 2 and E ε 4 <. Then n ˆα α n 3/2 ˆδ δ d N 0, σ 2 [ 1 1/2 1/2 1/3 ] 1 = N 0, σ 2 [ ]. As in the stationary case, ˆα is n-consistent because n ˆδ δ converges to a normal random variable. The OLS estimate of the time coeffi cient, ˆδ, is also consistent, but the speed of convergence is faster: it is n 3/2 -consistent in that n 3/2 ˆδ δ converge to a random variable. In this sense, ˆδ is superconsistent.

208 208 We provide a simpler proof of proposition 2.11 in the case y t = δt + ε t. We have ˆδ δ = X X 1 X ε = = [ 1 2 n ] 1 nt=1 t 2 n t=1 tε t 1 2. n 1 [ 1 2 n ] ε 1 ε 2. ε n Var nt=1 tε t = nt=1 t 2 nt=1 tε t Var nt=1 tε t = σ nt=1 t 2 nt=1 t 2 nt=1 tε t σ nt=1 t 2 = σ nt=1 t 2 nt=1 t 2 Z n, where Z n d Z N 0, 1

209 209 n 3/2 ˆδ nt=1 δ = n 3/2 σ t 2 nt=1 t 2 Z n Since lim n n3/2σ nt=1 t 2 nt=1 t 2 = σ 3 we have n 3/2 ˆδ δ d= σ 3Z d N 0, σ Hypothesis Testing for Time Regressions The OLS coeffi cient estimates of the time regression are asymptotically normal, provided the sampling error is properly scaled. Inference about ˆδ can be based on n 3/2 ˆδ δ s 2 12 n 3/2 ˆδ δ s 2 3 d N 0, 1 in the case y t = α + δt + ε t d N 0, 1 in the case y t = δt + ε t

210 210 4 Endogeneity and the GMM Consider y i = β 1 z i1 + β 2 z i β K z ik + ε i. If Cov z ij, ε i 0 or E zij ε i 0 then we say that zij j-th regressor is endogenous. It follows that E z i ε i 0. Definition endogenous regressor. We say that a regressor is endogenous if it is not predetermined i.e., not orthogonal to the error term, that is, if it does not satisfy the orthogonality condition Assumption 2.3 does not hold. If the regressors are endogenous we have, under the Assumptions 2.1, 2.2 and 2.4, b = β+ 1 n n i=1 z i z i 1 1 n n i=1 z i ε i p β + Σ 1 zz E z i ε i β since E z i ε i 0. The term Σ 1 zz E z i ε i is the asymptotic bias.

211 211 Example Simple regression model. Consider y i = β 1 + β 2 z i2 + ε i is b = [ b1 b 2 ] = Z Z 1 Z y = ȳ Ĉovz i2,y i S 2 z 2 z 2 Ĉovz i2,y i S 2 z 2 where Ĉov z i2, y i = 1 n zi2 z 2 y i ȳ, S 2 z 2 = 1 n zi2 z 2 2. Under the assumption 2.2 we have b 2 = Ĉov z i2, y i S 2 z p Cov z i2, y i Var z i2 = Cov z i2, β 1 + β 2 z i2 + ε i Var z i2 = β 2 + Cov z i2, ε i Var z i2.

212 b 1 = ȳ If Cov z i2, ε i = 0 b i that Ĉov z i2, y i p Sz 2 z 2 E y Cov z i2, y i 2 Var z i2 = β 1 + β 2 E z i2 = β 1 Cov z i2, ε i Var z i2 β 2 + Cov z i2, ε i Var z i2 E z i2 E z i2 E z i2 p β i. If z i2 is endogenous, b 1 and b 2 are inconsistent. Show Σ 1 zz E z i ε i = Covz i2,ε i Varz i2 E z i2 Covz i2,ε i Varz i2. 212

213 Examples of Endogeneity Simultaneous Equations Bias Example. Consider y i1 = α 0 + α 1 y i2 + ε i1 y i2 = β 0 + β 1 y i1 + ε i2 where ε i1 and ε i2 are independent. By construction y i1 and y i2 are endogenous regressors. In fact, it can be proved that Now ˆα 1,OLS ˆβ 1,OLS p Cov y i2, y i1 Var y i2 p Cov y i2, y i1 Var y i1 Cov y i2, ε i1 = Cov y i1, ε i2 = β 1 Var ε i1 0 1 β 1 α 1 α 1 Var ε i2 0 1 β 1 α 1 = Cov y i2, α 0 + α 1 y i2 + ε i1 Var y i2 = Cov y i1, β 0 + β 1 y i1 + ε i2 Var y i1 = α 1 + Cov y i2, ε i1 Var y i2 = β 1 + Cov y i1, ε i2 Var y i1 α 1 β 1.

214 214 The OLS estimator is inconsistent for both α 1 and β 1 and for α 0 and β 0 as well. This phenomenon is known as the simultaneous equations bias or simultaneity bias, because the regressor and the error term are often related to each other through a system of simultaneous equations. Example. Consider C i = α 0 + α 1 Y i + u i consumption function Y i = C i + I i GNP identity. where Cov u i, I i = 0. It can be proved that Example. See Hayashi: ˆα 1,OLS p α Var u i 1 α 1 Var y i. q d i = α 0 + α 1 p i + u i demand equation q s i = β 0 + β 1 p i + v i supply equation q d i = q s i market equilibrium

215 Errors-in-Variables Bias We will see that predetermined regressor necessarily becomes endogenous when measured with error. This problem is ubiquitous, particularly in micro data on households. Consider y i = βz i + u i where z i is a predetermined regressor. The variables y i and z i are measured with error: y i = y i + ε i and z i = z i + v i. Assume that E z i u i = E z i ε i = E z i v i = E v i u i = E v i ε i = 0. The regression equation is y i = βz i + η i, η i = u i + ε i βv i Assuming S&WD we have after some calculations: ˆβ OLS = i z i y i i z 2 i = i z i y i /n i z 2 i /n p β β E v 2 i. E z 2 i

216 Omitted Variable Bias Consider the long regression y = X 1 β 1 + X 2 β 2 + u and suppose that this model satisfies the assumptions hence the OLS based on the previous equation is consistent. However, for some reason X 2 is not included in the regression model short regression y = X 1 β 1 + ε, ε = X 2 β 2 + u We are interested only in β 1. We have b 1 = X 1 X 1 1 X1 y = X 1 X 1 1 X1 X 1 β 1 + X 2 β 2 + u = β 1 + X 1 X 1 1 X1 X 2 β 2 + X 1 X 1 1 X1 u = β 1 + X 1 X 1 n 1 X1 X 2 n β 2 + X 1 X 1 n 1 X1 u n

217 217 This expression converges in probability to β 1 + Σ 1 x 1 x 1 Σ x1 x 2 β 2. The conclusion is that b 1 is inconsistent if there are omitted variables that are correlated with X 1. The variables in X 1 are endogenous as long as Cov X 1, X 2 0 Cov X 1, ε = Cov X 1, X 2 β 2 + u = Cov X 1, X 2 β 2 Example. Consider the problem of unobserved ability in a wage equation for working adults. A simple model is log W AGE i = β 1 + β 2 educ i + β 3 abil i + u i where u i is the error term. We put abil i into the error term, and we are left with the simple regression model where ε i = β 3 abil i + u i. log W AGE i = β 1 + β 2 educ i + ε i

218 218 The OLS will be inconsistent estimator of β 2 if educ i and abil i are correlated. In effect, b 2 p β 2 + Cov educ i, ε i Var educ i = β 2 + Cov educ i, β 3 abil i + u i Var educ i Cov educ = β 2 + β i, abil i 3. Var educ i

219 The General Formulation Regressors and Instruments Definition. x i is an instrumental variable IV for z i if 1 x i is uncorrelated with ε i, that is, Covx i, ε i = 0 thus, x i is a predetermined variable, and 2 x i is correlated with z i, that is, Cov x i, z i 0. Exercise 4.1. Consider log wage i = β 1 + β 2 educ i + ε i. Omitted variable: ability. a Is educ an endogenous variable? b Can IQ be considered an IV for educ? and mother s education? Exercise 4.2. Consider children i = β 1 +β 2 mothereduc i +β 3 motherage i +ε i. Omitted variable: bcm i : dummy equal to one if the mother is informed about birth control methods. a Is mothereduc endogenous? b Suggest an IV for mothereducation. Exercise 4.3. Consider score i = β 1 + β 2 skipped i + ε i. Omitted variable: motivation a Is skipped i endogenous? b Can the distance between home or living quarters and university be considered an IV variable?

220 220 Exercise 4.4. Wooldridge, Chap. 15 Consider a simple model to estimate the effect of personal computer PC ownership on college grade point average for graduating seniors at a large public university: GP A i = β 1 + β 2 P C i + ε i where PC is a binary variable indicating PC ownership. a Why might PC ownership be correlated with ε i? b Explain why PC is likely to be related to parents annual income. Does this mean parental income is a good IV for PC? Why or why not? c Suppose that, four years ago, the university gave grants to buy computers to roughly one-half of the incoming students, and the students who received grants were randomly chosen. Carefully explain how you would use this information to construct an instrumental variable for PC. d Same question as c but suppose that the university gave grant priority to low-income students. see the use of IV in errors-in-variables problems in Woodridge s text book.

221 221 Assumption linearity. The equation to be estimated is linear: y i = z i δ + ε i, i = 1, 2,..., n, where z i is an L-dimensional vector of regressors, δ is an L-dimensional coeffi cient vector and ε i is an unobservable error term. Assumption S&WD. Let x i be a K-dimensional vector to be referred to as the vector of instruments, and let w i be the unique and nonconstant elements of y i, z i, x i. {w i } is jointly stationary and weakly dependent. Assumption orthogonality conditions. All the K variables in x i are predetermined in the sense that they are all orthogonal to the current error term: E x ik ε i = 0 for all i and k. This can be written as where g i = x i ε i. E xi yi z i δ = 0 or E g i = 0 Notice: x i should include the 1 constant. Not only x i1 = 1 can be considered as an IV variable but also guarantee that E 1 yi z i δ = 0 E ε i = 0.

222 222 Example 3.1. Consider q i = α 0 + α 1 p i + u i demand equation where Cov p i, u i 0, and x i is such that Cov x i, p i 0 but Cov x i, u i = 0. Using previous notation we have: z i = [ 1 p i ] x i = y i = q i,, δ = [ α0 [ 1 x i w i = ] α 1 ], K = 2 q i p i x i., L = 2 In the above example, x i and z i share the same variable a constant. The instruments that are also regressors are called predetermined regressors, and the rest of the regressors, those that are not included in x i, are called endogenous regressors.

223 223 Example wage equation. Consider where: LW i = δ 1 + δ 2 S i + δ 3 EXP R i + δ 4 IQ i + ε i. LW i is the log wage of individual i, S i is completed years of schooling we assume predetermined, EXP R i is experience in years we assume predetermined, IQ i is IQ an error-ridden measure of the individual s ability, is endogenous due to the errors-in-variables problem We still have information on: AGE i age of the individual - predetermined, MED i mother s education in years - predetermined. Note: AGE; is excluded from the wage equation, reflecting the underlying assumption that, once experience is controlled for, age has no effect on the wage rate.

224 224 In terms of the general model, z i = y i = LW i, 1 S i EXP R i, δ = IQ i 1 S i x i = EXP R i AGE i MED i δ 1 δ 2 δ 3 δ 4, K = 5, L = 4 w i = [ LW i S i EXP R i IQ i AGE i MED i ].

225 Identification The GMM estimation of the parameter vector δ is about how to exploit the information afforded by the orthogonality conditions E xi yi z i δ = 0 E xi z i δ = E x i y i E xi z i δ = E x i y i can be interpreted as a linear system with K equations where δ is the unknown vector. Notice: E xi z i is a K L matrix and E x i y i is a K 1 vector. Can we solve the system with respect to δ? We need to study the identification of the system. Assumption rank condition for identification. The K L matrix E xi z i is of full column rank i.e., its rank equals L, the number of its columns. We denote this matrix by Σ xz.

226 226 Example. Consider the example 3.2 where We have x i z i = = x i = 1 S i EXP R i AGE i MED i 1 S i EXP R i AGE i MED i, z i = 1 S i EXP R i IQ i [ 1 Si EXP ER i IQ i ]. 1 S i EXP ER i IQ i S i S 2 i S i EXP ER i S i IQ i EXP R i EXP R i S i EXP ER 2 i EXP R i IQ i AGE i AGE i S i AGE i EXP ER i AGE i IQ i MED i MED i S i MED i EXP ER i MED i IQ i.

227 227 E xi z i = Σxz = 1 E S i E EXP ER i E IQ i E S i E S 2 i E S i EXP ER i E S i IQ i E EXP R i E EXP R i S i E EXP ER 2 i E EXP R i IQ i E AGE i E AGE i S i E AGE i EXP ER i E AGE i IQ i E MED i E MED i S i E MED i EXP ER i E MED i IQ i. Assumption 3.4 requires that rank Σ xz = 4.

228 Order Condition for Identification Since rank Σ xz min {K, L} we have: if K < L rank Σ xz < L. necessary condition for identification is that K L. Definition order condition for identification. K L or Thus a #orthogonality conditions }{{} K #parameters. } {{ } L Definition. We say that the equation is overidentified if the rank condition is satisfied and K > L, exactly identified or just identified if the rank condition is satisfied and K = L and underidentified or not identified if the order condition is not satisfied i.e., if K < L.

229 229 Example. Consider the system Ax = b, with A = E xi z i and b = E x i y i. It can be proved that the system is always possible it has at least one solution. Consider the following scenarios: 1. If rank A = L and K = L the SLE is exactly identified. Example: [ ] [ ] [ ] { 1 1 x1 3 x1 = 2 = x 2 = 1 Note: rank A = 2 = L = K. x 2 2. If rank A = L and K > L. The SLE is overidentified. Example: [ x1 x 2 Note: rank A = 2 = L and K = 3. ] = { x1 = 2 x 2 = 1

230 If rank A < L the SLE is underidentified. Example: [ ] [ ] [ ] 1 1 x1 2 = x = 2 x 2, x 2 R Note: rank A = 1 < L. x 2 4. If K < L then rank A < L and the SLE is underidentified. Example: [ 1 1 ] [ x 1 x 2 Note: rank A = 1 and K = 1 < L = 2. ] = 1 x 1 = 1 x 2, x 2 R

231 The Assumption for Asymptotic Normality Assumption {g i } is a martingale difference sequence with finite second moments. Let g i = x i ε i. {g i } is a martingale difference sequence so E g i = 0. The K K matrix of cross moments, E gi g i, is nonsingular. Let S = Avar ḡ. Remarks: Assumption 3.5 implies Avar ḡ = lim Var nḡ = E gi g i. Assumption 3.5 implies nḡ d N 0, Avar ḡ. If the instruments include a constant, then this assumption implies that the error is a martingale difference sequence and a fortiori serially uncorrelated.

232 232 A suffi cient and perhaps easier to understand condition for Assumption 3.5 is that E ε i F i = 0 where I i 1 = {ε i 1, ε i 2,..., ε 1, x i 1,..., x 1 }, F i = I i 1 x i = {ε i 1, ε i 2,..., ε 1, x i, x i 1,..., x 1 }. It implies the error term is orthogonal not only to the current but also to the past instruments. Since g i g i = ε2 i x ix i, S is a matrix of fourth moments. Consistent estimation of S will require a fourth-moment assumption to be specified in Assumption 3.6 below. If {g i } is serially correlated, then S does not equal E gi g i and will take a more complicated form.

233 Generalized Method of Moments GMM Defined The method of moment principle: corresponding feature of the sample. To estimate a feature of the population, use the Examples: Parameter of the population Estimator E y i Ȳ Var y i S y 2 E xi yi z i δ 1 n i x i yi z i δ Method of moments: choose the parameter estimate so that the corresponding sample moments are also equal to zero. Since we know that E xi yi z i δ = 0 we choose the parameter estimate δ so that 1 n n i=1 x i yi z i δ = 0.

234 234 Another way of writing 1 n ni=1 x i yi z i δ = 0: 1 n n n i=1 g i = 0 1 n g i w; δ } i=1 {{ } g n δ = 0 g n δ = 0. Let s expand g n δ = 0 : 1 n n 1 n n i=1 x i yi z i δ = 0 n i=1 n x i y i 1 x i z i=1 n i δ = 0 1 x i z n i δ = 1 i=1 n n i=1 S xz δ = sxy. x i y i

235 235 Thus: S xz K L δ = s xy L 1 K 1 is a system with K linear equations in L unknowns. S xz δ = s xy is the sample analogue of E xi yi z i δ = 0, that is E xi z i δ = E x i y i.

236 Method of Moments Consider S xz δ = sxy If K = L and rank Σ xz = L Σ xz := E xi z i is invertible and Sxy is invertible in probability, for n large enough. Solving S xz δ = sxy with respect to δ gives ˆδIV = S 1 = = xz s xy n 1 n n i=1 i=1 x i z i x i z i = X Z 1 X y. n n i=1 n i=1 x i y i x i y i

237 237 Example. Consider y i = δ 1 + δ 2 z i2 + ε i and suppose that Cov z i, ε i 0, that is, z i is an endogenous variable. We have L = 2 so we need at least K = 2 instrumental variables. Let x i = 1 x i2 and suppose that Cov x i2, ε i = 0 and Cov x i2, z i2 0. Thus an IV estimator is ˆδ IV = X Z 1 X y. Exercise 4.5. Consider the previous example. a Show that the IV estimator ˆδ2,IV can be written as ni=1 x i2 x 2 y i ȳ ˆδ2,IV = ni=1 x i2 x 2 z i2 z 2. b Show Cov x i2, y i = δ 2 Cov x i2, z i2 + Cov x i2, ε i ; c Based on part b, show p δ 2 write the assumptions you need to prove these results. that ˆδ2,IV

238 GMM It may happen that K > L there are more orthogonality conditions than parameters. In principle, it is better to have as many IV as possible, so the case K > L is desirable, but then the system S xz δ = s xy may not have a solution. Example. Suppose S xz = , s xy = K = 4, L = 3 and try if you can to solve S xz δ = sxy. This system is of same type of δ1 + δ2 = 1 δ3 = 1 δ4 + δ5 = 5 δ1 + δ2 = 2 the first and fourth equations are incompatible - the system is impossible - there is not a solution.

239 239 This means we cannot set g n δ exactly equal to 0. However, we can at least choose δ so that g n δ is as close to 0 as possible. In Linear Algebra two vectors are close if the distance between them is relatively small. We will define the distance in R K as follows: distance between ξ and η is equal to ξ η Ŵ ξ η where Ŵ, called the weighting matrix, is a symmetric positive definite matrix defining the distance. Example. If ξ = [ 1 2 ], η = the distance between these two vectors is [ 3 5 ], Ŵ = ξ η Ŵ ξ η = [ ] [ [ ] ] = = 13.

240 240 Definition GMM estimator. Let Ŵ be a K K symmetric positive definite matrix, possibly dependent on the sample, such that Ŵ p W as n, with W symmetric and positive definite. The GMM estimator of δ, denoted ˆδ Ŵ is where ˆδ Ŵ = arg min δ J ˆδ, Ŵ = ng n δ Ŵgn δ Proposition. Under the Assumptions 3.2 and 3.4 J ˆδ, Ŵ = n s xy S xz δ Ŵ s xy S xz δ. GMM estimator ˆδ Ŵ = S xzŵs xz 1 S xz Ŵs xy. To prove this proposition you need the following rule: q Wq = 2 q δ δ Wq where q is a K 1 vector depending on δ and W is a K K matrix not depending on δ.

241 241 If K = L then S xz is invertible and ˆδ Ŵ reduces to the IV estimator: ˆδ Ŵ = S xz ŴS xz 1 S xz Ŵs xy = S 1 xz Ŵ 1 S xz 1 S xz Ŵs xy = S 1 xz s xy = ˆδ IV Sampling Error The GMM estimator can be written as ˆδ Ŵ = δ + S xz ŴS xz 1 S xz Ŵḡ.

242 242 Proof: First consider s xy = 1 n = 1 n = 1 n i i i x i y i = S xz δ + ḡ x i z i δ + ε i x i z i δ + 1 n i x i ε i Replacing s xy = S xz δ + ḡ into ˆδ Ŵ = S xzŵs xz 1 S xz Ŵs xy produces: ˆδ Ŵ = S xzŵs 1 xz S xz Ŵs xy = S xzŵs xz 1 S xz Ŵ S xz δ + ḡ = S xzŵs xz 1 S xz ŴS xz δ + S xzŵs xz 1 S xz ŴS xz ḡ = δ + S xzŵs xz 1 S xz Ŵḡ.

243 Large-Sample Properties of GMM Asymptotic Distribution of the GMM Estimator Proposition asymptotic distribution of the GMM estimator. a Consistency Under Assumptions , ˆδ Ŵ p δ; b Asymptotic Normality If Assumption 3.3 is strengthened as Assumption 3.5, then n ˆδ Ŵ δ d N ˆδ Ŵ 0, Avar where Avar ˆδ Ŵ = Σ xz WΣ xz 1 Σ xz WSWΣ xz Σ xz WΣ xz 1 Recall: S E gi g i. c Consistent Estimate of Avar ˆδ Ŵ Suppose there is available a consistent estimator, Ŝ, of S. Then, under Assumption 3.2, Avar ˆδ Ŵ is consistently estimated by Âvar ˆδ Ŵ = S xz ŴS xz 1 S xz ŴŜŴS xz S xz ŴS xz 1.

244 Estimation of Error Variance Proposition consistent estimation of error variance. For any consistent estimator ˆδ and under Assumptions 3.1, 3.2, the assumptions that E zi z i and E ε 2 i exist and are finite we have 1 n p ˆε i E ε 2 n i where ˆε i y i z iˆδ. i= Hypothesis Testing Proposition robust t-ratio and Wald statistics. Suppose Assumptions hold, and suppose there is available a consistent estimate Ŝ of S Avar ḡ = E gi g i. Let Âvar ˆδ Ŵ = S xz ŴS xz 1 S xz ŴŜŴS xz S xz ŴS xz 1.

245 245 Then a under the null H 0 : δ j = δ 0 j Ŵ n ˆδj δ 0 t 0 j = j Âvar ˆδ Ŵ jj = ˆδj Ŵ δ 0 j SE j d N 0, 1 where Âvar ˆδ Ŵjj is the j, j element of Âvar ˆδ Ŵ and SE j = 1 n Âvar ˆδ Ŵ jj. b Under the null hypothesis H 0 :Rδ = r where p is the number of restrictions and R p L is of full row rank, W = n Rˆδ Ŵ r RÂvar ˆδ Ŵ R 1 Rˆδ Ŵ r d χ 2 p.

246 Estimation of S Let Ŝ 1 n n i=1 ˆε 2 i x ix i, where ˆε i y i z iˆδ. Assumption finite fourth moments. E xik z il 2 exists and is finite for all k = 1,..., K, and l = 1,..., L. Proposition consistent estimation of S. Suppose ˆδ is consistent and S = E gi g i exists and is finite. Then under Assumptions 3.1, 3.2 and 3.6 the following estimator is consistent. Ŝ 1 n n i=1 ˆε 2 i x ix i, where ˆε i y i z iˆδ.

247 Effi cient GMM Estimator The next proposition provides a choice of Ŵ that minimizes the asymptotic variance. Proposition optimal choice of the weighting matrix. If Ŵ is chosen such that Ŵ p S 1 then the lower bound for the asymptotic variance of the GMM estimators is reached, which is equal to Σ xz S 1 Σ xz 1. Definition. The estimator ˆδ Ŝ 1 = arg min δ ng n δ Ŵgn δ where Ŵ = Ŝ 1 is called the effi cient GMM estimator.

248 248 The effi cient GMM estimator can be written as ˆδ Ŵ = S xzŵs 1 xz S xz Ŵs xy Ŝ 1 ˆδ = S 1 xzŝ 1 S xz S xz Ŝ 1 s xy and Avar ˆδ Ŝ 1 = Σ xzs 1 Σ xz 1 Avar ˆδ Ŝ 1 = S xzŝ 1 S xz 1.

249 249 To calculate the effi cient GMM estimator, we need the consistent estimator Ŝ, which depends on ˆε i. This leads us to the following two-step effi cient GMM procedure: Step1: Compute Ŝ 1 n ni=1 ˆε 2 i x ix i, where ˆε i = y i z i δ. To obtain δ : δ Ŵ = arg min δ n s xy S xz δ Ŵ s xy S xz δ where Ŵ is a matrix that converges in probability to a symmetric and positive definite matrix, for example Ŵ = S 1 xx. With this choice, use the so called 2SLS estimator ˆδ S 1 xx to obtain the residuals ˆε i = y i z iˆδ and Ŝ 1 n ni=1 ˆε 2 i x ix i. Step 2: Minimize J δ, Ŝ with respect to δ. The minimizer is the effi cient GMM estimator, ˆδ Ŝ 1 = arg min δ n s xy S xz δ Ŝ 1 s xy S xz δ.

250 250 Example. Wooldridge, chap data base:card Wage and education data for a sample of men in 1976 Dependent Variable: LOGWAGE Method: Least Squares Sample: Included observations: 3010 Variable Coefficient Std. Error t Statistic Prob. C EDUC EXPER EXPER^ BLACK SMSA SOUTH R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan Quinn criter F statistic Durbin Watson stat ProbF statistic SMSA =1 if in Standard Metropolitan Statistical Area in NEAR4 =1 if he grew up near a 4 year college.

251 251

252 252 z i = [ 1 EDUC i EXP ER i EXP ER 2 i BLACK i SMSA i SOUT H ] x i = [ 1 EXP ER i EXP ER 2 i BLACK i SMSA i SOUT H NEAR4 i NEAR2 i ] Dependent Variable: LOGWAGE Method: Generalized Method of Moments Sample: Included observations: 3010 Linear estimation with 1 weight update Estimation weighting matrix: HAC Bartlett kernel, Newey West fixed bandwidth = Standard errors & covariance computed using estimation weighting matrix Instrument specification: C EXPER EXPER^2 BLACK SMSA SOUTH NEARC4 NEARC2 Variable Coefficient Std. Error t Statistic Prob. C EDUC EXPER EXPER^ BLACK SMSA SOUTH R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Sum squared resid Durbin Watson stat J statistic Instrument rank 8 ProbJ statistic

253 Testing Overidentifying Restrictions Testing all Orthogonality Conditions If the equation is exactly identified then J δ, Ŵ = 0. If the equation is overidentified then J δ, Ŵ > 0. When Ŵ is chosen optimally so that Ŵ = Ŝ 1 p S 1 then J δ Ŝ 1, Ŝ 1 is asymptotically chi-squared. Proposition Hansen s test of overidentifying restrictions. Under assumptions J δ Ŝ 1, Ŝ 1 d χ 2 K L

254 254 Two comments: 1 This is a specification test, testing whether all the restrictions of the model which are the assumptions maintained in Proposition 3.6 are satisfied. If the J δ Ŝ 1, Ŝ 1 is surprisingly large, it means that either the orthogonality conditions Assumption 3.3 or the other assumptions or both are likely to be false. Only when we are confident about those other assumptions can we interpret the large J statistic as evidence for the endogeneity of some of the K instruments included in x i. 2 Small-sample properties of the test may be a matter of concern. Example continuation. EVIEWS provides the J statistics of proposition 3.6:

255 255 Dependent Variable: LOGWAGE Method: Generalized Method of Moments Sample: Included observations: 3010 Linear estimation & iterate weights Estimation weighting matrix: White Standard errors & covariance computed using estimation weighting matrix Convergence achieved after 2 weight iterations Instrument specification: C EXPER EXPER^2 BLACK SMSA SOUTH NEARC4 NEARC2 Variable Coefficient Std. Error t Statistic Prob. C EDUC EXPER EXPER^ BLACK SMSA SOUTH R squared Mean dependent var Adjusted R squared S.D. dependent var S.E. of regression Sum squared resid Durbin Watson stat J statistic Instrument rank 8 ProbJ statistic

256 Testing Subsets of Orthogonality Conditions Consider x i = We want to test H 0 : E x i2 ε i = 0. [ xi1 } K 1 rows x i2 } K K 1 rows ] The basic idea is to compare two J statistics from two separate GMM estimators, one using only the instruments included in x i1 and the other using also the suspect instruments x i2 in addition to x i1. If the inclusion of the suspect instruments significantly increases the J statistic, that is a good reason for doubting the predeterminedness of x i2. This restriction is testable K 1 L why?.

257 257 Proposition testing a subset of orthogonality conditions. Suppose that the rank condition is satisfied for x i1, so E xi1 z i is of full column rank. Under assumptions Let J = ng n ˆδ Ŝ 1 g n ˆδ, ˆδ = S xz Ŝ 1 S xz 1 S xz Ŝ 1 s xy J 1 = ng 1n δ Ŝ 1 g 1n δ, δ = S x1 zŝ 1 11 S x 1 z Then, under the null H 0 : E x i2 ε i = 0, 1 S x1 zŝ 1 11 s x 1 y. C J J 1 d χ 2 K K 1.

258 Example. EVIEWS 7 performs this test. Following previous example, suppose you want to test E nearc4 i ε i = 0. In our case, x i1 is 7 1 vector and x i2 = nearc4 i is a scalar L = 7, K 1 = 7, K K 1 =

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Statistical Inference. Part IV. Statistical Inference

Statistical Inference. Part IV. Statistical Inference Part IV Statistical Inference As of Oct 5, 2017 Sampling Distributions of the OLS Estimator 1 Statistical Inference Sampling Distributions of the OLS Estimator Testing Against One-Sided Alternatives Two-Sided

More information

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators 1 2 Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE Hüseyin Taştan 1 1 Yıldız Technical University Department of Economics These presentation notes are based on Introductory

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

CHAPTER 6: SPECIFICATION VARIABLES

CHAPTER 6: SPECIFICATION VARIABLES Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

More information

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1)

5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1) 5. Erroneous Selection of Exogenous Variables (Violation of Assumption #A1) Assumption #A1: Our regression model does not lack of any further relevant exogenous variables beyond x 1i, x 2i,..., x Ki and

More information

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section

Outline. 11. Time Series Analysis. Basic Regression. Differences between Time Series and Cross Section Outline I. The Nature of Time Series Data 11. Time Series Analysis II. Examples of Time Series Models IV. Functional Form, Dummy Variables, and Index Basic Regression Numbers Read Wooldridge (2013), Chapter

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. Exercises for the course of Econometrics Introduction 1. () A researcher is using data for a sample of 30 observations to investigate the relationship between some dependent variable y i and independent

More information

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler Basic econometrics Tutorial 3 Dipl.Kfm. Introduction Some of you were asking about material to revise/prepare econometrics fundamentals. First of all, be aware that I will not be too technical, only as

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

Practice exam questions

Practice exam questions Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Practice Questions for the Final Exam. Theoretical Part

Practice Questions for the Final Exam. Theoretical Part Brooklyn College Econometrics 7020X Spring 2016 Instructor: G. Koimisis Name: Date: Practice Questions for the Final Exam Theoretical Part 1. Define dummy variable and give two examples. 2. Analyze the

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Heteroscedasticity 1

Heteroscedasticity 1 Heteroscedasticity 1 Pierre Nguimkeu BUEC 333 Summer 2011 1 Based on P. Lavergne, Lectures notes Outline Pure Versus Impure Heteroscedasticity Consequences and Detection Remedies Pure Heteroscedasticity

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator

More information

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity Outline: Further Issues in Using OLS with Time Series Data 13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process I. Stationary and Weakly Dependent Time Series III. Highly Persistent

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Environmental Econometrics

Environmental Econometrics Environmental Econometrics Syngjoo Choi Fall 2008 Environmental Econometrics (GR03) Fall 2008 1 / 37 Syllabus I This is an introductory econometrics course which assumes no prior knowledge on econometrics;

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Graduate Econometrics Lecture 4: Heteroskedasticity

Graduate Econometrics Lecture 4: Heteroskedasticity Graduate Econometrics Lecture 4: Heteroskedasticity Department of Economics University of Gothenburg November 30, 2014 1/43 and Autocorrelation Consequences for OLS Estimator Begin from the linear model

More information

Model Specification and Data Problems. Part VIII

Model Specification and Data Problems. Part VIII Part VIII Model Specification and Data Problems As of Oct 24, 2017 1 Model Specification and Data Problems RESET test Non-nested alternatives Outliers A functional form misspecification generally means

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points EEP 118 / IAS 118 Elisabeth Sadoulet and Kelly Jones University of California at Berkeley Fall 2008 Introductory Applied Econometrics Final examination Scores add up to 125 points Your name: SID: 1 1.

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Inference in Regression Model

Inference in Regression Model Inference in Regression Model Christopher Taber Department of Economics University of Wisconsin-Madison March 25, 2009 Outline 1 Final Step of Classical Linear Regression Model 2 Confidence Intervals 3

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Exercise sheet 6 Models with endogenous explanatory variables

Exercise sheet 6 Models with endogenous explanatory variables Exercise sheet 6 Models with endogenous explanatory variables Note: Some of the exercises include estimations and references to the data files. Use these to compare them to the results you obtained with

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

THE MULTIVARIATE LINEAR REGRESSION MODEL

THE MULTIVARIATE LINEAR REGRESSION MODEL THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus

More information

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects Economics 113 Simple Regression Models Simple Regression Assumptions Simple Regression Derivation Changing Units of Measurement Nonlinear effects OLS and unbiased estimates Variance of the OLS estimates

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Inference in Regression Analysis

Inference in Regression Analysis ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information 1. Qualitative Information Qualitative Information Up to now, we assume that all the variables has quantitative meaning. But often in empirical work, we must incorporate qualitative factor into regression

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 1: Introduction

Economics 326 Methods of Empirical Research in Economics. Lecture 1: Introduction Economics 326 Methods of Empirical Research in Economics Lecture 1: Introduction Hiro Kasahara University of British Columbia December 24, 2014 What is Econometrics? Econometrics is concerned with the

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C = Economics 130 Lecture 6 Midterm Review Next Steps for the Class Multiple Regression Review & Issues Model Specification Issues Launching the Projects!!!!! Midterm results: AVG = 26.5 (88%) A = 27+ B =

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables Lecture 8. Using the CLR Model Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 000) filed RDEP = Expenditure on research&development (in billions of 99 $) The

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Introductory Econometrics Exercises for tutorials (Fall 2014)

Introductory Econometrics Exercises for tutorials (Fall 2014) Introductory Econometrics Exercises for tutorials (Fall 2014) Dept. of Econometrics, Uni. of Economics, Prague, zouharj@vse.cz September 23, 2014 Tutorial 1: Review of basic statistical concepts Exercise

More information

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall

Applied Econometrics. Applied Econometrics Second edition. Dimitrios Asteriou and Stephen G. Hall Applied Econometrics Second edition Dimitrios Asteriou and Stephen G. Hall MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3. Imperfect Multicollinearity 4.

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Brief Suggested Solutions

Brief Suggested Solutions DEPARTMENT OF ECONOMICS UNIVERSITY OF VICTORIA ECONOMICS 366: ECONOMETRICS II SPRING TERM 5: ASSIGNMENT TWO Brief Suggested Solutions Question One: Consider the classical T-observation, K-regressor linear

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

Introduction to Econometrics. Heteroskedasticity

Introduction to Econometrics. Heteroskedasticity Introduction to Econometrics Introduction Heteroskedasticity When the variance of the errors changes across segments of the population, where the segments are determined by different values for the explanatory

More information

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University Instructions: Answer all four (4) questions. Be sure to show your work or provide su cient justi cation for

More information

1. The OLS Estimator. 1.1 Population model and notation

1. The OLS Estimator. 1.1 Population model and notation 1. The OLS Estimator OLS stands for Ordinary Least Squares. There are 6 assumptions ordinarily made, and the method of fitting a line through data is by least-squares. OLS is a common estimation methodology

More information

Econometrics Problem Set 4

Econometrics Problem Set 4 Econometrics Problem Set 4 WISE, Xiamen University Spring 2016-17 Conceptual Questions 1. This question refers to the estimated regressions in shown in Table 1 computed using data for 1988 from the CPS.

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Econometrics Review questions for exam

Econometrics Review questions for exam Econometrics Review questions for exam Nathaniel Higgins nhiggins@jhu.edu, 1. Suppose you have a model: y = β 0 x 1 + u You propose the model above and then estimate the model using OLS to obtain: ŷ =

More information

Multiple Regression Analysis: Further Issues

Multiple Regression Analysis: Further Issues Multiple Regression Analysis: Further Issues Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) MLR: Further Issues 1 / 36 Effects of Data Scaling on OLS Statistics Effects

More information

Econometrics I Lecture 3: The Simple Linear Regression Model

Econometrics I Lecture 3: The Simple Linear Regression Model Econometrics I Lecture 3: The Simple Linear Regression Model Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 32 Outline Introduction Estimating

More information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

7. Integrated Processes

7. Integrated Processes 7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2. Updated: November 17, 2011 Lecturer: Thilo Klein Contact: tk375@cam.ac.uk Contest Quiz 3 Question Sheet In this quiz we will review concepts of linear regression covered in lecture 2. NOTE: Please round

More information

Simple Regression Model. January 24, 2011

Simple Regression Model. January 24, 2011 Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Exercise sheet 3 The Multiple Regression Model

Exercise sheet 3 The Multiple Regression Model Exercise sheet 3 The Multiple Regression Model Note: In those problems that include estimations and have a reference to a data set the students should check the outputs obtained with Gretl. 1. Let the

More information

11. Simultaneous-Equation Models

11. Simultaneous-Equation Models 11. Simultaneous-Equation Models Up to now: Estimation and inference in single-equation models Now: Modeling and estimation of a system of equations 328 Example: [I] Analysis of the impact of advertisement

More information

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers Student Name: Economics 4818 - Introduction to Econometrics - Fall 2007 Final Exam - Answers SHOW ALL WORK! Evaluation: Problems: 3, 4C, 5C and 5F are worth 4 points. All other questions are worth 3 points.

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information