ECON 5350 Class Notes Functional Form and Structural Change

Similar documents
4. Nonlinear regression functions

Regression #8: Loose Ends

The regression model with one fixed regressor cont d

ECON 482 / WH Hong Binary or Dummy Variables 1. Qualitative Information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Econometrics Summary Algebraic and Statistical Preliminaries

Greene, Econometric Analysis (5th ed, 2003) Chapters 6, 7, 8: Inference, Prediction, Functional Form, Structural Change, Specification Error

Lecture 3: Multiple Regression

Categorical Predictor Variables

Ch 7: Dummy (binary, indicator) variables

Review of Econometrics

The multiple regression model; Indicator variables as regressors

ECON Interactions and Dummies

ECON 5350 Class Notes Nonlinear Regression Models

Econometrics -- Final Exam (Sample)

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Ch 3: Multiple Linear Regression

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

LECTURE 5 HYPOTHESIS TESTING

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Chapter 9: The Regression Model with Qualitative Information: Binary Variables (Dummies)

Chapter 3: Multiple Regression. August 14, 2018

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Lecture 6: Linear Regression

Scatter plot of data from the study. Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Christopher Dougherty London School of Economics and Political Science

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Applied Statistics and Econometrics

Scatter plot of data from the study. Linear Regression

Statistical View of Least Squares

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

6. Dummy variable regression

Lecture 6: Linear Regression (continued)

Final Exam - Solutions

Answer Key: Problem Set 5

ECON 4230 Intermediate Econometric Theory Exam

Midterm 2 - Solutions

Econometrics of Panel Data

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.

Finding Relationships Among Variables

Threshold Autoregressions and NonLinear Autoregressions

Ch 2: Simple Linear Regression

22s:152 Applied Linear Regression

ECON The Simple Regression Model

Bivariate Relationships Between Variables

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Applied Econometrics (QEM)

Lab 10 - Binary Variables

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Econ 510 B. Brown Spring 2014 Final Exam Answers

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Multiple linear regression S6

Simple Linear Regression

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

TMA 4275 Lifetime Analysis June 2004 Solution


A time series plot: a variable Y t on the vertical axis is plotted against time on the horizontal axis

MS&E 226: Small Data

Multiple Regression Analysis

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

1. The OLS Estimator. 1.1 Population model and notation

Topic 7: Heteroskedasticity

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Intermediate Econometrics

Instead of using all the sample observations for estimation, the suggested procedure is to divide the data set

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

AMS-207: Bayesian Statistics

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Day 4: Shrinkage Estimators

Applied Quantitative Methods II

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

FAQ: Linear and Multiple Regression Analysis: Coefficients

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Homoskedasticity. Var (u X) = σ 2. (23)

Interpreting Regression Results

Brief Suggested Solutions

EC402 - Problem Set 3

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Analysing data: regression and correlation S6 and S7

Simple Linear Regression

Nonlinear Regression Functions

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

The linear regression model: functional form and structural breaks

Introduction to Estimation Methods for Time Series models. Lecture 1

Chapter 10 Nonlinear Models

Lecture 4: Testing Stuff

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Making sense of Econometrics: Basics

Lectures on Structural Change

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Lecture 4: Multivariate Regression, Part 2

Binary Logistic Regression

Transcription:

ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this section, we introduce several types of nonlinear models and methods of testing for nonlinearity associated with structural changes. 2 Dummy Variables 2.1 Comparing Two Means Consider the regression model y i = α + x iβ + δd i + ɛ i for i = 1,..., n where d i is a dummy or binary variable equal to one if the condition is satisfied and zero otherwise. The dummy variable can be used to contrast two means E[y i x i, d i = 0] = α + x iβ E[y i x i, d i = 1] = α + x iβ + δ. A simple t test can then be used to test whether the conditional mean of y is different when d = 0 as opposed to d = 1. The null hypothesis for this test would be H 0 : δ = 0. An example will be presented below. 2.2 Several Categories Some qualitative variables are naturally grouped into discrete categories (e.g., seasons of the year, religious affi liation, race, etc.). It also often useful to artificially categorize quantitative variables (e.g., income levels, education, age, etc.). Consider the regression model y i = α + x iβ + δ 1 D 1i + δ 2 D 2i + δ 3 D 3i + ɛ i with a set of four binary variables (D 1i, D 2i, D 3i and D 4i ). Notice that one of the variables (D 4i ) is excluded from the model to avoid the dummy-variable trap. Inclusion of D 4i would violate the Classical assumption that the X matrix is of full rank. This occurs because the four dummy variables will sum to a column of ones, which is perfectly collinear with the constant. By excluding D 4i, the δ coeffi cients are interpreted as 1

the change in y when moving from the j th category (D ji = 1) as compared to 4 th category (D 4i = 1). 2.2.1 Example. Testing for seasonality. Consider two different models that test for seasonality in Y : y i = α + x iβ + δ 1 D 1i + δ 2 D 2i + δ 3 D 3i + ɛ i (1) y i = x iβ + θ 1 D 1i + θ 2 D 2i + θ 3 D 3i + θ 4 D 4i + ɛ i. (2) Each equation avoids the dummy-variable trap the first by excluding D 4i and the second by excluding the constant term α. Both equations generate the same goodness of fit, but the coeffi cients are interpreted differently. For equation (1) the test for seasonality is H 0 : δ 1 = δ 2 = δ 3 = 0 and for equation (2) the test is H 0 : θ 1 = θ 2 = θ 3 = θ 4. Both of these tests can be performed using the general F test developed in chapter 6 (provided the errors are normally distributed). It is instructive to relate the coeffi cients to one another Category Relationship between coeffi cients D 1 = 1 θ 1 = α + δ 1 δ 1 = θ 1 θ 4 D 2 = 1 θ 2 = α + δ 2 δ 2 = θ 2 θ 4 D 3 = 1 θ 3 = α + δ 3 δ 3 = θ 3 θ 4 D 4 = 1 θ 4 = α This highlights the fact that the δ coeffi cients are measured as the effect on Y, relative to the excluded category. 2.3 Interactive Effects The dummy variables introduced in the previous two sections are intercept dummies because they cause parallel shifts in the regression line. Often, we want the slope of the regression line to change with some qualitative variable. For example, one might think that each extra inch of height for women is associated with a different change in weight than it is for men (i.e., the slope of the regression of weight on height is different for men than women). This can be incorporated using slope dummies or interaction terms. 2

Consider the following regression model y i = α + β 1 x i + δ 1 D i + β 2 (x i D i ) + ɛ i. (3) The slope of the regression line when D = 0 equals β 1. The slope of the regression line when D = 1 equals β 1 + β 2. Therefore, to test whether the slope of the regression line is different when D = 0 versus D = 1 can be carried out by a simple t test of the null hypothesis H 0 : β 2 = 0. The intercept dummy term (δ 1 D i ) is included so the two regression lines are free to have different intercepts. 2.4 Spline Regressions Sometimes it is useful to let the slope of a regression line vary with some threshold value of a continuous explanatory variable. This so-called spline regression is basically a regression line with a kink(s). Consider the regression y i = α + βx i + ɛ i with the additional restriction that the regression line is continuous everywhere but has a kink at x 0. The point x 0 is referred to as a knot. Note that the regression line is not differentiable at x 0. We need to introduce a dummy variable The new spline regression model is D i = 1 if x i > x 0 0 otherwise. y i = α + β 1 x i + β 2 (x i D i ) + β 3 D i + ɛ i (4) with the additional restriction that the line is continuous at x i = x 0 (i.e., α + β 1 x 0 = α + β 1 x 0 + β 2 x 0 + β 3 ). Substituting the restriction β 3 = β 2 x 0 back into equation (4) and rearranging gives y i = α + β 1 x i + β 2 D i (x i x 0 ) + ɛ i. The slope of the regression line when x i < x 0 equals β 1. The slope of the regression line when x i > x 0 equals β 1 + β 2. 2.5 MATLAB Example. Earnings Equation. Consider the following (log) earnings equation ln(wage i ) = β 1 + β 2 Age i + β 3 Age 2 i + β 4 Grade i + β 5 Married i + β 6 (Married i Grade i ) + ɛ i 3

and a sample of 1000 men taken from the 1988 Current Population Survey. The code shows how to estimate (and graph) a regression model with both an intercept and a slope dummy variable. The code also estimates (and graphs) a spline regression with knots at Grade i = 12 and Grade i = 16. Finally, it tests to see if the slopes are equal across the different education levels (see MATLAB example 12). 3 Nonlinear Functional Forms Below are some common nonlinear functional forms for regression models. Note that although some regression models may appear to violate Classical assumption #1 (i.e., the model is linear in the coeffi cients and error term), it is sometimes possible to take a linearizing transformation of the model (e.g., see the double-log form below). 3.1 Double-Log Consider the following regression model y i = β 1 k j=2 x β j j,i exp(ɛ i) which initially appears to be nonlinear in the parameters and the error term. However, by taking natural logs of both sides, we get ln(y i ) = β 1 + β 2 ln(x 2,i ) + β 3 ln(x 3,i ) +... + β k ln(x k,i ) + ɛ i, which is now obviously linear in β and ɛ. This is called the double-log functional form. The β coeffi cients can be interpreted as elasticities ln(y i ) ln(x ki ) = β k or the percentage change in y i for a one percent change in x ki, all else equal. 3.2 Semi-Log The semi-log functional form refers to either the dependent or some of the independent variables being logged. One example is ln(y i ) = β 1 + β 2 x 2,i + β 3 x 3,i +... + β k x k,i + ɛ i. 4

3.2.1 Notes. 1. If, for instance, we introduced a time trend t as an explanatory variable, then ln(y i )/ t gives the conditional average growth rate of y i. 2. β 2, for example, can be interpreted as the percentage change in y i for a unit change in x 2,i. 3.3 Polynomial An example of a polynomial functional form is y i = β 1 + β 2 x 2,i + β 3 x 2 2,i + β 4 (x 2,i x 3,i ) + ɛ i. The coeffi cients do not measure the relevant partial derivatives. For example y i x 2i = β 2 + 2β 3 x 2,i + β 4 x 3,i which needs to be evaluated at some value for x 2 and x 3. High-order polynomial functional forms can provide excellent goodness of fit within the sample, however, the out-of-sample fit is often poor. Rarely does theory call for anything over second-order polynomials. 3.4 Box-Cox Transformations The Box-Cox transformation introduces a very flexible and general functional form. For example, the Box-Cox transformation of y i = α + βx i + ɛ i is ( y λ1 i 1 = α + β λ 1 ) i 1 + ɛ i. x λ2 λ 2 Unfortunately, the parameters λ 1 and λ 2 need to be estimated along with α and β. This produces a nonlinear estimation problem (we will discuss this further beginning in chapter 9). Here are some possible outcomes Parameter combination λ 1 = λ 2 = 1 λ 1 = 0, λ 2 = 1 λ 1 = λ 2 = 0 λ 1 = 1, λ 2 = 1 Model type linear model semi-log model double-log model reciprocal model. 5

4 Structural Change In this section, we are concerned with testing for structural change in the regression model. We start by assuming that we know the location of the possible break point. Then later, we present a more flexible procedure for identifying possible unknown break points. 4.1 Tests with Known Break Points We begin by partitioning the data into two parts. Denote the sample sizes of each part as n 1 and n 2, where n = n 1 + n 2. The regression model is Y 2 Y 1 = X 1 0 β 1 + ɛ 1 Y = Xβ + ɛ 0 X 2 ɛ 2 β 2 where β 1 and β 2 are (k 1) vectors. We can estimate this model using least squares b = X 1X 1 0 0 X 2X 2 1 X 1Y 1. X 2Y 2 Next, we test for structural change using our standard F test, where the hypotheses are H 0 : Rβ = q or β 1 = β 2 H A : Rβ q or β 1 β 2 and R = (I k I k ) and q = 0. The standard F statistic is F = (Rb q) [R(X X) 1 R ] 1 (Rb q)/k s 2 F (k, n 2k). 4.1.1 Notes. 1. This test is often referred to as a Chow test after Gregory Chow (1960). 2. The test can be performed using the unrestricted and restricted residuals as well. 3. It is also possible to test for a break in only the slopes or only the intercepts. 4. Sometimes the error variances may not be equal on each side of the break. In this case, the appropriate 6

test is of the Wald type W = (ˆβ 1 ˆβ 2 ) (V 1 + V 2 ) 1 (ˆβ 1 ˆβ 2 ) which is asymptotically distributed chi-square with k degrees of freedom. V 1 and V 2 refer to the asymptotic variance-covariance matrices of ˆβ 1 and ˆβ 2 respectively. 4.2 A Test with Unknown Break Points The Chow test above requires that the user knows the break point with certainty. Often structural change is more subtle and gradual, with an unknown break point. The CUSUM test below is designed for this type of situation. Begin by defining the recursive residuals (t = 1,..., T ) as e t = y t x tb t 1 where b t 1 = (X t 1X t 1 ) 1 X t 1Y t 1 and X t 1 is the (t 1 k) matrix of regressors including data from 1,..., t 1. The variance of the recursive residuals is var(e t ) = E(e te t ) = var(ɛ t x t(b t 1 β)) = σ 2 + x tσ 2 (X t 1X t 1 ) 1 x t = σ 2 (1 + x t(x t 1X t 1 ) 1 x t ). Now define w t = e t (1 + x t (X t 1 X t 1) 1 x t ), which under constant parameters is distributed w t iidn(0, σ 2 ). The CUSUM test is based on the cumulative sum of w t W t = t w r r=k+1 ˆσ where ˆσ 2 = t (w r w) 2 r=k+1 T k 1 and w = t w r r=k+1 T k. The CUSUM statistic can then be graphed over time with the confidence bands connecting the points [k ± a (T k)] and [T ± 3a (T k)]. For a 95% (99%) confidence interval, a = 0.948 (a = 1.143). See MATLAB example 13 for a Monte Carlo experiment to find unknown break points using the recursive residuals. 7

4.2.1 Notes 1. It is also possible to graph the recursive residuals themselves or the sum of the squared recursive residuals (CUSUMSQ test). 2. Other tests are available for finding unknown structural break points, such as the Hansen (1992) test and Andrews (1993) test. 8