Chapter 7. Testing Linear Restrictions on Regression Coefficients

Similar documents
Chapter 3 Multiple Regression Complete Example

Lectures 5 & 6: Hypothesis Testing

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Econometrics Homework 1

Multiple Regression Analysis

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

1 A Non-technical Introduction to Regression

Chapter 3 Statistical Estimation of The Regression Function

ECON 497: Lecture Notes 10 Page 1 of 1

Rockefeller College University at Albany

Regression Analysis. BUS 735: Business Decision Making and Research

Chapter 11 Handout: Hypothesis Testing and the Wald Test

Chapter 4: Regression Models

Inference in Regression Model

Basic Business Statistics, 10/e

Inference with Simple Regression

Section 3: Simple Linear Regression

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 4 Suggested Solution

Regression Analysis II

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Finding Relationships Among Variables

Motivation for multiple regression

Chapter 7 Student Lecture Notes 7-1

Use of Dummy (Indicator) Variables in Applied Econometrics

Multiple Regression Analysis

Statistical Inference. Part IV. Statistical Inference

Simple Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

11.5 Regression Linear Relationships

2. Linear regression with multiple regressors

Wednesday, October 17 Handout: Hypothesis Testing and the Wald Test

LI EAR REGRESSIO A D CORRELATIO

The Multiple Regression Model

Inferences for Regression

Correlation Analysis

Class time (Please Circle): 11:10am-12:25pm. or 12:45pm-2:00pm

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

1 Correlation and Inference from Regression

Mathematics for Economics MA course

Chapter 14 Multiple Regression Analysis

2 Prediction and Analysis of Variance

Variance Decomposition and Goodness of Fit

Monday, October 15 Handout: Multiple Regression Analysis Introduction

Business Statistics. Lecture 10: Course Review

Spring 1998 Final exam (35%)

The multiple regression model; Indicator variables as regressors

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Chapter 14 Student Lecture Notes 14-1

Review of Statistics

STAT 212 Business Statistics II 1

Chapter 9. Dummy (Binary) Variables. 9.1 Introduction The multiple regression model (9.1.1) Assumption MR1 is

Types of economic data

Lab 11 - Heteroskedasticity

Solutions: Monday, October 15

determine whether or not this relationship is.

Chapter 13. Multiple Regression and Model Building

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Wooldridge, Introductory Econometrics, 4th ed. Chapter 6: Multiple regression analysis: Further issues

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Chapter 9 Inferences from Two Samples

Multiple Regression Analysis: The Problem of Inference

ECON 497 Midterm Spring

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Multiple Regression: Inference

Statistics for Managers using Microsoft Excel 6 th Edition

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

EXAMINATION QUESTIONS 63. P f. P d. = e n

Practice exam questions

Inference in Regression Analysis

STATISTICS 110/201 PRACTICE FINAL EXAM

Chapter 4. Regression Models. Learning Objectives

MGEC11H3Y L01 Introduction to Regression Analysis Term Test Friday July 5, PM Instructor: Victor Yu

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Testing and Model Selection

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Chapter 10 Nonlinear Models

Instead of using all the sample observations for estimation, the suggested procedure is to divide the data set

Topic 14: Inference in Multiple Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Chapter 14 Simple Linear Regression (A)

Business Statistics 41000: Homework # 5

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Chapter 1. An Overview of Regression Analysis. Econometrics and Quantitative Analysis. What is Econometrics? (cont.) What is Econometrics?

Midterm 2 - Solutions

Unit 10: Simple Linear Regression and Correlation

Chapter 10: Multiple Regression Analysis Introduction

Topic 1. Definitions

1 Independent Practice: Hypothesis tests for one parameter:

The simple linear regression model discussed in Chapter 13 was written as

TESTING FOR CO-INTEGRATION

Multiple Linear Regression CIVL 7012/8012

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Ch 13 & 14 - Regression Analysis

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Ordinary Least Squares Regression Explained: Vartanian

Economics 618B: Time Series Analysis Department of Economics State University of New York at Binghamton

Chapter 16. Simple Linear Regression and Correlation

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Transcription:

Chapter 7 Testing Linear Restrictions on Regression Coefficients 1.F-tests versus t-tests In the previous chapter we discussed several applications of the t-distribution to testing hypotheses in the linear regression model. In this chapter the range of possible hypotheses is extended beyond the scope of the t-distribution and accordingly we will introduce the F-distribution. Hypotheses can be thought of as restrictions on the population parameters. In Chapter 6, all the hypotheses that we considered implied a single restriction e.g. H 0 : 1 = 0. We will refer to these as Single Restriction Hypotheses (SRH). In this chapter we introduce the idea of Multiple Restriction Hypotheses (MRH). To put he discussion in context consider the general linear regression model: Y =. + ß 1 X 1 + ß 2 X 2 +... + ß K X K + u [7.1] In this model there are H = K + 1 coefficients on the right hand side. If the model is estimated with n observations, then the number of degrees of freedom is (n - H). In the previous chapter it was shown that an SRH of the form H 0 : ß k = ß 0 can be tested by comparing the sample t-statistic (b k - ß 0 )/SE(b k ) = t(n - H) with the appropriate critical value of the t-distribution with (n - H) degrees of freedom. In many cases researchers need to test more complex hypotheses such as H 0 : ß 1 = ß 2 = 0 [7.2a] The hypothesis in [7.2a] not only involves two coefficients, but it involves two separate restrictions namely that ß 1 = 0 and ß 2 = 0. Since both restrictions are to be tested together, the hypothesis in [7.2a] is referred to as a Multiple Restriction Hypotheses (MRH). The key feature that distinguishes an MRH from an SRH hypothesis is not the number of parameters in the hypothesis, but rather the number of separate restrictions. Each equals sign in an MRH represents a single restriction. The MRH in [7.2a] has p = 2 restrictions, where p represents the number of restrictions. The simplest way to count the restrictions in an MRH is to count the number of equals signs. Other examples of MRHs are

Chapter 7 2 H 0 : ß 1 = ß 2 = ß 3 = 0 [7.2b] and H 0 : ß 1 = ß 2 and ß 3 = ß 4 [7.2c] In [7.2b] there are three equals signs, so p = 3 in this case. In example [7.2c], the MRH involves four parameters, but there are only two equals signs so p = 2 in this case. The main reason for distinguishing between single and multiple restriction hypotheses involving population regression coefficients is that an MRH can be tested only by using the F-distribution; the t- distribution cannot be used to test an MRH. 2. Hypotheses as Restrictions The most useful way to approach the testing of compound hypotheses is to focus on the fact that the hypothesis consists of p linear restrictions on the population regression coefficients. To illustrate these ideas consider the five-parameter model (H = 5): Y =. + ß 1 X 1 + ß 2 X 2 + ß 3 X 3 + ß 4 X 4 + u [7.3] Equation [7.3] is referred to as the unrestricted model. Hypothesis [7.2a] restricts two of the slope coefficients to be zero. To impose these restrictions, substitute the equations from the null hypothesis into the unrestricted model. The result is the so-called restricted model, which in this case is: Y =. + ß 3 X 3 + ß 4 X 4 + u [7.3a] Notice that in the restricted model [7.5a] there are three parameters to estimate. The five parameters in the unrestricted model have been reduced to three, H - p = 5-2 = 3. The reduction in the number of parameters is equal to the number of restrictions. Now consider the restrictions implied by the hypothesis in [7.2b]. When these three restrictions are imposed on the unrestricted model, the result is a restricted model with H - p = 5-3 = 2 parameters: Y =. + ß 4 X 4 + u [7.3b] Finally, consider the hypothesis [7.2c] which involves 4 coefficients but implies just two restrictions. To impose these restrictions we will eliminate the parameters ß 2 and ß 4 :

Chapter 7 3 Y =. + ß 1 X 1 + ß 1 X 2 + ß 3 X 3 + ß 3 X 4 + u Notice that the restricted model has H - p = 5-2 = 3 parameters. However, it is not in the usual form because there are four right hand side variables and a constant term, but only three free parameters. To estimate the restricted model for hypothesis [2.3c], variables with identical coefficients must be collected together: Y =. + ß 1 Z + ß 3 W + u [7.3c] where Z = X 1 + X 2 and W = X 3 + X 4. The three parameters that appear in the restricted model can be estimated by regressing Y on Z, W and a constant. The variables Z and W are calculated by adding together the appropriate pairs of X-variables. These examples illustrate the general principle that when p linear restrictions that are substituted into the unrestricted model with H parameters, the result is a restricted model with (H - p) parameters. Whereas estimating the unrestricted involves regressing Y on H variables (which include the constant term) the restricted model is estimated by regressing Y on (H - p) variables. As shown in the next section, the test statistic used to test H 0 is based on a comparison of how well the unrestricted and restricted models fit the sample data. 3. The F-statistic When the unrestricted model [7.1] is estimated using the available sample of n observations, there will be a certain Sum of Squared Residuals (SSR u ) associated with this unrestricted model. 2 SSR u = e i where e i = Y i - a - b 1 X 1i -... - b K X Ki To test an MRH such as the ones described by [7.2a], [7.2b] and [7.2c], the appropriate restricted model is fitted to exactly the same sample. The result will be a restricted Sum of Squared Residuals, SSR r. One point that we can be absolutely sure of is that the unrestricted model must fit the data at least as well as the restricted model. In other words, the unrestricted Sum of Squared Residuals must be less than or equal to the restricted Sum of Squared Residuals: SSR u SSR r i.e., SSR r - SSR u 0 When the unrestricted model is estimated, the H parameters are chosen by the least squares procedure to

Chapter 7 4 ensure that the Sum of Squared Residuals is as small as possible. When restrictions are imposed on the way in which the parameters are chosen it is not possible that the Sum of Squared Residuals will be reduced still further. To illustrate, consider the unrestricted model [7.3] and the restricted model [7.3a]. Suppose [7.3a] is estimated first and the estimated coefficients are a * *, b 3 and b 4*. The restricted Sum of Squared Residuals is SSR r. When the unrestricted model [7.3] is estimated, it is obvious that least squares could again achieve SSR r by simply choosing a = a * * *, b 3 = b 3 and b 4 = b 4 while b 1 = b 2 = 0. In fact, unrestricted least squares will generally reduce the Sum of Squared Residuals further by selecting non zero values of b 1 and b 2. Notice that unrestricted least squares can always reproduce the fit that is achieved by restricted least squares. In almost every sample, unrestricted least squares will fit the data better than the restricted model. What would it mean if in a particular sample SSR u = SSR r? This can happen only if the unrestricted parameters (chosen by unrestricted least squares) satisfy exactly the restrictions in the hypothesis. In the previous example, this would mean that unrestricted least squares is free to choose any values for b 3 and b 4, but the best values to choose just happen to be zero, as specified in the null hypothesis. In this unlikely event, we would have to conclude that the sample evidence is perfectly consistent with the hypothesis that the population parameters ß 3 and ß 4 are both zero. After all, when least squares is free to estimate these parameters, the estimates are zero. Clearly, imposing the restrictions in the hypothesis has no effect on the Sum of Squared Residuals. Normally we think of the unrestricted model as the base case. Now if the restrictions in H 0 result in a big increase in the Sum of Squared Residuals (compared to the unrestricted model), there is a conflict between the sample evidence and the hypothesis. In short, if the restrictions raise the Sum of Squared Residuals by a sufficient amount, then the hypothesis is rejected. The question of whether the restrictions raise the residual sum squares enough to cause H 0 to be rejected is answered by the F-statistic: F(p,n-H) = {(SSR r - SSR u )/SSR u }*(n-h)/p [7.4] The first part of the F-statistic is {(SSR r - SSR u )/SSR u }, which is the change in the Sum of Squared Residuals divided by the unrestricted Sum of Squared Residuals. In other words, the first part is the percentage increase in the Sum of Squared Residuals that results from the restrictions in H 0. If this is a large percentage increase, it indicates the sample information contradicts the hypothesis. The second part of the F-statistic

Chapter 7 5 is the ratio (n-h)/p, i.e., the degrees of freedom in the unrestricted model divided by the number of restrictions in H 0. The F-statistic is characterized by two degrees of freedom, p and (n-h). The first is p, often referred to as the degrees of freedom in the numerator of the F-statistic, since it is the p restrictions that are responsible for the quantity (SSR r - SSR u ). Notice that while the numerator of [7.4] has p is the degrees of freedom, the value of p actually appears in the denominator of [7.4]. The second quantity is (n-h), which are the degrees of freedom in denominator term, namely the unrestricted residuals, SSR u. The critical value of the F-statistic depends on three factors. First, is the level of significance or the size of the test. Typically, tests are conducted at the 5% or 1% level of significance. Second, the critical value depends on the degrees of freedom p and (n-h). The critical F-value for a test at the 5% level of significance is written: F.05 (p, n-h) The hypothesis H 0 is rejected if: F(p,n-H) > F.05 (p, n-h) 4. An Equivalent Form of the F-statistic The F-statistic can be expressed in an alternative form by noting that the estimate of the variance of the population deviations u 1 u 2 from the unrestricted model is s 2 2 = {1/(n-H)}e i = {1/(n-H)}SSR u The F-statistic in [7.4] can therefore be written as F(p,n-H) = {(SSR r - SSR u )}/(p*s 2 ) [7.5] 2 It is important to note that [7.5] uses the estimate of 1 u based on the unrestricted model. The intuitive explanation for this is that the restrictions in H 0 may be false and if they are the restricted model will provide 2 a biased estimate of 1 u2. The unrestricted model gives an unbiased estimate of 1 u whether the restrictions are true or false (assuming as we are that the unrestricted model is a valid starting point). 5 Two Examples: A Time Series Saving Equation

Chapter 7 6 The dependent variable in equation [7.6] is RSAV which represents per capita real savings in Canada. RSAV has been computed from quarterly National Accounts data over a period that starts in the first quarter of 1951 (1951:1) up to the 3 rd quarter in 1993 (1993:3) which amounts to 171 quarterly observations. In [7.6] quarterly per capita household saving is explained by real personal disposable income per capita (RPDI) and an annual interest rate (R4Y). It is expected that the coefficients on both RPDI and R4Y will be positive - higher incomes and higher interest rates encourage households to save more of their income. In addition, the four quarterly dummy variables Q1 to Q4 allow RSAV to exhibit quarterly seasonal variations. See Chapter 5 for a more detailed discussion of this model and the data. RSAV =. 1 Q1 +. 2 Q2 +. 3 Q3 +. 4 Q4 + ß 1 *RPDI + ß 2 *R4Y + u [7.6] The first hypothesis that we will test is: H 0 : ß 1 = 0 against the alternative that ß 1 g 0. The restricted model is obtained by substituting the restriction from H 0 into [7.6]. The restricted model is therefore: RSAV =. 1 Q1 +. 2 Q2 +. 3 Q3 +. 4 Q4 + ß 2 *R4Y + u [7.7] Notice that a single restriction reduces the number of right hand side parameters from 6 to 5 i.e., a reduction of one. Table 7.1 reports the results of estimating two regressions that correspond to the unrestricted [7.6] and restricted [7.7] models. The F-statistic for this test is given by equation [7.4] and is calculated using the restricted and unrestricted residual sums of squares that are reported in Table 7.1. Notice that there are 171 observations, H u = 6 and p = 1. F(1, 165) = {(12.0117*10 5-8.62242*10 5 )/8.62242*10 5 }*(165/1) F(1, 165) = {(12.0117-8.62242)/8.62242}*165 = 64.858 The critical F-value for a test at the 5% level of significance can be located in the table of critical values. The appropriate number is found by consulting the relevant row and column where row = (n - H u ) and column = p. Since there is no row for (n - H u ) = 165 we consult the next row up, which applies to 120 degrees of freedom. The critical value is F * (1,120) = 3.92 and since the computed F-statistic is much greater than F *, the null hypothesis is rejected. The sample evidence contradicts the null hypothesis that income has no effect on saving. To put this in a positive way, the sample evidence supports the view that saving is positively related to income. The hypothesis H 0 : ß 1 = 0 is an SRH not an MRH and could just have easily been tested using the

Chapter 7 7 t-test. Notice that TSP automatically reports the t-statistics for all the separate hypotheses that the population each regression coefficient is zero. For the null hypothesis that the true income coefficient is zero, the t- statistic is 8.053 and the critical value for a two-tailed test at the 5% level of significance and 120 degrees of freedom is 1.98. Thus according to the t-test, the hypothesis is rejected. In fact, the t-test and the F-test are equivalent tests because F(1, n-h u ) = t 2 (n-h u ). Notice that the square of the sample t-statistic is 8.053 2 = 64.854, which is the sample F-statistic. Also the square of the critical t-value is 1.98 2 = 3.92, which is the critical F-value. This functional relationship ensures that the t-test and F-test approaches to testing simple hypotheses are bound to result in identical conclusions. In no case will they produce conflicting evidence.

Chapter 7 8 Table 7.1 Dependent variable: RSAV Current sample: 1951:1 to 1993:3 Number of observations: 171 Mean of dependent variable = 232.794 Std. dev. of dependent var. = 226.434 The Unrestricted Model: Sum of squared residuals = 862242. Variance of residuals = 5225.71 Std. error of regression = 72.2891 R-squared =.901077 Estimated Standard Variable Coefficient Error t-statistic Q1-183.794 20.4720-8.97782 Q2-250.414 21.3650-11.7207 Q3 61.3401 23.9679 2.55926 Q4-337.821 21.6205-15.6250 RPDI.119329.014817 8.05348 R4Y 17.2214 3.41371 5.04477 The Restricted Model Sum of squared residuals =.120117E+07 Variance of residuals = 7235.99 Std. error of regression = 85.0646 R-squared =.862193 Estimated Standard Variable Coefficient Error t-statistic Q1-85.4744 19.3377-4.42008 Q2-142.128 19.5377-7.27455 Q3 199.081 19.7585 10.0757 Q4-226.472 19.5592-11.5788 R4Y 40.9752 2.02240 20.2606 Example 2: The second example tests the hypothesis that the seasonal dummy variables all have the same coefficient. If this hypothesis is true then there is no seasonal variation in saving beyond the seasonality that may be transmitted to saving from any seasonal variations in income and the interest rate. In other words,. Specifically, in the context of equation [7.6] the hypothesis is H 0 :. 1 =. 2 =. 3 =. 4

Chapter 7 9 There are three "=" signs in the null hypothesis (p = 3) and so this MRH must be tested with an F-statistic - there is no valid t-test for this case. The unrestricted or alternative model is shown in equation [7.6]. The null hypothesis is that the seasonal intercepts are all identical. Let. be the common value of the coefficients on Q1, Q2, Q3 and Q4. When the restrictions are substituted into [7.6], the four seasonal factors collapse to a constant term with coefficient.:. 1 Q1 +. 2 Q2 +. 3 Q3 +. 4 Q4 =.(Q1 + Q2 + Q3 + Q4) =.C where C is a vector of 1's i.e. the constant term. (see the Chapter 5 for more details on the relationship between the Q s and C) So the restricted model is RSAV =.C + ß 1 *RPDI + ß 2 *R4Y + u [7.8] where C is a vector of units ("1s"). Notice that if. is substituted for the seasonal.'s in [7.6], the common. can be factored out:.(q1 + Q2 + Q3 + Q4) =.C, since the vector sum of the seasonal dummy variables is the constant vector C which has 1 in every position. Table 7.2 shows the results of estimating the restricted model [7.8]. To test the null hypothesis, the key information in Table 7.2 is the sum of squared residuals. From equation [7.4] the F-statistic is F(3, 165) = {(44.0865*10 5-8.62242*10 5 )/8.62242*10 5 }*(165/3) F(3, 165) = 226.22 The critical value F * (3,120) at the 5% level of significance is 2.68, so the null hypothesis is overwhelmingly rejected. That is, the hypothesis that there is no seasonal variation in real per capita saving beyond that which can be explained by seasonal variations in RPDI and R4Y is rejected. Again, it is useful to express the evidence in a positive way. The evidence in the sample supports the contention that RSAV has a seasonal component that cannot be explained by the any seasonal variations in RPDI and R4Y. Suppose the hypothesis had not been rejected. In this case, the sample evidence would not have been able to reject model [7.8]. Notice that model [7.8] could be consistent with some seasonal pattern in RSAV. But in [7.8], all of the seasonal pattern in RSAV has to come from seasonal variations in RPDI or R4Y. In [7.8], there can be no independent seasonality in RSAV. However, the sample evidence has strongly rejected [7.8]. Finally, it is instructive to look at the contents of Table 7.2 in more detail. First, note that the R-

Chapter 7 10 squared statistic is 0.49 and this much lower than the R-squared for the unrestricted model. Clearly, the restrictions result in a much poorer fit (the sum of squared residual rises considerably. Also, the coefficient on R4Y in Table 7.2 has a t-statistic of 0.03. This low value means that in the context of [7.8], we would not reject the hypothesis that the interest rate has no effect on real saving. (If someone claims interest rates have no effect on saving, the evidence in [7.2] cannot reject this claim.) But is this really what the data imply? The answer is no because [7.2] is itself rejected by the data. This illustrates how a badly specified model such as [7.8] can give misleading inferences. Model [7.8] has been rejected in favour of [7.6] and when [7.6] is estimated the claim that the interest rate has no effect on real saving is rejected (interest rates do appear to influence saving.) Dependent variable: RSAV Current sample: 1951:1 to 1993:3 Number of observations: 171 Table 7.2 Mean of dependent variable = 232.794 Std. dev. of dependent var. = 226.434 Sum of squared residuals =.440865E+07 Variance of residuals = 26242.0 Std. error of regression = 161.994 R-squared =.494208 Estimated Standard Variable Coefficient Error t-statistic C -269.761 42.6186-6.32965 RPDI.209469.031550 6.63932 R4Y.242620 7.39724.032799 6 Example 3: Restrictions that Modify the Dependent Variable The hypotheses that we have considered so far have all led to restricted models that alter the right hand side of the equation. In some cases the restrictions alter the dependent variable as well and we will consider such an example in this section. For illustrative purposes consider the model in [7.9] y. 1 X 1 2 X 2 3 X 3 u [7.9]

Chapter 7 11 The null hypothesis is H 0 : 1 1.5 and 2 1.0 When the restrictions in the null are substituted into [7.9] to obtain the restricted model we get an equation in which two coefficients on the right hand side are predetermined. Since it is not possible to estimate a model in that form, the predetermined part is brought to the left hand side and so redefines the dependent variable: y. 1.5X 1 1.0X 2 3 X 3 u (y 1.5X 1 1.0X 2 ) 3 X 3 u W 3 X 3 u [7.10] where W (y 1.5X 1 1.0X 2 ) Equation [7.10] is the restricted model. It is derived by substituting the restrictions implied by the null hypothesis into the unrestricted model. In order to estimate the restricted model in [7.10], it is necessary to calculate the modified dependent variable, W. Now lets turn to the empirical example. The price of a single-detached house depends on the characteristics of the house and the location of the house. Table 7.3 reports the results of a study of 2446 house sales. Each transaction represents the sale of a single house for a certain price on a certain date. Some characteristics of these houses have been recorded in the sample. SAGAR is a dummy variable that takes the value 1 if the house has a single attached garage, SAGER = 0 otherwise. DAGAR signals the presence of a double attached garage. SIZE and LSIZE are the size of the house and lot in square feet. AGE is the age of the house in years. BATHP is the number of bathroom pieces (for example, two full bathrooms each with basin, shower, tub and toilet means BATHP = 8). POOLIG and POOLAG are dummy variables that record the presence of in-ground and above-ground pools respectively. TIME is a time trend variable that records the date of the market transaction. The full set of results in Table 7.3 will not be discussed here. We will focus on the value of single and double garages, which are the subject of our the hypotheses. First, note from Table 7.3 that the

Chapter 7 12 coefficient on SAGAR implies a single attached garage is estimated to raise the house price by $6,292 while a double garage is valued at $16,935. (Prices and values of characteristics apply to the early 1980s in the City of Guelph). We will consider two hypotheses concerning the value of garages. Hypotheses should be framed before looking at the data, so for the purposes of our example, assume that these hypotheses were suggested by two real estate agents who derived them from their general knowledge of the housing market, not the statistical results in Table 7.3. The first hypothesis states that single car garages and double car garages raise house prices by 6000 and 17000 dollars respectively. These values are very close to the unrestricted least squares estimates in Table 7.3, so it will not be surprising if this hypothesis is not rejected by the data i.e., consistent with the data.. The second hypothesis claims that single and double car garages raise house prices by 5000 and 15000 dollars respectively. These values are further from the unrestricted estimates and in fact, as we will see, far enough away that the hypothesis is rejected at the 5% level of significance. The procedure is to substitute the restrictions into the model and to estimate the resulting restricted model. Since the restrictions impose particular non-zero coefficient values on SAGAR and DAGAR, this is a case in which the dependent variable is modified.

Chapter 7 13 Current sample: 1 to 2446 Dependent variable: PRICE Current sample: 1 to 2446 Number of observations: 2446 Table 7.3 Mean of dependent variable = 68055.8 Std. dev. of dependent var. = 23079.5 Sum of squared residuals =.266682E+12 Variance of residuals =.109475E+09 Std. error of regression = 10463.0 R-squared =.795233 Adjusted R-squared =.794476 Estimated Standard Variable Coefficient Error t-statistic C 7192.60 1076.40 6.68208 SAGAR 6291.55 554.959 11.3370 DAGAR 16935.3 885.395 19.1274 SIZE 29.4499.768985 38.2971 LSIZE.813466.073681 11.0403 AGE -197.372 8.83127-22.3493 BATHP 1473.48 156.482 9.41630 POOLAG 3879.60 1422.17 2.72794 POOLIG 7702.80 1235.13 6.23644 TIME 586.752 20.6769 28.3771 In the TSP program (see Appendix A), the new dependent variables are W1 and W2. The program calculates the F-statistics for the two tests and the values are printed in Table 7.4 along with the two restricted models. The critical values of the F-statistic is obtained from the appropriate statistical table.

Chapter 7 14 Table 7.4 Dependent variable: W1 Number of observations: 2446 Mean of dependent variable = 64417.6 Std. dev. of dependent var. = 19848.2 Sum of squared residuals =.266724E+12 Std. error of regression = 10459.6 R-squared =.723087 Estimated Standard Variable Coefficient Error t-statistic C 7314.13 1021.17 7.16254 SIZE 29.4590.732548 40.2144 LSIZE.813090.072899 11.1537 AGE -198.851 7.99702-24.8657 BATHP 1471.51 153.500 9.58638 POOLAG 3889.83 1420.57 2.73822 POOLIG 7718.09 1234.36 6.25269 TIME 586.954 20.6453 28.4304 F = 0.19532 At the 5% level of significance, the critical F(2, 2446-10) is 3.00 so the null hypothesis is not rejected. Dependent variable: W2 Number of observations: 2446 Mean of dependent variable = 64932.3 Std. dev. of dependent var. = 20208.8 Sum of squared residuals =.267458E+12 Std. error of regression = 10474.0 R-squared =.732148 Estimated Standard Variable Coefficient Error t-statistic C 6783.39 1022.57 6.63368 SIZE 30.0146.733555 40.9167 LSIZE.837435.072999 11.4719 AGE -207.148 8.00800-25.8677 BATHP 1539.94 153.711 10.0184 POOLAG 3772.40 1422.52 2.65191 POOLIG 7773.25 1236.06 6.28874 TIME 589.411 20.6737 28.5102 F = 3.54561 At the 5% level of significance, the critical F(2, 2446-10) is 3.00 so the null hypothesis is rejected

Chapter 7 15 7 Structural Change It is often important to investigate whether regression parameters are stable over time. Did the North American Free Trade Agreement cause a shift in Canada's import function (the relationship between imports and national income)? Do advertising expenditures shift demand curves and can advertising brand names reduce own-price elasticities to the advantage of firms)? Is the demand for money function stable? If it is not, then a demand function estimated over the recent past may not produce good forecasts in the future. Economists often talk about "testing for structural change." What they really mean is that they want to test a hypothesis that parameters are stable (the restricted case) against an alternative in which parameters change from one period to another. The alternative or unrestricted model always has more parameters than the restricted model and it is the restricted model that the data may reject. In the previous examples equation [7.6] represented the unrestricted model. We simply accepted that over the period 1951:1 to 1993:3 the parameters of the saving function were stable or constant. That assumption will now be put to the test. To do so we will specify an unrestricted model in which the parameters are allowed to change at the end of 1972:1 In the new unrestricted model, all six parameters are allowed to take different values in the two subperiods (1951:1 to 1972:1) and (1972:2 to 1993:3). On way to estimate the new unrestricted model is to break the sample into two part and estimate [7.6] using the two sub-samples. The results are shown in Table 7.5

Chapter 7 16 Table 7.5 Dependent variable: RSAV Current sample: 1951:1 to 1972:1 Number of observations: 85 Mean of dependent variable = 96.8825 Sum of squared residuals = 156687, R-squared =.946904 Estimated Standard Variable Coefficient Error t-statistic Q1-91.3616 34.5044-2.64783 Q2-194.670 35.3958-5.49982 Q3 160.623 45.2306 3.55121 Q4-260.246 36.4484-7.14012 RPDI.172385.035733 4.82427 R4Y -21.4156 6.90464-3.10162 Current sample: 1972:2 to 1993:3 Number of observations: 86 Mean of dependent variable = 367.125 Sum of squared residuals = 446575, R-squared =.831057 Estimated Standard Variable Coefficient Error t-statistic Q1-72.5489 84.4227 -.859353 Q2-98.9469 88.0526-1.12373 Q3 170.721 94.6098 1.80448 Q4-203.410 88.6013-2.29579 RPDI.061144.031342 1.95087 R4Y 23.3917 3.73121 6.26920 On the one hand, the seasonal pattern in real saving does not seem to have changed substantially between the two subperiods. Both before and after 1972:1 savings are highest in the third quarter and lowest in the fourth quarter (given a certain level of disposable income and a certain interest rate). On the other hand, the estimated interest rate coefficient is negative in the first sub-period and positive in the second. The marginal propensity to save also seems to change dramatically between the two subperiods. The question is: are these differences statistically significant? Do the data reject parameter constancy? The question boils down to whether a model with fixed coefficients over the entire sample can explain the data as well as a model that allows the coefficients to be different in the two sub-samples. To be specific, the unrestricted model is RSAV =. 11 Q1 +. 21 Q2 +. 31 Q3 +. 41 Q4 + ß 11 *RPDI + ß 21 *R4Y + u 1951:1-1972:1 RSAV =. 12 Q1 +. 22 Q2 +. 32 Q3 +. 42 Q4 + ß 12 *RPDI + ß 22 *R4Y + u 1972:2-1993:3

Chapter 7 17 Notice that in the unrestricted model there are H u = 12 parameters. In the context of the above model, the null hypothesis is: H 0 :. 11 =. 12,. 21 =. 22,. 31 =. 32,. 41 =. 42, ß 11 = ß 12, ß 21 = ß 22 Notice that there are 6 "=" signs in the null hypothesis, that is p = 6 restrictions. When these restrictions are imposed, the model reduces to [7.6], which applies to the entire sample period. The sum of squared residuals for the unrestricted model over the entire sample range is the total of the sum of squared residuals for both subperiods. From Table 7.5, that total is (1.56687 + 4.46575)*10 5 = 6.03262*10 5. In the present case, the estimates of the restricted model are reported in the top half of Table 7.1 There, the sum of squared residuals are 8.62242*10 5. After cancelling the common factors 10 5, the F-statistic for this test is therefore: F(6, 171-12) = {(8.62242-6.03262)/6.03262}*(159/12) = 5.688 The critical F-value, F * (6, 120) is 2.17. Since the calculated F-statistic is well above the critical value, the hypothesis of parameter stability is rejected in favour of the alternative hypothesis that the parameters have changed over time. Having rejected parameter stability, the next question is one of detail. Did all the parameters shift, or did just a few shift? The evidence in Table 7.5 suggests that the interest rate and income coefficients are very different between the two subperiods, but that the basic seasonal pattern in saving did not change at least in qualitative terms. The unrestricted model continues to be the pair of linear equations fitted to the subsamples 1951:1-1972:1 and 1972:2-1993:3 (which has 12 coefficients.) Consider first the hypothesis that the coefficients on income and the interest rate remained constant over the entire period while the quarterly intercepts shifted (given the evidence in Table 7.5, it would not be surprising if this is rejected). The hypothesis is: H 0 : ß 11 = ß 12, ß 21 = ß 22 [7.13] This hypothesis contains two restrictions (p = 2). The alternative model is that all 6 parameters shifted. The unrestricted model has already been estimated (see Table 7.5). But how is the restricted model to be estimated? The restricted model allows the seasonal dummy coefficients to shift between the two subperiods, but constrains the other two coefficients to be constant. Appendix B lists a TSP program that estimates the restricted and unrestricted models for this example. The first olsq statement estimates the unrestricted model

Chapter 7 18 in one step (instead of two separate regressions). Notice that RSAV is regressed on 12 variables. Q11, Q21, Q31 and Q41 are the seasonal dummies in the first half of the sample. Q12, Q22, Q32 and Q42 are the seasonal dummies in the second half of the sample. Similarly the coefficients on RPDI1 and RPDI2 are the marginal propensities to save in the first and second sub-samples respectively. The results from the first olsq statement are presented in Table 7.6. Notice that the results in Table 7.6 are identical to those from the separate subsample regressions that are reported in Table 7.5. (Compare the sum of squared residuals) Table 7.6 Dependent variable: RSAV Current sample: 1951:1 to 1993:3 Number of observations: 171 Mean of dependent variable = 232.794 Sum of squared residuals = 603262. R-squared =.930789 Estimated Standard Variable Coefficient Error t-statistic Q11-91.3616 47.7228-1.91442 Q21-194.670 48.9557-3.97646 Q31 160.623 62.5581 2.56758 Q41-260.246 50.4115-5.16243 Q12-72.5489 69.6003-1.04236 Q22-98.9469 72.5929-1.36304 Q32 170.721 77.9988 2.18876 Q42-203.410 73.0453-2.78471 RPDI1.172385.049422 3.48803 RPDI2.061144.025839 2.36633 R4Y1-21.4156 9.54976-2.24253 R4Y2 23.3917 3.07611 7.60431 The second olsq statement in Appendix B estimates the restricted model for this hypothesis. Notice that in the restricted regression there are two sets of seasonal dummy variables, but single coefficients on RPDI and R4Y. The results of the restricted regression are presented in Table 7.7. The restricted sum of squared residuals is 7.03013*10 5. The F-statistic for the hypothesis [7.13] is F(2, 159) = {(7.03013-6.03262)/6.03262}*(159/2) = 13.15 The critical F-value for a test at the 5% level of significance is 3.07, so the null hypothesis is rejected.

Chapter 7 19 7.7 Dependent variable: RSAV Current sample: 1951:1 to 1993:3 Number of observations: 171 Mean of dependent variable = 232.794 Sum of squared residuals = 703013, R-squared =.919345 Estimated Standard Variable Coefficient Error t-statistic Q11-61.7681 32.6101-1.89414 Q21-160.923 33.2228-4.84376 Q31 234.023 39.2643 5.96020 Q41-224.417 34.0253-6.59560 Q12 42.5575 55.3031.769531 Q22 19.4547 57.6815.337278 Q32 295.443 62.0378 4.76231 Q42-85.0790 58.1141-1.46400 RPDI.037637.022082 1.70443 R4Y 18.5818 3.13075 5.93527 The final hypothesis test is that the seasonal coefficients are stable over the entire period (but the coefficients on R4Y and RPDI change in 1972). The unrestricted model again has 12 coefficients (6 in each of the two subsamples.) The null hypothesis, which has p = 4 restrictions, is H 0 :. 11 =. 12,. 21 =. 22,. 31 =. 32,. 41 =. 42 [7.12] The alternative hypothesis is that these seasonal coefficients change in 1972:1. The unrestricted model also allows the coefficients on income and the interest rate to change, so the hypothesis focuses exclusively on the seasonal coefficients. In the TSP program in Appendix B, the third and last olsq statement estimates the restricted model for hypothesis [7.12]. The results are in Table 7.8. The F-statistic is F(4, 159) = {(6.53670-6.03262)/6.03262}*(159/4) = 3.32 The critical F-value is F * (4, 120) = 2.45, so again the null hypothesis is rejected. It seems that there are no qualitative differences in the seasonal variations across the two sub-samples, but the quantitative differences are statistically significantly different. At the beginning of this section it was shown that the hypothesis that all the coefficients are constant is rejected. We then probed deeper to see if the instability could be traced to the slope coefficients on RPDI and R4Y or to changes in the seasonal coefficients. The result is that we rejected constancy in both the slope

Chapter 7 20 coefficients and the seasonal effects, so "overall instability" cannot be blamed on just the slopes alone or the seasonal intercepts alone. Table 7.8 Dependent variable: RSAV Current sample: 1951:1 to 1993:3 Number of observations: 171 Mean of dependent variable = 232.794 Sum of squared residuals = 653670. R-squared =.925006 Estimated Standard Variable Coefficient Error t-statistic Q1-119.037 38.7547-3.07154 Q2-185.312 40.1023-4.62098 Q3 119.746 46.6901 2.56469 Q4-271.400 40.7882-6.65390 RPDI1.199891.039716 5.03299 RPDI2.081271.017068 4.76149 R4Y1-27.5069 8.56405-3.21190 R4Y2 23.4423 3.14654 7.45017 Appendix A The following TSP program calculates the regressions reported in Tables 7.3 and 7.4. The F- statistics for the two null hypotheses discussed in section 6 of this chapter are also computed by this TSP program. options memory=4; options crt; smpl 1 2446; read(file= '~/614/614.dat') price size lsize age poolag poolig bathp fire1 sagar dagar time; olsq price c sagar dagar size lsize age bathp poolag poolig time; set SSRu = @ssr; set h=@ncoef; w1=price-6000*sagar - 17000*dagar; olsq w1 c size lsize age bathp poolag poolig time; set SSRr=@ssr; set f=(ssrr-ssru)*(@nob-h)/(ssru*2);

Chapter 7 21 print f; w2=price-5000*sagar - 15000*dagar; olsq w2 c size lsize age bathp poolag poolig time; set SSRr=@ssr; set f=(ssrr-ssru)*(@nob-h)/(ssru*2); print f; ******** Appendix B This appendix contains the TSP program used to calculate the regressions reported in section 7 of this chapter (Structural change) options crt; in DAT374; smpl 1951:1 1993:3; trend t; rsav=sav*10**5/(cpi*pop); rpdi=pdi*10**5/(cpi*pop); msd rsav rpdi; dummy; d2=t>85; d1=t<=85; q11=q1*d1; q21=q2*d1; q31=q3*d1; q41=q4*d1; q12=q1*d2; q22=q2*d2; q32=q3*d2; q42=q4*d2; rpdi1=rpdi*d1; rpdi2=rpdi*d2; r4y1=r4y*d1; r4y2=r4y*d2; olsq rsav q11 q21 q31 q41 q12 q22 q32 q42 rpdi1 rpdi2 r4y1 r4y2; olsq rsav q11 q21 q31 q41 q12 q22 q32 q42 rpdi r4y; olsq rsav q1-q4 rpdi1 rpdi2 r4y1 r4y2; The following tsp program Explanatory notes: What are the roles of d1 & d2 in this program? d1 and d2 are dummy variables: d1 is unity between 1951:1 and 1972:1 but zero after that while d2 is zero up to 1972:1 an unity after that. The products d1*rpdi and d2*rpdi are two columns that stretch from 1951:1 to 1993:3. d1*rpdi equals rpdi from 1951:1 to 1972:1 but zero after that, d2*rpdi is zero up to 1972:1 but equals rpdi after that. By including d1*rpdi and d2*rpdi in a regression there will be two coefficients on income (one for the first sub-period and another for the second sub-period). If rpdi alone is included then the income coefficient is constant throughout the entire period.

Chapter 7 22 These ideas apply to the dummy variables and to the interest rate. How are d1 and d2 created? The tsp statement "trend t" creates a variable t that is 1 in the 1951:1 and increments by one every quarter. If you print t you get the values 1 2 3 4 5 6... This is called a time trend because it just counts time. The first subsample is 1951:1 to 1972:1 (notice in the assignment you use a different date to split the sample) and t goes from 1 to 85. The second sample is 1972:2 to 1993:3 and t goes from 86 to 171. The dummy variable d1 is set to unity if t <= 85 and d1 = 0 otherwise. This is done in tsp by this command: d1 = t < = 85; This odd-looking statement is understood by tsp to mean: set d1 to unity if t<=85 is true and set d1=0 if t<=85 is false. Similarly, d2 = t > 85; sets d2=1 if t>85 and sets d2=0 if t <=85.