Use of Dummy (Indicator) Variables in Applied Econometrics

Size: px

Start display at page:

Download "Use of Dummy (Indicator) Variables in Applied Econometrics"

Clarence Gordon
5 years ago
Views:

1 Chapter 5 Use of Dummy (Indicator) Variables in Applied Econometrics

2 Section 5.1 Introduction

3 Use of Dummy (Indicator) Variables Model specifications in applied econometrics often necessitate the use of qualitative variables as explanatory factors. You can present examples of qualitative variables, either with the use of time-series data or with the use of cross-sectional data. Emphasis is placed on the mechanics of the transformations of qualitative variables into dummy (indicator) variables. Emphasis is placed on the interpretation of the estimated coefficients associated with dummy (indicator) variables. 3

4 Dummy (Indicator) Variables Dummy (indicator) variables represent qualitative variables. Key Features: Intercept shifters Slope shifters Singularity problem (Dummy Variable Trap) Qualitative choice models 4

5 Zero-One Variables or Dummy Variables (Possible for both explanatory variables and dependent variables) Qualitative variables can represent the following: temporal effects -- seasons wartime and peacetime years political regimes government programs geographical regions characteristics of households or individuals such as gender, marital status, race, occupation, or employment status structural shifts 5

6 Dummy (Indicator) Variables and Attributes Dummy (indicator) variables represent the occurrence or nonoccurrence of a particular attribute of a qualitative variable. If the attribute occurs, the dummy (indicator) variable takes on the value of 1. If the attribute does not occur, the dummy (indicator) variable takes on the value of 0. In the same way as a light switch, either the attribute is on (value of 1) or off (value of 0). 6

7 Qualitative Variables A qualitative variable can consist of two or more categories but these categories must be mutually exclusive and exhaustive. Ease and convenience of analysis should guide the construction of the 0-1 variable. However, interpretation depends on the construction of the indicator variable. 7

8 8 Example The Investment Tax Credit (ITC), YEAR ITC Two categories: either in force or not 1, ITC in force 0, otherwise

9 Example: Qualitative Variable Region with Multiple Categories 9 STATE PCEXP PCAID PCINC REGION INDICATOR ME NH VT MA RI CT NY NJ PA OH IND IL MICH WISC MINN IOWA

10 10 Investigate the Relationship of PCAID and PCINC on PCEXP One possibility: Four regions are evident: Northeast (NE), Midwest (MW), South (S), and West (W). For each region, run the regression. PCEXP i = a0 r + a1 rpcaidi + a2rpcinci + ε i, r = 1,2,3,4. Four regions (r), four regression runs Obtain these coefficients, one for each region. Set Set Set Set Do not use this specification: PCEXP 1: a = c 0 i = 1, 2,..., 50. i 01 2: a 3: a 4: a , a 11, a, a, a 12 13, a 14 21, a, a, a + c PCAID i + c 2 PCINC i + c 3 REGION i + ε i

11 Section 5.2 Intercept Shifters

12 0-1 Variables in Regression Analysis Intercept and/or slope shifters Key purpose: Achieve a greater degree of generalization of the model. Run one model with qualitative variables. Intercept Shifters For example, PCEXP + β NE 3 i i = β + β PCAID + β MW 4 0 i 1 + β WE 5 i i + ε + β PCINC i 2 i The number of observations pertaining to each category does not have to be equal. However, there must be at least one observation in each category. 12 continued...

13 0-1 Variables in Regression Analysis The number of ones in each category equals the number of replications. NE i MW i WE i = 1 if the i th observation corresponds to the Northeast region = 0 otherwise = 1 if the i th observation corresponds to the Midwest region = 0 otherwise = 1 if the i th observation corresponds to the West region = 0 otherwise The omitted category is the South. South i = 1 if the i th observation corresponds to the South region = 0 otherwise 13 continued...

14 0-1 Variables in Regression Analysis Another important difference of using one equation with dummies, instead of running four regressions is that you treat the residuals differently. In four different regressions, you have four different residuals; a shock in regression 1 might not affect the other regressions. In one regression with four dummies, a shock in the residuals might affect all regions, because they are included in one regression. 14

15 Singularity Problem In the previous model specification, although four regions were evident, only three dummy variables appear. Why? Dummy variable trap singularity problem: sum of all dummy variables equals the intercept (perfect collinear situation). Two alternatives handle this problem: 1. Eliminate the intercept from the regression model and use all dummy variables. 2. Arbitrarily eliminate one of the categories of the qualitative variable and keep the intercept (very common). 15 The intercept of the regression equation, wrt (2), is the intercept pertaining to the omitted zero-one variable (base intercept).

16 General Rule If there are r categories of a qualitative variable, use r-1 indicator variables to avoid the dummy variable trap. 16

17 Interpretation The coefficient of any 0-1 variable indicates the difference between the base intercept and the intercept pertaining to the particular category of the attribute. For example, ˆ PCEXPi = β 0 + β1pcaidi + β2 + ˆ β ˆ + ˆ β WE 3NEi + β4mwi 5 ˆ i ˆ PCINC i If there is more than one set of discrete variables, one 0-1 variable must be deleted from each set. 17

18 Graphical Illustration of Intercept Shifters PCEXP NE SOUTH WE B o +B 3 B o B o +B 5 PCINC 18 The coefficients B 3 and B 5 represent how far above (below) PCEXP is from the base region (SOUTH).

19 Dependent Variable: pcexp Number of Observations Read 50 Number of Observations Used 50 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

20 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 pcaid <.0001 pcinc <.0001 ne mw we Interpretation of these estimated coefficients? 20

21 Tests of Hypotheses Associated with Dummy Variables 1. Test the statistical significance of each estimated coefficient associated with the included dummy variables. H o : βi = 0 (use of t test) The estimated coefficient describes the difference between the included qualitative category and the base category. 21 continued...

22 Tests of Hypotheses Associated with Dummy Variables The estimated intercepts: NE : ˆ β WE : ˆ β o MW : ˆ β : ˆ β S βˆo o + ˆ β o 3 + ˆ β + ˆ β 5 4 Note that ˆ β ˆ β ˆ β = = = difference in intercept from the Northeast and the South difference in intercept from the Midwest and the South difference in intercept from the West and the South Reason: To measure whether the level of PCEXP is statistically different among the included regions and the base region. 22 continued...

23 Tests of Hypotheses Associated with Dummy Variables 2. Test the statistical significance of a particular estimated coefficient associated with an included dummy variable from other estimated coefficients associated with included dummy variables. H o : β i = β j (use F test) In this example, this process entails three separate tests H o i j of : β β = β 3 = β4, H o : β3 = β5, H o : H o H o H o : β = β : β = β : β = β Test whether the level of PCEXP in the Northeast and the level of PCEXP in the Midwest are the same. Test whether the level of PCEXP in the Northeast and the level of PCEXP in the West are the same Test whether the level of PCEXP in the Midwest and the level of PCEXP in the West are the same. continued...

24 Tests of Hypotheses Associated with Dummy Variables 3. Test whether or not the qualitative variable plays a statistically significant role in affecting the dependent variable. H o In the example, test test of hypothesis. (use of F-test) : β 3 = β 4 = β 5 = 0, a joint If you fail to reject H o, then region does not play a statistically significant role in affecting the level of per capita state expenditures. 24

25 Tests of Hypotheses In conducting tests of hypotheses of indicator variables as intercept shifters, carry out all three tests of hypotheses. H : β = 0 The test of o i is automatic, but the other null hypotheses involving joint tests of coefficients associated with dummy variables are not automatic. Each of the three hypothesis tests conveys important information. 25

26 Using Intercept and Slope Shifter Variables * run separate regressions for each region; data ne; set statedata1970; if region=1; proc reg data=ne; model pcexp=pcaid pcinc / dw; data mw; set statedata1970; if region=2; proc reg data=mw; model pcexp=pcaid pcinc / dw; data so; set statedata1970; if region=3; proc reg data=so; model pcexp=pcaid pcinc / dw; data we; set statedata1970; if region=4; proc reg data=we; model pcexp=pcaid pcinc / dw; 26 continued...

27 * use of both intercept shifters and slope shifters; proc reg data=statedata1970; South is arbitrarily selected as the base or reference category; model pcexp=pcaid pcinc pcaidne pcaidmw pcaidwe pcincne pcincmw pcincwe ne mw we / dw; test ne=0, mw=0, we=0; test pcaidne=0, pcaidmw=0, pcaidwe=0; test pcincne=0, pcincmw=0, pcincwe=0; test pcaidne=0, pcaidmw=0, pcaidwe=0, pcincne=0, pcincmw=0, pcincwe=0; test ne=0, mw=0, we=0, pcaidne=0, pcaidmw=0, pcaidwe=0, pcincne=0, pcincmw=0, pcincwe=0; 27

28 The MEANS Procedure Output Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc ne mw so we

29 region= The MEANS Procedure Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc continued...

30 region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc

31 Model: MODEL1 Dependent Variable: pcexp Illustration of the Dummy Variable Trap Number of Observations Read 50 Number of Observations Used 50 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

32 NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased. NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown. we = Intercept - ne - mw so Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept B <.0001 pcaid <.0001 pcinc <.0001 ne B mw B

33 Illustrating the Dummy Variable Trap * Illustration of dummy variable trap; model pcexp=pcaid pcinc ne mw so we / dwprob; * South is arbitrarily selected as the base or reference intercept; model pcexp=pcaid pcinc ne mw we / dwprob; test ne=mw; test ne=we; test mw=we; test ne=0, mw=0, we=0; * West is arbitrarily chosen as the base or reference intercept; model pcexp=pcaid pcinc ne mw so / dwprob; test ne=mw; test ne=so; test mw=so; test ne=0, mw=0, so=0; 33

34 The REG Procedure Output The REG Procedure Model: MODEL2 Test 1 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Interpretation? Test H 0 : Coefficient of NE = Coefficient of MW 34 continued...

35 The REG Procedure Output The REG Procedure Model: MODEL2 Test 2 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Interpretation? Test H 0 : Coefficient of NE = Coefficient of WE 35 continued...

36 The REG Procedure Output The REG Procedure Model: MODEL2 Test 3 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Interpretation? Test H 0 : Coefficient of MW = Coefficient of WE 36 continued...

37 The REG Procedure Output The REG Procedure Model: MODEL2 Test 4 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Interpretation? Test H 0 : coefficients of NE, MW, and WE are jointly equal to zero. 37

38 Run the same model, but choose the West as the reference region. What do you observe? The REG Procedure Model: MODEL3 Dependent Variable: pcexp Number of Observations Read 50 Number of Observations Used 50 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

39 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 pcaid <.0001 pcinc <.0001 ne mw so

40 The REG Procedure Output The REG Procedure Model: MODEL3 Test 5 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : Coefficient of NE = Coefficient of MW 40 continued...

41 The REG Procedure Output The REG Procedure Model: MODEL3 Test 6 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : Coefficient of NE = Coefficient of the South 41 continued...

42 The REG Procedure Output The REG Procedure Model: MODEL3 Test 7 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : Coefficient of MW = Coefficient of the South 42 continued...

43 The REG Procedure Output The REG Procedure Model: MODEL3 Test 8 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : Coefficients of NE, MW, and South are jointly equal to zero. 43

44 Section 5.3 Slope Shifters

45 Slope Shifters Allow for differences in slopes of one or more of the continuous predetermined variables. Generate certain interaction variables. Produce a dummy variable with a continuous predetermined variable. For example, does MPC vary by race? Geographic region? Season? Before 1980 versus after 1980? 45 continued...

46 Slope Shifters It is possible to use different regression models in lieu of a single regression model with slope shifters. Key Point: Generate new variables by forming cross products between each of the 0-1 variables representing categories of the attribute and the selected continuous predetermined variables whose coefficients are being allowed to vary among categories. 46

47 Model Specification Model specification is now given by the following: For example: PCEXP = ˆ β + ˆ β PCAID + ˆ β ( MW 4 i i 0 * PCINC 1 i i + ˆ β PCINC ) + ˆ β ( WE * PCINC 5 2 i + ˆ β ( NE i 3 ) + ε i i * PCINC i ) NE i MW WE i * PCINC i * PCINC * PCINC i i interaction term i interaction term interaction term Again, it is necessary to arbitrarily omit one category of the attribute to avoid dummy variable trap. 47

48 Illustration of the Use of Slope Shifters Suppose that you want to ascertain whether or not the effect of PCINC on PCEXP is the same across regions. That is, is the marginal effect of PCINC in PCEXP the same for the Northeast, the Midwest, the South, and the West? 48

49 Marginal Effects by Region With this specification, marginal effects of PCINC on PCEXP by region are given as follows: Region Northeast Midwest South West Marginal Effect of PCINC on PCEXP ˆ ˆ β + β ˆ β ˆ β ˆ β ˆ β + ˆ β

50 Graphical Illustration of Slope Shifters PCEXP NE B 2 +B 3 SOUTH B 2 WE B 2 +B 5 B o PCINC 50

51 The key hypotheses to consider are shown below: 1. t-tests 0 : 0 : 0 : = = = β β β o o o H H H } Key Hypotheses 2. F-test 3. F-test 51 0 : = = = β β β H o : : : β β β β β β = = = o o o H H H }

52 Illustration of Slope Shifter Variables * Illustration of slope shifter variables; * South is arbitrarily selected as the base or reference category; model pcexp=pcaid pcinc pcincne pcincmw pcincwe / dwprob; test pcincne=pcincmw; test pcincne=pcincwe; test pcincmw=pcincwe; test pcincne=0, pcincmw=0, pcincwe=0; * West is arbitrarily chosen as the base or reference category; model pcexp=pcaid pcinc pcincne pcincmw pcincso / dwprob; test pcincne=pcincmw; test pcincne=pcincso; test pcincmw=pcincso; test pcincne=0, pcincmw=0, pcincso=0; 52

53 The REG Procedure Model: MODEL1 Dependent Variable: pcexp Number of Observations Read 50 Number of Observations Used 50 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

54 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 pcaid <.0001 pcinc <.0001 pcincne pcincmw pcincwe Interpretation of estimated coefficients? 54

55 The REG Procedure Output Model: MODEL1 Test 1 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of PCINCNE = coefficient of PCINCMW 55 continued...

56 The REG Procedure Output Model: MODEL1 Test 2 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of PCINCNE = coefficient of PCINCWE 56 continued...

57 The REG Procedure Output Model: MODEL1 Test 3 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of PCINCMW = coefficient of PCINCWE 57 continued...

58 The REG Procedure Output Model: MODEL1 Test 4 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test the joint hypothesis that coefficients PCINCNE, PCINCMW, and PCINCWE equal zero. 58

59 The REG Procedure Model: MODEL2 Dependent Variable: pcexp Number of Observations Read 50 Number of Observations Used 50 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

60 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept <.0001 pcaid <.0001 pcinc <.0001 pcincne pcincmw pcincso

61 The REG Procedure Output The REG Procedure Model: MODEL2 Test 5 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of PCINCNE = coefficient of PCINCMW 61 continued...

62 The REG Procedure Output The REG Procedure Model: MODEL2 Test 6 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of PCINCNE = coefficient of PCINCSO 62 continued...

63 The REG Procedure Output The REG Procedure Model: MODEL2 Test 7 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of PCINCMW = coefficient of PCINCSO 63 continued...

64 The REG Procedure Output The REG Procedure Model: MODEL2 Test 8 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficients of PCINCNE, PCINCMW, and PCINCSO are jointly equal to zero. 64

65 region= The MEANS Procedure Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc continued...

66 region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc

67 Marginal Effects/Elasticities Region Marginal Effect of PCAID on PCEXP Marginal Effect of PCINC on PCEXP Northeast = Midwest = South West = continued...

68 Region % Change in PCEXP to a 1% Change in PCAID % Change in PCEXP to a 1% Change in PCINC Northeast (2.5177)( )/ = ( )( )/ = Midwest (2.5177)( )/ ( )( )/ = = South (2.5177)( )/ = West (2.5177)( )/ = ( )( )/ = ( )( )/ =

69 Section 5.4 Intercept Shifters and Slope Shifters

70 Use of Both Intercept Shifters and Slope Shifter Variables PCEXP NE i = β + β PCAID + β PCINC + β NE + β MW + β WE o + β6 NE PCAID + β7 MW PCAID + β8 WE PCAID + β NE PCINC + β MW PCINC + β WE PCINC + ε 9 MW PCEXP PCEXP South West i i PCEXP PCEXP i i = β + β ) + ( β + β ) PCAID + ( β + β ) PCINC + ε ( = β + β ) + ( β + β ) PCAID + ( β + β ) PCINC + ε ( = β + β PCAID + β PCINC + ε = β + β ) + ( β + β ) PCAID + ( β + β ) PCINC + ε ( i 4 i 5 i i i 70

71 region= The MEANS Procedure Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc continued...

72 region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc region= Variable N Mean Std Dev Minimum Maximum pcexp pcaid pcinc

73 Model: MODEL1 Dependent Variable: pcexp Number of Observations Read 11 Number of Observations Used 11 Separate Regression for Region 1 (NE) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t 73 Intercept pcaid pcinc

74 The REG Procedure Model: MODEL1 Dependent Variable: pcexp Number of Observations Read 12 Number of Observations Used 12 Separate Regression for Region 2 (MW) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept pcaid pcinc

75 The REG Procedure Model: MODEL1 Dependent Variable: pcexp Number of Observations Read 14 Number of Observations Used 14 Separate Regression for Region 3 (South) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept pcaid pcinc

76 Dependent Variable: pcexp Number of Observations Read 13 Number of Observations Used 13 Separate Regression for Region 4 (West) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t 76 Intercept pcaid <.0001 pcinc <.0001

77 The REG Procedure Model: MODEL1 Dependent Variable: pcexp Number of Observations Read 50 Number of Observations Used 50 Analysis of Variance Regression with Intercept and Slope Shifters with Region (50 observations) Reference Region South Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

78 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept pcaid pcinc pcaidne pcaidmw pcaidwe pcincne pcincmw pcincwe ne mw we

79 Run Separate Regressions for Each Region NE PCEXP i = PCAID PCINC MW PCEXP i South PCEXP i = PCAID PCINC = PCAID PCINC West PCEXP i = PCAID PCINC 79

80 The REG Procedure Output The REG Procedure Model: MODEL1 Test 1 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficient of NE, coefficient of MW, and coefficient of WE = continued...

81 The REG Procedure Output The REG Procedure Model: MODEL1 Test 2 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficients of PCAIDNE, PCAIDMW, and PCAIDWE are jointly equal to zero. 81 continued...

82 The REG Procedure Output The REG Procedure Model: MODEL1 Test 3 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficients of PCINCNE, PCINCMW, and PCINCWE are jointly equal to zero. 82 continued...

83 The REG Procedure Output The REG Procedure Model: MODEL1 Test 4 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficients of PCAIDNE, PCAIDMW, PCAIDWE, PCINCNE, PCINCMW, and PCINCWE are jointly equal to zero. 83 continued...

84 The REG Procedure Output The REG Procedure Model: MODEL1 Test 5 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : coefficients of NE, MW, WE, PCAIDNE, PCAIDMW, PCAIDWE, PCINCNE, PCINCMW, and PCINCWE are jointly equal to zero. 84

85 Testing Hypotheses with Dummy Variables * test ne regression the same the so regression; test ne=0, pcaidne=0, pcincne=0; * test mw regression the same the so regression; test mw=0, pcaidmw=0, pcincmw=0; * test we regression the same the so regression; test we=0, pcaidwe=0, pcincwe=0; * test ne regression the same the mw regression; test ne=mw, pcaidne=pcaidmw, pcincne=pcincmw; * test ne regression the same the we regression; test ne=we, pcaidne=pcaidwe, pcincne=pcincwe; * test mw regression the same the we regression; test mw=we, pcaidmw=pcaidwe, pcincmw=pcincwe; 85

86 The REG Procedure Output The REG Procedure Model: MODEL1 Test 6 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : NE Regression the same as the SOUTH Regression 86 continued...

87 The REG Procedure Output The REG Procedure Model: MODEL1 Test 7 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : MW Regression the same as the SOUTH Regression 87 continued...

88 The REG Procedure Output The REG Procedure Model: MODEL1 Test 8 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : WEST Regression the same as the SOUTH Regression 88 continued...

89 The REG Procedure Output The REG Procedure Model: MODEL1 Test 9 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : NE Regression the same as the MW Regression 89 continued...

90 The REG Procedure Output The REG Procedure Model: MODEL1 Test 10 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : NE Regression the same as the WEST Regression 90 continued...

91 The REG Procedure Output The REG Procedure Model: MODEL1 Test 11 Results for Dependent Variable pcexp Mean Source DF Square F Value Pr > F Numerator Denominator Test H 0 : MW Regression the same as the WEST Regression 91

92 Test NE Regression the Same as South Regression H β = β β 0 : 3 6 = 9 = 0 F test Test MW Regression the Same as South Regression H β = β β 0 : 4 7 = 10 = F test Test WE Regression the Same as South Regression H β = β β 0 : 5 8 = 11 = 0 0 F test Test NE Regression the Same as MW Regression H 0 3 = β4, β6 = β7, : β β = β Test NE Regression the Same as WE Regression H : β β β β β = β 0 3 = 5, 6 = 8, F test F test 92 Test MW Regression the Same as WE Regression H 0 4 = β5, β7 = β8, : β β = β F test

93 Section 5.5 Final Thoughts about the Use of Dummy (Indicator) Variables

94 Caveats 1. The use of 0-1 variables particularly when slopes are allowed to vary requires a large number of degrees of freedom. 2. Difficulties might arise in analysis and interpretation, particularly with the use of slope shifter variables. 3. Insure an adequate number of replications. 4. It might be advantageous not to omit an extreme category for comparison purposes. 5. The category to omit might be the one in which the analyst is most interested. 6. Generate dummy variables using IF/THEN statements to save time and cut down on data entry errors. 94

95 Model Formulation Suppose the model formulation is written as: = β β β ln X + ε lnyt 0 + 1DUM t + 2 t t (1) where Y t refers to the value of the dependent variable in time t; X t refers to the value of the explanatory variable in time t; and DUM t refers to a dummy variable in time t. This dummy variable is an intercept shifter of the econometric relationship between Y and X. It takes on the value of 1 or 0. Suppose DUM t = 1 if YR > 1988; 0 otherwise. Relative to the years prior to 1988, the percentage change in Y t can be expressed as β1 ( e 1) x100 %. 95 continued...

96 Model Formulation To understand this result, consider that for years prior to 1988, Y t = lny e t ( β = β + β ln X 0 or 0 + β2 ln X t ) β0 = e ( X ) 2 t t β 2 (2) (3) For years 1988 and on, t = ( β 0 + β1) + β ln X t lny 2 (4) 96 Therefore, β 1 represents how much higher (if β 1 > 0) or lower (if β 1 < 0) the natural logarithm of Y t is relative to the years prior to By the same token, for years 1988 and on, Y = e 0 + β1 ) ( ) ( β β2 t X t (5) continued...

97 Model Formulation Now the percentage change in Y for years 1988 on, relative to the years prior to 1988, is given by Y (1988 on) Y ( prior to t Y ( prior to1988) t 1988) x100 % t (6) By substitution, equation (6) can be written using equations (3) and (5). e ( β β 0 + 1) ( X e t β 0 β2 ) e ( X ) t β β 2 0 ( X t ) β 2 x100% (7) Equation (7) can be simplified algebraically as β ( e 1 1)x100% (8) 97 Hence, equation (8) represents the percentage change in Y t relative to the base period (the years prior to 1988). continued...

98 Model Formulation The moral of this story is that if you have a double-log (or linear in logarithms) specification where some of the exogenous variables are dummy variables, be careful of the interpretation of the coefficients associated with the dummy variables. The correct interpretation is the percentage change in the dependent variable relative to a base period. That is, you use this expression: β ( e 1 1)x100% where β 1 represents the coefficient associated with the relevant dummy variable. 98

99 Test of Seasonality in Per Capita Orange Juice Consumption The AUTOREG Procedure Dependent Variable lallojgalpc Ordinary Least Squares Estimates SSE DFE 150 MSE Root MSE SBC AIC Regress R-Square Total R-Square Durbin-Watson continued...

100 Standard Approx Variable DF Estimate Error t Value Pr > t Intercept lrallojprice <.0001 lrallgfjprice lrpcdpi m m <.0001 m m <.0001 m <.0001 m <.0001 m <.0001 m <.0001 m <.0001 m <.0001 m <.0001 Test 1 Source DF Mean Square F Value Pr > F 100 Numerator <.0001 Denominator

101 U.S. Per Capita Orange Juice Consumption Reference (Base) Month: December 101 Month Percentage Change in Per Capita Orange Juice Consumption Relative to December January (exp(0.0432) -1) x 100% = 4.41% February (exp( ) -1) x 100% = -9.48% March (exp( ) -1) x 100% = -1.42% April (exp( ) -1) x 100% = -8.70% May (exp( ) -1) x 100% = -8.63% June (exp( ) -1) x 100% = % July (exp( ) -1) x 100% = % August (exp( ) -1) x 100% = -9.49% September (exp( ) -1) x 100% = -9.36% October (exp( ) -1) x 100% = -5.06% November (exp( ) -1) x 100% = -6.46%

102 Section 5.6 Additional Readings

103 Additional Readings See references: Kennedy (1981) Kennedy (1986) Suits (1984) Van Garderen and Shah (2002) 103

Regression Analysis II

Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index