Dummy Variables. Susan Thomas IGIDR, Bombay. 24 November, 2008

Similar documents
Suan Sunandha Rajabhat University

Technical note on seasonal adjustment for M0

GAMINGRE 8/1/ of 7

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities

Multiple Regression Analysis

Chapter 3. Regression-Based Models for Developing Commercial Demand Characteristics Investigation

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

. regress lchnimp lchempi lgas lrtwex befile6 affile6 afdec6 t

Technical note on seasonal adjustment for Capital goods imports

Multivariate Regression Model Results

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

Time Series Analysis

Jayalath Ekanayake Jonas Tappolet Harald Gall Abraham Bernstein. Time variance and defect prediction in software projects: additional figures

ENGINE SERIAL NUMBERS

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Multiple Regression Analysis

NATCOR Regression Modelling for Time Series

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

P7.7 A CLIMATOLOGICAL STUDY OF CLOUD TO GROUND LIGHTNING STRIKES IN THE VICINITY OF KENNEDY SPACE CENTER, FLORIDA

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

BAYESIAN PROCESSOR OF ENSEMBLE (BPE): PRIOR DISTRIBUTION FUNCTION

Scarborough Tide Gauge

Changing Hydrology under a Changing Climate for a Coastal Plain Watershed

Jackson County 2013 Weather Data

Public Library Use and Economic Hard Times: Analysis of Recent Data

Time Series Analysis of United States of America Crude Oil and Petroleum Products Importations from Saudi Arabia

Modeling and Forecasting Currency in Circulation in Sri Lanka

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model

DAILY QUESTIONS 28 TH JUNE 18 REASONING - CALENDAR

Determine the trend for time series data

2019 Settlement Calendar for ASX Cash Market Products. ASX Settlement

Location. Datum. Survey. information. Etrometa. Step Gauge. Description. relative to Herne Bay is -2.72m. The site new level.

Climatography of the United States No

FinQuiz Notes

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Location. Datum. Survey. information. Etrometa. Step Gauge. Description. relative to Herne Bay is -2.72m. The site new level.

Practice Test Chapter 8 Sinusoidal Functions

Annual Average NYMEX Strip Comparison 7/03/2017

Wednesday, September 26 Handout: Estimating the Variance of an Estimate s Probability Distribution

Jackson County 2018 Weather Data 67 Years of Weather Data Recorded at the UF/IFAS Marianna North Florida Research and Education Center

Econometrics Review questions for exam

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

EVALUATION OF ALGORITHM PERFORMANCE 2012/13 GAS YEAR SCALING FACTOR AND WEATHER CORRECTION FACTOR

Time Series Analysis of Currency in Circulation in Nigeria

Introduction to Forecasting

Factor Mimicking Portfolios

Time series and Forecasting

DESC Technical Workgroup. CWV Optimisation Production Phase Results. 17 th November 2014

Chiang Rai Province CC Threat overview AAS1109 Mekong ARCC

GTR # VLTs GTR/VLT/Day %Δ:

Testing for non-stationarity

Outline. The Path of the Sun. Emissivity and Absorptivity. Average Radiation Properties II. Average Radiation Properties

Life Cycle of Convective Systems over Western Colombia

UPPLEMENT A COMPARISON OF THE EARLY TWENTY-FIRST CENTURY DROUGHT IN THE UNITED STATES TO THE 1930S AND 1950S DROUGHT EPISODES

Statistical Models for Rainfall with Applications to Index Insura

SYSTEM BRIEF DAILY SUMMARY

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Climatography of the United States No

Climatography of the United States No

Computing & Telecommunications Services Monthly Report January CaTS Help Desk. Wright State University (937)

Seasonality in recorded crime: preliminary findings

Figure 1. Time Series Plot of arrivals from Western Europe

Please see the attached document in relation to your Freedom of Information request.

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

LECTURE 10: MORE ON RANDOM PROCESSES

FORECASTING COARSE RICE PRICES IN BANGLADESH

SYSTEM BRIEF DAILY SUMMARY

2.1 Inductive Reasoning Ojectives: I CAN use patterns to make conjectures. I CAN disprove geometric conjectures using counterexamples.

Statistics for IT Managers

Geometry Voic Mr. R. and Mr. K

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Making sense of Econometrics: Basics

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

Climatography of the United States No

ECON 4230 Intermediate Econometric Theory Exam

Monthly Trading Report Trading Date: Dec Monthly Trading Report December 2017

2017 Settlement Calendar for ASX Cash Market Products ASX SETTLEMENT

Variability and trends in daily minimum and maximum temperatures and in diurnal temperature range in Lithuania, Latvia and Estonia

STATISTICAL FORECASTING and SEASONALITY (M. E. Ippolito; )

CWV Review London Weather Station Move

Climatography of the United States No

Climatography of the United States No

peak half-hourly New South Wales

Long-term Water Quality Monitoring in Estero Bay

Alterations to the Flat Weight For Age Scale BHA Data Published 22 September 2016

Transcription:

IGIDR, Bombay 24 November, 2008

The problem of structural change Model: Y i = β 0 + β 1 X 1i + ɛ i Structural change, type 1: change in parameters in time. Y i = α 1 + β 1 X i + e 1i for period 1 Y i = α 2 + β 2 X i + e 2i for period 2 Solution: for a given break point τ, σ 2 unrestricted = n1 e 2 1i + n2 e 2 2i vs.σ 2 restricted = N=n1+n2 ɛ 2 i Critical value: F(k, N - 2k) Other types of structural change Type 2: change in constant terms (dummy variables) Type 3: change in distribution of errors Type 4: change in sets of coefficients

Testing for change in parameters in the sample (r_i r_f) (r_i r_f) =alpha_1 + beta_1 (r_m r_f) (r_i r_f) =alpha_2 + beta_2 (r_m r_f) (r_m r_f) T_1 (r_m r_f) Time, t

Type 2: change in constant terms A dummy variable, D k is a binary variable which takes the value of 0 or 1 when the condition is false or true. Example: D k = 0 if a boy child is born, D k = 1 if a girl child is born. Dummies are useful in changing the structure of the model depending upon the value of some conditioning variable. The simplest is to change the intercept term of the regression model. Example: Y i = weight, X i = height Y i = α 1 + βx i + e 1i, i = female Y i = α 2 + βx i + e 2i, i = male

Type 2: change in constant terms Y = alpha_1 + beta X + e Y Y = alpha_0 + beta X + e alpha_1 alpha_0 D=0 D=1 X_i = D=1 D=0

Change in intercept: test of mean Data: Group 1 Y i = µ + ɛ i Group 2 Y i = (µ + δ) + ɛ i µ = µ G1, µ + δ = µ G2 or δ = µ G2 µ G1 The regression can be estimated as: Y i = µ + δd i + ɛ i where D i = 0 for Group 1, D i = 1 for Group 2. Alternative model: Y i = µ 1 G 1 + µ 2 G 2 + e i G 1 = 1, ifi = Group 1, otherwise G 1 = 0 G 2 = 1, ifi = Group 2, otherwise G 2 = 0 However, H 0 in model 2 cannot ask whether α 1, α 2 = 0. Advantage of the dummy variable model: H 0 : δ = µ G1 µ G2 is a well posed test of whether the mean of G 1, G 2 are different.

Change in intercept: test of mean Data frame for model 1 [y x] = Y = 2 6 4 Y 1 1 0 Y 2 1 0......... Y n1 1 0 Y n1 +1 1 1 Y n1 +2 1 1......... Y N 1 1» In1 0 I n2 I n2 + ɛ 3 7 5 OLS solution:» = ˆµˆδ» N n2 n 2 n 2 1» n1 ȳ 1 + n 2 ȳ 2 n 2 ȳ 2» = ȳ 1 ȳ 2 ȳ 1

HW Use the normal equations of the OLS optimisation to show that»» ȳ = 1 ˆµˆδ ȳ 2 ȳ 1 What is the standard error of ˆµ, ˆδ?

Change in intercept: test of mean Data frame for model 2 Y =» In1 0 0 I n2 + ɛ OLS solution:» ˆµG1 ˆµ G2 " σ ˆµG1 σ ˆµG2 # = =» n1 0 0 n 2» σɛ/ n1 σ ɛ/ n2 1» n1 ȳ 1 n 2 ȳ 2 =» ȳ1 ȳ 2

Model choices when dealing with dummy variables Model 1: Y i = µ + δd i + ɛ i Model 2: Y i = µ G1 D G1 + µ G2 D G2 + e i Incorrect model: Y i = α + µ G1 D G1 + µ G2 D G2 + e i Data matrix for the models: Model 1 Model 2 Incorrect model»»» In1 0 In1 0 In1 I n1 0 0 I n2 I n2 0 I n2 I n2 I n2 In the third data matrix, the sum of the second and third columns add up to the first. This means the inverse of (X X) 1 cannot be calculated. Which in turn means that three coefficents cannot be estimated. Problem of multicollinearity: with dummy variables, coefficients for a comprehensive set of dummies cannot be estimated simultaneously with an intercept. Model can either contain a comprehesive set of dummy variables or an intercept.

Modelling the index of industrial production, IIP

Seasonality in the IIP data tmp$iip 100 150 200 250 300 0 50 100 150 200

Features of IIP Data has monthly frequency from April 1990 to Sep 2008 Appears to have an annual trend linear? non-linear? Appears to have seasonality. Expected patterns at regular intervals. Model suggestions: A different level for different years: year trend term Captures a level of IIP for a given year. For example,trend is denoted as 1 for 1990, 2 for 1991, 3 for 1992, etc. A different level for different months: month dummies Captures a level of IIP for a given month in a year. Each month has a dummy. For example, Jan t is 1 for January in any month, and 0 otherwise. Model 1: IIP t = α 0 + α 1 Y t + β 1 Jan t + β 2 Feb t +... + β 11 Nov t + ɛ t

Model 1 Regression results Estimate Std. Error t value Pr(> t ) (Intercept) 59.5827 1.9685 30.27 0.0000 year 10.0038 0.1736 57.63 0.0000 Residual SE = 0.0530 F-stat(1, 220) = 3322 prob value = 2.2e-16 R-squared = 0.9379 Adjusted R-squared: 0.9376

Explained vs. Actual data iip 100 150 200 250 300 0 50 100 150 200

Behaviour of serial dependence in residuals ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Full model 1 Regression results Estimate Std. Error t value Pr(> t ) (Intercept) 59.4567 2.7854 21.35 0.0000 year 10.0058 0.1677 59.68 0.0000 jan -3.6010 3.8334-0.94 0.3486 feb 10.1878 3.8334 2.66 0.0085 mar -4.3148 3.7649-1.15 0.2531 may -3.5201 3.7649-0.93 0.3509 jun -1.9253 3.7649-0.51 0.6096 jul -2.3411 3.7649-0.62 0.5347 aug -0.5201 3.7649-0.14 0.8903 sep -2.2230 3.8352-0.58 0.5628 oct 0.2770 3.8352 0.07 0.9425 nov 9.9826 3.8352 2.60 0.0099 Residual SE = 0.04730 F-stat(11, 210) = 3327.3 prob value = 2.2e-16 R-squared = 0.9449 Adjusted R-squared: 0.9420

Interpreting the model The omitted dummy is Dec. Therefore, the jan coefficient value of -3.601 is the additional shift for January in addition to the value for December. In this model, the January effect is: 59.4567 3.601 = 55.8557 From the model, the IIP level for January 1991 is IIP jan 1991 = 59.4567 + 10.0058 2 3.601 = 75.8673

Explained vs. Actual data iip 100 150 200 250 300 0 50 100 150 200

Behaviour of serial dependence in residuals ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Model 2 The trend and seasonality is non-linear Model suggestions: Fit the model on log(iip) Model 2: log IIP t = α 0 + α 1 Y t + β 1 Jan t + β 2 Feb t +... + β 11 Nov t + ɛ t

Simple model 2 Regression results: Estimate Std. Error t value Pr(> t ) (Intercept) 4.3804 0.0075 580.80 0.0000 year 0.0634 0.0007 95.27 0.0000 Residual SE = 0.05301 F-stat(1, 220) = 9077 prob value = 2.2e-16 R-squared = 0.9763 Adjusted R-squared: 0.9762

Explained vs. Actual data liip 4.6 4.8 5.0 5.2 5.4 5.6 0 50 100 150 200

Full model 2 Regression results: Estimate Std. Error t value Pr(> t ) (Intercept) 4.3788 0.0099 443.15 0.0000 year 0.0634 0.0006 106.56 0.0000 jan -0.0191 0.0136-1.40 0.1621 feb 0.0554 0.0136 4.07 0.0001 mar -0.0205 0.0134-1.53 0.1267 may -0.0193 0.0134-1.44 0.1502 jun -0.0103 0.0134-0.77 0.4429 jul -0.0149 0.0134-1.11 0.2664 aug -0.0057 0.0134-0.43 0.6681 sep -0.0106 0.0136-0.78 0.4376 oct 0.0063 0.0136 0.46 0.6436 nov 0.0587 0.0136 4.31 0.0000 Residual SE = 0.04732 F-stat(11, 210) = 1042 prob value = 2.2e-16 R-squared = 0.9820 Adjusted R-squared: 0.9811

Explained vs. Actual data liip 4.6 4.8 5.0 5.2 5.4 5.6 0 50 100 150 200

Residual data ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Results, inference There is still a lot of serial dependence in the residuals of the model. This means that there is yet a lot of variance about the IIP which is to be captured.

Model 3 The dummy variables capture a The linearity is better captured by log changes in IIP from the previous year. y t = log(iip t, y 1 ) log(iip t, y 0 ) This is a standard data transformation used in the econometric literature for seasonally adjusting macro-economic data. Model suggestions: There is a trend. There is seasonality. Model 2: log IIP t,y1 /IIP t,y0 = α 0 + α 1 Y t + β 1 Jan t + β 2 Feb t +... + β 11 Nov t + ɛ t

IIP YoY growth series giip 0 5 10 15 20 25 0 50 100 150 200

IIP YoY growth series giip 0 5 10 15 20 25 0 50 100 150 200

Simple model 3 Regression results: Estimate Std. Error t value Pr(> t ) (Intercept) 4.2634 0.6553 6.51 0.0000 gyear 0.2176 0.0562 3.87 0.0001 Residual SE = 4.124 F-stat(1, 208) = 14.98 prob value = 1.45e-4 R-squared = 0.06718 Adjusted R-squared: 0.06269

Explained vs. Actual data giip 0 5 10 15 20 25 0 50 100 150 200

Residual data ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Full model 3 Regression results: Estimate Std. Error t value Pr(> t ) (Intercept) 4.2099 0.9417 4.47 0.0000 year 0.2188 0.0575 3.81 0.0002 jan 0.2393 1.2437 0.19 0.8476 feb 0.4629 1.2437 0.37 0.7102 mar -0.5066 1.2202-0.42 0.6785 may -0.5438 1.2202-0.45 0.6563 jun -0.2688 1.2202-0.22 0.8259 jul -0.3038 1.2202-0.25 0.8036 aug 0.0545 1.2202 0.04 0.9644 sep 0.3346 1.2443 0.27 0.7883 oct 0.2540 1.2443 0.20 0.8385 nov 0.8752 1.2443 0.70 0.4827 Residual SE = 4.207 F-stat(11, 198) = 1.479 prob value = 0.1415 R-squared = 0.0759 Adjusted R-squared: 0.0246

Explained vs. Actual data giip 0 5 10 15 20 25 0 50 100 150 200

Residual data ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Model 4: an autoregressive model for IIP Time series models use information from previous periods of own data to explain the next. Example, an autoregressive model for IIP would take the form: IIP t = α + β 1 IIP t 1 + ɛ t This is called the Autoregressive model (AR) of order 1 because it has only one previous period variable as the explanatory variable for IIP t. More generic forms of AR models are: IIP t = α + β 1 IIP t 1 +... + β k IIP t k + ɛ t This is an AR(k) model with IIP from k previous periods to explain IIP t.

Model 4: an autoregressive model for IIP Model for yoy-growth in IIP: giip t = 6.2819 + 0.5174giip t 1 + 0.3524giip t 2 + 0.0217giip t 3 0.0052giip t 4 0.0125giip t 5 0.0351giip t 6 + 0.0452giip t 7 0.0499giip t 8 0.0318giip t 9 0.0291giip t 10 + 0.1101giip t 11 0.2506giip t 12 + 0.2932giip t 13 + 0.1812giip t 14 0.0402giip t 15 0.2257giip t 16 + ɛ t σ ɛ = 2.4385 σ giip = 4.2594

Explained vs. Actual data giip 0 5 10 15 20 25 0 50 100 150 200

Dependence in residual data ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Dependence in variance of residual data ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20

Cross-plot of giip vs. residuals 0 5 10 15 20 25 5 0 5 10 t$resid

Cross-plot of giip vs. residuals-squared (t$resid * t$resid) 0 20 40 60 80 100 120 140 0 5 10 15 20 25