ITSx: Policy Analysis Using Interrupted Time Series

Similar documents
Time-Series Regression and Generalized Least Squares in R*

Interrupted Time Series Analysis for Single Series and Comparative Designs: Using Administrative Data for Healthcare Impact Assessment

Evaluating Interventions on Drug Utilization: Analysis Methods

Lecture 5: Estimation of time series

1 Forecasting House Starts

Regression Models for Time Trends: A Second Example. INSR 260, Spring 2009 Bob Stine

Non-independence due to Time Correlation (Chapter 14)

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model

Solution to Series 6

Spectral Analysis. Al Nosedal University of Toronto. Winter Al Nosedal University of Toronto Spectral Analysis Winter / 71

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

LDA Midterm Due: 02/21/2005

Multiple Linear Regression (solutions to exercises)

F9 F10: Autocorrelation

Forecasting. Al Nosedal University of Toronto. March 8, Al Nosedal University of Toronto Forecasting March 8, / 80

AUTOCORRELATION. Phung Thanh Binh

Iris Wang.

LECTURE 11. Introduction to Econometrics. Autocorrelation

The ARIMA Procedure: The ARIMA Procedure

Eco and Bus Forecasting Fall 2016 EXERCISE 2

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Math 3330: Solution to midterm Exam

Mixed Effects Models for Fish Growth

Week 9: An Introduction to Time Series

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

Workshop 9.3a: Randomized block designs

Introduction and Background to Multilevel Analysis

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Autocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time

Analysis. Components of a Time Series

10. Time series regression and forecasting

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Regression and Models with Multiple Factors. Ch. 17, 18

Empirical Approach to Modelling and Forecasting Inflation in Ghana

1 Graphical method of detecting autocorrelation. 2 Run test to detect autocorrelation

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Introduction to SAS proc mixed

Forecasting using R. Rob J Hyndman. 2.5 Seasonal ARIMA models. Forecasting using R 1

TIME SERIES DATA ANALYSIS USING EVIEWS

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

Outline. Nature of the Problem. Nature of the Problem. Basic Econometrics in Transportation. Autocorrelation

The log transformation produces a time series whose variance can be treated as constant over time.

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

The data was collected from the website and then converted to a time-series indexed from 1 to 86.

Regression of Time Series

Introduction to SAS proc mixed

Applied Time Series Topics

Homework 5, Problem 1 Andrii Baryshpolets 6 April 2017

Autocorrelation. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Autocorrelation POLS / 20

STAT3401: Advanced data analysis Week 10: Models for Clustered Longitudinal Data

Serial Correlation. Edps/Psych/Stat 587. Carolyn J. Anderson. Fall Department of Educational Psychology

STK4900/ Lecture 10. Program

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

CHAPTER 6: SPECIFICATION VARIABLES

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Econometrics Part Three

Stat 500 Midterm 2 8 November 2007 page 0 of 4

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

Forecasting of the Austrian Inflation Rate

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Ph.D. Preliminary Examination Statistics June 2, 2014

Monday 7 th Febraury 2005

Financial Time Series Analysis: Part II

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

AR, MA and ARMA models

Circle a single answer for each multiple choice question. Your choice should be made clearly.

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Leftovers. Morris. University Farm. University Farm. Morris. yield

Forecasting Seasonal Time Series 1. Introduction. Philip Hans Franses Econometric Institute Erasmus University Rotterdam

ALR 2014 Caual Modelling Workshop

Regression Examples in R

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9)

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) Outline. Return Rate. US Gross National Product

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Analysis of Violent Crime in Los Angeles County

Step 2: Select Analyze, Mixed Models, and Linear.

ECON 4230 Intermediate Econometric Theory Exam

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Econ 300/QAC 201: Quantitative Methods in Economics/Applied Data Analysis. 17th Class 7/1/10

PAPER 206 APPLIED STATISTICS

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Estimation and application of best ARIMA model for forecasting the uranium price.

Homework #5 - Answer Key Mark Scheuerell

Lecture 9 STK3100/4100

Econometrics. 9) Heteroscedasticity and autocorrelation

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Introduction to Econometrics

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Lecture 9: Predictive Inference

Covariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects

COMPUTER SESSION 3: ESTIMATION AND FORECASTING.

Transcription:

ITSx: Policy Analysis Using Interrupted Time Series Week 3 Slides Michael Law, Ph.D. The University of British Columbia

Layout of the weeks 1. Introduction, setup, data sources 2. Single series interrupted time series analysis 3. ITS with a control group 4. Extensions and regression discontinuities 5. Course wrap-up

Overview of steps 1. Determine time periods 2. Select analytic cohorts 3. Determine outcomes of interest 4. Setup data 5. Visually inspect the data 6. Perform preliminary analysis 7. Check for and address autocorrelation 8. Run the final model 9. Plot the results 10.Predict relative and absolute effects 3

INTRODUCTION TO THE EXAMPLES

EXAMPLE 1: WATER FLOW ON NILE

Source: http://commons.wikimedia.org/wiki/file:river_nile_map.svg

USA Canada

Research Question Previous work has suggested weather patterns shifted in 1898 (Cobb 1978) What was the impact of weather changes on annual water flow levels in the Nile?

Nile Pre-period 1871-1897 Huron Pre-period 1871-1897 Nile Post-period 1898-1930 Huron Post-period 1898-1930

EXAMPLE 2: WEST VIRGINIA MEDICAID DRUG POLICY

Medicaid Antipsychotic Reimbursement, 1991-2005 Antipsychotic Reimbursement 0 1 2 3 4 5 6 Atypical Typical 1991 1993 1995 1997 1999 2001 2003 2005 Source: CMS Medicaid Quarterly Drug Utilization Data, excludes Arizona, which does not provide data Converted to 2005$ using the Medical Care component of the CPI

STEP 4: SETUP DATA

Data Setup One data row for each group, in each time period Variables: Existing Trend Post-intervention Level Change Post-intervention Trend Change Outcome of interest Existing Level Difference Existing Trend Difference Level Change Difference Trend Change Difference

year nile flow 1871 1 3533.9 1872 1 3664.3 1896 1 3856.0 1897 1 3248.2 1898 1 3479.8 1899 1 2453.0 1929 1 3241.1 1930 year gr0up 1 2334.4 1871 0 2120.0 1872 0 2273.3

year nile flow 1871 1 3533.9 1872 1 3664.3 1929 1 3241.1 1930 1 2334.4 1871 0 2120.0 1872 0 2273.3 1929 0 2060.8 1930 0 1796.7 time level trend 1 0 0 2 0 0 59 1 32 60 1 33 1 0 0 2 0 0 59 1 32 60 1 33 niletime nilelevel niletrend 1 0 0 2 0 0 59 1 32 60 1 33 0 0 0 0 0 0 0 0 0 0 0 0

STEP 5: VISUALLY INSPECT DATA

Visually Inspect Data What you were looking for last week: Wild points Linear trends Co-interventions Data quality issues Add to this: suitability of the control group

# Plot the time series for the Nile river at Aswan plot(data$time[1:60],data$flow[1:60], ylab="water Flow", ylim=c(0,4500), xlab="year", type="l", col="red", xaxt="n") # Add in control group flow into Lake Huron points(data$time[61:120],data$flow[61:120], type='l', col="blue") # Add x-axis year labels axis(1, at=1:60, labels=data$year[1:60])

# Add in the points for the figure points(data$time[1:60],data$flow[1:60], col="red", pch=20) points(data$time[61:120],data$flow[61:120], col="blue", pch=20) # Label the weather change abline(v=27.5,lty=2) # Add in a legend legend(x=3, y=1000, legend=c("nile","huron"), col=c( red", blue"),pch=20)

STEP 6: PRELIMINARY ANALYSIS

Preliminary analysis As with last week, start with a standard OLS regression with a time series specification This will form the basis for checks about autocorrelation

Basic time series model For intervention status j, for group k, at time t: outcome jkt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level jt + β 5 trend jt + β 6 level jt group k + β 7 trend jt group k +ε jkt

OLS Regression in R # A preliminary OLS regression model_ols <- lm(flow ~ time + nile + niletime + level + trend + nilelevel + niletrend, data=data) # See summary of model output summary(model_ols) # Get confidence intervals for coefficients confint(model_ols)

OLS Model Results Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2442.191 148.540 16.441 < 2e-16 *** time -17.731 9.272-1.912 0.05838. nile 965.745 210.068 4.597 1.13e-05 *** niletime 23.149 13.112 1.765 0.08021. level 256.428 193.936 1.322 0.18878 trend 4.022 11.534 0.349 0.72796 nilelevel -1062.702 274.267-3.875 0.00018 *** niletrend -14.983 16.311-0.919 0.36029

STEP 7: AUTOCORRELATION

Methods to Check Several methods, including: Durbin-Watson test Residual plots ACF and partial-acf plots

Durbin-Watson test A formal test that tests for correlated residuals Interpretation Values of 2 indicate no autocorrelation lower values indicate positive correlation, higher indicates negative correlation # Durbin-watson test, 12 time periods dwt (model_ols,max.lag=12, alternative="two.sided")

> dwt(model_ols,max.lag=12,alternative="two.sided") lag Autocorrelation D-W Statistic p-value 1 0.18937367 1.620277 0.010 2 0.01902871 1.952186 0.442 3-0.12878820 2.211857 0.402 4-0.21783205 2.373411 0.062 5-0.14356400 2.215094 0.292 6-0.10262651 2.129624 0.414 7-0.05243658 1.983483 0.864 8 0.07835392 1.701832 0.188 9-0.13712761 2.065004 0.438 10-0.22540820 2.233116 0.092 11 0.07906429 1.616586 0.178 12 0.08853936 1.574862 0.130 Alternative hypothesis: rho[lag]!= 0

Residual plots Residuals from an OLS should not be related over time (independence assumption) Use a residual plot to visually inspect for patterns # Graph the residuals plot(data$time[1:60], residuals(model_ols)[1:60], type='o', pch=16, xlab='time', ylab='ols Residuals', col="red") abline(h=0,lty=2)

Autocorrelation plots A plotting method with which you can assess autocorrelation and moving averages Two plots Autocorrelation Partial autocorrelation

Model ACF Partial ACF No autocorrelation All zeros All zeros Autoregressive (p) Exponential Decay p significant lags before dropping to zero Moving Average (q) q significant lags before dropping to zero Exponential Decay Both (p,q) Decay after q th lag Decay after p th lag

Examples of Exponential Decay

Examples of Significant Lags

Producing ACF plots Use the following code on our nile data: # Set plotting to two records on one page par(mfrow=c(2,1)) # Produce plots acf(residuals(model_ols)) acf(residuals(model_ols),type='partial')

STEP 8: RUN THE FINAL MODEL

Running the final model As with last week, use gls(): # Fit the GLS regression model model_p10 <- gls(flow ~ time + nile + niletime + level + trend + nilelevel + niletrend, data=data, correlation=corarma(p=10,form=~time nile), method="ml") summary(model_p10) confint(model_p10)

Coefficients: Value Std.Error t-value p-value (Intercept) 2464.7702 76.14223 32.37061 0.0000 time -20.2181 4.85602-4.16352 0.0001 nile 885.4280 107.68137 8.22267 0.0000 niletime 26.8242 6.86745 3.90600 0.0002 level 368.1143 101.97194 3.60996 0.0005 trend 2.1405 5.22455 0.40970 0.6828 nilelevel -1101.1601 144.21010-7.63580 0.0000 niletrend -16.4864 7.38863-2.23132 0.0277 Interpretation: after the weather change, there was a sustained drop in average monthly water flow of 1101 million cubic meters relative to the change in the St. Mary s river. There was also a small relative drop in the trend of 16.5 million cubic meters per month afterward.

Model checking To check the specification of the correlation structure, you can formally test the inclusion of further autoregressive parameters # Likelihood-ratio tests model_p10q1 <- update(model_p10, correlation=corarma(q=1,p=10,form=~time nile)) anova(model_p10,model_p10q1) model_p11 <- update(model_p10, correlation=corarma(p=11,form=~time nile)) anova(model_p10,model_p11)

> anova(model_p10,model_p10q1) Model df AIC BIC loglik Test L.Ratio p-value model_p10 1 19 1755.983 1808.945-858.9915 model_p10q1 2 20 1756.284 1812.034-858.1421 1 vs 2 1.698731 0.1925 > anova(model_p10,model_p11) Model df AIC BIC loglik Test L.Ratio p-value model_p10 1 19 1755.983 1808.945-858.9915 model_p11 2 20 1755.303 1811.053-857.6516 1 vs 2 2.679826 0.1016

STEP 9: PLOT THE RESULTS

Time series with control model For intervention status j, for group k, at time t: Baseline outcome for the control group Pre-existing trend in the outcome of interest for the control group Baseline difference between intervention and control groups Pre-existing difference in trend between intervention And control group Level change in the control group outcome jkt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level t + β 5 trend jt + β 6 level jt group k + β 7 trend jt group k +ε jkt Trend change in the control group Difference in level change between the intervention and control group * First variable of interest Difference in trend change between the intervention and control group * Second variable of interest

outcome jkt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level t + β 5 trend jt + β 6 level jt group k + β 7 trend jt group k +ε jkt For the control group: Outcome of Interest β 0 (existing level) β 1 (existing trend) β 4 (level change) β 5 (slope change) Pre-intervention Post-intervention Time

outcome jkt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level t + β 5 trend jt + β 6 level jt group k + β 7 trend jt group k +ε jkt Predicted values: =β 0 +β 1 *time+β 4 +β 5 =β 0 +β 1 =β 0 +β 1 *time =β 0 +β 1 *time+β 4 +β 5 *trend Outcome of Interest Pre-intervention Post-intervention Time

outcome jkt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level t + β 5 trend jt + β 6 level jt group k + β 7 trend jt group k +ε jkt For the intervention group: β 3 (existing trend diff) Outcome of Interest β 2 (existing level diff) β 6 (level change diff) β 7 (slope change diff) Pre-intervention Post-intervention Time

outcome jkt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level t + β 5 trend jt + β 6 level jt group k + β 7 trend jt group k +ε jkt Predicted values: =β 0 +β 2 + (β 1 +β 3 )*time+β 4 +β 5 Outcome of Interest =β 0 +β 2 +β 1 +β 3 =β 0 +β 2 +(β 1 +β 3 )*time =β 0 +β 2 +(β 1 +β 3 )*time +β 4 +β 5 +β 6 +β 7 =β 0 +β 2 +(β 1 +β 3 )*time+β 4 +β 5 *trend =β 0 +β 2 +(β 1 +β 3 )*time+ β 4 +β 6 +(β 5 +β 7 )*trend Pre-intervention Post-intervention Time

Coefficients: (Intercept) 2464.8 time -20.21 nile 885.4 niletime 26.8 level 368.1 trend 2.1 nilelevel -1101.2 niletrend -16.5 Outcome of Interest Pre-intervention Post-intervention Time

STEP 9: PLOT THE RESULTS (PART 2)

Plot the results # First plot the raw data points for the Nile plot(data$time[1:60],data$flow[1:60], ylim=c(0,4500), ylab="water Flow", xlab="year", pch=20, col="lightblue", xaxt="n") # Add x-axis year labels axis(1, at=1:60, labels=data$year[1:60]) # Label the policy change abline(v=27.5,lty=2) # Add in the points for the control points(data$time[61:120],data$flow[61:120], col="pink", pch=20)

Plotting the intervention group For observed, use fitted ( ) # Plot the first line segment for the intervention group lines(data$time[1:27], fitted(model_p10)[1:27], col="blue",lwd=2) # Add the second line segment for the intervention group lines(data$time[28:60], fitted(model_p10)[28:60], col="blue",lwd=2)

Plotting the counterfactual For counterfactual, use segments(x 1,y 1,x 2,y 2 ) # Add the counterfactual for the intervention group segments(28, model_p10$coef[1] + model_p10$coef[2]*28 + model_p10$coef[3] + model_p10$coef[4]*28 + model_p10$coef[5] + model_p10$coef[6], 60, model_p10$coef[1] + model_p10$coef[2]*60 + model_p10$coef[3] + model_p10$coef[4]*60 + model_p10$coef[5]+model_p10$coef[6]*33, lty=2,col='blue',lwd=2)

Plotting the control group Again, for observed, use fitted ( ) # Plot the first line segment for the control group lines(data$time[61:87], fitted(model_p10)[61:87], col="red",lwd=2) # Add the second line segment for the control lines(data$time[88:120], fitted(model_p10)[88:120], col="red",lwd=2)

Plotting the counterfactual For counterfactual (if desired), use segments(x 1,y 1,x 2,y 2 ) # Add the counterfactual for the control group segments(1, model_p10$coef[1] + model_p10$coef[2], 60, model_p10$coef[1] + model_p10$coef[2]*60, lty=2,col='red',lwd=2)

Adding a legend Since we have two data series # Add in a legend legend(x=3, y=1000, legend=c("nile","huron"), col=c("blue","red"), pch=20)

STEP 10: PREDICTED CHANGES

outcome jt = β 0 + β 1 time t + β 2 group k + β 3 group k time t + β 4 level t + β 5 trend t + β 6 level t group k + β 7 trend t group k +ε jkt Predicted values: =β 0 +β 2 +(β 1 +β 3 )*time+β 4 +β 5 *trend Outcome of Interest =β 0 +β 2 +(β 1 +β 3 )*time+ β 4 +β 6 +(β 5 +β 7 )*trend Pre-intervention Post-intervention Time

Step 10: Predicted changes Using the model coefficients, you can predict absolute and relative changes # Predicted value at 25 years after the weather change pred <- fitted(model_p10)[52] # Then estimate the counterfactual at the same time point cfac <- model_p10$coef[1] + model_p10$coef[2]*52 + model_p10$coef[3] + model_p10$coef[4]*52 + model_p10$coef[5] + model_p10$coef[6]*25 # Absolute change at 25 years pred - cfac # Relative change at 25 years (pred - cfac) / cfac

> # Absolute change at 25 years > pred - cfac 52-1513.321 > # Relative change at 25 years > (pred - cfac) / cfac 52-0.3677265 Interpretation: In the 25 th year after the weather change, the average monthly water flow was 1513 million cubic meters less than would have been expected if the weather had not changed. This represented a 36.8% reduction.

Absolute Relative Single Series -1093-29.6% With Control -1513-36.8%

Overview of steps 1. Determine time periods 2. Select analytic cohorts 3. Determine outcomes of interest 4. Setup data 5. Visually inspect the data 6. Perform preliminary analysis 7. Check for and address autocorrelation 8. Run the final model 9. Plot the results 10.Predict relative and absolute effects 68