Homework #5 - Answer Key Mark Scheuerell

Similar documents
Lecture 5: Estimation of time series

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model

Solutions Homework 4. Data Set Up

The data was collected from the website and then converted to a time-series indexed from 1 to 86.

Fitting univariate state-space models

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9)

COMPUTER SESSION 3: ESTIMATION AND FORECASTING.

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) Outline. Return Rate. US Gross National Product

Minitab Project Report - Assignment 6

Report for Fitting a Time Series Model

Lecture 6: Forecasting of time series

LOWELL WEEKI.Y JOURINAL

PALACE PIER, ST. LEONARDS. M A N A G E R - B O W A R D V A N B I E N E.

PanHomc'r I'rui;* :".>r '.a'' W"»' I'fltolt. 'j'l :. r... Jnfii<on. Kslaiaaac. <.T i.. %.. 1 >

ITSx: Policy Analysis Using Interrupted Time Series

Akaike criterion: Kullback-Leibler discrepancy

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Transformations for variance stabilization

SAS Syntax and Output for Data Manipulation:

Homework 5, Problem 1 Andrii Baryshpolets 6 April 2017

Lecture # 31. Questions of Marks 3. Question: Solution:

The log transformation produces a time series whose variance can be treated as constant over time.

Some general observations.

Analysis of multivariate timeseries using the MARSS package

Lesson 24: Using the Quadratic Formula,

Step 2: Select Analyze, Mixed Models, and Linear.

Week 9: An Introduction to Time Series

X t = a t + r t, (7.1)

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Dynamic linear models (aka state-space models) 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Stat 153 Time Series. Problem Set 4

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016

MAT3379 (Winter 2016)

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Educjatipnal. L a d ie s * COBNWALILI.S H IG H SCHOOL. I F O R G IR L S A n B k i n d e r g a r t e n.

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Spectral Analysis. Al Nosedal University of Toronto. Winter Al Nosedal University of Toronto Spectral Analysis Winter / 71

Elements of Multivariate Time Series Analysis

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Simple Linear Regression for the MPG Data

STAT 153: Introduction to Time Series

Lecture 8: Multivariate GARCH and Conditional Correlation Models

Akaike criterion: Kullback-Leibler discrepancy

Time Series I Time Domain Methods

Linear regression models in matrix form

Large Scale Data Analysis Using Deep Learning

Nonparametric inference in hidden Markov and related models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

My data doesn t look like that..

Classical Decomposition Model Revisited: I

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

SAS Syntax and Output for Data Manipulation: CLDP 944 Example 3a page 1

477/577 In-class Exercise 5 : Fitting Wine Sales

Multivariate Time Series: VAR(p) Processes and Models

STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05

2. An Introduction to Moving Average Models and ARMA Models

Univariate Nonstationary Time Series 1

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

The Matrix Algebra of Sample Statistics

Some Time-Series Models

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Generalized Additive Models (GAMs)

Simple Linear Regression for the Climate Data

Regression Models - Introduction

Lecture 18 Miscellaneous Topics in Multiple Regression

COMPUTER SESSION: ARMA PROCESSES

Lecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay

Chapter 18 The STATESPACE Procedure

Linear Models Review

Statistics 910, #5 1. Regression Methods

Random Intercept Models

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

STAT 436 / Lecture 16: Key

FORECASTING USING R. Dynamic regression. Rob Hyndman Author, forecast

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Example of data analysis homework

Linear Regression (1/1/17)

Model selection and checking

MAT 3379 (Winter 2016) FINAL EXAM (PRACTICE)

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Prediction problems 3: Validation and Model Checking

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Exam Applied Statistical Regression. Good Luck!

Interpretation of two error statistics estimation methods: 1 - the Derozier s method 2 the NMC method (lagged forecast)

Forecasting. Al Nosedal University of Toronto. March 8, Al Nosedal University of Toronto Forecasting March 8, / 80

Time Series Forecasting Methods:

arxiv: v1 [stat.co] 11 Dec 2012

The Role of Leader Motivating Language in Employee Absenteeism (Mayfield: 2009)

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

Non-independence due to Time Correlation (Chapter 14)

PAPER 206 APPLIED STATISTICS

Consider fitting a model using ordinary least squares (OLS) regression:

Lesson 23: Deriving the Quadratic Formula

Transcription:

Homework #5 - Answer Key Mark Scheuerell Background Here are the answers for the homework problems from the sixth week of class on the Dynamic Linear Models (DLMs) material. Begin by getting the data get S-R data; cols are: 1: brood yr (brood.yr) 2: number of spawners (Sp) 3: number of recruits (Rec) 4: PDO during first summer at sea (PDO.t2) 5: PDO during first winter at sea (PDO.t3) load("kvichaksockeye.rdata") head of data file head(srdata) brood.yr Sp Rec PDO.t2 PDO.t3 1 1952 5970 17310-0.61-0.61 2 1953 320 520-1.48-2.66 3 1954 240 750-2.05-1.26 4 1955 250 1280 0.01 0.11 5 1956 9443 39036 0.86 0.37 6 1957 2843 4091-0.25 0.29 Question 1 Begin by fitting a reduced form of Equation 15 that includes only a time-varying level (α t ) and observation error (v t ). That is, log(r t ) = α t + log(s t ) + v t log(r t /S t ) = α t + v t (1) This model assumes no density-dependent survival in that the number of recruits is an ascending function of spawners. Plot the ts of α t and note the AICc for this model. Also plot appropriate model diagnostics. Answer The stock-recruit model here is a random walk observed with error, which we have seen a lot in class. To see the equivalency, instead write the observation model as where y t = log(r t /S t ) and x t = α t. The process model is then y t = x t + v t (2) x t = x t 1 + w t (3) 1

The first thing we need is to compute the response variable y t = log(r t /S t ). Time series of ln(r/s) lnrs <- log(srdata$rec/srdata$sp) dat <- matrix(lnrs, nrow=1) number of years of data--we'll need this later TT <- length(lnrs) Now we can set up the DLM as a level-only model (i.e., a random walk with observation error) and fit it with MARSS. library(marss) MARSS model defn for process eqn BB <- matrix(1) UU <- matrix(0) QQ <- matrix("q") for observation eqn ZZ <- matrix(1) AA <- matrix(0) RR <- matrix("r") only need starting values for regr parameters inits_list <- list(x0=matrix(1)) list of model matrices & vectors mod_list <- list(b=bb, U=UU, Q=QQ, Z=ZZ, A=AA, R=RR, tinitx=0) fit DLM Q1 <- MARSS(dat, inits=inits_list, model=mod_list) Success! abstol and log-log tests passed at 22 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 22 iterations. Log-likelihood: -48.67502 AIC: 103.35 AICc: 104.0559 Estimate R.r 0.271 Q.q 0.318 x0.x0 0.953 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. plot the time-varying level plot.ts(t(q1$states), ylab=expression(alpha_t)) 2

alpha_t 0.5 0.0 0.5 1.0 1.5 0 10 20 30 get AICc Q1$AICc Time [1] 104.0559 And finally examine some diagnostic plots: get list of Kalman filter output kf_out <- MARSS::MARSSkfss(Q1) forecast errors innov <- kf_out$innov Q-Q plot of forecast errors qqnorm(t(innov), main="", pch=16, col="blue") add y=x line for easier interpretation qqline(t(innov)) 3

Sample Quantiles 2 1 0 1 2 2 1 0 1 2 Theoretical Quantiles plot ACF of innovations acf(t(innov), lag.max=10, main="acf for Q1 residuals") ACF for Q1 residuals ACF 0.2 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 The residuals seem to be reasonably well behaved in that there appear normal with no significant autocorrelation. Lag 4

Question 2 Fit the full model specified by Equation 15. For this model, obtain the time series of α t, which is an estimate of the stock productivity in the absence of density-dependent effects. How do these estimates of productivity compare to those from the previous question? Plot the ts of α t and note the AICc for this model. Also plot appropriate model diagnostics. (Hint: If you don t want a parameter to vary with time, what does that say about its process variance?) Answer Now we need to fit a DLM with a time-varying level (intercept), but time-invariant slope. Begin by obtaining the time series of spawners to use as the covariate. Sp <- matrix(srdata$sp, nrow=1) Set up the MARSS model structure so α varies with time, but not β, which means q = 0 for β. This means that Q should be Q = [ ] qα 0 0 0 (4) number of regr coefs m <- 2 MARSS model defn for process eqn B <- diag(m) 2x2; Identity U <- matrix(0,nrow=m,ncol=1) 2x1; both elements = 0 Q <- matrix(list(0),m,m) 2x2; all 0 for now Q[1,1] <- "q_alpha" 2x2; diag = (q1,q2) for observation eqn Z <- array(na, c(1,m,tt)) NxMxT; empty for now Z[1,1,] <- rep(1,tt) Nx1; 1's for intercept Z[1,2,] <- Sp Nx1; regr variable A <- matrix(0) 1x1; scalar = 0 R <- matrix("r") 1x1; scalar = r only need starting values for regr parameters inits_list <- list(x0=matrix(c(0, 0), nrow=m)) list of model matrices & vectors mod_list <- list(b=b, U=U, Q=Q, Z=Z, A=A, R=R) list of control params con_list <- list(maxit=2000) fit DLM Q2 <- MARSS(dat, inits=inits_list, model=mod_list, control=con_list) Success! abstol and log-log tests passed at 1190 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. Alert: Numerical warnings were generated. Print the $errors element of output to see the warnings. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 1190 iterations. Log-likelihood: -34.03694 5

AIC: 76.07388 AICc: 77.286 Estimate R.r 0.722397 Q.q_alpha 0.000000 x0.x1 0.721472 x0.x2-0.000018 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. MARSSkem warnings. Type MARSSinfo() for help. MARSSkem: The soln became unstable and loglik DROPPED. Use control$trace=1 to generate a more detailed error report. plot the time-varying level plot.ts(q2$states[1,], ylab=expression(alpha_t)) alpha_t 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 Time There does not appear to be any model support for a time-varying α. Now let s check the AICc value. get AIC Q2$AICc [1] 77.286 The AICc value for this model is much lower for that in Q1. Let s check out some diagnostic plots for the model in Q2. First we get the model innovations from our fitted MARSS object. get list of Kalman filter output kf_out <- MARSS::MARSSkfss(Q2) forecast errors 6

innov <- kf_out$innov Q-Q plot of forecast errors qqnorm(t(innov), main="", pch=16, col="blue") add y=x line for easier interpretation qqline(t(innov)) Sample Quantiles 2 1 0 1 2 1 0 1 2 Theoretical Quantiles plot ACF of innovations acf(t(innov), lag.max=10, main="acf for Q2 residuals") 7

ACF for Q2 residuals ACF 0.4 0.0 0.4 0.8 0 2 4 6 8 10 The diagnostics indicate that there is some significant autocorrelation in the residuals at lags 1, 5 and 6. The autocorrelation at lag 1 is likely due to environmental factors whereas the autocorrelation at lags 5 & 6 is perhaps a reflection of the dominant age classes of these fish. Lag Question 3 Fit the model specified by Equation 16 with the summer PDO index as the covariate (PDO.t2). What is the mean level of productivity? Plot the ts of δ t and note the AICc for this model. Also plot appropriate model diagnostics. Answer Now we need to fit a DLM so α and β are time-invariant, but δ varies by year. This means that Q should be number of regr coefs m <- 3 MARSS model defn for process eqn B <- diag(m) 2x2; Identity U <- matrix(0,nrow=m,ncol=1) Q <- matrix(list(0),m,m) place delta last--it's the only one to time-vary Q[3,3]=("q_delta") Q = 0 0 0 0 0 0 (5) 0 0 q δ 8

for observation eqn Z <- array(na, c(1,m,tt)) NxMxT; empty for now Z[1,1,] <- rep(1,tt) 1's for intercept Z[1,2,] <- SRdata[,2] Sp regr variable Z[1,3,] <- SRdata[,4] summer PDO regr variable A <- matrix(0) 1x1; scalar = 0 R <- matrix("r") 1x1; scalar = r only need starting values for regr parameters inits_list <- list(x0=matrix(c(0,0,0), nrow=m)) list of model matrices & vectors mod_list <- list(b=b, U=U, Q=Q, Z=Z, A=A, R=R) list of control params con_list <- list(maxit=2000, allow.degen=true) fit DLM Q3 <- MARSS(dat, inits=inits_list, model=mod_list, control=con_list) Success! abstol and log-log tests passed at 807 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. Alert: Numerical warnings were generated. Print the $errors element of output to see the warnings. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 807 iterations. Log-likelihood: -34.30589 AIC: 78.61177 AICc: 80.48677 Estimate R.r 7.20e-01 Q.q_delta 0.00e+00 x0.x1 7.20e-01 x0.x2-1.72e-05 x0.x3 4.52e-02 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. MARSSkem warnings. Type MARSSinfo() for help. MARSSkem: The soln became unstable and loglik DROPPED. Use control$trace=1 to generate a more detailed error report. mean productivity mean(q3$states[1,]) [1] 0.7197999 plot the time-varying effect of PDO plot.ts(q3$states[3,], ylab=expression(delta_t)) 9

delta_t 0.03 0.04 0.05 0.06 0 10 20 30 Time Here, as in Q2, there does not appear to be any model support for a time-varying δ. get AIC Q3$AICc [1] 80.48677 Again, this model is better than that in Q1, but not better than that for Q2. Here are some diagnostic plots: get list of Kalman filter output kf_out <- MARSS::MARSSkfss(Q3) forecast errors innov <- kf_out$innov Q-Q plot of forecast errors qqnorm(t(innov), main="", pch=16, col="blue") add y=x line for easier interpretation qqline(t(innov)) 10

Sample Quantiles 2 1 0 1 2 1 0 1 2 Theoretical Quantiles plot ACF of innovations acf(t(innov), lag.max=10, main="acf for Q3 residuals") ACF for Q3 residuals ACF 0.4 0.0 0.4 0.8 0 2 4 6 8 10 There is some indication that our model is not adequately accounting for autocorrelation in the residuals (i.e., signficant correlation at lag=1). Lag 11

Question 4 Fit the model specified by Equation 16 with the winter PDO index as the covariate (PDO.t3). What is the mean level of productivity? Plot the ts of δ t and note the AICc for this model. Also plot appropriate model diagnostics. Again we need to fit a DLM so that α and β are time-invariant, but δ varies by year. As for Q3, Q should be 0 0 0 Q = 0 0 0 (6) 0 0 q δ number of regr coefs m <- 3 MARSS model defn for process eqn B <- diag(m) 2x2; Identity U <- matrix(0,nrow=m,ncol=1) Q <- matrix(list(0),m,m) place delta last--it's the only one to time-vary Q[3,3]=("q_delta") for observation eqn Z <- array(na, c(1,m,tt)) NxMxT; empty for now Z[1,1,] <- rep(1,tt) 1's for intercept Z[1,2,] <- SRdata[,2] Sp regr variable Z[1,3,] <- SRdata[,5] winter PDO regr variable A <- matrix(0) 1x1; scalar = 0 R <- matrix("r") 1x1; scalar = r only need starting values for regr parameters inits_list <- list(x0=matrix(c(0,0,0), nrow=m)) list of model matrices & vectors mod_list <- list(b=b, U=U, Q=Q, Z=Z, A=A, R=R) list of control params con_list <- list(maxit=2000, allow.degen=true) fit DLM Q4 <- MARSS(dat, inits=inits_list, model=mod_list, control=con_list) Success! abstol and log-log tests passed at 777 iterations. Alert: conv.test.slope.tol is 0.5. Test with smaller values (<0.1) to ensure convergence. Alert: Numerical warnings were generated. Print the $errors element of output to see the warnings. MARSS fit is Estimation method: kem Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001 Estimation converged in 777 iterations. Log-likelihood: -34.09679 AIC: 78.19358 AICc: 80.06858 Estimate R.r 7.18e-01 Q.q_delta 0.00e+00 x0.x1 7.50e-01 x0.x2-2.03e-05 12

x0.x3 6.45e-02 Standard errors have not been calculated. Use MARSSparamCIs to compute CIs and bias estimates. MARSSkem warnings. Type MARSSinfo() for help. MARSSkem: The soln became unstable and loglik DROPPED. Use control$trace=1 to generate a more detailed error report. mean productivity mean(q4$states[1,]) [1] 0.74952 plot the time-varying effect of PDO plot.ts(q4$states[3,], ylab=expression(delta_t)) delta_t 0.04 0.05 0.06 0.07 0.08 0.09 0 10 20 30 Time Again, it appears as though there is no data support for a time-varying effect of PDO. get AIC Q4$AICc [1] 80.06858 This model is not any better than that in Q2. Here are some diagnostic plots: get list of Kalman filter output kf_out <- MARSS::MARSSkfss(Q4) forecast errors innov <- kf_out$innov Q-Q plot of forecast errors qqnorm(t(innov), main="", pch=16, col="blue") add y=x line for easier interpretation 13

qqline(t(innov)) Sample Quantiles 2 1 0 1 2 1 0 1 2 Theoretical Quantiles plot ACF of innovations acf(t(innov), lag.max=10, main="acf for Q4 residuals") ACF for Q4 residuals ACF 0.4 0.0 0.4 0.8 0 2 4 6 8 10 As in Q3 there is some indication that our model is not adequately accounting for autocorrelation in the residuals (i.e., signficant correlation at lags 1 & 4-5). Lag 14

Question 5 Based on AICc, which of the models above is the most parsimonius? Is it well behaved (i.e., are the model assumptions met)? Plot the model forecasts for the best model. Is this a good forecast model? Here is a table of AICc values for all 4 models. tbl_aicc <- data.frame(model=paste0("q",seq(4)), AICc=round(c(Q1$AICc,Q2$AICc,Q3$AICc,Q4$AICc),1)) tbl_aicc model AICc 1 Q1 104.1 2 Q2 77.3 3 Q3 80.5 4 Q4 80.1 The model we fit in Q2 appears to have the lowest AIC, so let s use that for forecasting. Here s how to obtain the time series of forecasts (and their SE) for the best model. get list of Kalman filter output kf_out <- MARSS::MARSSkfss(Q1) forecasts of regr parameters; 2xT matrix eta <- kf_out$xtt1 predictor variable (1's only for the intercept) Z <- array(1, c(1,1,tt)) NxMxT; empty for now ts of E(forecasts) fore_mean <- vector() for(t in 1:TT) { fore_mean[t] <- Z[,,t] %*% eta[,t,drop=f] } variance of regr parameters; 1x2xT array Phi <- kf_out$vtt1 obs variance; 1x1 matrix R_est <- coef(q1, type="matrix")$r ts of Var(forecasts) fore_var <- vector() for(t in 1:TT) { tz <- matrix(z[,,t],1,1) transpose of Z fore_var[t] <- Z[,,t] %*% Phi[,,t] %*% tz + R_est } And now we can plot them. fup <- fore_mean+2*sqrt(fore_var) flo <- fore_mean-2*sqrt(fore_var) par(mar=c(4,4,0.1,0), oma=c(0,0,2,0.5)) ylims=c(min(flo),max(fup)) plot(srdata$brood.yr, t(dat), type="p", pch=16, ylim=ylims, col="blue", xlab="year", ylab="ln(r/s"), xaxt="n") lines(srdata$brood.yr, fore_mean, type="l", xaxt="n", ylab="", lwd=3) lines(srdata$brood.yr, fup) lines(srdata$brood.yr, flo) 15

ln(r/s 3 2 1 0 1 2 3 1960 1970 1980 1990 Year Overall, the accuracy of the forecasts is a bit suspect as many observations are at 0.5+ log-units from the forecast. Although most of the observed ln(r/s) fell within the 95% forecast intervals, the intervals themselves are relatively large and span a large range of R/S. 16