Estimation of Markov Switching model using a statistical computing software R

Size: px

Start display at page:

Download "Estimation of Markov Switching model using a statistical computing software R"

Robert Lyons
6 years ago
Views:

1 Estimation of Markov Switching model using a statistical computing software R Atsushi Matsumoto Abstract The objective of this note is to provide readers with the procedures to estimate Markov Switching model with time varying transition probability using a statistical computing software R. Since Hamilton(1989), there have been many amount of researches as to Markov Switching model and there are programs to estimate such models on the Internet. But almost all the programs are written in GAUSS or Matlab, both of which are expensive softwares. Therefore, using R, we can estimate Markov Switching model freely. Furthermore, as the program can be rewritten easily, readers can expand models to estimate advanced models. 1 The Basic Model The basic model which this note attempts to estimate is Markov Switching model(introduced by Hamilton(1989)) with time varying transition probability(ms-tvtp). Since the focus of this note is business cycle transition, we henceforth consider fitting the business cycle data to MS-TVTP model. Let y t be the log differenced series of the coincident, composite index of business condition index in Japan with 100 multiplied(y t = 100 ln cci t ), and x t be the level of O/N call rate in Japan. The basic model to be estimated here is MS-TVTP-AR(1) model in which y t follows AR(1) process with switching intercept: y t = m st + φy t 1 + e t, e t i.i.d. N (0, σ 2 ). (1) where s t is the latent variable taking on the value of 0 or 1. s t = 0 and s t = 1 indicate recession and expansion, respectively. Therefore m st indicates that the intercept is determined by the value of s t. And, following Filardo(1994), s t is assumed to follow 2-state discrete Markov chain of which transition probablity is dependent of an exogenous variable x t 1 and is represented by: Pr(s t = 1 s t 1 = 1 : x t 1 ) = Pr(s t = 0 s t 1 = 0 : x t 1 ) = exp(β(1) 0 + β (1) 1 x t 1) 1 + exp(β (1) 0 + β (1) 1 x t 1), exp(β(0) 0 + β (0) 1 x t 1) 1 + exp(β (0) 0 + β (0) 1 x t 1). We here define the parameter set θ and the information set avairable at t Ω t by: θ : = { β (0) 0, β(1) 0, β(0) 1, β(1) 1, φ, σ2, m 0, m 1, Ω t : = { y t, y t 1,, x t, x t 1,. Note that any s t are not included in the information set since we cannnot observe the value of s t. Our interests are to (1)estimate the parameter set θ and (2)calculate the filtered probability of recession Pr(s t = 0 Ω t ) 1) based on the estimated parameters This note conducted all the estimation using R version The maximization of the likelihood function is conducted by using function nlm, while it has been confirmed that we can obtain the same result using function optim I reffered Kim and Nelson(1999) s web site in writing this program. But all remaining errors are mine. Osaka School of International Public Policy, Osaka University atsushi-mail@hcc6.bai.ne.jp 1) If you want to calculate the filtered probability of the expansion Pr(s t = 1 Ω t), noting that Pr(s t = 1 Ω t) = 1 Pr(s t = 0 Ω t), you can easily obtain it. (2) 1

2 2 The Construction of Program This section explains the construction of the program to estimate MS-TVTP model using R. The program consists of the following three steps: (1) construct the likelihood function recursively (2) maximize the likelihood function to get the maximum likelihood estimator. (3) using the obtained maximum likelihood estimator, calculate the filtered probability In addition to these steps, however, the program needs the part for data loading, arranging the result and so on. All the steps is explained in the following. This note uses Hungarian notation, in which the name of a variable indicates its type or intended use: words beginning with v represent vectors, words beginning with m represent matrices and other words are basically scalar in this note. Here the estimation programs is named MSTVTPest and has the form: MSTVTPest<-function(vInitial,mData){ likfcn<-function(vparam){ # (1)calculate the likelihood est<-nlm(likfcn, vinitial, ) # (2)get the maximum likelihood estimate filter<-function(vparam){ # (3)calculate the filtered probability return{mresult As the above indicates, the function MSTVTPest has its argument vinitial and mdata, and has its return value mresult Here vinitial is the vector of initial value(now 8-dimensional) needed for maximization step, mdata is the 73 3 matrix of the data seris, and mresult is the matrix which consists of the estimation result. We see that MSTVTPest includes the function likfcn(construct the likelihood), the function nlm(maximize the likelihood) and the function filter(calculate the filtered probability) 2.1 Constructing likelihood: likfcn Let us begin with constructing the likelihood function. In this note, the name of the likelihood funciton is likfcn, of which argument is the parameter vector vparam. Note that the difference between vinitial and vparam. The formere is the vector of initial value, while the latter is the parameter vector to be estimated. First write: likfcn <-function(vparam){ yy <- mdata[,1] xx <- mdata[,2] p0 <- vparam[1] q0 <- vparam[2] phi <- vparam[3] var <- vparam[4]^2 mu0 <- vparam[5] mu1 <- vparam[6] p1 <- vparam[7] q1 <- vparam[8] Since mdata has been already loaded as the argument of MSTVTPest, we don t need to load the data again in constructing the likelihood function. Here, after loading the data, we name each series of the data and each paramter. Because the 1st column of mdata is y t and the 2nd column is x t, we name them: yy = y, zmat = x, and, for the parameters, similarliry name: β (1) 0 = p0, β (1) 1 = p1, φ = phi, m 0 = mu0, β (0) 0 = q0, β (0) 1 = q1, σ 2 = var, m 1 = mu1. 2

3 Hereafter f( ) denotes a conditional PDF. By its definition, the maximum likelihood estimator θ ML is defined by: T θ ML = arg max ln L(θ) = ln f(y t Ω t 1 ), θ where ln L(θ) is the log likelihood. The conditional PDF of y t given Ω t 1 is, however, dependent not only of the current state s t but also of the past state s t 1. Hence, integrating the joint PDF of y t, s t and s t 1 with respect to s t and s t 1, we consider the marginal PDF of y t : f(y t Ω t 1 ) f(y t, s t = j, s t 1 = i Ω t 1 ). j=0,1 Since the value of s t and s t 1 cannot be observed at any time, we decompose the above function, given Pr(s t 1 = i Ω t 1 ). By this decomposition, we can obtain the conditional PDF of y t given s t and s t 1, and can regard the unobserved s t and s t 1 as observed. f(y t Ω t 1 ) f(y t, s t = j, s t 1 = i Ω t 1 ) j=0,1 j=0,1 j=0,1 j=0,1 t=2 f(y t s t = j, s t 1 = i, Ω t 1 ) Pr(s t = j, s t 1 = i Ω t 1 ) f(y t s t = j, s t 1 = i, Ω t 1 ) Pr(s t = j s t 1 = i : x t 1 ) Pr(s t 1 = i Ω t 1 ), f(y t s t = j, Ω t 1 ) Pr(s t = j s t 1 = i : x t 1 ) Pr(s t 1 = i Ω t 1 ), ( where f(y t s t = j, s t 1 = i, Ω t 1 ) = f(y t s t = j, Ω t 1 ) = 1 exp 2πσ ( ) 2 ) yt m j φy t 1 2σ 2. Now we represent this decomposition as in vector form. We here define the vector f t, ξ t and ζ t as below. The reason for representing in vector form is that it is easier to write program using such form than to write using scalar notation. ( ) Pr(s t = 0 s t 1 = 0 : x t 1 ) Pr(s t 1 = 0 Ω t 1 ) f(yt s f t = t = 0 : Ω t 1 ) Pr(s, ξ f(y t s t = 1 : Ω t 1 ) t = t = 1 s t 1 = 0 : x t 1 ) Pr(s Pr(s t = 0 s t 1 = 1 : x t 1 ), ζ t = t 1 = 0 Ω t 1 ) Pr(s t 1 = 1 Ω t 1 ) Pr(s t = 1 s t 1 = 1 : x t 1 ) Pr(s t 1 = 1 Ω t 1 ) Then f(y t Ω t 1 ) can be obtained as follows, where 1 2 is a 2-dimentional vector of 1, is the Hadamard product, which represents element-by-element maltiplication, and [a : b] represents from the a th row to the b th row. f(y t Ω t 1 ) = 1 2(f t (ξ t [1 : 2] η t [1 : 2] + ξ t [3 : 4] η t [3 : 4])). (3) Next we conduct this decomposition and constructing the likelihood in writing program. Firstly we define the vector of s t and the vector of intercept m st, which are denoted by vst= (0, 1) and vmu= (m 0, m 1 ), respectively. vst <- rbind(0,1) vmu <- vst*mu1+(matrix(1,nrow=2,ncol=1)-vst)*mu0 Secondly we formulate the transition probabiliyt at initial point. Since the transition probability is assumed to be expressed as logit form, letting qpr= Pr(s 2 = 0 s 1 = 0 : x 1 ) and ppr= Pr(s 2 = 1 s 1 = 1 : x 1 ), it can be written and be collected as: qpr <- exp(q0+q1*xx[1])/(1+exp(q0+q1*xx[1])) ppr <- exp(p0+p1*xx[1])/(1+exp(p0+p1*xx[1])) mtrpr <- rbind(cbind(qpr,1-ppr), cbind(1-qpr,ppr)) 3

4 Pr(s 1 = 0 : Ω 1 ) and Pr(s 1 = 1 : Ω 1 ) are needed at initial point. We here use the steady state probabilities for them. For simplicity of expression, we denote π t = ( Pr(s 1 = 0 : Ω 1 ), Pr(s 1 = 1 : Ω 1 ) ). The steady state probability is πt such that for all t π t = π t 1, π t = P π t 1 (I 2 P )π t = 0 2, where I 2 is the 2-dimensional identity matrix, P is the transition probability matrix and 1 2π p = 1 follows from the definition of probability. Therefore the steady state probability vector π t can be obtained as: ( I2 P 1 2 ) ( ) 02 π = 1 π = (A A) 1 A ( 02 1 ), where A := ( I2 P 1 2 ). (4) From Eq.(4), in order to get the steady state probabilty vector, we write program as follows. Now va= A, ven= (0, 0, 1) and vprob= π va <- rbind(cbind(1-qpr,-1+ppr), cbind(-1+qpr,1-ppr), cbind(1,1)) ven <-rbind(0,0,1) vprob <- solve(t(va)%*%va)%*%t(va)%*%ven vprob <- rbind(vprob[1], vprob[1], vprob[2], vprob[2]) We then have completed the preparation for constructing the likelihood at initial point. Next we recursively construct the likelihood to be maximized. This note utilizes function while to conduct recursive calculation. Letting n be the maximum of the time, from initial point to n, we repeat the following calculation, where vprob correspondosto ζ t, vtrpr corresponds to ξ t, vprobdd is the vector with Pr(s t = 0 Ω t 1 ) and Pr(s t = 1 Ω t 1 ) obtained from summing the elements of the product of vprobd and vprob, and ff corresponds to f t. In addition, cf is f(y t, s t = j Ω t 1 ) for all j and f is f(y t Ω t 1 ). To go on the next step, you need to note that the loglikelihood function is obtained by not summming ln f(y t Ω t 1 ) but substracting them. This is because function nlm conducts the minimization of a function. m <- length(yy) likv <- 0 j_iter <- 2 while( j_iter <= n){ qpr <- exp(q0+q1*xx[j_iter])/(1+exp(q0+q1*xx[j_iter])) ppr <- exp(p0+p1*xx[j_iter])/(1+exp(p0+p1*xx[j_iter])) mtrpr <- rbind(cbind(qpr,1-ppr), cbind(1-qpr,ppr)) vtrpr <- rbind(mtrpr[1,1], mtrpr[2,1], mtrpr[1,2], mtrpr[2,2]) ff <- (yy[j_iter]-yy[j_iter-1]*phi)*matrix(1,nrow=2,ncol=1)-vmu vvar <- var*matrix(1,nrow=2,ncol=1) vprobd <- vprtrf*vprob vprobdd <- vprobd[1:2]+vprobd[3:4] cf <- (1/sqrt(2*pi*vVar))*exp(-0.5*ff*ff/vVar)*vProbdd f <- sum(cf) lik <- log(f) vpro <- cf/f vprob <- rbind(vpro[1], vpro[1], vpro[2], vpro[2]) likv <- likv-lik j_iter <- j_iter+1 4

5 return(likv) In the above, vpro and vprob are the part of the Hamilton filter. See the subsection 2.3 for the details of the filter. The function likfcn has its return value likv, which is the value of the log likelihood function obtained by the recursive construction. In the next subsection, we consider the maximization of this likv to get the maximum likelihood estimator. At the beggining of the recursive calculation, likv is set to be 0 and j iter, which is the time index, is set to be 2 because we here consider AR(1) model. If you use, for instance, AR(4) model, then j iter should be set to be Optimization of the likelihood function: nlm In this subsection, the maximiation of the likelihood function is conducted(precisely, the minimization of the likelihood function with 1 multiplied). To do so, we here use function nlm, which is a function conducting non linear minimization. Function nlm has its argument the function to be minimized(likfcn) and the initial value(vinitial), and so on: see the help of nlm for details. Anyway, declaring the function to be minimized and the initial value does well. est<-nlm(likfcn, vinitial, hessian = TRUE, print.level = 0, gradtol = 1e-6, iterlim = ) But you sometimes face the situation in which the inverse of Hessian matrix cannot be obtained, or there are some errors in the result. To judge whether or not the optimization has been conducted precisely, we use est$code and function switch. For details, see the help of nlm. c<-est$code switch(c, "1" = print("the MLE has been obtained."), "2" = print("the MLE has been obtained."), "3" = print("error3!! The MLE cannot be obtained."), "4" = print("error4!! The number of iteration is exceeded."), "5" = print("error5!! The function is not upper bounded and has no MLE.")) If you have no error and have the maximum likelihood estimator, you can calculate its standard error, z value and p value. When using function nlm, you can get the vector of the estimator if adding $estimate to the result. And, similarily, if you add $hessian to the result, you can get the Hessian matrix evaluated at the maximum likelihood estimator. This note obtains the standard error by using the inverse of the Hessian matrix. Although you generally need to multiply 1 by the Hessian before inverting it, you don t need to do such a operation here because 1 has been already multiplied by the likelihood. And p value is for the two sided test. Now write: vest<-est$estimate if(det(est$hessian) == 0) print("error!! The Hessian cannnot be inverted.") else print("the Hessian could be inverted.") minfo<-solve(est$hessian) We call the estimated result vest, which is 8-dimensional vector with the estimated parameters. The if in the second line is for confirming whether or not the Hessian matrix can be inverted. If the determinant of the Hessian isn t zero, that is, if the Hessian matrix is non singular, the sentence "The Hessian could be inverted" is printed and the inverse of the Hessian matrix(name minfo) is obtained, while the sentence "ERROR!! The Hessian cannnot be inverted." is printed if not. Next wirte: k<-length(vest) vstd.err<-matrix(0,k,1) vz.value<-matrix(0,k,1) vp.value<-matrix(0,k,1) We let k be the number of parameters. And vstd.err, vz.value and vp.value are defined to be the vectors of zeros for restoring the standard errors, z values and p values, respectively. Then, using function while, we calculate the standard errors, z values and p values, and restore them into the vectors defined in the above. In the below, if is used to judge whether or not the diagonal element of the information matrix is positive. If it is not positive, then the standard error cannot be calculated and the sentence "ERROR!! Std.Err is negative!!" is pritend. 5

6 i<-1 while(i<=k){ if(minfo[i,i]>0) vstd.err[i]<-sqrt(minfo[i,i]) else print("error!! Std.Err is negative!!") vz.value[i]<-vest[i]/vstd.err[i] vp.value[i]<-2*(1-pnorm(abs(vz.value[i]))) i<-i+1 What follows is to rearrange the estimation result and to retun the result of the estimation as a return value of function MSTVTPest. For the details of function signif, see its help. vest<-signif(vest,digits=5) vstd.err<-signif(vstd.err,digits=5) vz.value<-signif(vz.value,digits=5) vp.value<-signif(vp.value,digits=5) mresult<-cbind(rbind("name","p0","q0","phi","sigma","m0","m1","p1","q1"), rbind(cbind("est","std.err","z-value","p-value"), cbind(est_vec,sd_vec,z_vec,p_vec))) 2.3 The filtered probability: filter This subsection calculates the filtered probability using the estimated result. The filtered probability is the probability that s t = j given the current information Ω t, denoted by Pr(s t = j Ω t ). This is needed in decomposing f(y t+1 Ω t ) at t + 1. For j = 0, 1, it can be obtained by: Pr(s t = j Ω t ) Pr(s t = j, s t 1 = i Ω t ) Pr(s t = j, s t 1 = i y t, Ω t 1 ) = j=0,1 f(y t, s t = j, s t 1 = i Ω t 1 ) f(y t Ω t 1 ) f(y t s t = j, Ω t 1 ) p(i,j) t Pr(s t 1 = i Ω t 1 ) f(y t s t = j, Ω t 1 )p (i,j) t Pr(s t 1 = i Ω t 1 ). where p (i,j) t = Pr(s t = j s t 1 = i : x t 1 ) and Pr(s t = j Ω t ) is the j +1 th element of the following vector: ( ( ft (ξ t [1 : 2] η t [1 : 2] + ξ t [3 : 4] η t [3 : 4]) ) ) Pr(s t = j Ω t ) = ( 1 2 ft (ξ t [1 : 2] η t [1 : 2] + ξ t [3 : 4] η t [3 : 4]) ), where j = 0, 1, j+1 We then calculate this probability in writing program. The function for calculating the filtered probability is named filter, of which argument is vparam While vparam represents a general parameter vector, it is equal to the maximum likelihood estimatro vector vest in calcualting the filtered probability. The step for calculation is almost all the same as that for constructing the likelihood. But, in addition to loading yy and xx, the announced recession period is also loaded for comparison. This data is named as vrec Other parts are not explained here because their explanations have been already done. filter<-function(vparam){ yy <- mdata[,1] vrec <- mdata[,3] xx <- mdata[,2] p0 <- vparam[1] n <- length(yy) 6

7 Here we define the vector vprst0 to restore the result of calculating the filtered probability. But the procedures for the likelihood function is, of course, all the same as in the above subsection. Some parts are also omitted here because their explanations have been already done. vprst0 <- matrix(0,nrow=n,ncol=1) likv <- 0 j_iter <- 2 while( j_iter <= n){ vpro <- cf/f Now write: vprob <- rbind(vpro[1], vpro[1], vpro[2], vpro[2]) vprst0[j_iter] <- vprob[1] likv <- likv-lik j_iter <- j_iter+1 return(list(vprst0,vrec)) The return values of function filter are the calculated filtered probabilities(vprst0) and the announced recession period(vrec). Since vpro has the form of vpro= (Pr(s t = 0 Ω t ) Pr(s t = 1 Ω t ), and we are interested in the recession probability, we pick up the filtered probability of recession by writing vprst0[j iter] <- vprob[1]. Lastly, we write the program to draw the calculated filtered probabilties and the announced recession period. Letting the argument of function filter be vest, we firstly calcualte the filtered probabilities and save this result as the matrix fil.prob, of which 1st column is the calculated filtered probabilities and 2nd column is the announced recession period. Loading the data from this matrix. we draw the graph of the calculated filtered probabilities and the announced recession period, where par(mfrow=c(2,1)) is the order for dividing display into two parts and ts is the order for transforming the data into time series data. fil.prob<-filter(est_vec) par(mfrow=c(2,1)) plot(ts(data.frame(fil.prob[1])),main="filtered Probability", ylab=" ") plot(ts(data.frame(fil.prob[2])),main="announced Recession", ylab=" ") return(result_mat) Lastly, return(mresutl) is the return value of function MSTVTPest. We have completed writing the program. References [1] Filardo, Andrew J.(1994) Business cycle phase and their transition dynamics, Journal of Business & Economic Statistics, Vol.12, No.3, pp [2] Hamilton, James D.(1989) A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica, 57, pp [3] Kim, Chang J. and Nelson, Charles R.(1999) State space models with regime switching, The MIT Press 7

Switching Regime Estimation

Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms