Derivation of an EM algorithm for constrained and unconstrained multivariate autoregressive state-space (MARSS) models

Size: px

Start display at page:

Download "Derivation of an EM algorithm for constrained and unconstrained multivariate autoregressive state-space (MARSS) models"

Kristin Ferguson
6 years ago
Views:

1 Derivaion of an EM algorihm for consrained and unconsrained mulivariae auoregressive sae-space MARSS models Elizabeh Eli Holmes * March 30, 08 Absrac This repor presens an Expecaion-Maximizaion EM algorihm for esimaion of he maximumlikelihood parameer values of consrained mulivariae auoregressive Gaussian sae-space MARSS models. The MARSS model can be wrien: x=bx-+u+w, y=zx+a+v, where w and v are mulivariae normal error-erms wih variance-covariance marices Q and R respecively. MARSS models are a class of dynamic linear model and vecor auoregressive model sae-space model. Shumway and Soffer presened an unconsrained EM algorihm for his class of models in 98, and a number of researchers have presened EM algorihms for specific ypes of consrained MARSS models since hen. In his repor, I presen a general EM algorihm for consrained MARSS models, where he consrains are on he elemens wihin he parameer marices B,u,Q,Z,a,R. The consrains ake he form vecm=f+dm, where M is he parameer marix, f is a column vecor of fixed values, D is a marix of mulipliers, and m is he column vecor of esimaed values. This allows a wide variey of consrained parameer marix forms. The presenaion is for a ime-varying MARSS model, where ime-variaion eners hrough he fixed meaning no esimaed f and D marices for each parameer. The algorihm allows missing values in y and parially deerminisic sysems where 0s appear on he diagonals of Q or R. Keywords: Time-series analysis, Kalman filer, EM algorihm, maximum-likelihood, vecor auoregressive model, dynamic linear model, parameer esimaion, sae-space ciaion: Holmes, E. E. 0. Derivaion of an EM algorihm for consrained and unconsrained mulivariae auoregressive sae-space MARSS models. * Norhwes Fisheries Science Cener, NOAA Fisheries, Seale, WA 98, eli.holmes@noaa.gov, hp://faculy.washingon.edu/eeholmes

2 Overview EM algorihms exend maximum-likelihood esimaion o models wih hidden saes and are widely used in engineering and compuer science applicaions. This repor presens an EM algorihm for a general class of Gaussian consrained mulivariae auoregressive sae-space MARSS models, wih a hidden mulivariae auoregressive process sae model and a mulivariae observaion model. This is an imporan class of ime-series model used in many differen scienific fields. The reader is referred o McLachlan and Krishnan 008 for general background on EM algorihms and o Harvey 989 for a discussion of EM algorihms for ime-series daa. Borman 009 has a nice uorial on he EM algorihm. Before showing he derivaion for he consrained case, I firs show a derivaion of he EM algorihm for unconsrained MARSS model. This EM algorihm was published by Shumway and Soffer 98, bu my derivaion is more similar o Ghahramani e al s Ghahramani and Hinon, 996; Roweis and Ghahramani, 999 slighly differen presenaion. One difference in my presenaion and all hese previous presenaions, however, is ha I rea he daa as a random variable hroughou; his means ha here are no special updae equaions for he missing values case. Anoher difference is ha I presen he updae equaions for boh sochasic iniial saes and fixed iniial saes. I hen exend he derivaion o consrained MARSS models where here are fixed and shared elemens in he parameer marices and o he case of degenerae MARSS models where some processes in he model are deerminisic raher han sochasic. See also Wu e al. 996 and Zuur e al. 003 for oher examples of he EM algorihm for differen classes of consrained MARSS models. When working wih MARSS models, one should be cognizan ha misspecificaion of he prior on he iniial hidden saes can have caasrophic and difficul o deec effecs on he parameer esimaes. There is ofen no sign ha somehing is amiss wih he MLE esimaes oupu by an EM algorihm. There has been much work on how o avoid hese iniial condiions effecs; see especially lieraure on vecor auoregressive sae-space models in he economics lieraure. The rouble ofen occurs when he prior on he iniial saes is inconsisen wih he disribuion of he iniial saes ha is implied by he maximum-likelihood model. This ofen happens when he model implies a specific covariance srucure on he iniial saes, bu since he maximum-likelihood parameers are unknown, his covariance srucure is unknown. Using a diffuse prior does no help since your diffuse prior sill has some covariance srucure ofen independence is being imposed. In some ways he EM algorihm is less sensiive o a misspecified prior because i uses he smoohed saes condiioned on all he daa. However, if he prior is inconsisen wih he model, he EM algorihm will no canno find he MLEs. I is very possible however ha i will find parameer esimaes ha are closer o wha you inend esimaes uninfluenced by he prior, bu hey will no be MLEs. The derivaion presened here allows one o circumven hese problems by reaing he iniial saes as fixed and esimaed parameers. The problemaic iniial sae variance-covariance marix is removed from he model, albei a he cos of addiional esimaed parameers. Finally, when working wih MARSS models, one needs o ensure ha he model is idenifiable, i.e. a unique soluion exiss. For a given MARSS model, some of he parameer elemens will need o be fixed no esimaed in order o produce a model wih one soluion. How o do ha depends on he MARSS model being fied and is up o he user.. The MARSS model The linear MARSS model wih a sochasic iniial sae is x = Bx + u + w, where W MVN0, Q y = Zx + a + v, where V MVN0, R X 0 MVNξ, Λ a b c The y equaion is called he observaion process, and y is a n vecor. The x equaion is called he sae or process equaion, and x is a m vecor. The equaion for x describes a mulivariae auoregressive process also called a random walk or Markov process. w are he process errors and are specific realizaions of he random variable W; v is defined similarly. The iniial sae can eiher defined a = 0, as is done in unconsrained means ha each elemen in he parameer marix is esimaed and no elemens are fixed or shared. Sochasic means he iniial sae has a disribuion raher han a fixed value. Because he process mus sar somewhere, one needs o specify he iniial sae. In equaion, I show he iniial sae specified as a disribuion. However, he derivaion will also discuss he case where he iniial sae is specified as an unknown fixed parameer.

3 equaion, or a =. When presening he MARSS model, I use = 0 bu he derivaions will show he EM algorihm for boh cases. Q and R are variance-covariance marices ha specify he sochasiciy in he observaion and sae equaions. In he MARSS model, x and y equaions describe wo sochasic processes. By radiion, one condiions on observaions of y, and x is reaed as compleely hidden, hence he name hidden Markov process of which a MARSS model is a special ype. However, you could condiion on parial observaions of x and rea y as a parially hidden process wih as usual proper consrains o ensure idenifiabiliy. Noneheless in his repor, I follow radiion and rea x as hidden and y as parially observed. If x is parially observed hen he updae equaions say he same bu he expecaions shown in secion 6 would be compued condiioned on he parially observed x. The firs par of his repor will review he derivaion of an EM algorihm for he ime-consan MARSS model equaion. However he main objecive of his repor is o show he derivaion of an EM algorihm o solve a much more general MARSS model secion 4, which is a MARSS model wih linear consrains on ime-varying parameers: x = B x + u + G w, where W MVN0, Q y = Z x + a + H v, where V MVN0, R x 0 = ξ + Fl, where l MVN0, Λ The linear consrains appear as he vecorizaion of each parameer B, u, Q, Z, a, R, ξ, Λ is described by he relaion f + D m. This relaion specifies linear consrains of he form β i + β a,i a + β b,i b +... on he elemens in each MARSS parameer marix. Equaion is a much broader class of MARSS models ha includes MARSS models wih exogenous variable covariaes, AR-p models, moving average models, consrained MARSS models and models ha are combinaions of hese. The derivaion also includes parially deerminisic sysems where G, H and F may have all zero rows.. The join log-likelihood funcion Equaion describes a mulivariae sochasic process and Y and X are random variables whose disribuions are given by Equaion. Denoe a specific realizaion of hese random variables as y and x which denoes a se of all y s and x s from = o T. The join log-likelihood 3 of y and x can hen be wrien hen as follows 4, where X denoes he random variable and x is a realizaion from ha random variable and similarly for Y : 5 fy, x = fy X = xfx, 3 where Thus, fx = fx 0 fy X = x = T T fy X = x fx X = x T T fy, x = fy X = x fx 0 fx X = x T T = fy X = x fx 0 fx X = x. Here x denoes he se of x from = o = and hus x is shorhand for x T. The hird line follows because condiioned on x, he y s are independen of each oher because he v are independen of each 3 This is no he log likelihood oupu by he Kalman filer. The log likelihood oupu by he Kalman filer is he log Ly; Θ noice x does no appear, which is known as he marginal log likelihood. 4 The log-likelihood funcion is shown here for he MARSS wih non-ime varying parameers equaion. 5 To alleviae cluer, I have lef off subscrips on he f s. To emphasize ha he f s represen differen densiy funcions, one would ofen use a subscrip showing wha parameers are in he funcions, i.e. fx X = x becomes f B,u,Q x X = x

4 oher. In he las line, x becomes x from he Markov propery of he equaion for x equaion a, and x becomes x because y depends only on x equaion b. Since X X = x is mulivariae normal and Y X = x is mulivariae normal equaion, we can wrie down he join log-likelihood funcion using he likelihood funcion for a mulivariae normal disribuion Johnson and Wichern, 007, sec log Ly, x; Θ = y Zx a R y Zx a x Bx u Q x Bx u x 0 ξ Λ x 0 ξ log Λ n log π log Q log R 6 n is he number of daa poins. This is he same as equaion 6.64 in Shumway and Soffer 006. The above equaion is for he case where x 0 is sochasic has a known disribuion. However, if we insead rea x 0 as fixed bu unknown secion in Harvey, 989, i is hen a parameer and here is no Λ. The likelihood hen is slighly differen. x 0 is defined as a parameer ξ and log Ly, x; Θ = y Zx a R y Zx a x Bx u Q x Bx u log Q log R 7 Noe ha in his case, x 0 is no longer a realizaion of a random variable X 0 ; i is a fixed bu unknown parameer. Equaion 7 is wrien as if all he x 0 are fixed, however when he general derivaion is presened, i allowed ha some x 0 are fixed Λ=0 and ohers are sochasic. If R is consan hrough ime, hen T log R in he likelihood equaion reduces o T log R, however someimes one needs o includes ime-dependen weighing on R 6. The same applies o T log Q. All bolded elemens are column vecors lower case and marices upper case. A is he ranspose of marix A, A is he inverse of A, and A is he deerminan of A. Parameers are non-ialic while elemens ha are slaned are realizaions of a random variable x and y are slaed 7.3 Missing values In Shumway and Soffer and oher presenaions of he EM algorihm for MARSS models Shumway and Soffer, 006; Zuur e al., 003, he missing values case is reaed separaely from he non-missing values case. In hese derivaions, a series of modificaions are given for he EM updae equaions when here are missing values. In my derivaion, I presen he missing values reamen differenly, and here is only one se of updae equaions and hese equaions apply in boh he missing values and non-missing values cases. My derivaion does his by keeping E[Y daa] and E[Y X daa] in he updae equaions much like E[X daa] is kep in he equaions while Shumway and Soffer replace hese expecaions involving Y by heir values, which depend on wheher or no he daa are a complee observaion of Y wih no missing values. Secion 6 shows how o compue he expecaions involving Y when he daa are an incomplee observaion of Y. 6 If for example, one waned o include a emporally dependen weighing on R replace R wih α R = α n R, where α is he weighing a ime and is fixed no esimaed. 7 In marix algebra, a capiol bolded leer indicaes a marix. Unforunaely in saisics, he capiol leer convenion is used for random variables. Forunaely, his derivaion does no need o reference random variables excep indirecly when using expecaions. Thus, I use capiols o refer o marices no random variables. The one excepion is he reference o X and Y. In his case a bolded slaned capiol is used. 4

5 The EM algorihm The EM algorihm cycles ieraively beween an expecaion sep he inegraion in he equaion followed by a maximizaion sep he arg max in he equaion: Θ j+ = arg max Θ x y log Lx, y; Θfx, y Y = y, Θ j dxdy 8 Y indicaes hose Y ha have an observaion and y are he acual observaions. Noe ha Θ and Θ j are differen. If Θ consiss of muliple parameers, we can also break his down ino smaller seps. Le Θ = {α, β}, hen α j+ = arg max α x y log Lx, y, β j ; αfx, y Y = y, α j, β j dxdy 9 Now he maximizaion is only over α, he par ha appears afer he ; in he log-likelihood. Expecaion sep The inegral ha appears in equaion 8 is an expecaion. The firs sep in he EM algorihm is o compue his expecaion. This will involve compuing expecaions like E[X X Y = y, Θ j ] and E[Y X Y = y, Θ j ]. The j subscrip on Θ denoes ha hese are he parameers a ieraion j of he algorihm. Maximizaion sep: A new parameer se Θ j+ is compued by finding he parameers ha maximize he expeced log-likelihood funcion he par in he inegral wih respec o Θ. The equaions ha give he parameers for he nex ieraion j + are called he updae equaions and his repor is devoed o he derivaion of hese updae equaions. Afer one ieraion of he expecaion and maximizaion seps, he cycle is hen repeaed. New expecaions are compued using Θ j+, and hen a new se of parameers Θ j+ is generaed. This cycle is coninued unil he likelihood no longer increases more han a specified olerance level. This algorihm is guaraneed o increase in likelihood a each ieraion if i does no, i means here is an error in one s updae equaions. The algorihm mus be sared from an iniial se of parameer values Θ. The algorihm is no paricularly sensiive o he iniial condiions bu he surface could definiely be muli-modal and have local maxima. See secion on using Mone Carlo iniializaion o ensure ha he global maximum is found.. The expeced log-likelihood funcion The funcion ha is maximized in he M sep is he expeced value of he log-likelihood funcion. This expecaion is condiioned on wo hings: he observed Y s which are denoed Y and which are equal o he fixed values y and he parameer se Θ j. Noe ha since here may be missing values in he daa, Y can be a subse of Y, ha is, only some Y have a corresponding y value a ime. Mahemaically wha we are doing is E XY [gx, Y Y = y, Θ j ]. This is a mulivariae condiional expecaion because X, Y is mulivariae a m n T vecor. The funcion gθ ha we are aking he expecaion of is log LY, X; Θ. Noe ha gθ is a random variable involving he random variables, X and Y, while log Ly, x; Θ is no a random variable bu raher a specific value since y and x are a se of specific values. We denoe his expeced log-likelihood by Ψ. The goal is o find he Θ ha maximize Ψ and his becomes he new Θ for he j + ieraion of he EM algorihm. The equaions o compue he new Θ are ermed he updae equaions. Using he log likelihood equaion 6 and expanding ou all he erms, we can wrie ou 5

6 Ψ in verbose form as: E XY [log LY, X; Θ; Y = y, Θ j ] = Ψ = E[Y R Y ] E[Y R ZX ] E[ZX R Y ] E[a R Y ] E[Y R a] + E[ZX R ZX ] + E[a R ZX ] + E[ZX R a] + E[a R a] T log R E[X Q X ] E[X Q BX ] E[BX Q X ] E[u Q X ] E[X Q u] + E[BX Q BX ] + E[u Q BX ] + E[BX Q u] + u Q u T log Q E[X 0 V 0 X 0 ] E[ξ Λ X 0 ] E[X 0 Λ ξ] + ξ Λ ξ log Λ n log π 0 All he E[ ] appearing here denoe E XY [g Y = y, Θ j ]. In he res of he derivaion, I drop he condiional and he XY subscrip on E o remove cluer, bu i is imporan o remember ha whenever E appears, i refers o a specific condiional mulivariae expecaion. If x 0 is reaed as fixed, hen X 0 = ξ and he las wo lines involving Λ are dropped. Keep in mind ha Θ and Θ j are differen. Θ is a parameer appearing in funcion gx, Y, Θ i.e. he parameers in equaion 6. X and Y are random variables which means ha gx, Y, Θ is a random variable. We ake he expecaion of gx, Y, Θ, meaning we ake inegral over he join disribuion of X and Y. We need o specify wha ha disribuion is and he condiioning on Θ j meaning he Θ j appearing o he righ of he in Eg Θ j is specifying his disribuion. This condiioning affecs he value of he expecaion of gx, Y, Θ, bu i does no affec he value of Θ, which are he R, Q, u, ec. values on he righ side of equaion 0. We will firs ake he expecaion of gx, Y, Θ condiioned on Θ j using inegraion and hen ake he differenial of ha expecaion wih respec o Θ.. The expecaions used in he derivaion The following expecaions appear frequenly in he updae equaions and are given special names 8 : x = E XY [X Y = y, Θ j ] ỹ = E XY [Y Y = y, Θ j ] P = E XY [X X Y = y, Θ j ] P, = E XY [X X Y = y, Θ j ] Ṽ = var XY [X Y = y, Θ j ] = P x x Õ = E XY [Y Y Y = y, Θ j ] W = var XY [Y Y = y, Θ j ] = Õ ỹ ỹ ỹx = E XY [Y X Y = y, Θ j ] ỹx, = E XY [Y X Y = y, Θ j ] a b c d e f g h i The subscrip on he expecaion, E, denoes ha his is a mulivariae expecaion aken over X and Y. The righ sides of equaions e and g arise from he compuaional formula for variance and covariance: Secion 6 shows how o compue he expecaions in equaion. var[x] = E[XX ] E[X] E[X] cov[x, Y ] = E[XY ] E[X] E[Y ]. 3 8 This noaion is differen han wha you see in Shumway and Soffer 006, secion 6.. Wha I call Ṽ, hey refer o as P n, and my P would be P n + x x in heir noaion. 6

7 Table : Noes on mulivariae expecaions. For he following examples, le X be a vecor of lengh hree, X, X, X 3. f is he probabiliy disribuion funcion pdf. C is a consan no a random variable. E X [gx] = gxfx, x, x 3 dx dx dx 3 E X [X ] = x fx, x, x 3 dx dx dx 3 = x fx dx = E[X ] E X [X + X ] = E X [X ] + E X [X ] E X [X + C] = E X [X ] + C E X [CX ] = C E X [X ] E X [X X = x] = x 3 The unconsrained updae equaions In his secion, I show he derivaion of he updae equaions when all elemens of a parameer marix are esimaed and are all allowed o be differen, i.e. he unconsrained case. These are similar o he updae equaions one will see in Shumway and Soffer 006. Secion 5 shows he updae equaions when here are unesimaed fixed or esimaed bu shared values in he parameer marices, i.e. he consrained updae equaions. To derive he updae equaions, one mus find he Θ, where Θ is comprised of he MARSS parameers B, u, Q, Z, a, R, ξ, and Λ, ha maximizes Ψ equaion 0 by parial differeniaion of Ψ wih respec o Θ. However, I will be using he EM equaion where one maximizes each parameer marix in Θ one-by-one equaion 9. In his case, he parameers ha are no being maximized are se a heir ieraion j values, and hen one akes he derivaive of Ψ wih respec o he parameer of ineres. Then solve for he parameer value ha ses he parial derivaive o zero. The parial differeniaion is wih respec o each individual parameer elemen, for example each u i,j in marix u. The idea is o single ou hose erms in equaion 0 ha involve u i,j say, differeniae by u i,j, se his o zero and solve for u i,j. This gives he new u i,j ha maximizes he parial derivaive wih respec o u i,j of he expeced log-likelihood. Marix calculus gives us a way o joinly maximize Ψ wih respec o all elemens no jus elemen i, j in a parameer marix. 3. Marix calculus need for he derivaion Before commencing, some definiions from marix calculus will be needed. The parial derivaive of a scalar Ψ is a scalar wih respec o some column vecor b which has elemens b, b... is [ ] Ψ Ψ b = Ψ Ψ b b b n Noe ha he derivaive of a column vecor b is a row vecor. The parial derivaives of a scalar wih respec o some n n marix B is Ψ Ψ Ψ b, b, b n, Ψ Ψ Ψ Ψ B = b, b, b n, Ψ Ψ Ψ b,n b,n Noe ha he indexing is inerchanged; Ψ/ b i,j = [ Ψ/ B ]. For Q and R, his is unimporan because j,i hey are variance-covariance marices and are symmeric. For B and Z, one mus be careful because hese may no be symmeric. b n,n 7

8 Table : Derivaives of a scalar wih respec o vecors and marices. In he following a is a scalar unrelaed o a, a and c are n column vecors, b and d are m column vecors, D is a n m marix, C is a n n marix, and A is a diagonal n n marix 0s on he off-diagonals. C is he inverse of C, C is he ranspose of C, C = C = C, and C is he deerminan of C. Noe, all he numeraors in he differenials on he far lef reduce o scalars. Alhough he marix names may be he same as hose of marices referred o in he ex, he marices in his able are dummy marices used o show he marix derivaive relaions. f g/ a = f g/ a + g f/ a f = fa and g=ga are some funcions of a and are column vecors. f/ a = f/ g g/ a 4 a c/ a = c a/ a = c 5 a/ a = a / a = I n a Db/ D = b D a/ D = ba 6 a Db/ vecd = b D a/ vecd = vecba C is inverable. log C / C = log C / C = C = C log C / vecc = vecc If C is also symmeric and B is no a funcion of C. log C BC / C = C log C BC / vecc = vecc 7 b D CDd/ D = db D C + bd D C 8 b D CDd/ vecd = vecdb D C + bd D C If b = d and C is symmeric hen he sum reduces o bb D C a Ca/ a = ac a / a = a C 9 a C c/ C = C ac C 0 a C c/ vecc = vecc ac C A number of derivaives of a scalar wih respec o vecors and marices will be needed in he derivaion and are shown in able. In he able, boh he vecorized and non-vecorized versions are shown. The vecorized version of a marix D wih dimension n m is d, d n, d, vecd n,m d n, d,m d n,m 8

9 3. The updae equaion for u unconsrained Take he parial derivaive of Ψ wih respec o u, which is a m marix. All parameers oher han u are fixed o consan values because parial derivaion is being done. Since he derivaive of a consan is 0, erms no involving u will equal 0 and drop ou. Taking he derivaive o equaion 0 wih respec o u: Ψ/ u = E[X Q u]/ u E[u Q X ]/ u + E[BX Q u]/ u + E[u Q BX ]/ u + u Q u/ u The parameers can be moved ou of he expecaions and hen he marix derivaive relaions able are used o ake he derivaive. E[X ] Q E[X ] Q + B E[X ] Q + B E[X ] Q + u Q Ψ/ u = This also uses Q = Q. This can hen be reduced o Ψ/ u = E[X ] Q E[X ] B Q u Q 3 Se he lef side o zero a p m marix of zeros and ranspose he whole equaion. Q cancels ou 9 by muliplying on he lef by Q lef since he whole equaion was jus ransposed, giving 0 = E[X ] B E[X ] u = E[X ] B E[X ] u 4 Solving for u and replacing he expecaions wih heir names from equaion, gives us he new u ha maximizes Ψ, u j+ = x B x 5 T 3.3 The updae equaion for B unconsrained Take he derivaive of Ψ wih respec o B. Terms no involving B, equal 0 and drop ou. I have pu he E ouside he parials by noing ha E[hX, B]/ B = E[ hx, B/ B] since he expecaion is condiioned on B j no B. Ψ/ B = E[ X Q BX / B] E[ BX Q X / B] + E[ BX Q BX / B] + E[ BX Q u/ B] + E[ u Q BX / B] = E[ X Q BX ]/ B] E[ X B Q X / B] + E[ X B Q BX / B] + E[ X B Q u/ B] + E[ u Q BX / B ] 6 9 Q is a variance-covariance marix and is inverible. Q Q = I, he ideniy marix. 9

10 Afer pulling he consans ou of he expecaions, we use relaions 6 and 8 o ake he derivaive and noe ha Q = Q : Ψ/ B = E[X X ]Q E[X X ]Q 7 + E[X X ]B Q + E[X ]u Q + E[X ]u Q This can be reduced o Ψ/ B = E[X X ]Q + E[X X ]B Q + E[X ]u Q 8 Se he lef side o zero an m m marix of zeros, cancel ou Q by muliplying by Q on he righ, ge rid of he -/, and ranspose he whole equaion o give 0 = E[X X ] B E[X X ] u E[X ] = P, B P u x 9 The las line replaced he expecaions wih heir names shown in equaion. Solving for B and noing ha P is like a variance-covariance marix and is inverible, gives us he new B ha maximizes Ψ, T B j+ = P, u x T P 30 Because all he equaions above also apply o block-diagonal marices, he derivaion immediaely generalizes o he case where B is an unconsrained block diagonal marix: b, b, b, b, b, b, b 3, b 3, b 3, B = b 4,4 b 4, b 5,4 b 5, = B B b 6,6 b 6,7 b 6,8 0 0 B b 7,6 b 7,7 b 7, b 8,6 b 8,7 b 8,8 For he block diagonal B, T B i,j+ = P, u x T P i i 3 where he subscrip i means o ake he pars of he marices ha are analogous o B i ; ake he whole par wihin he parenheses no he individual marices inside he parenheses. If B i is comprised of rows a o b and columns c o d of marix B, hen ake rows a o b and columns c o d of he marices subscriped by i in equaion The updae equaion for Q unconsrained The usual way o do his derivaion is o use wha is known as he race rick which will pull he Q ou o he lef of he c Q b erms which appear in he likelihood 0. Here I m showing a less elegan derivaion ha plods sep by sep hrough each of he likelihood erms. Take he derivaive of Ψ wih respec 0

11 o Q. Terms no involving Q equal 0 and drop ou. Again he expecaions are placed ouside he parials by noing ha E[hX, Q]/ Q = E[ hx, Q/ Q]. Ψ/ Q = E[ X Q X / Q] E[ X Q BX / Q] E[ BX Q X / Q] E[ X Q u/ Q] E[ u Q X / Q] + E[ BX Q BX / Q] + E[ BX Q u/ Q] + E[ u Q BX / Q] T + u Q u/ Q log Q / Q The relaions 0 and 7 are used o do he differeniaion. Noice ha all he erms in he summaion are of he form c Q b, and hus afer differeniaion, all he c b erms can be grouped inside one se of parenheses. Also here is a minus ha comes from equaion 0 and i cancels ou he minus in fron of he iniial /. Ψ/ Q = Q E[X X ] E[X BX ] E[BX X ] E[X u ] E[uX ] 33 + E[BX BX ] + E[BX u ] + E[uBX ] + uu Q T Q 3 Pulling he parameers ou of he expecaions and using BX = X B, we have Ψ/ Q = Q E[X X ] E[X X ]B B E[X X ] E[X ]u u E[X ] 34 + B E[X X ]B + B E[X ]u + u E[X ]B + uu Q T Q The parial derivaive is hen rewrien in erms of he Kalman smooher oupu: Ψ/ Q = Q P P, B B P, x u u x 35 + B P B + B x u + u x B + uu Q T Q Seing his o zero a m m marix of zeros, Q is canceled ou by muliplying by Q wice, once on he lef and once on he righ and he / is removed: T Q = P P, B B P, x u u x + B P B + B x u + u x B + uu This gives us he new Q ha maximizes Ψ, Q j+ = T P P, B B P, x u u x B P B + B x u + u x B + uu

12 This derivaion immediaely generalizes o he case where Q is a block diagonal marix: q, q, q, q, q, q, q,3 q,3 q 3, Q = q 4,4 q 4, Q q 4,5 q 5, = 0 Q q 6,6 q 6,7 q 6,8 0 0 Q q 6,7 q 7,7 q 7, q 6,8 q 7,8 q 8,8 In his case, Q i,j+ = T P P, B B P, x u u x + B P B + B x u + u x B + uu where he subscrip i means ake he elemens of he marix in he big parenheses ha are analogous o Q i ; ake he whole par wihin he parenheses no he individual marices inside he parenheses. If Q i is comprised of rows a o b and columns c o d of marix Q, hen ake rows a o b and columns c o d of marices subscriped by i in equaion 38. By he way, Q is never really unconsrained since i is a variance-covariance marix and he upper and lower riangles are shared. However, because he shared values are only he symmeric values in he marix, he derivaion sill works even hough i s echnically incorrec Henderson and Searle, 979. The consrained updae equaion for Q shown in secion 5.8 explicily deals wih he shared lower and upper riangles. 3.5 Updae equaion for a unconsrained Take he derivaive of Ψ wih respec o a, where a is a n marix. Terms no involving a, equal 0 and drop ou. Ψ/ a = E[Y R a]/ a E[a R Y ]/ a 39 + E[ZX R a]/ a + E[a R ZX ]/ a + E[a R a]/ a The expecaions around consans can be dropped 0. Using relaions 5 and 9 and using R = R, we have hen Ψ/ a = E[Y R ] E[Y R ] + E[ZX R ] + E[ZX R ] + a R 40 Pull he parameers ou of he expecaions, use ab = b a and R = R where needed, and remove he / o ge Ψ/ a = E[Y ] R E[X ] Z R a R 4 Se he lef side o zero a n marix of zeros, ake he ranspose, and cancel ou R by muliplying by R, giving 0 = E[Y ] Z E[X ] a = ỹ Z x a 4 Solving for a gives us he updae equaion for a: 0 because E XY C = C, where C is a consan. a j+ = T ỹ Z x i 38 43

13 3.6 The updae equaion for Z unconsrained Take he derivaive of Ψ wih respec o Z. Terms no involving Z, equal 0 and drop ou. The expecaions around erms involving only consans have been dropped. Ψ/ Z = noe Z is m n while Z is n m E[ Y R ZX / Z] E[ ZX R Y / Z] + E[ ZX R ZX / Z] + E[ ZX R a/ Z] + E[ a R ZX / Z] = E[ Y R ZX / Z] E[ X Z R Y / Z] + E[ X Z R ZX / Z] + E[ X Z R a/ Z] + E[ a R ZX / Z] 44 Using he marix derivaive relaions able and using R = R, we ge Ψ/ Z = E[X Y R ] E[X Y R ] 45 + E[X X Z R ] + E[X a R ] + E[X a R ] Pulling he parameers ou of he expecaions and geing rid of he /, we have Ψ/ Z = E[X Y ]R E[X X ]Z R E[X ]a R 46 Se he lef side o zero a m n marix of zeros, ranspose i all, and cancel ou R by muliplying by R on he lef, o give 0 = E[Y X ] Z E[X X ] a E[X ] ỹx = Z P a x 47 Solving for Z and noing ha P is inverible, gives us he new Z: 3.7 The updae equaion for R unconsrained T ỹx Z j+ = a x T P 48 Take he derivaive of Ψ wih respec o R. Terms no involving R, equal 0 and drop ou. The expecaions around erms involving consans have been removed. Ψ/ R = E[ Y R Y / R] E[ Y R ZX / R] E[ ZX R Y / R] E[ Y R a/ R] E[ a R Y / R] + E[ ZX R ZX / R] + E[ ZX R a/ R] + E[ a R ZX / R] + a R a/ R T log R / R We use relaions 0 and 7 o do he differeniaion. Noice ha all he erms in he summaion are of he form c R b, and hus afer differeniaion, we group all he c b inside one se of parenheses. Also 49 3

14 here is a minus ha comes from equaion 0 and cancels ou he minus in fron of /. Ψ/ R = R E[Y Y ] E[Y ZX ] E[ZX Y ] E[Y a ] E[aY ] 50 + E[ZX ZX ] + E[ZX a ] + E[aZX ] + aa R T R Pulling he parameers ou of he expecaions and using ZY = Y Z, we have Ψ/ R = R E[Y Y ] E[Y X ]Z Z E[X Y ] E[Y ]a a E[Y ] 5 + Z E[X X ]Z + Z E[X ]a + a E[X ]Z + aa R T R We rewrie he parial derivaive in erms of expecaions: Ψ/ R = R Õ ỹx Z Zỹx ỹ a aỹ 5 + Z P Z + Z x a + a x Z + aa R T R Seing his o zero a n n marix of zeros, we cancel ou R by muliplying by R wice, once on he lef and once on he righ, and ge rid of he /. T R = Õ ỹx Z Zỹx ỹ a aỹ + Z P Z + Z x a + a x Z + aa 53 We can hen solve for R, giving us he new R ha maximizes Ψ, R j+ = Õ ỹx T Z Zỹx ỹ a aỹ + Z P Z + Z x a + a x Z + aa 54 As wih Q, his derivaion immediaely generalizes o a block diagonal marix: R = R R R 3 In his case, R i,j+ = T Õ ỹx Z Zỹx ỹ a aỹ + Z P Z + Z x a + a x Z + aa i 55 where he subscrip i means we ake he elemens in he marix in he big parenheses ha are analogous o R i. If R i is comprised of rows a o b and columns c o d of marix R, hen we ake rows a o b and columns c o d of marix subscriped by i in equaion Updae equaion for ξ and Λ unconsrained, sochasic iniial sae Shumway and Soffer 006 and Ghahramani and Hinon 996 imply in heir discussion of he EM algorihm ha boh ξ and Λ can be esimaed hough no simulaneously. Harvey 989, however, discusses ha here are only wo allowable cases: x 0 is reaed as fixed Λ = 0 and equal o he unknown parameer ξ or x 0 is reaed as sochasic wih a known mean ξ and variance Λ. For compleeness, we show here he updae equaion in he case of x 0 sochasic wih unknown mean ξ and variance Λ a case ha Harvey 989 says is no consisen. 4

15 We proceed as before and solve for he new ξ by minimizing Ψ. Take he derivaive of Ψ wih respec o ξ. Terms no involving ξ, equal 0 and drop ou. Ψ/ ξ = E[ξ Λ X 0 ]/ ξ E[X 0 Λ ξ]/ ξ + ξ Λ ξ/ ξ 56 Using relaions 5 and 9 and using Λ = Λ, we have Ψ/ ξ = E[X 0 Λ ] E[X 0 Λ ] + ξ Λ 57 Pulling he parameers ou of he expecaions, we ge Ψ/ ξ = E[X 0 ]Λ + ξ Λ 58 We hen se he lef side o zero, ake he ranspose, and cancel ou / and Λ by noing ha i is a variance-covariance marix and is inverible. Thus, 0 = Λ E[X 0 ] + Λ ξ = x 0 ξ 59 ξ j+ = x 0 60 x 0 is he expeced value of X 0 condiioned on he daa from = o T, which comes from he Kalman smooher recursions wih iniial condiions defined as E[X 0 Y 0 = y 0 ] ξ and varx 0 X 0 Y 0 = y 0 Λ. A similar se of seps ges us o he updae equaion for Λ, Λ j+ = Ṽ0 6 Ṽ 0 is he variance of X 0 condiioned on he daa from = o T and is an oupu from he Kalman smooher recursions. If he iniial sae is defined as a = insead of = 0, he updae equaion is derived in an idenical fashion and he updae equaion is similar: ξ j+ = x 6 Λ j+ = Ṽ 63 These are oupu from he Kalman smooher recursions wih iniial condiions defined as E[X Y 0 = y 0 ] ξ and varx X Y 0 = y 0 Λ. Noice ha he recursions are iniialized slighly differenly; you will see he Kalman filer and smooher equaions presened wih boh ypes of iniializaions depending on wheher he auhor defines he iniial sae a = 0 or =. 3.9 Updae equaion for ξ unconsrained, fixed x 0 For he case where x 0 is reaed as fixed, i.e. as anoher parameer, hen here is no Λ, and we need o maximize Ψ/ ξ using he slighly differen Ψ shown in equaion 7. Now ξ appears in he sae equaion par of he likelihood. Ψ/ ξ = E[ X Q Bξ/ ξ] E[ Bξ Q X / ξ] + E[ Bξ Q Bξ/ ξ] + E[ Bξ Q u/ ξ] + E[ u Q Bξ/ ξ] = 64 E[ X Q Bξ/ ξ] E[ ξ B Q X / ξ] + E[ ξ B Q Bξ/ ξ] + E[ ξ B Q u/ ξ] + E[ u Q Bξ/ ξ] Afer pulling he consans ou of he expecaions, we use relaions 6 and 8 o ake he derivaive: Ψ/ ξ = E[X ] Q B E[X ] Q B + ξ B Q B + u Q B + u Q B 65 5

16 This can be reduced o Ψ/ ξ = E[X ] Q B ξ B Q B u Q B 66 To solve for ξ, se he lef side o zero an m marix of zeros, ranspose he whole equaion, and hen cancel ou B Q B by muliplying by is inverse on he lef, and solve for ξ. This sep requires ha his inverse exiss. ξ = B Q B B Q E[X ] u 67 Thus, in erms of he Kalman filer/smooher oupu he new ξ for EM ieraion j + is ξ j+ = B Q B B Q x u 68 Noe ha using, x 0 oupu from he Kalman smooher would no work since Λ = 0. As a resul, ξ j+ ξ j in he EM algorihm, and i is impossible o move away from your saring condiion for ξ. This is concepually similar o using a generalized leas squares esimae of ξ o concenrae i ou of he likelihood as discussed in Harvey 989, secion However, in he conex of he EM algorihm, dealing wih he fixed x 0 case requires nohing special; one simply akes care o use he likelihood for he case where x 0 is reaed as an unknown parameer equaion 7. For he oher parameers, he updae equaions are he same wheher one uses he log-likelihood equaion wih x 0 reaed as sochasic equaion 6 or fixed equaion 7. If your MARSS model is saionary and your daa appear saionary, however, equaion 67 probably is no wha you wan o use. The esimae of ξ will be he maximum-likelihood value, bu i will no be drawn from he saionary disribuion; insead i could be some wildly differen value ha happens o give he maximum-likelihood. If you are modeling he daa as saionary, hen you should probably assume ha ξ is drawn from he saionary disribuion of he X s, which is some funcion of your model parameers. This would mean ha he model parameers would ener he par of he likelihood ha involves ξ and Λ. Since you probably don wan o do ha if migh sar o ge circular, you migh ry an ieraive process o ge decen ξ and Λ or ry fixing ξ and esimaing Λ above. You can fix ξ a, say, zero, by making sure he model you fi has a saionary disribuion wih mean zero. You migh also need o demean your daa or esimae he a erm o accoun for non-zero mean daa. A second approach is o esimae x as he iniial sae insead of x Updae equaion for ξ unconsrained, fixed x In some cases, he esimae of x 0 from x using equaion 68 will be highly sensiive o small changes in he parameers. This is paricularly he case for cerain B marices, even if hey are saionary. The resul is ha your ξ esimae is wildly differen from he daa a =. The esimaes are correc given how you defined he model, jus no realisic given he daa. In his case, you can specify ξ as being he value of x a = insead of = 0. Tha way, he daa a = will consrain he esimaed ξ. In his case, we rea x as fixed bu unknown parameer ξ. The likelihood is hen: log Ly, x; Θ = y Zx a R y Zx a x Bx u Q x Bx u log Q log R Ψ/ ξ = E[ Y R Zξ/ ξ] E[ Zξ R Y / ξ] + E[ Zξ R Zξ/ ξ] + E[ Zξ R a/ ξ] + E[ a R Zξ/ ξ] E[ X Q Bξ/ ξ] E[ Bξ Q X / ξ] + E[ Bξ Q Bξ/ ξ] + E[ Bξ Q u/ ξ] + E[ u Q Bξ/ ξ] meaning he X s have a saionary disribuion

17 Noe ha he second summaion sars a = and ξ is x insead of x 0. Afer pulling he consans ou of he expecaions, we use relaions 6 and 8 o ake he derivaive: Ψ/ ξ = E[Y ] R Z E[Y ] R Z + ξ Z R Z + a R Z + a R Z 7 E[X ] Q B E[X ] Q B + ξ B Q B + u Q B + u Q B This can be reduced o Ψ/ ξ = E[Y ] R Z ξ Z R Z a R Z + E[X ] Q B ξ B Q B u Q B = ξ Z R Z + B Q B + E[Y ] R Z a R Z + E[X ] Q B u Q B 7 To solve for ξ, se he lef side o zero an m marix of zeros, ranspose he whole equaion, and solve for ξ. ξ = Z R Z + B Q B Z R E[Y ] a + B Q E[X ] u 73 Thus, when ξ x, he new ξ for EM ieraion j + is ξ j+ = Z R Z + B Q B Z R ỹ a + B Q x u 74 4 The ime-varying MARSS model wih linear consrains The firs par of his repor deal wih he case of a MARSS model equaion where he parameers are ime-consan and where all he elemens in a parameer marix are esimaed wih no consrains. I will now describe he derivaion of an EM algorihm o solve a much more general MARSS model equaion 75, which is a ime-varying MARSS model where he MARSS parameer marices are wrien as a linear equaion f + Dm. This is a very general form of a MARSS model, of which many mos mulivariae auoregressive Gaussian models are a special case. This general MARSS model includes as special cases, MARSS models wih covariaes many VARSS models wih exogeneous variables, mulivariae AR lag-p models and mulivariae moving average models, and MARSS models wih linear consrains placed on he elemens wihin he model parameers. The objecive is o derive one EM algorihm for he whole class, hus a uniform approach o fiing hese models. The ime-varying MARSS model is wrien: x = B x + u + G w, where W MVN0, Q y = Z x + a + H v, where V MVN0, R 75a 75b x 0 = ξ + Fl, where 0 = 0 or 0 = 75c L MVN0, Λ [ ] [ ] w Q 0 MVN0, Σ, Σ = v 0 R This looks quie similar o he previous non-ime varying MARSS model, bu now he model parameers, B, u, Q, Z, a and R, have a subscrip and we have a muliplier marix on he error erms v, w, l. The G muliplier is m s, so we now have s sae errors insead of m. The H muliplier is n k, so we now have k observaion errors insead of n. The F muliplier is m j, so now we can have some iniial saes j of hem be sochasic and ohers be fixed. I assume ha appropriae consrains are pu on G and H so ha he resuling MARSS model is no under- or over-consrained. The noaion/presenaion here was influenced by SJ Koopman s work, esp. Koopman and Ooms 0 and Koopman 993, bu in hese works, Q and R equal I and he variance-covariance srucures are insead specified only by H and G. I keep Q and R in my formulaion as i seems more inuiive o me in he conex of he EM algorihm and he required join-likelihood funcion. 75d 75e For example, if boh G and H are column vecors, hen he sysem is over-consrained and has no soluion. 7

18 We can rewrie his MARSS model using vec relaionships able 3: x = x I m vecb + vecu + G w, W MVN0, Q y = x I n vecz + veca + H v, V MVN0, R x 0 = ξ + Fl, L MVN0, Λ 76 Each model parameer, B, u, Q, Z, a, and R, is wrien as a ime-varying linear model, f +D m, where f and D are fully-known no esimaed and no missing values and m is a column vecor of he esimaes elemens of he parameer marix: vecb = f,b + D,b β vecu = f,u + D,u υ vecq = f,q + D,q q vecz = f,z + D,z ζ veca = f,a + D,a α vecr = f,r + D,r r vecλ = f λ + D λ λ vecξ = f ξ + D ξ p 77 The esimaed parameers are now he column vecors, β, υ, q, ζ, α, r, p and λ. The ime-varying aspec comes from he ime-varying f and D. Noe ha variance-covariance marices mus be posiivedefinie and we canno specify a form ha canno be esimaed. Fixing he diagonal erms and esimaing he off-diagonals would no be allowed. Thus he f and D erms for Q, R and Λ are limied. For he oher parameers, he forms are fairly unresriced, excep ha he Ds need o be full rank so ha we are no specifying an under-consrained model. Full rank will imply ha we are no rying o esimae confounded marix elemens; for example, rying o esimae a and a bu only a + a appear in he model. The emporally variable MARSS model, equaion 76 ogeher wih 77, looks raher differen han oher emporally variable MARSS models, such as a VARSSX or MARSS wih covariaes model, in he lieraure. Bu hose models are special cases of his equaion. By deriving an EM algorihm for his more general if unfamiliar form, I hen have an algorihm for many differen ypes of ime-varying MARSS models wih linear consrains on he parameer elemens. Below I show some examples. 4. MARSS model wih linear consrains We can use equaion 76 o pu linear consrains on he elemens of he parameers, B, u, Q, Z, a, R, ξ and Λ. Here is an example of a simple MARSS model wih linear consrains: [ ] [ ] [ ] [ ] [ ] [ ] [ ] x a 0 x w w 0. q q = + MVN, x 0 a x w w u + 0. q q, y c 3c + d + [ ] y = c d x + v v x, y 3 c + e + e v 3 v v MVN a a, r r 0 v r [ x x ] 0 [ [ ] π 0 MVN, π] 0 Linear consrains mean ha elemens of a marix may be fixed o a specific numerical value or specified as a linear combinaion of values which can be shared wihin a marix bu no shared beween marices. 8

19 Le s say we have some parameer marix M here M could be any of he parameers in he MARSS model where each marix elemen is wrien as a linear model of some poenially shared values: a + c c M =. a 0 0 3c + b Thus each i-h elemen in M can be wrien as β i +β a,i a+β b,i b+β c,i c, which is a linear combinaion of hree esimaed values a, b and c. The marix M can be rewrien in erms of a β i par and he par involving he β,j s: a + c 0 c M = a 0 = M fixed + M free c b The vec funcion urns any marix ino a column vecor by sacking he columns on op of each oher. Thus, a + c vecm = a 3c + c 0 b We can now wrie vecm as a linear combinaion of f = vecm fixed and Dm = vecm free. m is a p column vecor of he p free values, in his case p = 3 and he free values are a, b, c. D is a design marix ha ranslaes m ino vecm free. For example, a + c a vecm = a = b = f + Dm 3c c c b There are consrains on D. Your D marix needs o describe a solvable linear se of equaions. Basically i needs o be full rank rank p where p is he number of columns in D or free values you are rying o esimae, so ha you can esimae each of he p free values. For example, if a + b always appeared ogeher, hen a + b can be esimaed bu no a and b separaely. Noe, if M is fixed, hen D is undefined bu ha is fine because in his case, here will be no updae equaion needed; you jus use he fixed value of M in he algorihm. 4. A MARSS model wih exogenous variables The following is a commonly seen MARSS model wih covariaes c and d appearing as addiive elemens: x = Bx + Cc + w y = Zx + Dd + v Here, D is he effec of d on y no a design marix which would have a subscrip. We would ypically wan o esimae C or D which are he influence of our covariaes on our responses, x or y. Le s say here are p covariaes in c and q covariaes in d. Then we can wrie he above in vec form: x = x I m vecb + c I p vecc + w 88 y = x I n vecz + d I q vecd + v 9

20 Table 3: Kronecker and vec relaions. Here A is n m, B is m p, C is p q, and E and D are p p. a is a m column vecor and b is a p column vecor. The symbol sands for he Kronecker produc: A C is a np mq marix. The ideniy marix, I n, is a n n diagonal marix wih ones on he diagonal. veca = veca = a The vec of a column vecor or is ranspose is iself. a = a I vecaa = a I n veca = Aa vecaa = Aa since Aa is iself an m column vecor vecab = I p A vecb = B I n veca 80 vecabc = C A vecb 8 veca Ba = a Ba = a a vecb 8 A BC D = AC BD A B + A C = A B + C a I p C = a C Ca I q = a C Ea D = EDa I p = a ED a I p Cb I q = ab C 85 a b = vecba a b = a b = vecba 86 A B = A B 87 0

21 Le s say we pu no consrains B, Z, Q, R, ξ, or Λ. Then in he form of equaion 76, wih he parameers defined as follows: x = x I m vecb + vecu + w y = x I n vecz + veca + v, vecb = f,b + D,b β; f,b = 0; D,b = ; β = vecb vecu = f,u + D,u υ; f,u = 0; D,u = c I p ; υ = vecc vecq = f,q + D,q q; f,q = 0; D,q = D q vecz = f,z + D,z ζ; f,z = 0; D,z = ; ζ = vecz veca = f,a + D,a α; f,a = 0; D,a = d vecr = f,r + D,r r; f,r = 0; D,r = D r vecλ = f λ + D λ λ; f λ = 0 vecξ = ξ = f ξ + D ξ p; f ξ = 0; D ξ = I q ; α = vecd Noe ha variance-covariance marices are never unconsrained really so we use D q, D r and D λ o specify he symmery wihin he marix. The ransformaion of he simple MARSS wih covariaes equaion 88 ino he form of equaion 76 may seem a lile painful, bu he advanage is ha a single EM algorihm can be used for a large class of models. Presumably, he ransformaion of he equaion will be hidden from users by a wrapper funcion ha does he reformulaion before passing he model o he general EM algorihm. In he MARSS R package, his reformulaion is done in he MARSS.marxss funcion. 4.3 A general MARSS model wih exogenous variables Le s imagine now a very general MARSS model wih various inpus. inpu here jus means ha i is some fully known marix raher han somehing we are esimaing. I could be a sequence of 0s and s if for example we were fiing a before/afer sor of model. Below he leers wih a subscrip are he inpus and D is an inpu no a design marix, excep x, y, w and v. In vec form, his is: x = J BL x + C Uc + G w y = M ZN x + D Ad + H v 89 x = x I m L J vecb + c C vecu + G w = x I m L J f b + D b β + c C f u + D u υ + G w W MVN0, G QG y = x I n N M vecz + d D veca + H v 90 = x I n Z f z + D z ζ + A f a + D a α + H v V MVN0, H RH X 0 MVNf ξ + D ξ p, FΛF, where vecλ = f λ + D λ λ We could wrie down a likelihood funcion for his model bu wrien his way, he model presumes ha H RH, G QG, and FΛF are valid variance-covariance marices. I will acually wrie his model differenly below because I don wan o make ha assumpion.

22 We define he f and D parameers as follows. vecb = f,b + D,b β = L J f b + L J D b β vecu = f,u + D,u υ = c C f u + c C D u υ vecq = f,q + D,q q = G G f q + G G D q q vecz = f,z + D,z ζ = N M f z + N M D z ζ veca = f,a + D,a α = d D f a + d D D a α vecr = f,r + D,r r = H H f q + H H D r r vecλ = f λ + D λ λ = 0 + D λ λ vecξ = ξ = f ξ + D ξ p = 0 + p Here, for example f b and D b indicae he linear consrains on B and f,b is L J f b and D,b is L J D b. The elemens of B ha are being esimaed are β arranged as a column vecor. As usual, his reformulaion looks cumbersome, bu would be hidden from he user presumably. 4.4 The expeced log-likelihood funcion As menioned above, we do no necessarily wan o assume ha H R H, G Q G, and FΛF are valid variance-covariance marices. This would rule ou many MARSS models ha we would like o fi. For example, if Q = σ and G =, GQG would be an invalid variance-variance marix. However, his is a valid MARSS model. We do need o be careful ha H and G are specified such ha he model has a soluion. For example, a model where boh G and H are would no be solvable for all y. Insead I will define Φ = G G G, Ξ = H H H, and Π = F F F. I hen require ha he inverses of G G, H H, and F F exis and ha f,q + D,q q, f,r + D,r r, and f λ + D λ λ specify valid variance-covariance marices. These are much less sringen resricions. For he purpose of wriing down he expeced log-likelihood, our MARSS model is now wrien Φ x = Φ x I m vecb + Φ vecu + w, where W MVN0, Q Ξ y = Ξ x I n vecz + Ξ veca + v, where V MVN0, R Πx 0 = Πξ + l, where L MVN0, Λ 9 As menioned before, his relies on G and H having forms ha do no lead o over- or under-consrained linear sysems. To derive he EM updae equaions, we need he expeced log-likelihood funcion for he ime-varying MARSS model. Using equaion 9, we ge E XY [log LY, X; Θ] = E T XY Y X I m vecz veca Ξ R Ξ Y X I m vecz veca + + log R X X I m vecb vecu Φ Q 0+ X X I m vecb vecu + log Q 0+ + X 0 vecξ Π Λ ΠX 0 vecξ + log Λ + log π Φ 9

23 If any G, H or F is all zero, hen he line in he likelihood wih R, Q or Λ, respecively, does no appear. If any x 0 are fixed, meaning all zero row in F, ha X 0 ξ anywhere i appears in he likelihood. The way I have wrien he general equaion, some x 0 migh be fixed and ohers sochasic. The vec of he model parameers are defined as follows: vecb = f,b + D,b β vecu = f,u + D,u υ vecz = f,z + D,z ζ veca = f,a + D,a α vecq = f,q + D,q q vecr = f,r + D,r r vecξ = f ξ + D ξ p vecλ = f λ + D λ λ Φ = G G G Ξ = H H H Π = F F F 5 The consrained updae equaions The derivaion proceeds by aking he parial derivaive of equaion 9 wih respec o he esimaed erms, he ζ, α, ec, seing he derivaive o zero, and solving for hose esimaed erms. Concepually, he algebraic seps in he derivaion are similar o hose in he unconsrained derivaion. 5. The general u updae equaions We ake he derivaive of Ψ equaion 9 wih respec o υ. Ψ/ υ = E[X Q D,u υ]/ υ E[υ D,uQ X ]/ υ + E[X I m vecb Q D,u υ]/ υ + E[υ D,uQ X I m vecb ]/ υ + υ D,uQ D,u υ/ υ + E[f,uQ D,u υ]/ υ + E[υ D,uQ f,u ]/ υ 93 where Q = Φ Q Φ. Since υ is o he far lef or righ in each erm, he derivaive is simple using he derivaive erms in able 3.. Ψ/ υ becomes: Ψ/ υ = E[X Q D,u ] + E[X I m vecb Q D,u ] 94 + υ D,uQ D,u + E[f,uQ D,u ] Se he lef side o zero and ranspose he whole equaion. 0 = D,uQ E[X ] D,uQ E[X ] I m vecb D,uQ D,u υ D,uQ f,u 95 Thus, T D,uQ D,u υ = D,uQ E[X ] E[X ] I m vecb f,u 96 3

24 We solve for υ, and he new υ for he j + ieraion of he EM algorihm is T υ j+ = D,uQ D,u T D,uQ x x I m vecb f,u 97 where Q = Φ Q Φ = G G G Q G G G. The updae equaion requires ha T D,uQ D,u is inverible. I generally will be if Φ Q Φ is a proper variance-covariance marix posiive semi-definie and D,u is full rank. If G has all-zero rows hen Φ Q Φ has zeros on he diagonal and we have a parially deerminisic model. In his case, Q will have all-zero row/columns and D,uQ D,u will no be inverible unless he corresponding row of D,u is zero. This means ha if one of he x rows is fully deerminisic hen he corresponding row of u would need o be fixed. We can ge around his, however. See secion 7 on he modificaions o he updae equaion when some of he x s are fully deerminisic. 5. The general a updae equaion The derivaion of he updae equaion for α wih fixed and shared values is compleely analogous o he derivaion for υ. We ake he derivaive of Ψ wih respec o α and arrive a he analogous: α j+ = T D,aR D,a T T = T D,aR D,a D,aR ỹ x I n vecz f,a D,aR ỹ Z x f,a 98 T D,aR D,a mus be inverible. 5.3 The general ξ updae equaion, sochasic iniial sae When x 0 is reaed as sochasic wih an unknown mean and known variance, he derivaion of he updae equaion for ξ wih fixed and shared values is as follows. Take he derivaive of Ψ using equaion 9 wih respec o p: Ψ/ p = x 0 L ξ L 99 Replace ξ wih f ξ + D ξ p, se he lef side o zero and ranspose: Thus, and he new ξ is hen, 0 = D ξ L x0 Lf ξ + LD ξ p 00 p j+ = D ξ LD ξ D ξ L x 0 f ξ 0 ξ j+ = f ξ + D ξ p j+, 0 When he iniial sae is defined as a =, replace x 0 wih x in equaion The general ξ updae equaion, fixed x 0 For he case, x 0 is reaed as fixed, i.e. as anoher parameer, and Λ does no appear in he equaion. I will be easier o work wih Ψ wrien as follows: E XY [log LY, X; Θ] = E T XY Y Z X a R Y Z X a + + X B X u Q X B X u + x 0 f ξ + D ξ p log Q + log π log R 03 4

arxiv: v1 [stat.me] 16 Feb 2013

arxiv: v1 [stat.me] 16 Feb 2013 Derivaion of an EM algorihm for consrained and unconsrained mulivariae auoregressive sae-space MARSS models Elizabeh Eli Holmes arxiv:30.399v [sa.me] 6 Feb 03 February 9, 03 Absrac This repor presens an