Nonlinear and non-gaussian state-space modelling by means of hidden Markov models University of Göttingen St Andrews, 13 December 2010 bla bla bla bla
1 2 Glacial varve thickness
(General) state-space model (SSM): y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) y t = a(g t, ɛ t ) g t = b(g t 1, η t ) a, b: known functions (not necessarily linear) ɛ t, η t iid (not necessarily N )
(General) state-space model (SSM): y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) y t = a(g t, ɛ t ) g t = b(g t 1, η t ) a, b: known functions (not necessarily linear) ɛ t, η t iid (not necessarily N )
Example 1. Stochastic volatility model: y t = ɛ t β exp(g t /2) g t = φg t 1 + ση t ɛ t iid tν or N (0, 1), η t iid N (0, 1) g t determines variance (volatility) of y t
Example 2. Poisson autoregression: y t Poisson ( β exp(g t ) ) g t = φg t 1 + ση t η t iid N (0, 1) g t determines mean (and variance) of y t
Desired: parameter estimation state decoding forecasts model checking SSM likelihood: L(y) =... } {{ } n fold f (y, g) dg can not be evaluated directly... (SSM linear & Gaussian Kalman filter optimal)
Desired: parameter estimation state decoding forecasts model checking SSM likelihood: L(y) =... } {{ } n fold f (y, g) dg can not be evaluated directly... (SSM linear & Gaussian Kalman filter optimal)
Desired: parameter estimation state decoding forecasts model checking SSM likelihood: L(y) =... } {{ } n fold f (y, g) dg can not be evaluated directly... (SSM linear & Gaussian Kalman filter optimal)
Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications
Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications
Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications
Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications
1 2 Glacial varve thickness
Hidden Markov model: y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) Non-observable process: N-state Markov chain g t initial distribution δ i = P(g 1 = i) transition probabilities γ ij = P(g t = j g t 1 = i) Observable process: y t state-dependent density f (y t g t )
Hidden Markov model: y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) Non-observable process: N-state Markov chain g t initial distribution δ i = P(g 1 = i) transition probabilities γ ij = P(g t = j g t 1 = i) Observable process: y t state-dependent density f (y t g t )
Hidden Markov model: y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) Non-observable process: N-state Markov chain g t initial distribution δ i = P(g 1 = i) transition probabilities γ ij = P(g t = j g t 1 = i) Observable process: y t state-dependent density f (y t g t )
Key idea: HMMs have the same two-process structure as SSMs in SSMs: g t continuous-valued discretizing g t yields approximation by HMM benefit: HMM methodology becomes applicable
split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )
split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )
split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )
split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )
split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )
Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM
Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM
Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM
Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM
Pros and Cons (HMM method): + likelihood directly available extensions straightforward simple formulae for residuals, forecasts, decoding m and range of g t have to be chosen only feasible for one-dimensional state spaces
Glacial varve thickness 1 2 Glacial varve thickness
Glacial varve thickness s considered in Langrock (2010): stochastic volatility earthquake counts polio counts (seasonal) daily rainfall occurrence (seasonal) glacial varve thickness
Glacial varve thickness s considered in Langrock (2010): stochastic volatility earthquake counts polio counts (seasonal) daily rainfall occurrence (seasonal) glacial varve thickness
Glacial varve thickness varves: layers of sediment deposited by melting glaciers can be useful for long-term climate research source: Shumway and Stoffer (Time Series Analysis and Its s, 2006)
Glacial varve thickness varves: layers of sediment deposited by melting glaciers can be useful for long-term climate research source: Shumway and Stoffer (Time Series Analysis and Its s, 2006) 150 varve thickness in mm 100 50 0 0 100 200 300 400 500 600 years Figure: Series of glacial varve thicknesses for a location in Massachusetts.
Glacial varve thickness y t = ɛ t β exp(g t ) g t = φg t 1 + ση t ɛ t Gamma ( shape = cv 2, scale = cv 2 ) Properties: E(y t g t ) = β exp(g t ) (Conditional) coefficient of variation: sd(y t g t ) E(y t g t ) = c v
Glacial varve thickness y t = ɛ t β exp(g t ) g t = φg t 1 + ση t ɛ t Gamma ( shape = cv 2, scale = cv 2 ) Properties: E(y t g t ) = β exp(g t ) (Conditional) coefficient of variation: sd(y t g t ) E(y t g t ) = c v
Glacial varve thickness y t = ɛ t β exp(g t ) g t = φg t 1 + ση t ɛ t Gamma ( shape = cv 2, scale = cv 2 ) Properties: E(y t g t ) = β exp(g t ) (Conditional) coefficient of variation: sd(y t g t ) E(y t g t ) = c v
Glacial varve thickness Table: Estimated model parameters and bootstrap 95% confidence intervals (400 replications). para. estimate c.i. φ 00.95 [0.90, 0.97] σ 00.15 [0.11, 0.19] β 24.42 [19.1, 31.1] c v 00.40 [0.37, 0.42] resolution: m = 200 g t range: b 0 = 3, b m = 3
Glacial varve thickness 150 varve thickness 100 50 0 0 100 200 300 400 500 600 Figure: Series of glacial varve thicknesses (solid grey line) and decoded mean sequence of the fitted gamma SSM (crosses). years
HMM approximation convenient in SSM context whole HMM methodology applicable simple implementation of standard and nonstandard models Langrock, R., MacDonald, I. M., Zucchini, W., 2010 Estimating standard and nonstandard stochastic volatility models using structured hidden Markov models. (submitted) Langrock, R., 2010 Some applications of nonlinear and non-gaussian state-space modeling by means of hidden Markov models. (submitted)