Nonlinear and non-gaussian state-space modelling by means of hidden Markov models

Similar documents
Hidden Markov Models for precipitation

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

Sequential Monte Carlo Methods for Bayesian Computation

Computer Intensive Methods in Mathematical Statistics

Chapter 8 - Extensions of the basic HMM

Nonparametric inference in hidden Markov and related models

Particle Filters. Outline

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

July 31, 2009 / Ben Kedem Symposium

Spatio-temporal precipitation modeling based on time-varying regressions

Computer Intensive Methods in Mathematical Statistics

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Machine Learning for OR & FE

The Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo

On some special-purpose hidden Markov models

Bayesian Networks BY: MOHAMAD ALSABBAGH

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

The Hot Hand in Professional Darts

Forecasting & Futurism

Basic math for biology

Computer Intensive Methods in Mathematical Statistics

Bayesian Inference for DSGE Models. Lawrence J. Christiano

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

Introduction. log p θ (y k y 1:k 1 ), k=1

The Hot Hand in Professional Darts

Bayesian Regression Linear and Logistic Regression

Time Series I Time Domain Methods

Linear Dynamical Systems (Kalman filter)

Note Set 5: Hidden Markov Models

STA 4273H: Statistical Machine Learning

X t = a t + r t, (7.1)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS

Convergence of Random Processes

A Gaussian state-space model for wind fields in the North-East Atlantic

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Modeling and Simulating Rainfall

State Space Models for Wind Forecast Correction

If we want to analyze experimental or simulated data we might encounter the following tasks:

Estimation for state space models: quasi-likelihood and asymptotic quasi-likelihood approaches

Recursive Kernel Density Estimation of the Likelihood for Generalized State-Space Models

Ensemble Kalman Filter

State-Space Methods for Inferring Spike Trains from Calcium Imaging

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Stat 516, Homework 1

Towards inference for skewed alpha stable Levy processes

Chapter 3 - Estimation by direct maximization of the likelihood

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Consistency of the maximum likelihood estimator for general hidden Markov models

Fundamental Probability and Statistics

Robust Backtesting Tests for Value-at-Risk Models

Generalized Autoregressive Score Smoothers

Chapter 3 - Temporal processes

Meta-heuristic ant colony optimization technique to forecast the amount of summer monsoon rainfall: skill comparison with Markov chain model

Hidden Markov models: definition and properties

The Unscented Particle Filter

Introduction to Machine Learning CMU-10701

Automated Likelihood Based Inference for Stochastic Volatility Models using AD Model Builder. Oxford, November 24th 2008 Hans J.

HMM part 1. Dr Philip Jackson

Asymptotic quasi-likelihood based on kernel smoothing for nonlinear and non-gaussian statespace

Modelling Wind Farm Data and the Short Term Prediction of Wind Speeds

Bayesian Inference for DSGE Models. Lawrence J. Christiano

EnKF-based particle filters

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Dynamic System Identification using HDMR-Bayesian Technique

AST 418/518 Instrumentation and Statistics

Learning Static Parameters in Stochastic Processes

Residual Bootstrap for estimation in autoregressive processes

STA 414/2104: Machine Learning

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Towards a Bayesian model for Cyber Security

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Modelling residual wind farm variability using HMMs

Probabilistic Machine Learning

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

GENERALIZED LINEAR MODELING APPROACH TO STOCHASTIC WEATHER GENERATORS

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Hmms with variable dimension structures and extensions

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

F denotes cumulative density. denotes probability density function; (.)

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Stable Limit Laws for Marginal Probabilities from MCMC Streams: Acceleration of Convergence

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Financial Econometrics

\ fwf The Institute for Integrating Statistics in Decision Sciences

Nonlinear Time Series

Bayesian Inference for DSGE Models. Lawrence J. Christiano

IMPLIED DISTRIBUTIONS IN MULTIPLE CHANGE POINT PROBLEMS

Variational Autoencoder

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic.

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Transcription:

Nonlinear and non-gaussian state-space modelling by means of hidden Markov models University of Göttingen St Andrews, 13 December 2010 bla bla bla bla

1 2 Glacial varve thickness

(General) state-space model (SSM): y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) y t = a(g t, ɛ t ) g t = b(g t 1, η t ) a, b: known functions (not necessarily linear) ɛ t, η t iid (not necessarily N )

(General) state-space model (SSM): y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) y t = a(g t, ɛ t ) g t = b(g t 1, η t ) a, b: known functions (not necessarily linear) ɛ t, η t iid (not necessarily N )

Example 1. Stochastic volatility model: y t = ɛ t β exp(g t /2) g t = φg t 1 + ση t ɛ t iid tν or N (0, 1), η t iid N (0, 1) g t determines variance (volatility) of y t

Example 2. Poisson autoregression: y t Poisson ( β exp(g t ) ) g t = φg t 1 + ση t η t iid N (0, 1) g t determines mean (and variance) of y t

Desired: parameter estimation state decoding forecasts model checking SSM likelihood: L(y) =... } {{ } n fold f (y, g) dg can not be evaluated directly... (SSM linear & Gaussian Kalman filter optimal)

Desired: parameter estimation state decoding forecasts model checking SSM likelihood: L(y) =... } {{ } n fold f (y, g) dg can not be evaluated directly... (SSM linear & Gaussian Kalman filter optimal)

Desired: parameter estimation state decoding forecasts model checking SSM likelihood: L(y) =... } {{ } n fold f (y, g) dg can not be evaluated directly... (SSM linear & Gaussian Kalman filter optimal)

Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications

Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications

Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications

Parameter estimation in case of nonlinearity/non-gaussianity: Extended Kalman filter + simple implementation in general poor approximation (Generalized) method of moments + simple implementation low efficiency, no state decoding Monte Carlo methods + high efficiency computer-intensive nonstandard models require nontrivial modifications

1 2 Glacial varve thickness

Hidden Markov model: y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) Non-observable process: N-state Markov chain g t initial distribution δ i = P(g 1 = i) transition probabilities γ ij = P(g t = j g t 1 = i) Observable process: y t state-dependent density f (y t g t )

Hidden Markov model: y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) Non-observable process: N-state Markov chain g t initial distribution δ i = P(g 1 = i) transition probabilities γ ij = P(g t = j g t 1 = i) Observable process: y t state-dependent density f (y t g t )

Hidden Markov model: y t 1 0y t 0 y t+1 (observable)... g t 1 0g t 0 g t+1... (non-observable) Non-observable process: N-state Markov chain g t initial distribution δ i = P(g 1 = i) transition probabilities γ ij = P(g t = j g t 1 = i) Observable process: y t state-dependent density f (y t g t )

Key idea: HMMs have the same two-process structure as SSMs in SSMs: g t continuous-valued discretizing g t yields approximation by HMM benefit: HMM methodology becomes applicable

split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )

split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )

split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )

split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )

split essential range of g t into m equidistant intervals B i := [b i 1, b i ] = b i : midpoint of B i L(y) = f (g 1 )f (y 1 g 1 )... n t=2 m P(g 1 B i1 )f (y 1 g 1 =bi 1 ) i 1 =1 L(y) =: L approx (y) f (y, g) dg f (g t g t 1 )f (y t g t ) dg n... dg 1 n t=2 i t=1 m P(g t B it g t 1 =bi t 1 )f (y t g t =bi t )

Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM

Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM

Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM

Consider HMM with m-state MC (possible outcomes: midpoints b i ) transition probabilities: γ ij := P(g t B j g t 1 = b i ) transition probability matrix: Γ = (γ ij ) initial distribution: δ i := P(g 1 B i ) observable process: state-dependent density: f (y t g t = b i ) P(y t ): diag. matrix with ith entry f (y t g t = b i ) L approx (y) = δp(y 1 )ΓP(y 2 )Γ ΓP(y n 1 )ΓP(y n )1 t the HMM (δ, Γ, f (y t )) approximates the SSM

Pros and Cons (HMM method): + likelihood directly available extensions straightforward simple formulae for residuals, forecasts, decoding m and range of g t have to be chosen only feasible for one-dimensional state spaces

Glacial varve thickness 1 2 Glacial varve thickness

Glacial varve thickness s considered in Langrock (2010): stochastic volatility earthquake counts polio counts (seasonal) daily rainfall occurrence (seasonal) glacial varve thickness

Glacial varve thickness s considered in Langrock (2010): stochastic volatility earthquake counts polio counts (seasonal) daily rainfall occurrence (seasonal) glacial varve thickness

Glacial varve thickness varves: layers of sediment deposited by melting glaciers can be useful for long-term climate research source: Shumway and Stoffer (Time Series Analysis and Its s, 2006)

Glacial varve thickness varves: layers of sediment deposited by melting glaciers can be useful for long-term climate research source: Shumway and Stoffer (Time Series Analysis and Its s, 2006) 150 varve thickness in mm 100 50 0 0 100 200 300 400 500 600 years Figure: Series of glacial varve thicknesses for a location in Massachusetts.

Glacial varve thickness y t = ɛ t β exp(g t ) g t = φg t 1 + ση t ɛ t Gamma ( shape = cv 2, scale = cv 2 ) Properties: E(y t g t ) = β exp(g t ) (Conditional) coefficient of variation: sd(y t g t ) E(y t g t ) = c v

Glacial varve thickness y t = ɛ t β exp(g t ) g t = φg t 1 + ση t ɛ t Gamma ( shape = cv 2, scale = cv 2 ) Properties: E(y t g t ) = β exp(g t ) (Conditional) coefficient of variation: sd(y t g t ) E(y t g t ) = c v

Glacial varve thickness y t = ɛ t β exp(g t ) g t = φg t 1 + ση t ɛ t Gamma ( shape = cv 2, scale = cv 2 ) Properties: E(y t g t ) = β exp(g t ) (Conditional) coefficient of variation: sd(y t g t ) E(y t g t ) = c v

Glacial varve thickness Table: Estimated model parameters and bootstrap 95% confidence intervals (400 replications). para. estimate c.i. φ 00.95 [0.90, 0.97] σ 00.15 [0.11, 0.19] β 24.42 [19.1, 31.1] c v 00.40 [0.37, 0.42] resolution: m = 200 g t range: b 0 = 3, b m = 3

Glacial varve thickness 150 varve thickness 100 50 0 0 100 200 300 400 500 600 Figure: Series of glacial varve thicknesses (solid grey line) and decoded mean sequence of the fitted gamma SSM (crosses). years

HMM approximation convenient in SSM context whole HMM methodology applicable simple implementation of standard and nonstandard models Langrock, R., MacDonald, I. M., Zucchini, W., 2010 Estimating standard and nonstandard stochastic volatility models using structured hidden Markov models. (submitted) Langrock, R., 2010 Some applications of nonlinear and non-gaussian state-space modeling by means of hidden Markov models. (submitted)