Nonparametric inference in hidden Markov and related models

Similar documents
Ecological applications of hidden Markov models and related doubly stochastic processes

Markov-switching generalized additive models

Basic math for biology

STA 4273H: Statistical Machine Learning

STAT 518 Intro Student Presentation

STA 414/2104: Machine Learning

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Note Set 5: Hidden Markov Models

Towards a Bayesian model for Cyber Security

Chapter 3 - Estimation by direct maximization of the likelihood

Modelling Non-linear and Non-stationary Time Series

Model selection and checking

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Linear Dynamical Systems

Hidden Markov Models,99,100! Markov, here I come!

Nonlinear and non-gaussian state-space modelling by means of hidden Markov models

Using Estimating Equations for Spatially Correlated A

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Lecture 2: Univariate Time Series

Generalized additive modelling of hydrological sample extremes

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Threshold Autoregressions and NonLinear Autoregressions

CPSC 540: Machine Learning

Modeling Real Estate Data using Quantile Regression

HMM Workshop. Rocío Joo. Rocío Joo HMM Workshop 1 / 43

Dynamic Approaches: The Hidden Markov Model

Regression with correlation for the Sales Data

Analysing geoadditive regression data: a mixed model approach

State-Space Methods for Inferring Spike Trains from Calcium Imaging

A short introduction to INLA and R-INLA

Bayesian Methods for Machine Learning

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Bayesian non-parametric model to longitudinally predict churn

Modeling conditional distributions with mixture models: Theory and Inference

Introduction to Machine Learning CMU-10701

Estimation by direct maximization of the likelihood

PMR Learning as Inference

Data-Intensive Computing with MapReduce

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Econometría 2: Análisis de series de Tiempo

Statistical Inference and Methods

Semi-parametric estimation of non-stationary Pickands functions

Modelling and forecasting of offshore wind power fluctuations with Markov-Switching models

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Introduction to Machine Learning Midterm, Tues April 8

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Linear Regression Models P8111

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

A general mixed model approach for spatio-temporal regression data

Stat 5101 Lecture Notes

On the econometrics of the Koyck model

Modelling geoadditive survival data

Bayesian linear regression

A Gaussian state-space model for wind fields in the North-East Atlantic

Recovering Indirect Information in Demographic Applications

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Linear Dynamical Systems (Kalman filter)

Probabilistic Reasoning in Deep Learning

STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5

Variable Selection in Predictive Regressions

Regression, Ridge Regression, Lasso

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

movehmm An R package for the analysis of animal movement data

Gaussian Processes (10/16/13)

8 Nominal and Ordinal Logistic Regression

Unsupervised Learning

A brief introduction to mixed models

Master 2 Informatique Probabilistic Learning and Data Analysis

Next, we discuss econometric methods that can be used to estimate panel data models.

1 Mixed effect models and longitudinal data analysis

STA 4273H: Statistical Machine Learning

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

HMM part 1. Dr Philip Jackson

Density Estimation. Seungjin Choi

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Bayesian Networks in Educational Assessment

What s New in Econometrics? Lecture 14 Quantile Methods

STA 4273H: Statistical Machine Learning

Inference and estimation in probabilistic time series models

The Hot Hand in Professional Darts

STA414/2104 Statistical Methods for Machine Learning II

INTRODUCTORY REGRESSION ANALYSIS

Hmms with variable dimension structures and extensions

Time Series and Forecasting Lecture 4 NonLinear Time Series

Markov Switching Models

Regularization in Cox Frailty Models

Lecture 16: Mixtures of Generalized Linear Models

Modeling the Covariance

Approximate Bayesian Computation

Issues on quantile autoregression

An introduction to Sequential Monte Carlo

Hidden Markov models for time series of counts with excess zeros

Markov Chain Monte Carlo methods

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

Subject CS1 Actuarial Statistics 1 Core Principles

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Transcription:

Nonparametric inference in hidden Markov and related models Roland Langrock, Bielefeld University Roland Langrock Bielefeld University 1 / 47

Introduction and motivation Roland Langrock Bielefeld University 2 / 47

Figure: Haggis (the dish). Roland Langrock Bielefeld University 3 / 47

Figure: Wild Haggis (Dux magnus gentis venteris saginati). Roland Langrock Bielefeld University 4 / 47

Introducing hidden Markov models using Wild Haggis movement simulated movement track Pr(S t = j S t 1 = j) = 0.95 for j = 1, 2 where S t: state at time t Roland Langrock Bielefeld University 5 / 47

Introducing hidden Markov models using Wild Haggis movement simulated movement track step length distributions density 0.0 0.1 0.2 0.3 0.4 0 5 10 15 20 25 30 step length turning angle distributions density 0.0 0.2 0.4 0.6 0.8 3 2 1 0 1 2 3 turning angle Pr(S t = j S t 1 = j) = 0.95 for j = 1, 2, where S t: state at time t Roland Langrock Bielefeld University 6 / 47

Examples of HMM-type models/doubly stochastic processes hidden Markov models (HMMs) (general) state-space models Markov-switching regression Cox point processes X t 1 0X t 0 X t+1 (observed) In each case two components: S t 1 0S t 0 S t+1...... (hidden) an observable state-dependent process 1.) in animal movement: e.g. step lengths & turning angles 2.) in financial time series: some economic indicator, e.g. GDP values 3.) in disease progression: e.g. blood samples a latent (nonobservable) state process/system process in 1.): behavioural state in 2.): the nervousness of the market in 3.): the disease stage Roland Langrock Bielefeld University 7 / 47

Inference in HMM-type models Why nonparametric? 1. specifying a suitable model can be hard lots of ways to get it wrong! 2. more flexibility, perhaps leading to models that are more parsimonious, e.g. in terms of the number of states 3. as an exploratory tool A strategy applicable in many scenarios combines the simple yet powerful HMM machinery...... and the conceptual simplicity and general advantages of P-splines Roland Langrock Bielefeld University 8 / 47

1 Some basics on hidden Markov models 2 Nonparametric inference in hidden Markov models 3 Markov-switching generalized additive models 4 Concluding remarks Roland Langrock Bielefeld University 9 / 47

Some basics on hidden Markov models Roland Langrock Bielefeld University 10 / 47

HMMs summary/definition X t 1 0X t 0 X t+1 (observed) S t 1 0S t 0 S t+1...... (hidden) two (discrete-time) stochastic processes, one of them hidden distribution of observations determined by underlying state hidden state process is an N-state Markov chain Roland Langrock Bielefeld University 11 / 47

Building blocks of HMMs {S t} t=1,2,...,t is (usually) assumed to be an N-state Markov chain: state transition probabilities: γ ij = Pr(S t = j S t 1 = i) transition probability matrix (t.p.m.): γ 11... γ 1N Γ =....... γ N1... γ NN initial state distribution: δ = ( Pr(S 1 = 1),..., Pr(S 1 = N) ) State-dependent distributions f (x t s t = j): specify suitable class of parametric distributions e.g. normal, Poisson, Bernoulli, multivariate normal, gamma, Dirichlet,... one set of parameters for each state Roland Langrock Bielefeld University 12 / 47

HMMs likelihood calculation using brute force L(θ) = f (x 1,..., x T ) N N =... f (x 1,..., x T, s 1,..., s T ) = = s 1 =1 s T =1 N N... f (x 1,..., x T s 1,..., s T )f (s 1,..., s T ) s 1 =1 N... s T =1 N δ s1 T T f (x t s t) s 1 =1 s T =1 t=1 t=2 γ st 1,s t Simple form, but O(TN T ), numerical maximiz. of this expression thus infeasible. Roland Langrock Bielefeld University 13 / 47

HMMs likelihood calculation via forward algorithm Consider instead the so-called forward probabilities, α t(j) = f (x 1,..., x t, s t = j) These can be calculated using an efficient recursive scheme: α 1 = δq(x 1) α t = α t 1ΓQ(x t) with Q(x t) = diag ( f (x t s t = 1),..., f (x t s t = N) ) L(θ) = N α T (j) = δq(x 1)ΓQ(x 2)... ΓQ(x T )1 j=1 Computational effort: O(TN 2 ) linear in T! Roland Langrock Bielefeld University 14 / 47

Further inference a brief overview uncertainty quantification (parametric) bootstrap or Hessian-based model selection criteria such as the AIC model checking quantile residuals, simulation-based,... state decoding Viterbi algorithm Roland Langrock Bielefeld University 15 / 47

Related model classes state-space models can be approximated arbitrarily accurately by HMMs by finely discretizing the state space Markov-switching regression models HMMs with covariates Markov-modulated Poisson processes can be regarded as HMMs (with slightly modified dependence structure) The corresponding likelihoods can be written as easy-to-evaluate matrix products! Roland Langrock Bielefeld University 16 / 47

Nonparametric inference in hidden Markov models Roland Langrock Bielefeld University 17 / 47

HMMs motivation for a nonparametric approach distribution of observations selected by underlying state state-dependent distributions usually from a class of parametric distributions finding the right distribution, or even a suitable one, can be difficult an unfortunate choice can lead to...... a poor fit and hence poor predictive power... a bad performance of the state decoding... invalid inference e.g. on the number of states observed time series histogram of observations observations 40 20 0 20 40 Frequency 0 20 40 60 0 200 400 600 800 1000 time 60 40 20 0 20 40 60 observations What family of distributions to use for the state-dependent process? Roland Langrock Bielefeld University 18 / 47

Nonparametric estimation based on P-splines represent densities of state-dep. distributions using standardized B-spline basis densities: f (x t s t = i) = K k= K ai,kφk(xt) transform constrained parameters a i, K,..., a i,k : a i,k = exp(β i,k) K j= K exp(βi,j) with β i,0 = 0 numerically maximize the penalized log-likelihood: l p(θ, λ) = log ( L(θ) ) [ N ] λ i K ( ) 2 2 a i,k 2 i=1 k= K +2 Roland Langrock Bielefeld University 19 / 47

Inference identifiability holds under fairly weak conditions (essentially there needs to be serial correlation) generalized cross-validation or AIC-type statistic for (i) choosing λ from N-dimensional grid (ii) model selection on the number of states parameter estimation by numerical maximization of l p(θ, λ) local maxima can be an issue use many different initial values in the maximization uncertainty quantification via parametric bootstrap model checking via pseudo-residuals (standard) state decoding using Viterbi (standard) Roland Langrock Bielefeld University 20 / 47

A simple simulation experiment simulate T = 800 observations from 2-state HMM ( ) 0.9 0.1 Γ = 0.1 0.9 true densities of the state dep. distributions density 0.00 0.01 0.02 0.03 0.04 60 40 20 0 20 40 60 80 Roland Langrock Bielefeld University 21 / 47

A simple simulation experiment simulate T = 800 observations from 2-state HMM ( ) 0.9 0.1 Γ = 0.1 0.9 marginal distribution of obs. density 0.000 0.005 0.010 0.015 0.020 60 40 20 0 20 40 60 80 Roland Langrock Bielefeld University 22 / 47

A simple simulation experiment simulate T = 800 observations from 2-state HMM ( ) 0.9 0.1 Γ = 0.1 0.9 K = 15, thus 2K + 1 = 31 B-spline basis functions true (black) and estimated densities of the state dep. distributions density 0.00 0.01 0.02 0.03 0.04 lambdas about right 60 40 20 0 20 40 60 80 Roland Langrock Bielefeld University 23 / 47

A simple simulation experiment simulate T = 800 observations from 2-state HMM ( ) 0.9 0.1 Γ = 0.1 0.9 K = 15, thus 2K + 1 = 31 B-spline basis functions true (black) and estimated densities of the state dep. distributions density 0.00 0.01 0.02 0.03 0.04 lambdas too big 60 40 20 0 20 40 60 80 Roland Langrock Bielefeld University 24 / 47

A simple simulation experiment simulate T = 800 observations from 2-state HMM ( ) 0.9 0.1 Γ = 0.1 0.9 K = 15, thus 2K + 1 = 31 B-spline basis functions true (black) and estimated densities of the state dep. distributions density 0.00 0.01 0.02 0.03 0.04 lambdas too small 60 40 20 0 20 40 60 80 Roland Langrock Bielefeld University 25 / 47

Blainville s beaked whale dive data log( depth displacement in meters) 6 4 2 0 2 4 observed time series 10 20 30 40 time in hours histogram of the observations sample ACF Density value 0.00 0.05 0.10 0.15 0.20 0.25 0.30 ACF 0.0 0.2 0.4 0.6 0.8 1.0 8 6 4 2 0 2 4 6 log( depth displacement in meters) 0 5 10 15 20 25 30 lag Roland Langrock Bielefeld University 26 / 47

Blainville s beaked whale parametric HMMs Table: Results of fitting HMMs with normal state-dependent distributions. #states p AIC BIC 3 12 9784.00 9855.59 4 20 9498.16 9617.47 5 30 9400.30 9579.27 6 42 9294.88 9545.43 7 56 9208.04 9542.11 8 72 9129.15 9558.67 9 90 9090.98 9627.87 10 110 9064.53 9720.74 Roland Langrock Bielefeld University 27 / 47

Blainville s beaked whale parametric HMM, N = 7 fitted state dependent distributions (3 state parametric HMM) log(absolute depth displacement) Density 6 4 2 0 2 4 6 0.00 0.05 0.10 0.15 0.20 0.25 0.30 state 1 state 2 state 3 state 4 state 5 state 6 state 7 marginal 3 2 1 0 1 2 3 3 2 1 0 1 2 3 qq plot of residuals against standard normal quantiles of the standard normal sample quantiles 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 sample ACF for series of residuals lag ACF Roland Langrock Bielefeld University 28 / 47

Blainville s beaked whale parametric HMM, N = 3 fitted state dependent distributions (3 state parametric HMM) log(absolute depth displacement) Density 6 4 2 0 2 4 6 0.00 0.05 0.10 0.15 0.20 0.25 0.30 state 1 state 2 state 3 marginal 3 2 1 0 1 2 3 3 2 1 0 1 2 3 qq plot of residuals against standard normal quantiles of the standard normal sample quantiles 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 sample ACF for series of residuals lag ACF Roland Langrock Bielefeld University 29 / 47

Blainville s beaked whale nonparametric HMM with N = 3 fitted state dependent distributions (3 state nonparametric HMM) log(absolute depth displacement) Density 6 4 2 0 2 4 6 0.00 0.05 0.10 0.15 0.20 0.25 0.30 state 1 state 2 state 3 marginal 3 2 1 0 1 2 3 3 2 1 0 1 2 3 qq plot of residuals against standard normal quantiles of the standard normal sample quantiles 0 5 10 15 20 25 30 0.0 0.2 0.4 0.6 0.8 1.0 sample ACF for series of residuals lag ACF Roland Langrock Bielefeld University 30 / 47

Blainville s beaked whale Viterbi for nonparametric HMM with N = 3 depths in meters log( depth displacement in meters) 4 2 0 2 4 1200 1000 800 600 400 200 0 decoded states state 1 state 2 state 3 2 4 6 8 time in hours Roland Langrock Bielefeld University 31 / 47

Markov-switching generalized additive models Roland Langrock Bielefeld University 32 / 47

Markov-switching regression a basic model A simple Markov-switching (linear) regression model: with Y t = β (s t ) 0 + β (s t ) 1 x t + σ st ɛ t, a time series {Y t} t=1,...,t associated covariates x 1,..., x T (including the possibility of x t = y t 1) ɛ iid t N (0, 1) s t: state at time t of an unobservable N-state Markov chain Roland Langrock Bielefeld University 33 / 47

Markov-switching regression remarks on the basic model commonly used in economics to deal with parameter instability over time (key references: Goldfeld and Quandt, 1973; Hamilton, 1989) linear form of the predictor is usually assumed with little investigation (if any!) into the absolute or relative goodness of fit we consider nonparametric methods for estimating the form of the predictor (in analogy to the extension of linear models to GAMs) Roland Langrock Bielefeld University 34 / 47

Markov-switching regression more general model formulation More general model formulation: g ( ) E(Y t s t, x t) = η (s t ) (x t), }{{} µ (s t ) t where Y t follows some distribution from the exponential family x t = (x 1t,..., x Pt) is the covariate vector at time t g is a suitable link function η (s t ) is the predictor function given state s t (the form of which we do not yet specify) (φ (s t ) : any additional state-dependent dispersion parameters) Roland Langrock Bielefeld University 35 / 47

Likelihood evaluation using the forward recursion Define, analogously as for HMMs, the forward variable α t(j) = f (y 1,..., y t, S t = j x 1... x t) Then the following recursive scheme can be applied: α 1 = δq(y 1), α t = α t 1ΓQ(y t) (t = 2,..., T ) where Q(y t) = diag ( p Y (y t; µ (1) t, φ (1) ),..., p Y (y t; µ (N) t, φ (N) ) ) L(θ) = N α T (j) = δq(x 1)ΓQ(x 2)... ΓQ(x T )1 j=1 This form applies for any form of the conditional density p Y (y t; µ (s t ) t, φ (s t ) ) Roland Langrock Bielefeld University 36 / 47

Nonparametric modelling of the predictor here we consider a GAM-type framework: η (s t ) (x t) = β (s t ) 0 + f (s t ) 1 (x 1t) + f (s t ) 2 (x 2t) +... + f (s t ) P (x Pt), we represent each f (i) p as a linear combination of B-spline basis functions: f (i) p (x) = K γ ipk B k(x) k=1... and numerically maximize the penalized log-likelihood: l p(θ, λ) = log ( L(θ) ) N P i=1 p=1 λ ip 2 K ( 2 γ ipk) 2 k=3 inference analogous as for nonparametric HMMs notably, parametric models are nested special cases (for λ ) Roland Langrock Bielefeld University 37 / 47

A simple simulation experiment simulate T = 300 observations from 2-state Markov-switching regr. model: ( ) Y t Poisson(e β 0+f (s t ) (x t ) 0.9 0.1 ), Γ = 0.1 0.9 f (s t) (xt) 6 4 2 0 2 4 6 s t=1 (state 1) s t=2 (state 2) 3 2 1 0 1 2 3 x t Roland Langrock Bielefeld University 38 / 47

A simple simulation experiment simulate T = 300 observations from 2-state Markov-switching regr. model: ( ) Y t Poisson(e β 0+f (s t ) (x t ) 0.9 0.1 ), Γ = 0.1 0.9 K = 15, thus 2K + 1 = 31 B-spline basis functions smoothing parameter selection from a grid using AIC-type statistic f (s t) (xt) 6 4 2 0 2 4 6 s t=1 (state 1) s t=2 (state 2) 3 2 1 0 1 2 3 x t Roland Langrock Bielefeld University 39 / 47

A simple simulation experiment simulate T = 300 observations from 2-state Markov-switching regr. model: ( ) Y t Poisson(e β 0+f (s t ) (x t ) 0.6 0.4 ), Γ = 0.4 0.6 K = 15, thus 2K + 1 = 31 B-spline basis functions smoothing parameter selection from a grid using AIC-type statistic f (s t) (xt) 6 4 2 0 2 4 6 s t=1 (state 1) s t=2 (state 2) 3 2 1 0 1 2 3 x t Roland Langrock Bielefeld University 40 / 47

1.0 1.5 sales (in million USD) 2.0 2.5 3.0 3.5 Example Lydia Pinkham sales 1910 Roland Langrock Bielefeld University 1920 1930 1940 year 1950 1960 41 / 47

Example Lydia Pinkham sales Model MS-LIN: sales t = β (s t ) 0 + β (s t ) 1 advertising t + β (s t ) 2 sales t 1 + σ st ɛ t Model MS-GAM: sales t = β (s t ) 0 + f (s t ) (advertising t ) + β (s t ) 1 sales t 1 + σ st ɛ t MS LIN MS GAM Sales 1.5 2.0 2.5 3.0 Sales 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 Advertising 0.5 1.0 1.5 2.0 Advertising Figure: Estimated state-dependent mean sales as functions of advertising expenditure (state 1 in green, state 2 in red). Displayed are the predictor values when fixing the regressor sales t 1 at its overall mean, 1.84. Roland Langrock Bielefeld University 42 / 47

Example Lydia Pinkham sales sales 1.0 1.5 2.0 2.5 3.0 3.5 1910 1920 1930 1940 1950 1960 year state 1 2 1910 1920 1930 1940 1950 1960 year Figure: Sales figures and decoded states underlying the MS-GAM model. Roland Langrock Bielefeld University 43 / 47

Example Lydia Pinkham sales ACF MS LIN residuals qq plot MS LIN residuals ACF 0.2 0.2 0.6 1.0 Sample Quantiles 3 2 1 0 1 2 3 0 5 10 15 Lag 3 2 1 0 1 2 3 Theoretical Quantiles ACF MS GAM residuals qq plot MS GAM residuals ACF 0.2 0.2 0.6 1.0 Sample Quantiles 3 2 1 0 1 2 3 0 5 10 15 Lag 3 2 1 0 1 2 3 Theoretical Quantiles Roland Langrock Bielefeld University 44 / 47

Concluding remarks Roland Langrock Bielefeld University 45 / 47

Concluding remarks bringing together HMMs & P-splines gives lots of modelling options while inference is slightly more involved, resulting models often substantially increase the goodness of fit, and may in fact be more parsimonious than parametric alternatives various other such models can be formulated (and fitted), e.g. MS-GAMLSS models but does anyone need this kind of thing?? we re currently working on alternative, less computer-intensive methods for selecting the smoothing parameters Roland Langrock Bielefeld University 46 / 47

References Langrock, R., Kneib, T., Sohn, A., DeRuiter, S. (2015), Nonparametric inference in hidden Markov models using P-splines, Biometrics Langrock, R., Glennie, R., Kneib, T., Michelot, T. (2016). Markov-switching generalized additive models, Statistics and Computing Thank you! Roland Langrock Bielefeld University 47 / 47