Consistency of the maximum likelihood estimator for general hidden Markov models

Size: px

Start display at page:

Download "Consistency of the maximum likelihood estimator for general hidden Markov models"

Abel Robertson
5 years ago
Views:

1 Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden

2 Collaborators Hidden Markov models This is a joint work with Randal Douc, Télécom SudParis, Eric Moulines, Télécom ParisTech, Ramon van Handel, Princeton University. Ann. Statist. Volume 39, Number 1 (2011),

3 Plan of the talk 1 Hidden Markov models 2 3 4

4 We are here 1 Hidden Markov models 2 3 4

5 Hidden Markov models (HMMs) An HMM comprises an unobservable Markov chain X (X n ) n 0 on (X, X ) with transition kernel Q θ and initial distribution ν, i.e. X 0 ν and X n+1 X n Q θ (X n, ). observations (Y n ) n 0 in (Y, Y) being conditionally independent given X such that Y n X = d. Y n X n G θ (X n, ). We assume that G θ has a transition density g θ, i.e. G θ (x, A) = g θ (x, y) µ(dy), A Y. A Here θ is a parameter vector belonging to some compact metric space Θ.

6 HMMs (cont.) Hidden Markov models Graphical representation: Y n 1 Y n Y n+1 (Observations)... X n 1 X n X n+1... (Markov chain) Y n X n G θ (X n, ) X n+1 X n Q θ (X n, ) X 0 ν

7 Likelihood estimation in HMMs Throughout this talk we fix a distinguished element θ Θ, interpreted as the true parameter value. Having done this, we presume that Q θ possesses a unique invariant distribution π θ, denote by P θ the law of the stationary HMM under θ, and assume that we are given observations Y n 0 = (Y 0,... Y n ) sampled from the distribution P θ. In this setting, we estimate θ using the maximum likelihood estimator (MLE), i.e. the argument θ n maximizing Θ θ p ν θ (Y n 0 ), where p ν θ (yn 0 ) denotes the density of the measure Pν θ (Y n 0 ).

8 Consistency of the MLE Main problem We want to show that the MLE is strongly consistent in the sense that θ n θ, P θ -a.s., as n. For HMMs, this problem has been addressed in a long series of papers, where the most significant quantum leaps are: X Y Contributors finite finite Baum & Petrie (1966) finite general Leroux (1992) compact general Douc & Mathias (2001)

9 We are here 1 Hidden Markov models 2 3 4

10 Hidden Markov models Common thread of previous contributions: (1) Show that for all θ Θ there is a constant H(θ, θ ), the asymptotic contrast, such that, P θ -a.s., lim n n 1 log p ν θ (Y 0 n ) = lim n n 1 Ē θ (log p ν θ (Y 0 n )) = H(θ, θ ). (2) Establish identifiability, i.e. that the relative entropy is minimized only at θ = θ. E(θ, θ ) H(θ, θ ) H(θ, θ ) (3) Prove that θ n converges (a.s.) to the maximizer of H(θ, θ ).

11 Hidden Markov models In the literature, establishing the existence of H(θ, θ ) has (for θ θ ) required very restrictive assumptions that are rarely satisfied in practice (particularly in the non-compact case), such as the following. Assumptions (Global Doeblin) There are constants 0 < ɛ < ɛ + and a probability measure η on (X, X ) such that for all x X and A X, ɛ η(a) Q(x, A) ɛ + η(a). Thus, a general consistency result has hitherto remained lacking.

12 We are here 1 Hidden Markov models 2 3 4

13 Exponential separability Definition (Exponential separability) For each n, Let Q n and P n be probability measures on some measurable space (Z n, Z n ). Then (Q n ) is exponentially separated from (P n ), denoted (Q n ) (P n ), if there exists a sequence (A n ) of sets A n Z n such that lim inf n P n(a n ) > 0, lim sup n 1 log Q n (A n ) < 0. n Using standard properties of the Kullback-Leibler (KL) divergence one may prove the following. Theorem If (Q n ) (P n ), then lim inf n n 1 KL(P n Q n ) > 0.

14 Main result Hidden Markov models Assumptions (1) The Markov kernel Q θ is positive Harris recurrent. (2) For some integer l 1, each Q l θ has a bounded density q θ with respect to some σ-finite measure λ. (3) For every θ θ, ( P ν θ (Y n 0 ) ) ( Pθ (Y n 0 ) ). (4)... Theorem Under the assumptions above, the MLE is strongly consistent.

15 Verifying separability The assumption (3) of exponential separability is nontrivial. However, assume that for all θ θ, (i) Q θ is ergodic and (ii) P Y θ P Y θ. By (ii) there are s < and h : Y s+1 R such that Then for sets Ē θ (h(y s 0 )) = 0 and Ē θ (h(y s 0 )) = 1; A n { y n 0 Y n+1 : } 1 n s h(yi i+s ) > 1 n s 2 it holds, by (i), that P θ (A n ) 1 and P ν θ (A n) 0 as n. i=s

16 A useful deviation inequality Exponential separability follows if we can show that P ν θ (A n) 0 occurs at an exponential rate. We thus provide the following result. Proposition (Azuma-Hoeffding inequality for Markov chains) Assume that Q is V -uniformly ergodic with invariant distribution π. Then there is a constant c > 0 such that for all t > 0 and all bounded functions h : X R, P η ( 1 n ) n h(x i ) π(h) t i=1 c ν(v ) exp ( n ( t 2 c h 2 t h )).

17 Verifying separability Write the quantity of interest as a telescoping of form ( n s n ) n h(yi i+s ) = ξ i,j + E ν i+s θ (h(yi ) X0 i 1, Y0 i 1 ), }{{} i=1 j=0 i=1 i=1 H(X i ) where (ξ i,j ) is a martingale increment sequence for each j, separability is verified by combining the inequality above with the standard Azuma-Hoeffding inequality for martingale increment sequences. Thus, V -uniform ergodicity exponential separability!

18 Some examples Hidden Markov models Our assumptions can be straightforwardly verified for, e.g., very general classes of HMMs with finite state space and nonlinear HMMs with possibly non-compact state space, e.g. the popular stochastic volatility model { X k+1 = αx k + σɛ k+1, Y k = β exp (X k /2) ε k, where (ɛ k ) and (ε k ) are sequences of i.i.d. Gaussian variables and θ = (α, β, σ) are unknown parameters with α < 1 (Taylor, 1982).

19 We are here 1 Hidden Markov models 2 3 4

20 Our approach: Main ideas Instead of proving (1), i.e. for all θ Θ, lim n n 1 log p ν θ (Y 0 n ) = H(θ, θ ), (a.s.) we note that it is enough to establish 1 the limit lim n n 1 log p ν θ (Y 0 n ) = H(θ, θ ), (a.s.) which is easily obtained using the SBM theorem, and 2 the following bound: For all closed sets C Θ with θ / C, lim sup n n 1 sup log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) θ C

21 Our approach: A blocking technique To establish the bound, consider blocks of observations of length l and bound (roughly) the likelihood according to log p ν θ (Y 0 n ) 1 n l l k=0 log p λ θ k+l (Yk ) + o a.s. (n) for some constant c > 0. This implies directly, via Birkhoff s ergodic theorem, ( ) lim sup n 1 log p ν θ (Y 0 n ) Ēθ l 1 log p λ θ (Y 0 l ) (a.s.) n

22 Our approach: Identifiability To complete the proof we use the assumed exponential separability and its connection to the KL divergence: For all θ θ, ( P ν θ (Y0 n ) ) ( Pθ (Y0 n ) ) lim inf Ē θ (n 1 log p θ (Y 0 n) ) n p λ θ (Y 0 n) > 0 ( ) lim sup Ē θ n 1 log p λ θ (Y 0 n ) < H(θ, θ ) n ( ) l θ : Ēθ l 1 log p λ θ (Y l θ 0 ) < H(θ, θ ). Lemma θ

23 Our approach: Identifiability Combining this with the previous gives lim sup n 1 log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) n and, after some more work, using that the parameter space Θ is compact, for all closed sets C Θ with θ / C, lim sup n n 1 sup log p ν θ (Y 0 n ) < H(θ, θ ) (a.s.) θ C This completes the proof of consistency.

24 Summary Hidden Markov models We have introduced HMMs and the problem of maximum likelihood-based inference in such models. proved that the MLE is strongly consistent under (what we believe) minimal assumptions. discussed an information-theoretic device, exponential separability, which is efficiently used in our proof to establish identifiability. Shown how exponential separability can be verified using a novel Azuma-Hoeffding-type inequality for V -uniformly ergodic Markov chains a result of independent interest.

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE