Hmms with variable dimension structures and extensions

Size: px

Start display at page:

Download "Hmms with variable dimension structures and extensions"

Kory Lewis
5 years ago
Views:

1 Hmm days/enst/january 21, Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine xian

2 Hmm days/enst/january 21, Estimating [not testing] the number of components/states

3 Hmm days/enst/january 21, In the mixture model, y i f(y) = k p j f(y θ j ), i = 1,, n j=1 what if k is unknown?!

4 Hmm days/enst/january 21, Meaning of the question weak identifiability of mixtures insolvable philosophical problem unless k has a proper intrinsic meaning [and even so...] hence testing per se is impossible: the data cannot distinguish between k and k + h [unless guided by a firm hand!] the choice of a prior on k π(k) is thus necessary to translate the degree of details required [equivalence with penalizing factors in likelihood analysis]

5 Hmm days/enst/january 21, Multiplicity of technical solutions Saturated models with n mostly empty components Reversible jump MCMC techniques for exploration of most models [Green (1995); Richardson & Green (1997)] Birth and death and other jump processes [Preston (1976); Ripley (1977); Stephens (1999,2000)]

6 Hmm days/enst/january 21, Principles of RJMCMC Births and deaths are proposed with probabilities β k and δ k, respectively, when having k components. Acceptance probability for birth move is min(a, 1), with A = likelihood ratio δ k+1 β k (1 w)k 1 b(w, φ). Acceptance probability for death move is min(a 1, 1).

7 Hmm days/enst/january 21, Principles of BDMCMC New components are born according to a Poisson process with rate λ k when having k components. Each component (w, φ) dies with rate d(w, φ) = likelihood ratio 1 λ k k + 1 b(w, φ) (1 w) k 1.

8 Hmm days/enst/january 21, More general moves Local balance in for Markov jump processes in general: π(θ)q(θ, θ ) = π(θ )q(θ, θ) For birth-death, split-merge moves etc.: π(k)π(θ k k)l(θ k ) λ k b(u k )J 1 = π(k+1)π(θ k+1 k+1)l(θ k+1 ) d(θ k+1, θ k )

9 Hmm days/enst/january 21, Comparison of mixing properties RJMCMC works poorly if A = likelihood ratio δ k+1 β k (1 w)k 1 b(w, φ) is small. If A is small, then d(w, φ) = likelihood ratio 1 λ k k + 1 N 1 k + 1 A 1 is large and BDMCMC works poorly. b(w, φ) (1 w) k 1

10 Hmm days/enst/january 21, RJMCMC BDMCMC Rescaling time In discrete-time RJMCMC, let the time unit be 1/N, put β k = λ k /N and δ k = 1 λ k /N. As N each birth proposal will be accepted, and having k components births occur according to a Poisson process with rate λ k.

11 Hmm days/enst/january 21, As N, a component (w, φ) dies with rate lim Nδ k+1 1 N k + 1 min(a 1, 1) = lim N N 1 k + 1 β k δ k+1 likelihood ratio 1 b(w, φ) (1 w) k 1 = likelihood ratio 1 λ k k + 1 b(w, φ) (1 w) k 1. Hence RJMCMC BDMCMC. This holds more generally.

12 Hmm days/enst/january 21, Inference with varying k Little difference from the fixed k setting Inference must be conditional on k General principle in Bayesian model choice: parameters appearing in different models must be considered as separate entities

13 Hmm days/enst/january 21, Inference on k through posterior probabilities and predictive plots of the regression lines k k k k k k k k k k

14 Hmm days/enst/january 21, Results on a large uniform sample for the beta mixture: p 0 + (1 p 0 ) k i=1 ω i l ω Be(α i ɛ i, α i (1 ɛ i )) l k never estimated as 0 p 0 very small likelihood widely different from 1 curve almost flat

15 Hmm days/enst/january 21, Histogram of scan(datae) Relative Frequency data Histogram of scan(datae) Relative Frequency scan(datae)

16 Hmm days/enst/january 21, Histogram of rafe Histogram of pzero[pzero!= 1] Relative Frequency p Relative Frequency k p0 Histogram of alfa Histogram of epil epsilon Relative Frequency Relative Frequency alpha alpha epsilon Histogram of wei Histogram of poid weights Relative Frequency Relative Frequency probabilities weights probabilities

17 Hmm days/enst/january 21, Extensions to more challenging structures Introduce more advanced models by way of additional latent variables with possible dependence

18 Hmm days/enst/january 21, Hidden Markov models z 1 π z i p zi 1 z i x 1 z 1 N (µ z1, σ 2 z 1 ) x i z i N (µ zi, σ 2 z i ) omega omega Relative Frequency mu instants omega omega mu mu Relative Frequency sigma instants mu mu sigma sigma Relative Frequency omega instants sigma sigma Very similar to normal mixture but for additional structure which improves estimation

19 Hmm days/enst/january 21, Still allows for flat priors π(µ, σ, P ) 1 σ k 1 exp { 1 } (µi+1 2σ 2 µ i ) 2 I σ1 > >σ k [Robert & Titterington (1997)] Gibbs implementation straightforward 1. Generate missing data p(z i = j z i 1, z i+1, θ, p)

20 Hmm days/enst/january 21, Generate parameters p i. D(n i1 + 1,, n ik + 1) µ i N ( ni σ 2 i x i + α i 1 µ i 1 + α i+1 µ i+1 n i σ 2, i + α i 1 + α i+1 ) (n i σ 2 i + α i 1 + α i+1 ) σ 2 i IG ( ni 1 2, n i( x i µ i ) 2 + s 2 i 2 ) I σi 1 <σ i <σ i+1 [Celeux, Diebolt & Robert (1993)]

21 Hmm days/enst/january 21, Non-Gibbsic implementation also possible, without the missing states, thanks to forward-backward formulae Estimation of k possible via reversible jump [Robert, Rydén & Titterington (1999)] and other jump process methods [Cappé, Robert & Rydén (2001)]

22 Hmm days/enst/january 21, Split-merge moves for HMMs Parametrisation: p ij = ω ij / l ω il, Y t X t = i N (µ i, σ 2 i ). Move to split component j into j 1 and j 2 : ω ij1 = ω ij ε i, ω ij2 = ω ij (1 ε i ), ε i U(0, 1); ω j1 j = ω j jξ j, ω j2 j = ω j j/ξ j, ξ j log N (0, 1); similar ideas give ω j1 j 2 etc.; µ j1 = µ j 3σ j ε µ, µ j2 = µ j + 3σ j ε µ, ε µ N (0, 1); σ 2 j 1 = σ 2 j ξ σ, σ 2 j 2 = σ 2 j /ξ σ, ξ σ log N (0, 1). [Split intensity] λ S,k = kλ B [Birth intensity]

23 Hmm days/enst/january 21, Fixed k moves also used Example : Wind intensity in Athens Histogram and rawplot of the dataset

24 Hmm days/enst/january 21, Number of states Relative Frequency number of states temp[, 1] instants Log likelihood values Relative Frequency log likelihood temp[, 2] instants Number of moves Relative Frequency Number of moves temp[, 3] instants MCMC output on k (histogram and rawplot), number of states, and corresponding likelihood values

25 Hmm days/enst/january 21, pi sigma MCMC sequence of the parameters of the three components when conditioning on k = 3

26 Hmm days/enst/january 21, MCMC evaluation of the marginal density, compared with R nonparametric density estimate.

27 Hmm days/enst/january 21, Other latent variable models and hidden structures Hidden semi-markov models Switching ARMA models Stochastic volatility and ARCH models Discretised diffusions

28 Hmm days/enst/january 21, Hidden semi-markov models Example : Ion chanel model [Hobson, 1999; Carpenter et al., 2001] Observables y = (y t ) 1 t T directed by a hidden Gamma process x = (x t ) 1 t T : y t x t N (µ xt, σ 2 ) x t {0, 1} with durations (i = 0, 1) d j = t j+1 t j Ga(s i, λ i ) if x t = i for t j t < t j+1.

29 Hmm days/enst/january 21, y t Complex likelihood structure with no closed form expression

30 Hmm days/enst/january 21, Prior assumptions conjugate normal-gamma prior on the µ s and σ N (θ 0, τσ 2 ) G(ζ, η) 1 conjugate gamma prior on the λ s G(α, β) flat prior on the s s on {1,..., S}

31 Hmm days/enst/january 21, Particle system Generation of a system of particles (ω (j), x (j) ) j (j = 1,..., J) where ω = (µ 0, µ 1, σ, λ 0, λ 1, s 0, s 1 ) based on a proposal/instrumental/importance distribution π(ω y, x) π H (x y, ω)

32 Hmm days/enst/january 21, where π H full conditional of a fitted hidden Markov model with transition matrix IP = 1 λ 0 s 0 λ 0 s 0 λ 1 s 1 1 λ 1 s 1, by analogy with average sojourn times for both models.

33 Hmm days/enst/january 21, Simulation Use of forward backward formulae, of conjugate structure for the µ s, λ s and σ and of finite support for s, distributed as [ ] si i Γ(n i s i + α) s i x π(s i x) (β + v i ) n i Γ(s i ) n i I {1,2,...,S} (s i )

34 Hmm days/enst/january 21, Fitted series with residuals (top) and allocation probabilities (bottom)

35 Hmm days/enst/january 21, µ 1 µ σ (µ 1 µ 2 ) σ λ λ s s s 1 λ s 2 λ Fitted series with residuals (top) and allocation probabilities (bottom)

36 Hmm days/enst/january 21, Iterated particle system Repeated calls to importance sampling with systematic resampling steps to improve fit How many steps? Which improvement? Why bother?!

37 Hmm days/enst/january 21, Algorithm Step 0. Generate (j = 1,..., J) 1. ω (j) π(ω) 2. x (j) = (x (j) t ) 1 t T π H (x y, ω (j) ) and compute the weights (j = 1,..., J) ϱ j π(ω (j), x (j) y) π(ω (j) )π H (x (j) y, ω (j) )

38 Hmm days/enst/january 21, Step i. (i = 1,...) Generate (j = 1,..., J) 1. ω (j) π(ω y, x (j) ) 2. x (j) + = (x (j) t ) 1 t T π H (x y, ω (j) ) compute the weights (j = 1,..., J) ϱ j π(ω (j), x (j) + y) π(ω (j) y, x (j) )π H (x (j) + y, ω (j) ) resample the couples ω (j), x (j) + from the weights ϱ j, and take x (j) = x (j) + (j = 1,..., J).

39 Hmm days/enst/january 21, History and ancestry of the particle system

40 Hmm days/enst/january 21, Solving optimization problems

41 Hmm days/enst/january 21, Role of maximum a posteriori estimation in Bayesian inference θ = (θ 1, θ 2 ) Θ 1 Θ 2 p (θ) especially when posterior means are useless but difficulty with marginal MAP (MMAP) estimates because nuisance parameters must be integrated out θ MMAP 1 = arg Θ1 max p (θ 1 y) where p (θ 1 y) = Θ 2 p (θ 1, θ 2 y) dθ 2

42 Hmm days/enst/january 21, If integration possible in closed-form, use Expectation-Maximization (EM) algorithm [Dempster et al. (1977)] Deterministic algorithm which depends on initialization and is limited to certain classes of models. Stochastic variants like Stochastic EM (SEM) or Monte Carlo EM (MCEM) [Celeux & Diebolt (1985), Wei & Tanner (1991)] Parameter of interest always updated deterministically in the M step

43 Hmm days/enst/january 21, Standard (and Markov chain) Monte Carlo: draw random samples from the joint posterior distribution p (θ 1, θ 2 y) or MCMC (approximate, dependent) sample {( ) } θ (i) 1, θ(i) 2 ; i = 1,..., N and discard nuisance parameters. More suited to integration than to optimization

44 Hmm days/enst/january 21, Simulated annealing (SA) for maximizing p (θ 1 y) Non-homogeneous variant of MCMC for global optimization: invariant distribution at iteration i proportional to p γ(i) (θ 1 y), γ (i) increasing function diverging at infinity. Idea: as γ (i) goes to infinity, p γ(i) (θ 1 y) concentrates itself upon the set of global modes.

45 Hmm days/enst/january 21, State Augmentation for Marginal Estimation [Doucet, Godsill & Robert (2001)] Artificially augmented probability model whose marginal distribution is p γ (θ 1 y) via replications of the nuisance parameters: Replace θ 2 with γ artificial replications, θ 2 (1),..., θ 2 (γ) Treat the θ 2 (j) s as distinct random variables: γ q γ (θ 1, θ 2 (1),..., θ 2 (γ) y) p (θ 1, θ 2 (k) y) k=1

46 Hmm days/enst/january 21, Use corresponding marginal for θ 1 q γ (θ 1 y) = q γ (θ 1, θ 2 (1),..., θ 2 (γ) y) dθ 2 (1)... dθ 2 (γ) γ k=1 = p γ (θ 1 y) p (θ 1, θ 2 (k) y) dθ 2 (1)... dθ 2 (γ) Build a MCMC algorithm in the augmented space, with invariant distribution q γ (θ 1, θ 2 (1),..., θ 2 (γ) y) Use simulated subsequence { θ (i) 1 ; i IN } as drawn from marginal posterior p γ (θ 1 y)

47 Hmm days/enst/january 21, Application to the benchmark galaxy dataset [Roeder (1992)] 82 observations of galaxy velocities from 3 (?) groups Algorithm EM MCEM SAME Mean log-posterior Std dev of log-posterior

Marginal maximum a posteriori estimation using Markov chain Monte Carlo

Statistics and Computing 1: 77 84, 00 C 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Marginal maximum a posteriori estimation using Markov chain Monte Carlo ARNAUD DOUCET, SIMON J. GODSILL