Hilbert Space Embeddings of Hidden Markov Models

Size: px

Start display at page:

Download "Hilbert Space Embeddings of Hidden Markov Models"

Georgia Tucker
6 years ago
Views:

1 Hilbert Space Embeddings of Hidden Markov Models Le Song Carnegie Mellon University Joint work with Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1

2 Big Picture QuesJon Graphical Models! Dependent variables! Hidden variables! High dimensional! Nonlinear! MulJmodal Kernel Methods! High dimensional! Nonlinear! MulJmodal! Dependent variables! Hidden variables Graphical models and kernel methods complement each other 2

3 Big Picture QuesJon! High dimensional! Nonlinear! MulJmodal! Dependent variables! Hidden variables Kernel Graphical Models Combine the best of graphical models and kernel methods? Kenji Fukumizu, Carlos Guestrin, Arthur GreQon, Jonathan Huang, Eric Xing 3

4 Hidden Markov Models (HMMs) Video sequence Music High- dimensional features Hidden variables Unsupervised learning 4

5 NotaJon Prior TransiJon ObservaJon 5

6 Previous Work on HMMs ExpectaJon maximizajon [Dempster et al. 77]: Maximum likelihood solujon Interested in predicjon, not hidden states! Local maxima! Curse of dimensionality Singular value decomposijon (SVD) for surrogate hidden states! No local opjma! Consistent Spectral HMMs [Hsu et al. 09, Siddiqi et al. 10], Subspace IdenJficaJon [Van Overschee and De Moor 96] 6

7 PredicJve DistribuJons of HMMs Input output Variable eliminajon: Observable Operator [Jaeger 00] = 7

8 PredicJve DistribuJons of HMMs Input output Variable eliminajon (matrix representajon): 8

9 Observable representajon of HMM Key observajon: need not recover : Only need to esjmate O, Ax and π up to inverjble transformajon S where are singular vectors of joint probability of sequence pairs [Hsu et al. 09] 9

10 Observable representajon for HMM sequence pairs triplets singletons Thin SVD of C2,1, get principal lef singular vectors U 10

11 Observable representajon for HMMs EsJmable from observajon sequence! pairs A thin SVD of triplets C2,1 to get U singletons Works only for discrete case 11

12 Key Objects in Graphical Models Marginal distribujons Joint distribujons CondiJonal distribujons Sum rule Use kernel representajon for distribujons, do probabilisjc inference in feature space Product rule 12

13 Embedding distribujons Summary stajsjcs for distribujons : Mean Covariance Probability P(y0) expected features Pick a kernel, and generate a different summary stajsjc 13

14 Embedding distribujons One- to- one mapping from to for certain kernels (RBF kernel) Sample average converges to true mean at 14

15 Embedding joint distribujons Embedding joint distribujons outer- product feature map using is also the covariance operator Recover discrete probability with delta kernel Empirical esjmate converges at 15

16 Embedding CondiJonals For each value X=x condijoned on, return the summary stajsjc for Some X=x are never observed 16

17 Embedding condijonals avoid data par33on Condi3onal Embedding Operator 17

18 CondiJonal Embedding Operator EsJmaJon via covariance operators [Song et al. 09] Gaussian case: covariance matrix instead Discrete case: joint over marginal Empirical esjmate converges at 18

19 Sum and Product Rules ProbabilisJc RelaJon Hilbert Space RelaJon Sum Rule Product Rule Total ExpectaJon CondiJonal Embedding Linearity 19

20 Hilbert Space HMMs 20

21 Observable RepresentaJon for HMMs pairs triplets singletons 21

22 Hilbert space HMMs Use Hilbert space representajon for distribujons, do probabilisjc inference in feature space pairs triplets singletons 22

23 PredicJng Video Sequences 23

24 PredicJng Video Sequences Sequence of grey scale images as inputs Latent space dimension 50 24

25 PredicJng Sensor Time- series InerJal unit: 3D accelerajon and orientajon Latent space dimension 20 25

26 Audio Event ClassificaJon Mel- Frequency Cepstral Coefficients features Varying latent space dimension 26

27 Summary Represent distribujons in feature spaces, reason using Hilbert space sum and product rules Extends HMMs nonparametrically to domains with kernels Kernelize belief propagajon and general graphical models with hidden variables? 27

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume