Estimating Latent Variable Graphical Models with Moments and Likelihoods

Size: px
Start display at page:

Download "Estimating Latent Variable Graphical Models with Moments and Likelihoods"

Transcription

1 Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

2 Introduction Latent Variable Graphical Models Gaussian Mixture Models Latent Dirichlet Allocation Hidden Markov Models PCFGs... x2 x 1 h 1 h 2 h 3... x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

3 Introduction Parameter Estimation is Hard log-likelihood parameters Log-likelihood function is non-convex. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

4 Introduction Parameter Estimation is Hard log-likelihood MLE parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

5 Introduction Parameter Estimation is Hard log-likelihood EM MLE EM parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Local methods (EM, gradient descent,... ) are tractable but inconsistent. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

6 Introduction Parameter Estimation is Hard log-likelihood EM MoM MLE EM parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Local methods (EM, gradient descent,... ) are tractable but inconsistent. Method of moments estimators can be consistent and computationally-efficient, but require more data. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

7 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

8 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

9 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

10 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Note: some work in the observable operator framework does apply to a more general model class. Weighted automata: Balle and Mohri Junction trees: Song, Xing, and Parikh TODO: Check that this list is representative Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

11 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Note: some work in the observable operator framework does apply to a more general model class. Weighted automata: Balle and Mohri Junction trees: Song, Xing, and Parikh TODO: Check that this list is representative How can we apply the method of moments to estimate parameters efficiently for a general model? Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

12 Introduction Setup Discrete models, d, k. Assume d > k. Parameters and marginals can be put into a matrix or tensor - introduce notation. Assume infinite data. Highlight directed vs undirected - we focus on directed. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

13 Introduction Background: Three-view Mixture Models Definition (Bottleneck) A hidden variable h is a bottleneck if there exist three observed variables (views) x 1, x 2, x 3 that are conditionally independent given h. h x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

14 Introduction Background: Three-view Mixture Models Definition (Bottleneck) A hidden variable h is a bottleneck if there exist three observed variables (views) x 1, x 2, x 3 that are conditionally independent given h. Anandkumar, Hsu, and Kakade 2012 provide an algorithm to estimate conditional moments P(x i h) based on tensor eigendecomposition. In general, three views are necessary for identifiability (Kruskal 1977). h x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

15 Introduction Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

16 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 h 2 x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

17 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. h 1 h 2 x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

18 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

19 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

20 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. However, we can t learn P(h 2 h 1 ) this way. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

21 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

22 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

23 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Solve for P(h 1, h 2 ) using pseudoinversion. Z 12 = O (1 1) M 12 O (2 1) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

24 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Solve for P(h 1, h 2 ) using pseudoinversion. Z 12 = O (1 1) M 12 O (2 1) P(h 2 h 1 ) can be recovered by normalization. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

25 Introduction Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

26 Estimating Hidden Marginals Exclusive Views Definition (Exclusive views) We say h i S h has an exclusive view x v if 1. There exists some observed variable x v which is conditionally independent of the others S\{h i } given h i. x2 h 2 x1 h 1 h 3 h 4 S x4 2. The conditional moment matrix O (v i) P(x v h i ) has full column rank k and can be recovered. x3 3. TODO: Exclusive views for a clique Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

27 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

28 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z M = Z(O (1 1),, O (m m) ) x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

29 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z M = Z(O (1 1),, O (m m) ) x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Z = M(O (1 1),, O (m m) ). Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

30 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? x1 h 1 x2 h 2 h 4 x4 h 3 S x3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

31 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? Definition (Bottlenecked set) A set of hidden variables S is said to be bottlenecked if each h S is a bottleneck. x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

32 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? Definition (Bottlenecked set) A set of hidden variables S is said to be bottlenecked if each h S is a bottleneck. x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Theorem: A bottlenecked clique has exclusive views. TODO: Say show in paper. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

33 Estimating Hidden Marginals Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

34 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

35 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

36 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

37 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

38 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

39 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

40 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

41 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

42 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

43 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

44 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

45 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

46 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

47 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Halpern and Sontag 2013 Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Noisy Or (non-example) h 1 h 2 x 1 x 2 x 3 x 4 x 5 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

48 Combining moments with likelihood estimators Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

49 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. x a 1 h 1 h 2 x b 1 x a 2 x b 2 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

50 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. If we fix the conditional moments, log P(x) is convex in θ. x a 1 h 1 h 2 x b 1 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 ) P(h 1, h 2 ) }{{} h 1,h 2 known x a 2 x b Composite Likelihood Pseudoinverse Objective Objective π 1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

51 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. If we fix the conditional moments, log P(x) is convex in θ. No closed form solution, but a local method like EM is guaranteed to converge to the global optimum. x a 1 h 1 h 2 x b 1 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 ) P(h 1, h 2 ) }{{} h 1,h 2 known Objective x a 2 x b 2 Composite Likelihood Pseudoinverse Objective π 1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

52 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 )P(x 3 h 3 ) }{{} h 1,h 2,h 3 known P(h 3 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

53 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. log P(x) = log h 1 h 2 h 3... x 1 x 2 x 3 P(x 1 h 1 ) P(x 2 h 2 )P(x 3 h 3 ) }{{} h 1,h 2,h 3 known P(h 3 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

54 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 ) }{{} known P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

55 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. Can be shown that estimation with composite likelihoods is consistent (Lindsay 1988). Asymptotically, the composite likelihood estimator is more efficient. θ ˆθ h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 ) }{{} known P(h 1, h 2 ) Pseudoinverse Composite likelihood ɛ Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

56 Recovering parameters Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

57 Recovering parameters Recovering parameters in directed models Conditional probability tables are the default parameterization for a directed model. Can be recovered by normalization: x a 1 h 1 h 2 x b 1 h2 h x a 2 x b 2 P(h 2 h 1 ) = P(h 1, h 2 ) h 2 P(h 1, h 2 ). Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

58 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). θ h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

59 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). The unsupervised negative log-likelihood is non-convex, θ L unsup(θ) E x D [ log p θ (x, h)]. h 1 h 2 h H x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

60 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). The unsupervised negative log-likelihood is non-convex, L unsup(θ) E x D [ log p θ (x, h)]. h H x a 1 θ h 1 h 2 x b 2 However, the supervised negative log-likelihood is convex, x b 1 x a 2 L sup(θ) E (x,h) Dsup [ log p θ (x, h)] ( ) = θ E (x,h) Dsup [φ(x C, h C )] + A(θ). C G Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

61 Recovering parameters Recovering parameters in undirected log-linear models Recall, the marginals can typically estimated from supervised data. L sup(θ) = θ ( C G E (x,h) Dsup [φ(x C, h C )] ) } {{ } µ C + A(θ). θ h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

62 Recovering parameters Recovering parameters in undirected log-linear models Recall, the marginals can typically estimated from supervised data. L sup(θ) = θ ( C G E (x,h) Dsup [φ(x C, h C )] ) } {{ } µ C + A(θ). θ h 1 h 2 However, the marginals can also be consistently estimated by moments! µ C = P(x C h C ) P(h }{{} C ) φ(x }{{} C, h C ). x C,h C cond. moments hidden marginals x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

63 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 x b 1,2 x b 2,2 x b 3,2 x a 1,2 x a 2,2 x a 3,2 h1,1 h2,1 h3,1 x b 1,1 x b 2,1 x b 3,1 x a 1,1 x a 2,1 x a 3,1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

64 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 We can use pseudolikelihood (besag75pseudo) to consistently estimate distributions over local neighborhoods. ( ) A pseudo (θ; N (a)) log exp θ φ(x N, h N ). x b 1,2 x a 1,2 h1,1 x b 1,1 x a 1,1 x b 2,2 x a 2,2 h2,1 x b 2,1 x a 2,1 x b 3,2 x a 3,2 h3,1 x b 3,1 x a 3,1 a Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

65 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 We can use pseudolikelihood (besag75pseudo) to consistently estimate distributions over local neighborhoods. ( ) A pseudo (θ; N (a)) log exp θ φ(x N, h N ). x b 1,2 x a 1,2 h1,1 x b 1,1 x a 1,1 x b 2,2 x a 2,2 h2,1 x b 2,1 x a 2,1 x b 3,2 x a 3,2 h3,1 x b 3,1 x a 3,1 a Clique marginals not sufficient statistics, but we can still estimate them. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

66 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x a 1 x b 1 h 1 h 2 h 3 x a 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

67 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x1 a x1 b h 1 h 2 h 3 x a 3 Efficiently learns models with high-treewidth. x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

68 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x1 a x1 b h 1 h 2 h 3 x a 3 Efficiently learns models with high-treewidth. x b 2 h 4 x b 3 Combine moment estimators with composite likelihood estimators. x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

69 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. Efficiently learns models with high-treewidth. Combine moment estimators with composite likelihood estimators. Extends to log-linear models. x a 2 x b 2 x1 a x1 b h 1 h 2 h 3 h 4 x4 a x4 b x a 3 x b 3 Allows for easy regularization, missing data, Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

70 Conclusions Thank you! Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

Estimating Latent-Variable Graphical Models using Moments and Likelihoods

Estimating Latent-Variable Graphical Models using Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University, Stanford, CA, USA CHAGANTY@CS.STANFORD.EDU PLIANG@CS.STANFORD.EDU Abstract Recent work on the method of moments enable consistent parameter estimation,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015 Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS

LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS Submitted to the Annals of Statistics arxiv: stat.ml/1203.0697 LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS By Animashree Anandkumar,, Daniel Hsu, Furong Huang,, and Sham Kakade Univ. of California

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Spectral Probabilistic Modeling and Applications to Natural Language Processing School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Spectral Learning of Hidden Markov Models with Group Persistence

Spectral Learning of Hidden Markov Models with Group Persistence 000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments

More information

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Conditional Independence and Factorization

Conditional Independence and Factorization Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department

More information

Spectral Learning of Mixture of Hidden Markov Models

Spectral Learning of Mixture of Hidden Markov Models Spectral Learning of Mixture of Hidden Markov Models Y Cem Sübakan, Johannes raa, Paris Smaragdis,, Department of Computer Science, University of Illinois at Urbana-Champaign Department of Electrical and

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Tensor Factorization via Matrix Factorization

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Department of Computer Science Stanford University Stanford, CA 94305 Abstract Tensor factorization arises in many machine learning applications, such

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Collapsed Variational Inference for Sum-Product Networks

Collapsed Variational Inference for Sum-Product Networks for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Total positivity in Markov structures

Total positivity in Markov structures 1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Spectral Learning of Predictive State Representations with Insufficient Statistics

Spectral Learning of Predictive State Representations with Insufficient Statistics Spectral Learning of Predictive State Representations with Insufficient Statistics Alex Kulesza and Nan Jiang and Satinder Singh Computer Science & Engineering University of Michigan Ann Arbor, MI, USA

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Joint Optimization of Segmentation and Appearance Models

Joint Optimization of Segmentation and Appearance Models Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, 2013 1 / 19 Overview 1 Recap: Image Segmentation 2 Optimization

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Efficient Spectral Methods for Learning Mixture Models!

Efficient Spectral Methods for Learning Mixture Models! Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.

More information

Parameter learning in CRF s

Parameter learning in CRF s Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information