Estimating Latent Variable Graphical Models with Moments and Likelihoods

Size: px

Start display at page:

Download "Estimating Latent Variable Graphical Models with Moments and Likelihoods"

Andrea Wilkerson
5 years ago
Views:

1 Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

2 Introduction Latent Variable Graphical Models Gaussian Mixture Models Latent Dirichlet Allocation Hidden Markov Models PCFGs... x2 x 1 h 1 h 2 h 3... x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

3 Introduction Parameter Estimation is Hard log-likelihood parameters Log-likelihood function is non-convex. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

4 Introduction Parameter Estimation is Hard log-likelihood MLE parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

5 Introduction Parameter Estimation is Hard log-likelihood EM MLE EM parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Local methods (EM, gradient descent,... ) are tractable but inconsistent. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

6 Introduction Parameter Estimation is Hard log-likelihood EM MoM MLE EM parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Local methods (EM, gradient descent,... ) are tractable but inconsistent. Method of moments estimators can be consistent and computationally-efficient, but require more data. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

7 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

8 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

9 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

10 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Note: some work in the observable operator framework does apply to a more general model class. Weighted automata: Balle and Mohri Junction trees: Song, Xing, and Parikh TODO: Check that this list is representative Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

11 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Note: some work in the observable operator framework does apply to a more general model class. Weighted automata: Balle and Mohri Junction trees: Song, Xing, and Parikh TODO: Check that this list is representative How can we apply the method of moments to estimate parameters efficiently for a general model? Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

12 Introduction Setup Discrete models, d, k. Assume d > k. Parameters and marginals can be put into a matrix or tensor - introduce notation. Assume infinite data. Highlight directed vs undirected - we focus on directed. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

13 Introduction Background: Three-view Mixture Models Definition (Bottleneck) A hidden variable h is a bottleneck if there exist three observed variables (views) x 1, x 2, x 3 that are conditionally independent given h. h x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

14 Introduction Background: Three-view Mixture Models Definition (Bottleneck) A hidden variable h is a bottleneck if there exist three observed variables (views) x 1, x 2, x 3 that are conditionally independent given h. Anandkumar, Hsu, and Kakade 2012 provide an algorithm to estimate conditional moments P(x i h) based on tensor eigendecomposition. In general, three views are necessary for identifiability (Kruskal 1977). h x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

15 Introduction Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

16 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 h 2 x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

17 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. h 1 h 2 x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

18 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

19 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

20 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. However, we can t learn P(h 2 h 1 ) this way. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

21 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

22 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

23 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Solve for P(h 1, h 2 ) using pseudoinversion. Z 12 = O (1 1) M 12 O (2 1) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

24 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Solve for P(h 1, h 2 ) using pseudoinversion. Z 12 = O (1 1) M 12 O (2 1) P(h 2 h 1 ) can be recovered by normalization. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

25 Introduction Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

26 Estimating Hidden Marginals Exclusive Views Definition (Exclusive views) We say h i S h has an exclusive view x v if 1. There exists some observed variable x v which is conditionally independent of the others S\{h i } given h i. x2 h 2 x1 h 1 h 3 h 4 S x4 2. The conditional moment matrix O (v i) P(x v h i ) has full column rank k and can be recovered. x3 3. TODO: Exclusive views for a clique Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

27 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

28 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z M = Z(O (1 1),, O (m m) ) x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

29 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z M = Z(O (1 1),, O (m m) ) x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Z = M(O (1 1),, O (m m) ). Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

30 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? x1 h 1 x2 h 2 h 4 x4 h 3 S x3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

31 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? Definition (Bottlenecked set) A set of hidden variables S is said to be bottlenecked if each h S is a bottleneck. x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

32 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? Definition (Bottlenecked set) A set of hidden variables S is said to be bottlenecked if each h S is a bottleneck. x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Theorem: A bottlenecked clique has exclusive views. TODO: Say show in paper. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

33 Estimating Hidden Marginals Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

34 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

35 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

36 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

37 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

38 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

39 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

40 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

41 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

42 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

43 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

44 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

45 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

46 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

47 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Halpern and Sontag 2013 Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Noisy Or (non-example) h 1 h 2 x 1 x 2 x 3 x 4 x 5 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

48 Combining moments with likelihood estimators Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

49 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. x a 1 h 1 h 2 x b 1 x a 2 x b 2 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

50 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. If we fix the conditional moments, log P(x) is convex in θ. x a 1 h 1 h 2 x b 1 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 ) P(h 1, h 2 ) }{{} h 1,h 2 known x a 2 x b Composite Likelihood Pseudoinverse Objective Objective π 1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

51 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. If we fix the conditional moments, log P(x) is convex in θ. No closed form solution, but a local method like EM is guaranteed to converge to the global optimum. x a 1 h 1 h 2 x b 1 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 ) P(h 1, h 2 ) }{{} h 1,h 2 known Objective x a 2 x b 2 Composite Likelihood Pseudoinverse Objective π 1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

52 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 )P(x 3 h 3 ) }{{} h 1,h 2,h 3 known P(h 3 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

53 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. log P(x) = log h 1 h 2 h 3... x 1 x 2 x 3 P(x 1 h 1 ) P(x 2 h 2 )P(x 3 h 3 ) }{{} h 1,h 2,h 3 known P(h 3 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

54 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 ) }{{} known P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

55 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. Can be shown that estimation with composite likelihoods is consistent (Lindsay 1988). Asymptotically, the composite likelihood estimator is more efficient. θ ˆθ h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 ) }{{} known P(h 1, h 2 ) Pseudoinverse Composite likelihood ɛ Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

56 Recovering parameters Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

57 Recovering parameters Recovering parameters in directed models Conditional probability tables are the default parameterization for a directed model. Can be recovered by normalization: x a 1 h 1 h 2 x b 1 h2 h x a 2 x b 2 P(h 2 h 1 ) = P(h 1, h 2 ) h 2 P(h 1, h 2 ). Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

58 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). θ h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

59 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). The unsupervised negative log-likelihood is non-convex, θ L unsup(θ) E x D [ log p θ (x, h)]. h 1 h 2 h H x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

60 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). The unsupervised negative log-likelihood is non-convex, L unsup(θ) E x D [ log p θ (x, h)]. h H x a 1 θ h 1 h 2 x b 2 However, the supervised negative log-likelihood is convex, x b 1 x a 2 L sup(θ) E (x,h) Dsup [ log p θ (x, h)] ( ) = θ E (x,h) Dsup [φ(x C, h C )] + A(θ). C G Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

61 Recovering parameters Recovering parameters in undirected log-linear models Recall, the marginals can typically estimated from supervised data. L sup(θ) = θ ( C G E (x,h) Dsup [φ(x C, h C )] ) } {{ } µ C + A(θ). θ h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

62 Recovering parameters Recovering parameters in undirected log-linear models Recall, the marginals can typically estimated from supervised data. L sup(θ) = θ ( C G E (x,h) Dsup [φ(x C, h C )] ) } {{ } µ C + A(θ). θ h 1 h 2 However, the marginals can also be consistently estimated by moments! µ C = P(x C h C ) P(h }{{} C ) φ(x }{{} C, h C ). x C,h C cond. moments hidden marginals x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

63 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 x b 1,2 x b 2,2 x b 3,2 x a 1,2 x a 2,2 x a 3,2 h1,1 h2,1 h3,1 x b 1,1 x b 2,1 x b 3,1 x a 1,1 x a 2,1 x a 3,1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

64 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 We can use pseudolikelihood (besag75pseudo) to consistently estimate distributions over local neighborhoods. ( ) A pseudo (θ; N (a)) log exp θ φ(x N, h N ). x b 1,2 x a 1,2 h1,1 x b 1,1 x a 1,1 x b 2,2 x a 2,2 h2,1 x b 2,1 x a 2,1 x b 3,2 x a 3,2 h3,1 x b 3,1 x a 3,1 a Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

65 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 We can use pseudolikelihood (besag75pseudo) to consistently estimate distributions over local neighborhoods. ( ) A pseudo (θ; N (a)) log exp θ φ(x N, h N ). x b 1,2 x a 1,2 h1,1 x b 1,1 x a 1,1 x b 2,2 x a 2,2 h2,1 x b 2,1 x a 2,1 x b 3,2 x a 3,2 h3,1 x b 3,1 x a 3,1 a Clique marginals not sufficient statistics, but we can still estimate them. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

66 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x a 1 x b 1 h 1 h 2 h 3 x a 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

67 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x1 a x1 b h 1 h 2 h 3 x a 3 Efficiently learns models with high-treewidth. x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

68 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x1 a x1 b h 1 h 2 h 3 x a 3 Efficiently learns models with high-treewidth. x b 2 h 4 x b 3 Combine moment estimators with composite likelihood estimators. x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

69 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. Efficiently learns models with high-treewidth. Combine moment estimators with composite likelihood estimators. Extends to log-linear models. x a 2 x b 2 x1 a x1 b h 1 h 2 h 3 h 4 x4 a x4 b x a 3 x b 3 Allows for easy regularization, missing data, Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

70 Conclusions Thank you! Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26

Estimating Latent-Variable Graphical Models using Moments and Likelihoods

Estimating Latent-Variable Graphical Models using Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University, Stanford, CA, USA CHAGANTY@CS.STANFORD.EDU PLIANG@CS.STANFORD.EDU Abstract Recent work on the method of moments enable consistent parameter estimation,