Estimating Latent Variable Graphical Models with Moments and Likelihoods
|
|
- Andrea Wilkerson
- 5 years ago
- Views:
Transcription
1 Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
2 Introduction Latent Variable Graphical Models Gaussian Mixture Models Latent Dirichlet Allocation Hidden Markov Models PCFGs... x2 x 1 h 1 h 2 h 3... x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
3 Introduction Parameter Estimation is Hard log-likelihood parameters Log-likelihood function is non-convex. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
4 Introduction Parameter Estimation is Hard log-likelihood MLE parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
5 Introduction Parameter Estimation is Hard log-likelihood EM MLE EM parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Local methods (EM, gradient descent,... ) are tractable but inconsistent. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
6 Introduction Parameter Estimation is Hard log-likelihood EM MoM MLE EM parameters Log-likelihood function is non-convex. MLE is consistent but intractable. Local methods (EM, gradient descent,... ) are tractable but inconsistent. Method of moments estimators can be consistent and computationally-efficient, but require more data. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
7 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
8 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
9 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
10 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Note: some work in the observable operator framework does apply to a more general model class. Weighted automata: Balle and Mohri Junction trees: Song, Xing, and Parikh TODO: Check that this list is representative Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
11 Introduction Consistent estimation for general models Several estimators based on the method of moments. Phylogenetic trees: Mossel and Roch Hidden Markov models: Hsu, Kakade, and Zhang 2009 Latent Dirichlet Allocation: Anandkumar et al Latent trees: Anandkumar et al PCFGs: Hsu, Kakade, and Liang 2012 Mixtures of linear regressors chaganty13regression... These estimators are applicable only to a specific type of model. In contrast, EM and gradient descent apply for almost any model. Note: some work in the observable operator framework does apply to a more general model class. Weighted automata: Balle and Mohri Junction trees: Song, Xing, and Parikh TODO: Check that this list is representative How can we apply the method of moments to estimate parameters efficiently for a general model? Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
12 Introduction Setup Discrete models, d, k. Assume d > k. Parameters and marginals can be put into a matrix or tensor - introduce notation. Assume infinite data. Highlight directed vs undirected - we focus on directed. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
13 Introduction Background: Three-view Mixture Models Definition (Bottleneck) A hidden variable h is a bottleneck if there exist three observed variables (views) x 1, x 2, x 3 that are conditionally independent given h. h x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
14 Introduction Background: Three-view Mixture Models Definition (Bottleneck) A hidden variable h is a bottleneck if there exist three observed variables (views) x 1, x 2, x 3 that are conditionally independent given h. Anandkumar, Hsu, and Kakade 2012 provide an algorithm to estimate conditional moments P(x i h) based on tensor eigendecomposition. In general, three views are necessary for identifiability (Kruskal 1977). h x 1 x 2 x 3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
15 Introduction Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
16 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 h 2 x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
17 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. h 1 h 2 x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
18 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
19 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
20 Introduction Example: a bridge, take I Each edge has a set of parameters. h 1 and h 2 are bottlenecks. We can learn P(x a 1 h 1), P(x b 1 h 1),. However, we can t learn P(h 2 h 1 ) this way. x a 1 h 1 h 2 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
21 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
22 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
23 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Solve for P(h 1, h 2 ) using pseudoinversion. Z 12 = O (1 1) M 12 O (2 1) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
24 Introduction Example: a bridge, take II Observe the joint distribution, TODO: Use cartoon matrices P(x b 1, x a 2 ) }{{} M 12 = P(x1 b h a 1) P(x2 }{{} h 2) P(h }{{} 1, h 2 ) }{{} h 1,h 2 O (1 1) O (2 2) Z 12. Observed moments P(x1 b, x 2 a ) are linear in the hidden marginals P(h 1, h 2 ). h 1 h 2 M 12 = O (1 1) Z 12 O (2 1) x a 1 x b 1 x a 2 x b 2 Solve for P(h 1, h 2 ) using pseudoinversion. Z 12 = O (1 1) M 12 O (2 1) P(h 2 h 1 ) can be recovered by normalization. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
25 Introduction Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
26 Estimating Hidden Marginals Exclusive Views Definition (Exclusive views) We say h i S h has an exclusive view x v if 1. There exists some observed variable x v which is conditionally independent of the others S\{h i } given h i. x2 h 2 x1 h 1 h 3 h 4 S x4 2. The conditional moment matrix O (v i) P(x v h i ) has full column rank k and can be recovered. x3 3. TODO: Exclusive views for a clique Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
27 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
28 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z M = Z(O (1 1),, O (m m) ) x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
29 Estimating Hidden Marginals Exclusive views give parameters Given exclusive views, P(x h), learning cliques is solving a linear equation! TODO: Use cartoon tensors P(x 1,..., x m ) = P(x 1 h 1 ) }{{}}{{} h M 1,...,h m O (1 1) P(h 1,, h m ) }{{} Z M = Z(O (1 1),, O (m m) ) x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Z = M(O (1 1),, O (m m) ). Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
30 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? x1 h 1 x2 h 2 h 4 x4 h 3 S x3 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
31 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? Definition (Bottlenecked set) A set of hidden variables S is said to be bottlenecked if each h S is a bottleneck. x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
32 Estimating Hidden Marginals Bottlenecked graphs When are we assured exclusive views? Definition (Bottlenecked set) A set of hidden variables S is said to be bottlenecked if each h S is a bottleneck. x2 h 2 x1 h 1 h 3 x3 h 4 S x4 Theorem: A bottlenecked clique has exclusive views. TODO: Say show in paper. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
33 Estimating Hidden Marginals Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
34 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
35 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
36 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
37 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
38 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
39 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
40 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
41 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
42 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
43 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
44 Estimating Hidden Marginals Example x a 1 x b 1 x a 2 h 1 x a 3 h 2 h 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
45 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
46 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
47 Estimating Hidden Marginals More Bottlenecked Examples Hidden Markov models h 1 h 2 h 3... Halpern and Sontag 2013 Latent Tree models h 1 x 1 x 2 x 3 h 2 h 3 h 4 x a 2 x b 2 x a 3 x b 3 x a 4 x b 4 Noisy Or (non-example) h 1 h 2 x 1 x 2 x 3 x 4 x 5 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
48 Combining moments with likelihood estimators Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
49 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. x a 1 h 1 h 2 x b 1 x a 2 x b 2 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
50 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. If we fix the conditional moments, log P(x) is convex in θ. x a 1 h 1 h 2 x b 1 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 ) P(h 1, h 2 ) }{{} h 1,h 2 known x a 2 x b Composite Likelihood Pseudoinverse Objective Objective π 1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
51 Combining moments with likelihood estimators Convex marginal likelihoods The MLE is statistically most efficient, but usually non-convex. If we fix the conditional moments, log P(x) is convex in θ. No closed form solution, but a local method like EM is guaranteed to converge to the global optimum. x a 1 h 1 h 2 x b 1 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 ) P(h 1, h 2 ) }{{} h 1,h 2 known Objective x a 2 x b 2 Composite Likelihood Pseudoinverse Objective π 1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
52 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log P(x 1 h 1 ) P(x 2 h 2 )P(x 3 h 3 ) }{{} h 1,h 2,h 3 known P(h 3 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
53 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. log P(x) = log h 1 h 2 h 3... x 1 x 2 x 3 P(x 1 h 1 ) P(x 2 h 2 )P(x 3 h 3 ) }{{} h 1,h 2,h 3 known P(h 3 h 2 )P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
54 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 ) }{{} known P(h 1, h 2 ) Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
55 Combining moments with likelihood estimators Composite likelihoods In general, the full likelihood is still non-convex. TODO: Specify which x. Consider composite likelihood on a subset of observed variables. Can be shown that estimation with composite likelihoods is consistent (Lindsay 1988). Asymptotically, the composite likelihood estimator is more efficient. θ ˆθ h 1 h 2 h 3... x 1 x 2 x 3 log P(x) = log h 1,h 2 P(x 1 h 1 ) P(x 2 h 2 ) }{{} known P(h 1, h 2 ) Pseudoinverse Composite likelihood ɛ Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
56 Recovering parameters Outline TODO: Make outline a diagram Introduction Estimating Hidden Marginals Combining moments with likelihood estimators Recovering parameters Conclusions Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
57 Recovering parameters Recovering parameters in directed models Conditional probability tables are the default parameterization for a directed model. Can be recovered by normalization: x a 1 h 1 h 2 x b 1 h2 h x a 2 x b 2 P(h 2 h 1 ) = P(h 1, h 2 ) h 2 P(h 1, h 2 ). Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
58 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). θ h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
59 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). The unsupervised negative log-likelihood is non-convex, θ L unsup(θ) E x D [ log p θ (x, h)]. h 1 h 2 h H x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
60 Recovering parameters Recovering parameters in undirected log-linear models Assume a log-linear parameterization, TODO: use sum over cliques - talk through. p θ (x, h) = exp ( θ φ(x, h) A(θ) ). The unsupervised negative log-likelihood is non-convex, L unsup(θ) E x D [ log p θ (x, h)]. h H x a 1 θ h 1 h 2 x b 2 However, the supervised negative log-likelihood is convex, x b 1 x a 2 L sup(θ) E (x,h) Dsup [ log p θ (x, h)] ( ) = θ E (x,h) Dsup [φ(x C, h C )] + A(θ). C G Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
61 Recovering parameters Recovering parameters in undirected log-linear models Recall, the marginals can typically estimated from supervised data. L sup(θ) = θ ( C G E (x,h) Dsup [φ(x C, h C )] ) } {{ } µ C + A(θ). θ h 1 h 2 x a 1 x b 2 x b 1 x a 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
62 Recovering parameters Recovering parameters in undirected log-linear models Recall, the marginals can typically estimated from supervised data. L sup(θ) = θ ( C G E (x,h) Dsup [φ(x C, h C )] ) } {{ } µ C + A(θ). θ h 1 h 2 However, the marginals can also be consistently estimated by moments! µ C = P(x C h C ) P(h }{{} C ) φ(x }{{} C, h C ). x C,h C cond. moments hidden marginals x a 1 x b 1 x a 2 x b 2 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
63 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 x b 1,2 x b 2,2 x b 3,2 x a 1,2 x a 2,2 x a 3,2 h1,1 h2,1 h3,1 x b 1,1 x b 2,1 x b 3,1 x a 1,1 x a 2,1 x a 3,1 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
64 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 We can use pseudolikelihood (besag75pseudo) to consistently estimate distributions over local neighborhoods. ( ) A pseudo (θ; N (a)) log exp θ φ(x N, h N ). x b 1,2 x a 1,2 h1,1 x b 1,1 x a 1,1 x b 2,2 x a 2,2 h2,1 x b 2,1 x a 2,1 x b 3,2 x a 3,2 h3,1 x b 3,1 x a 3,1 a Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
65 Recovering parameters Optimizing pseudolikelihood Estimating marginals µ C is independent of treewidth, but computing the normalization constant is: TODO: convex but not easy ( ) A(θ) log exp θ φ(x, h). x b 1,3 h1,3 x b 2,3 h2,3 x b 3,3 h3,3 x,h x a 1,3 x a 2,3 x a 3,3 h1,2 h2,2 h3,2 We can use pseudolikelihood (besag75pseudo) to consistently estimate distributions over local neighborhoods. ( ) A pseudo (θ; N (a)) log exp θ φ(x N, h N ). x b 1,2 x a 1,2 h1,1 x b 1,1 x a 1,1 x b 2,2 x a 2,2 h2,1 x b 2,1 x a 2,1 x b 3,2 x a 3,2 h3,1 x b 3,1 x a 3,1 a Clique marginals not sufficient statistics, but we can still estimate them. Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
66 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x a 1 x b 1 h 1 h 2 h 3 x a 3 x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
67 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x1 a x1 b h 1 h 2 h 3 x a 3 Efficiently learns models with high-treewidth. x b 2 h 4 x b 3 x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
68 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. x a 2 x1 a x1 b h 1 h 2 h 3 x a 3 Efficiently learns models with high-treewidth. x b 2 h 4 x b 3 Combine moment estimators with composite likelihood estimators. x a 4 x b 4 Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
69 Conclusions Conclusions TODO: Use outline slide. TODO: Show the venn diagram on progress on generality.. An algorithm for any (non-degenerate) bottlenecked discrete graphical models. Efficiently learns models with high-treewidth. Combine moment estimators with composite likelihood estimators. Extends to log-linear models. x a 2 x b 2 x1 a x1 b h 1 h 2 h 3 h 4 x4 a x4 b x a 3 x b 3 Allows for easy regularization, missing data, Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
70 Conclusions Thank you! Chaganty, Liang (Stanford University) Moments and Likelihoods June 18, / 26
Estimating Latent-Variable Graphical Models using Moments and Likelihoods
Arun Tejasvi Chaganty Percy Liang Stanford University, Stanford, CA, USA CHAGANTY@CS.STANFORD.EDU PLIANG@CS.STANFORD.EDU Abstract Recent work on the method of moments enable consistent parameter estimation,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationVolodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015
Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationLEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS
Submitted to the Annals of Statistics arxiv: stat.ml/1203.0697 LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS By Animashree Anandkumar,, Daniel Hsu, Furong Huang,, and Sham Kakade Univ. of California
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationSpectral Probabilistic Modeling and Applications to Natural Language Processing
School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationSpectral Learning of Hidden Markov Models with Group Persistence
000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments
More informationSupplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics
Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationLearning Tractable Graphical Models: Latent Trees and Tree Mixtures
Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationSupplemental for Spectral Algorithm For Latent Tree Graphical Models
Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationConditional Independence and Factorization
Conditional Independence and Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationMETHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS
METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department
More informationSpectral Learning of Mixture of Hidden Markov Models
Spectral Learning of Mixture of Hidden Markov Models Y Cem Sübakan, Johannes raa, Paris Smaragdis,, Department of Computer Science, University of Illinois at Urbana-Champaign Department of Electrical and
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationTensor Factorization via Matrix Factorization
Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Department of Computer Science Stanford University Stanford, CA 94305 Abstract Tensor factorization arises in many machine learning applications, such
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationCollapsed Variational Inference for Sum-Product Networks
for Sum-Product Networks Han Zhao 1, Tameem Adel 2, Geoff Gordon 1, Brandon Amos 1 Presented by: Han Zhao Carnegie Mellon University 1, University of Amsterdam 2 June. 20th, 2016 1 / 26 Outline Background
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationTotal positivity in Markov structures
1 based on joint work with Shaun Fallat, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik (arxiv:1510.01290) Faculty of Science Total positivity in Markov structures Steffen Lauritzen
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationSpectral Learning of Predictive State Representations with Insufficient Statistics
Spectral Learning of Predictive State Representations with Insufficient Statistics Alex Kulesza and Nan Jiang and Satinder Singh Computer Science & Engineering University of Michigan Ann Arbor, MI, USA
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationJoint Optimization of Segmentation and Appearance Models
Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, 2013 1 / 19 Overview 1 Recap: Image Segmentation 2 Optimization
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationEfficient Spectral Methods for Learning Mixture Models!
Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More information