Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints

Size: px
Start display at page:

Download "Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints"

Transcription

1 Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.

2 Latent Variable Modeling Goal: Discover hidden effects from observed measurements Topic Models Observations: words. Hidden: topics. In One Day, 11,000 Flee Syria as War and Hardship Worsen By RICK GLADSTONE and NEIL Hurricane Exposed Flaws in Protection of MacFARQUHAR Tunnels The United Nations reported that 11,000 By ELISABETH ROSENTHAL Syrians fled on Friday, the vast majority of Nearly two weeks after Hurricane Sandy them clambering for safety over the Turkish struck, the vital arteries that bring cars, trucks Nursing Home Is Faulted Over Care After border. and subways into New York City s Storm transportation network have recovered, with By MICHAEL POWELL and SHERI FINK one major exception: the Brooklyn-Battery Amid the worst hurricane to hit New York City Obama to Insist on Tax Increase for the Tunnel remains closed. in nearly 80 years, officials have claimed that Wealthy the Promenade Rehabilitation and Health By HELENE COOPER and JONATHAN Behind New York Gas Lines, Warnings and Care Center failed to provide the most basic WEISMAN Crossed Fingers care to its patients. Amid talk of compromise, President Obama By DAVID W. CHEN, WINNIE HU and and Speaker John A. Boehner both indicated CLIFFORD KRAUSS unchanged stances on this issue, long a point The return of 1970s-era gas lines to the five of contention. boroughs of New York City was not the result of a single miscalculation, but a combination of ignored warnings and indecisiveness. Modeling communities in social networks, modeling gene regulation...

3 Challenges in Learning Topic Models Learning Topic Models Using Word Observations Challenges in Identifiability When can topics be identified? Conditions on the model parameter, e.g. on topic-word matrix Φ and on topic proportions distributions (h)? Does identifiability also lead to tractable algorithms?

4 Challenges in Learning Topic Models Learning Topic Models Using Word Observations Challenges in Identifiability When can topics be identified? Conditions on the model parameter, e.g. on topic-word matrix Φ and on topic proportions distributions (h)? Does identifiability also lead to tractable algorithms? Challenges in Design of Learning Algorithms Maximum likelihood learning of topic models NP-hard (Arora et. al.) In practice, heuristics such as Gibbs sampling, variation Bayes etc. Guaranteed learning with minimal assumptions? Efficient methods? Low sample and computational complexities?

5 Challenges in Learning Topic Models Learning Topic Models Using Word Observations Challenges in Identifiability When can topics be identified? Conditions on the model parameter, e.g. on topic-word matrix Φ and on topic proportions distributions (h)? Does identifiability also lead to tractable algorithms? Challenges in Design of Learning Algorithms Maximum likelihood learning of topic models NP-hard (Arora et. al.) In practice, heuristics such as Gibbs sampling, variation Bayes etc. Guaranteed learning with minimal assumptions? Efficient methods? Low sample and computational complexities? Moment-based approach: learning using low order observed moments

6 Probabilistic Topic Models Useful abstraction for automatic categorization of documents Observed: words. Hidden: topics. Bag of words: order of words does not matter Graphical model representation l words in a document x 1,...,x l. h: proportions of topics in a document. Word x i generated from topic y i. Exchangeability: x 1 x 2... h Φ(i,j) := P[x m = i y m = j] : topic-word matrix. Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words

7 Formulation as Linear Models Distribution of the topic proportions vector h If there are k topics, distribution over the simplex k 1 k 1 := {h R k,h i [0,1], h i = 1}. i Distribution of the words x 1,x 2,... Order n words in vocabulary. If x 1 is j th word, assign e j R n Distribution of each x i : supported on vertices of n 1. Properties Linear Model: E[x i h] = Φh. Multiview model: h is fixed and multiple words (x i ) are generated.

8 Geometric Picture for Topic Models Topic proportions vector (h) Document

9 Geometric Picture for Topic Models Single topic (h)

10 Geometric Picture for Topic Models Topic proportions vector (h)

11 Geometric Picture for Topic Models Topic proportions vector (h) Φ Φ Φ x 2 x 1 x 3 Word generation (x 1,x 2,...)

12 Geometric Picture for Topic Models Topic proportions vector (h) Φ Φ Φ x 2 x 1 x 3 Word generation (x 1,x 2,...) Moment-based estimation: co-occurrences of words in documents

13 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

14 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5

15 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 k M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ = λ r φ r φ r r=1

16 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ = Similarly Triples Tensor M 3 k λ r φ r φ r r=1 M 3 := E(x 1 x 2 x 3 ) = r λ r φ r φ r φ r

17 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ = Similarly Triples Tensor M 3 k λ r φ r φ r r=1 M 3 := E(x 1 x 2 x 3 ) = r λ r φ r φ r φ r Matrix and Tensor Forms: φ r := r th column of Φ. k k M 2 = λ r φ r φ r. M 3 = λ r φ r φ r φ r r=1 r=1

18 Multi-linear Transformation For a tensor T, define (for matrices V i of appropriate dimensions) [T(V 1,V 2,V 3 )] i1,i 2,i 3 := (T) j1,j 2,j 3 V 1 (j m,i m ) j 1,j 2,j 3 For a matrix M 2, M(V 1,V 2 ) := V 1 M 2 V 2. T = k λ r φ r φ r φ r r=1 m [3] T(W,W,W) = λ r (W φ r ) 3 r [k] T(I,v,v) = r [k]λ r v,φ r 2 φ r. T(I,I,v) = r [k]λ r v,φ r φ r φ r.

19 Form of Moments for a general Topic Model E[x i h] = Φh. Learn Φ, distribution of h Form of moments? h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ

20 Form of Moments for a general Topic Model E[x i h] = Φh. Learn Φ, distribution of h Form of moments? h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ Similarly Triples Tensor M 3 M 3 := E(x 1 x 2 x 3 ) = i,j,ke[h 3 ] i,j,k φ i φ j φ k = E[h 3 ](Φ,Φ,Φ)

21 Form of Moments for a general Topic Model E[x i h] = Φh. Learn Φ, distribution of h Form of moments? h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ Similarly Triples Tensor M 3 M 3 := E(x 1 x 2 x 3 ) = i,j,ke[h 3 ] i,j,k φ i φ j φ k = E[h 3 ](Φ,Φ,Φ) Tucker Tensor Decomposition Find decomposition M 3 = E[h 3 ](Φ,Φ,Φ) Key difference from CP: E[h 3 ] NOT a diagonal tensor Lot more parameters to estimate.

22 Guaranteed Learning of Topic Models Two Learning approaches CP Tensor decomposition: Parametric topic distributions (constraints on h) but general topic-word matrix Φ Tucker Tensor decomposition: Constrain topic-word matrix Φ but general (non-degenerate) distributions on h Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words

23 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

24 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

25 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words

26 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions

27 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions What if.. Allow for general h: model arbitrary topic correlations Constrain topic-word matrix Φ:

28 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions What if.. Allow for general h: model arbitrary topic correlations Constrain topic-word matrix Φ: Sparsity constraints

29 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic-word matrix Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words Φ h 1 h 2 x(1) x(2) h k x(n) So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions What if.. Allow for general h: model arbitrary topic correlations Constrain topic-word matrix Φ: Sparsity constraints

30 Some Intuitions.. h 1 h 2 h k Φ x(1) x(2) x(n) Learning using second-order moments Linear model: E[x i h] = Φh. and E[x 1 x 2 ] = ΦE[hh ]Φ Learning: recover Φ from ΦE[hh ]Φ. Ill-posed without further restrictions

31 Some Intuitions.. h 1 h 2 h k Φ x(1) x(2) x(n) Learning using second-order moments Linear model: E[x i h] = Φh. and E[x 1 x 2 ] = ΦE[hh ]Φ Learning: recover Φ from ΦE[hh ]Φ. Ill-posed without further restrictions When h is not degenerate: recover Φ from Col(Φ) No other restrictions on h: arbitrary dependencies

32 Some Intuitions.. h 1 h 2 h k Φ x(1) x(2) x(n) Learning using second-order moments Linear model: E[x i h] = Φh. and E[x 1 x 2 ] = ΦE[hh ]Φ Learning: recover Φ from ΦE[hh ]Φ. Ill-posed without further restrictions When h is not degenerate: recover Φ from Col(Φ) No other restrictions on h: arbitrary dependencies Sparsity constraints on topic-word matrix Φ Main constraint: columns of Φ are sparsest vectors in Col(Φ)

33 Sufficient Conditions for Identifiability columns of Φ are sparsest vectors in Col(Φ) Sufficient conditions? N(S) S Φ h 1 h 2 x(1) x(2) h k x(n)

34 Sufficient Conditions for Identifiability columns of Φ are sparsest vectors in Col(Φ) Sufficient conditions? N(S) S Φ h 1 h 2 x(1) x(2) h k x(n) Structural Condition: (Additive) Graph Expansion N(S) > S +d max, for all S [k] Parametric Conditions: Generic Parameters Φv 0 > N Φ (supp(v)) supp(v) A. Anandkumar, D. Hsu, A. Javanmard, and, S. M. Kakade. Learning Bayesian Networks with Latent Variables. In Proc. of Intl. Conf. on Machine Learning, June 2013.

35 Brief Proof Sketch Structural Condition: (Additive) Graph Expansion N(S) > S +d max, for all S [k] Parametric Conditions: Generic Parameters Φv 0 > N Φ (supp(v)) supp(v) Structural and Parametric Conditions Imply: When supp(v) > 1, Φv 0 > N Φ (supp(v)) supp(v) > d max Thus, supp(v) = 1, for Φv to be one of k sparsest vectors in Col(Φ)

36 Brief Proof Sketch Structural Condition: (Additive) Graph Expansion N(S) > S +d max, for all S [k] Parametric Conditions: Generic Parameters Φv 0 > N Φ (supp(v)) supp(v) Structural and Parametric Conditions Imply: When supp(v) > 1, Φv 0 > N Φ (supp(v)) supp(v) > d max Thus, supp(v) = 1, for Φv to be one of k sparsest vectors in Col(Φ) Claim: Parametric conditions are satisfied for generic parameters

37 Tractable Learning Algorithm Learning Task Recover topic-word matrix Φ from M 2 = ΦE[hh ]Φ. Exhaustive search min z 0 Φz 0 Change of Variables Convex relaxation min z Φz 1, b z = 1, where b is a row in Φ. min w M 1/2 2 w 1, e i M1/2 2 w = 1. Under reasonable conditions, the above program exactly recovers Φ

38 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

39 Latent General Topic Models h 1 h k Φ x(1) x(2) x(n) So far: recover topic-word matrix Φ from ΦE[hh ]Φ. Learning topic proportion distribution E[hh ] not enough to recover general distributions Need higher order moments to learn distribution of h Any models where low order moments suffice? e.g. Dirichlet/single topic require only third order moments. What about any other distributions? Are there other topic distributions which can be learned efficiently?

40 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated h 1 h k Φ x(1) x(2) x(n)

41 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated h 1 h k Φ x(1) x(2) x(n)

42 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated (I Λ) 1 η 1 h 1 h k η k Φ x(1) x(2) x(n)

43 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated (I Λ) 1 η 1 η k h 1 h k Φ x(1) x(2) x(n) Φ η 1 η 2 x(1) x(2) η k x(n)

44 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated Φ h 1 h k Φ η 1 η 2 η k x(1) x(2) x(n) x(1) x(2) x(n)

45 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated Φ h 1 h k Φ η 1 η 2 η k x(1) x(2) x(n) x(1) x(2) x(n) Φ: structured and sparse while Φ is dense h: correlated topics while η are uncorrelated

46 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3

47 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3 Solving CP decomposition through Tensor Power Method Recall η i are uncorrelated: E[η ] is diagonal. Reduction to CP decomposition: can be efficient solved via tensor power method

48 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3 Solving CP decomposition through Tensor Power Method Recall η i are uncorrelated: E[η ] is diagonal. Reduction to CP decomposition: can be efficient solved via tensor power method Sparse Tucker Decomposition: Unmixing via Convex Optimization Un-mix Φ from Φ = Φ(I Λ) 1 through l 1 optimization.

49 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3 Solving CP decomposition through Tensor Power Method Recall η i are uncorrelated: E[η ] is diagonal. Reduction to CP decomposition: can be efficient solved via tensor power method Sparse Tucker Decomposition: Unmixing via Convex Optimization Un-mix Φ from Φ = Φ(I Λ) 1 through l 1 optimization. Learning both structure and parameters of Φ and distribution of h Combine non-convex and convex methods for learning!

50 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

51 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

52 Extension to learning overcomplete representations So far.. Pairwise moments for learning structured topic-word matrices Third order moments for learning latent Bayesian network models Number of topics k, n is vocabulary size and k < n.

53 Extension to learning overcomplete representations So far.. Pairwise moments for learning structured topic-word matrices Third order moments for learning latent Bayesian network models Number of topics k, n is vocabulary size and k < n. Undercomplete Representation k h 1 h 2 h k Overcomplete Representation k h 1 h 2 h k x(1) x(2) n x(n) x(1) x(2) n x(n) What about overcomplete models: k > n? Do higher-order moments help?

54 Learning Overcomplete Representations Why Overcomplete Representations? Flexible modeling, robust to noise Huge gains in many applications, e.g. speech and computer vision.

55 Learning Overcomplete Representations Why Overcomplete Representations? Flexible modeling, robust to noise Huge gains in many applications, e.g. speech and computer vision. Recall Tucker Form of Moments for Topic Models M 2 := E(x 1 x 2 ) = E[h 2 ](Φ,Φ) ΦE[hh ]Φ M 3 := E(x 1 x 2 x 3 ) = E[h 3 ](Φ,Φ,Φ) k > n: Tucker decomposition not unique: model non-identifiable.

56 Learning Overcomplete Representations Why Overcomplete Representations? Flexible modeling, robust to noise Huge gains in many applications, e.g. speech and computer vision. Recall Tucker Form of Moments for Topic Models M 2 := E(x 1 x 2 ) = E[h 2 ](Φ,Φ) ΦE[hh ]Φ M 3 := E(x 1 x 2 x 3 ) = E[h 3 ](Φ,Φ,Φ) k > n: Tucker decomposition not unique: model non-identifiable. Identifiability of Overcomplete Models Possible under the notion of topic persistence Includes single topic model as a special case.

57 Persistent Topic Models Bag of Words Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.

58 Persistent Topic Models Bag of Words Model Persistent Topic Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words Topic h Mixture Topics Φ Φ Φ Φ y 1 y 2 y 3 Φ x 1 x 2 x 3 x 4 x 5 Words A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.

59 Persistent Topic Models Bag of Words Model Persistent Topic Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words Topic h Mixture Topics Φ Φ Φ Φ y 1 y 2 y 3 Φ x 1 x 2 x 3 x 4 x 5 Words Single-topic model is a special case. Persistence: incorporates locality or order of words. A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.

60 Persistent Topic Models Bag of Words Model Persistent Topic Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words Topic h Mixture Topics Φ Φ Φ Φ y 1 y 2 y 3 Φ x 1 x 2 x 3 x 4 x 5 Words Single-topic model is a special case. Persistence: incorporates locality or order of words. Identifiability conditions for overcomplete models? A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.

61 Identifiability of Overcomplete Models Recall Tucker Form of Moments for Bag-of-Words Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[h 4 ](Φ,Φ,Φ,Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[(h h)(h h) ](Φ Φ)

62 Identifiability of Overcomplete Models Recall Tucker Form of Moments for Bag-of-Words Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[h 4 ](Φ,Φ,Φ,Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[(h h)(h h) ](Φ Φ) For Persistent Topic Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[hh ](Φ Φ,Φ Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[hh ](Φ Φ)

63 Identifiability of Overcomplete Models Recall Tucker Form of Moments for Bag-of-Words Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[h 4 ](Φ,Φ,Φ,Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[(h h)(h h) ](Φ Φ) For Persistent Topic Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[hh ](Φ Φ,Φ Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[hh ](Φ Φ) Kronecker vs. Khatri-Rao Products Φ: Topic-word matrix, is n k. (Φ Φ): Kronecker product, is n 2 k 2 matrix. (Φ Φ): Khatri-Rao product, is n 2 k matrix.

64 Some Intuitions Bag-of-words Model: (Φ Φ)E[(h h)(h h) ](Φ Φ) Persistent Model: (Φ Φ)E[hh ](Φ Φ) Topic-Word Matrix Φ k Effective Topic-Word Matrix Given Fourth-Order Moments: Bag of Words Model: Kronecker Product Φ Φ k 2 n Persistent Model: Khatri-Rao Product Φ Φ k n 2 Not Identifiable n 2 Identifiable

65 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion

66 Conclusion Moment-based Estimation of Latent Variable Models Moments are easy to estimate. Low-order moments have good concentration properties

67 Conclusion Moment-based Estimation of Latent Variable Models Moments are easy to estimate. Low-order moments have good concentration properties Tensor Decomposition Methods Moment tensors have tractable forms for many models, e.g. Topic models, HMMs, Gaussian mixtures, ICA. Efficient CP tensor decomposition through power iterations. Efficient structured Tucker decomposition through l 1. Structured topic-word matrices: Expansion conditions Can be extended to overcomplete representations

68 Conclusion Moment-based Estimation of Latent Variable Models Moments are easy to estimate. Low-order moments have good concentration properties Tensor Decomposition Methods Moment tensors have tractable forms for many models, e.g. Topic models, HMMs, Gaussian mixtures, ICA. Efficient CP tensor decomposition through power iterations. Efficient structured Tucker decomposition through l 1. Structured topic-word matrices: Expansion conditions Can be extended to overcomplete representations Practical Considerations for Tensor Methods Not covered in detail in this tutorial. Matrix algebra and iterative methods. Scalable: Parallel implementation on GPUs

Learning Linear Bayesian Networks with Latent Variables

Learning Linear Bayesian Networks with Latent Variables Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar, Daniel Hsu y, Sham Kakade y University of California, Irvine y Microsoft Research,

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Animashree Anandkumar 1, Daniel Hsu 2, Adel Javanmard 3, and Sham M. Kakade 2 1 Department of EECS, University of California,

More information

arxiv: v1 [cs.lg] 13 Aug 2013

arxiv: v1 [cs.lg] 13 Aug 2013 When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity arxiv:1308.2853v1 [cs.lg] 13 Aug 2013 Animashree Anandkumar, Daniel Hsu, Majid Janzamin

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Efficient algorithms for estimating multi-view mixture models

Efficient algorithms for estimating multi-view mixture models Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Spectral Learning of Hidden Markov Models with Group Persistence

Spectral Learning of Hidden Markov Models with Group Persistence 000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Random Field Models for Applications in Computer Vision

Random Field Models for Applications in Computer Vision Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks. Extensions of Bayesian Networks Outline Ethan Howe, James Lenfestey, Tom Temple Intro to Dynamic Bayesian Nets (Tom Exact inference in DBNs with demo (Ethan Approximate inference and learning (Tom Probabilistic

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition Bayesian Networks Two components: 1. Directed Acyclic Graph (DAG) G: There is a node for every variable D: Some nodes

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Representation of undirected GM. Kayhan Batmanghelich

Representation of undirected GM. Kayhan Batmanghelich Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Efficient Spectral Methods for Learning Mixture Models!

Efficient Spectral Methods for Learning Mixture Models! Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Three right directions and three wrong directions for tensor research

Three right directions and three wrong directions for tensor research Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

Why Sparse Coding Works

Why Sparse Coding Works Why Sparse Coding Works Mathematical Challenges in Deep Learning Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 10 Aug 2011 Deep Learning Kickoff Meeting What is Sparse Coding? There are many formulations

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015 Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.

cross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way. 10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS

LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS Submitted to the Annals of Statistics arxiv: stat.ml/1203.0697 LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS By Animashree Anandkumar,, Daniel Hsu, Furong Huang,, and Sham Kakade Univ. of California

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Template-Based Representations. Sargur Srihari

Template-Based Representations. Sargur Srihari Template-Based Representations Sargur srihari@cedar.buffalo.edu 1 Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32 The Probabilistic Analysis of Systems in Engineering 1/32 Bayesian probabilistic bili networks Definition and use of Bayesian probabilistic networks K. Nishijima nishijima@ibk.baug.ethz.ch 2/32 Today s

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information