Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints
|
|
- Buddy Phillips
- 5 years ago
- Views:
Transcription
1 Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.
2 Latent Variable Modeling Goal: Discover hidden effects from observed measurements Topic Models Observations: words. Hidden: topics. In One Day, 11,000 Flee Syria as War and Hardship Worsen By RICK GLADSTONE and NEIL Hurricane Exposed Flaws in Protection of MacFARQUHAR Tunnels The United Nations reported that 11,000 By ELISABETH ROSENTHAL Syrians fled on Friday, the vast majority of Nearly two weeks after Hurricane Sandy them clambering for safety over the Turkish struck, the vital arteries that bring cars, trucks Nursing Home Is Faulted Over Care After border. and subways into New York City s Storm transportation network have recovered, with By MICHAEL POWELL and SHERI FINK one major exception: the Brooklyn-Battery Amid the worst hurricane to hit New York City Obama to Insist on Tax Increase for the Tunnel remains closed. in nearly 80 years, officials have claimed that Wealthy the Promenade Rehabilitation and Health By HELENE COOPER and JONATHAN Behind New York Gas Lines, Warnings and Care Center failed to provide the most basic WEISMAN Crossed Fingers care to its patients. Amid talk of compromise, President Obama By DAVID W. CHEN, WINNIE HU and and Speaker John A. Boehner both indicated CLIFFORD KRAUSS unchanged stances on this issue, long a point The return of 1970s-era gas lines to the five of contention. boroughs of New York City was not the result of a single miscalculation, but a combination of ignored warnings and indecisiveness. Modeling communities in social networks, modeling gene regulation...
3 Challenges in Learning Topic Models Learning Topic Models Using Word Observations Challenges in Identifiability When can topics be identified? Conditions on the model parameter, e.g. on topic-word matrix Φ and on topic proportions distributions (h)? Does identifiability also lead to tractable algorithms?
4 Challenges in Learning Topic Models Learning Topic Models Using Word Observations Challenges in Identifiability When can topics be identified? Conditions on the model parameter, e.g. on topic-word matrix Φ and on topic proportions distributions (h)? Does identifiability also lead to tractable algorithms? Challenges in Design of Learning Algorithms Maximum likelihood learning of topic models NP-hard (Arora et. al.) In practice, heuristics such as Gibbs sampling, variation Bayes etc. Guaranteed learning with minimal assumptions? Efficient methods? Low sample and computational complexities?
5 Challenges in Learning Topic Models Learning Topic Models Using Word Observations Challenges in Identifiability When can topics be identified? Conditions on the model parameter, e.g. on topic-word matrix Φ and on topic proportions distributions (h)? Does identifiability also lead to tractable algorithms? Challenges in Design of Learning Algorithms Maximum likelihood learning of topic models NP-hard (Arora et. al.) In practice, heuristics such as Gibbs sampling, variation Bayes etc. Guaranteed learning with minimal assumptions? Efficient methods? Low sample and computational complexities? Moment-based approach: learning using low order observed moments
6 Probabilistic Topic Models Useful abstraction for automatic categorization of documents Observed: words. Hidden: topics. Bag of words: order of words does not matter Graphical model representation l words in a document x 1,...,x l. h: proportions of topics in a document. Word x i generated from topic y i. Exchangeability: x 1 x 2... h Φ(i,j) := P[x m = i y m = j] : topic-word matrix. Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words
7 Formulation as Linear Models Distribution of the topic proportions vector h If there are k topics, distribution over the simplex k 1 k 1 := {h R k,h i [0,1], h i = 1}. i Distribution of the words x 1,x 2,... Order n words in vocabulary. If x 1 is j th word, assign e j R n Distribution of each x i : supported on vertices of n 1. Properties Linear Model: E[x i h] = Φh. Multiview model: h is fixed and multiple words (x i ) are generated.
8 Geometric Picture for Topic Models Topic proportions vector (h) Document
9 Geometric Picture for Topic Models Single topic (h)
10 Geometric Picture for Topic Models Topic proportions vector (h)
11 Geometric Picture for Topic Models Topic proportions vector (h) Φ Φ Φ x 2 x 1 x 3 Word generation (x 1,x 2,...)
12 Geometric Picture for Topic Models Topic proportions vector (h) Φ Φ Φ x 2 x 1 x 3 Word generation (x 1,x 2,...) Moment-based estimation: co-occurrences of words in documents
13 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
14 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5
15 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 k M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ = λ r φ r φ r r=1
16 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ = Similarly Triples Tensor M 3 k λ r φ r φ r r=1 M 3 := E(x 1 x 2 x 3 ) = r λ r φ r φ r φ r
17 Recap of CP Decomposition Recall form of moments for single topic/dirichlet model. E[x i h] = Φh. λ := [E[h]]i. h Learn topic-word matrix Φ, vector λ Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ = Similarly Triples Tensor M 3 k λ r φ r φ r r=1 M 3 := E(x 1 x 2 x 3 ) = r λ r φ r φ r φ r Matrix and Tensor Forms: φ r := r th column of Φ. k k M 2 = λ r φ r φ r. M 3 = λ r φ r φ r φ r r=1 r=1
18 Multi-linear Transformation For a tensor T, define (for matrices V i of appropriate dimensions) [T(V 1,V 2,V 3 )] i1,i 2,i 3 := (T) j1,j 2,j 3 V 1 (j m,i m ) j 1,j 2,j 3 For a matrix M 2, M(V 1,V 2 ) := V 1 M 2 V 2. T = k λ r φ r φ r φ r r=1 m [3] T(W,W,W) = λ r (W φ r ) 3 r [k] T(I,v,v) = r [k]λ r v,φ r 2 φ r. T(I,I,v) = r [k]λ r v,φ r φ r φ r.
19 Form of Moments for a general Topic Model E[x i h] = Φh. Learn Φ, distribution of h Form of moments? h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ
20 Form of Moments for a general Topic Model E[x i h] = Φh. Learn Φ, distribution of h Form of moments? h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ Similarly Triples Tensor M 3 M 3 := E(x 1 x 2 x 3 ) = i,j,ke[h 3 ] i,j,k φ i φ j φ k = E[h 3 ](Φ,Φ,Φ)
21 Form of Moments for a general Topic Model E[x i h] = Φh. Learn Φ, distribution of h Form of moments? h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Pairs Matrix M 2 M 2 := E[x 1 x 2 ] = E[E[x 1 x 2 h]] = ΦE[hh ]Φ Similarly Triples Tensor M 3 M 3 := E(x 1 x 2 x 3 ) = i,j,ke[h 3 ] i,j,k φ i φ j φ k = E[h 3 ](Φ,Φ,Φ) Tucker Tensor Decomposition Find decomposition M 3 = E[h 3 ](Φ,Φ,Φ) Key difference from CP: E[h 3 ] NOT a diagonal tensor Lot more parameters to estimate.
22 Guaranteed Learning of Topic Models Two Learning approaches CP Tensor decomposition: Parametric topic distributions (constraints on h) but general topic-word matrix Φ Tucker Tensor decomposition: Constrain topic-word matrix Φ but general (non-degenerate) distributions on h Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words
23 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
24 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
25 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics h y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words
26 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions
27 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions What if.. Allow for general h: model arbitrary topic correlations Constrain topic-word matrix Φ:
28 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions What if.. Allow for general h: model arbitrary topic correlations Constrain topic-word matrix Φ: Sparsity constraints
29 Pairwise moments for learning Learning using pairwise moments: minimal information. Exchangeable Topic Model Topic-word matrix Topic Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ h x 1 x 2 x 3 x 4 x 5 Words Φ h 1 h 2 x(1) x(2) h k x(n) So far.. Parametric h: Dirichlet, single topic, independent components,... No restrictions on Φ (other than non-degeneracy). Learning using third order moment through tensor decompositions What if.. Allow for general h: model arbitrary topic correlations Constrain topic-word matrix Φ: Sparsity constraints
30 Some Intuitions.. h 1 h 2 h k Φ x(1) x(2) x(n) Learning using second-order moments Linear model: E[x i h] = Φh. and E[x 1 x 2 ] = ΦE[hh ]Φ Learning: recover Φ from ΦE[hh ]Φ. Ill-posed without further restrictions
31 Some Intuitions.. h 1 h 2 h k Φ x(1) x(2) x(n) Learning using second-order moments Linear model: E[x i h] = Φh. and E[x 1 x 2 ] = ΦE[hh ]Φ Learning: recover Φ from ΦE[hh ]Φ. Ill-posed without further restrictions When h is not degenerate: recover Φ from Col(Φ) No other restrictions on h: arbitrary dependencies
32 Some Intuitions.. h 1 h 2 h k Φ x(1) x(2) x(n) Learning using second-order moments Linear model: E[x i h] = Φh. and E[x 1 x 2 ] = ΦE[hh ]Φ Learning: recover Φ from ΦE[hh ]Φ. Ill-posed without further restrictions When h is not degenerate: recover Φ from Col(Φ) No other restrictions on h: arbitrary dependencies Sparsity constraints on topic-word matrix Φ Main constraint: columns of Φ are sparsest vectors in Col(Φ)
33 Sufficient Conditions for Identifiability columns of Φ are sparsest vectors in Col(Φ) Sufficient conditions? N(S) S Φ h 1 h 2 x(1) x(2) h k x(n)
34 Sufficient Conditions for Identifiability columns of Φ are sparsest vectors in Col(Φ) Sufficient conditions? N(S) S Φ h 1 h 2 x(1) x(2) h k x(n) Structural Condition: (Additive) Graph Expansion N(S) > S +d max, for all S [k] Parametric Conditions: Generic Parameters Φv 0 > N Φ (supp(v)) supp(v) A. Anandkumar, D. Hsu, A. Javanmard, and, S. M. Kakade. Learning Bayesian Networks with Latent Variables. In Proc. of Intl. Conf. on Machine Learning, June 2013.
35 Brief Proof Sketch Structural Condition: (Additive) Graph Expansion N(S) > S +d max, for all S [k] Parametric Conditions: Generic Parameters Φv 0 > N Φ (supp(v)) supp(v) Structural and Parametric Conditions Imply: When supp(v) > 1, Φv 0 > N Φ (supp(v)) supp(v) > d max Thus, supp(v) = 1, for Φv to be one of k sparsest vectors in Col(Φ)
36 Brief Proof Sketch Structural Condition: (Additive) Graph Expansion N(S) > S +d max, for all S [k] Parametric Conditions: Generic Parameters Φv 0 > N Φ (supp(v)) supp(v) Structural and Parametric Conditions Imply: When supp(v) > 1, Φv 0 > N Φ (supp(v)) supp(v) > d max Thus, supp(v) = 1, for Φv to be one of k sparsest vectors in Col(Φ) Claim: Parametric conditions are satisfied for generic parameters
37 Tractable Learning Algorithm Learning Task Recover topic-word matrix Φ from M 2 = ΦE[hh ]Φ. Exhaustive search min z 0 Φz 0 Change of Variables Convex relaxation min z Φz 1, b z = 1, where b is a row in Φ. min w M 1/2 2 w 1, e i M1/2 2 w = 1. Under reasonable conditions, the above program exactly recovers Φ
38 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
39 Latent General Topic Models h 1 h k Φ x(1) x(2) x(n) So far: recover topic-word matrix Φ from ΦE[hh ]Φ. Learning topic proportion distribution E[hh ] not enough to recover general distributions Need higher order moments to learn distribution of h Any models where low order moments suffice? e.g. Dirichlet/single topic require only third order moments. What about any other distributions? Are there other topic distributions which can be learned efficiently?
40 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated h 1 h k Φ x(1) x(2) x(n)
41 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated h 1 h k Φ x(1) x(2) x(n)
42 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated (I Λ) 1 η 1 h 1 h k η k Φ x(1) x(2) x(n)
43 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated (I Λ) 1 η 1 η k h 1 h k Φ x(1) x(2) x(n) Φ η 1 η 2 x(1) x(2) η k x(n)
44 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated Φ h 1 h k Φ η 1 η 2 η k x(1) x(2) x(n) x(1) x(2) x(n)
45 Learning Latent Bayesian Networks BN: Markov relationships on DAG Pa i : parents of node i. P(h) = n P(h i h Pai ) i=1 Linear Bayesian Network: h j = i Pa j λ ji h i +η j h = Λh+η E[x i η] = Φ(I Λ) 1 η = Φ η and η i uncorrelated Φ h 1 h k Φ η 1 η 2 η k x(1) x(2) x(n) x(1) x(2) x(n) Φ: structured and sparse while Φ is dense h: correlated topics while η are uncorrelated
46 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3
47 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3 Solving CP decomposition through Tensor Power Method Recall η i are uncorrelated: E[η ] is diagonal. Reduction to CP decomposition: can be efficient solved via tensor power method
48 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3 Solving CP decomposition through Tensor Power Method Recall η i are uncorrelated: E[η ] is diagonal. Reduction to CP decomposition: can be efficient solved via tensor power method Sparse Tucker Decomposition: Unmixing via Convex Optimization Un-mix Φ from Φ = Φ(I Λ) 1 through l 1 optimization.
49 Learning Latent Bayesian Networks E[x i η] = Φ(I Λ) 1 η = Φ η E[η] = λ E[x 1 x 2 x 3 ] = E[η 3 ](Φ,Φ,Φ ) = i λ i(φ i ) 3 Solving CP decomposition through Tensor Power Method Recall η i are uncorrelated: E[η ] is diagonal. Reduction to CP decomposition: can be efficient solved via tensor power method Sparse Tucker Decomposition: Unmixing via Convex Optimization Un-mix Φ from Φ = Φ(I Λ) 1 through l 1 optimization. Learning both structure and parameters of Φ and distribution of h Combine non-convex and convex methods for learning!
50 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
51 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
52 Extension to learning overcomplete representations So far.. Pairwise moments for learning structured topic-word matrices Third order moments for learning latent Bayesian network models Number of topics k, n is vocabulary size and k < n.
53 Extension to learning overcomplete representations So far.. Pairwise moments for learning structured topic-word matrices Third order moments for learning latent Bayesian network models Number of topics k, n is vocabulary size and k < n. Undercomplete Representation k h 1 h 2 h k Overcomplete Representation k h 1 h 2 h k x(1) x(2) n x(n) x(1) x(2) n x(n) What about overcomplete models: k > n? Do higher-order moments help?
54 Learning Overcomplete Representations Why Overcomplete Representations? Flexible modeling, robust to noise Huge gains in many applications, e.g. speech and computer vision.
55 Learning Overcomplete Representations Why Overcomplete Representations? Flexible modeling, robust to noise Huge gains in many applications, e.g. speech and computer vision. Recall Tucker Form of Moments for Topic Models M 2 := E(x 1 x 2 ) = E[h 2 ](Φ,Φ) ΦE[hh ]Φ M 3 := E(x 1 x 2 x 3 ) = E[h 3 ](Φ,Φ,Φ) k > n: Tucker decomposition not unique: model non-identifiable.
56 Learning Overcomplete Representations Why Overcomplete Representations? Flexible modeling, robust to noise Huge gains in many applications, e.g. speech and computer vision. Recall Tucker Form of Moments for Topic Models M 2 := E(x 1 x 2 ) = E[h 2 ](Φ,Φ) ΦE[hh ]Φ M 3 := E(x 1 x 2 x 3 ) = E[h 3 ](Φ,Φ,Φ) k > n: Tucker decomposition not unique: model non-identifiable. Identifiability of Overcomplete Models Possible under the notion of topic persistence Includes single topic model as a special case.
57 Persistent Topic Models Bag of Words Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.
58 Persistent Topic Models Bag of Words Model Persistent Topic Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words Topic h Mixture Topics Φ Φ Φ Φ y 1 y 2 y 3 Φ x 1 x 2 x 3 x 4 x 5 Words A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.
59 Persistent Topic Models Bag of Words Model Persistent Topic Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words Topic h Mixture Topics Φ Φ Φ Φ y 1 y 2 y 3 Φ x 1 x 2 x 3 x 4 x 5 Words Single-topic model is a special case. Persistence: incorporates locality or order of words. A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.
60 Persistent Topic Models Bag of Words Model Persistent Topic Model Topic h Mixture Topics y 1 y 2 y 3 y 4 y 5 Φ Φ Φ Φ Φ x 1 x 2 x 3 x 4 x 5 Words Topic h Mixture Topics Φ Φ Φ Φ y 1 y 2 y 3 Φ x 1 x 2 x 3 x 4 x 5 Words Single-topic model is a special case. Persistence: incorporates locality or order of words. Identifiability conditions for overcomplete models? A. Anandkumar, D. Hsu, M. Janzamin, and S. M. Kakade. When are Overcomplete Representations Identifiable? Uniqueness of Tensor Decompositions Under Expansion Constraints, Preprint, June 2013.
61 Identifiability of Overcomplete Models Recall Tucker Form of Moments for Bag-of-Words Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[h 4 ](Φ,Φ,Φ,Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[(h h)(h h) ](Φ Φ)
62 Identifiability of Overcomplete Models Recall Tucker Form of Moments for Bag-of-Words Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[h 4 ](Φ,Φ,Φ,Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[(h h)(h h) ](Φ Φ) For Persistent Topic Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[hh ](Φ Φ,Φ Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[hh ](Φ Φ)
63 Identifiability of Overcomplete Models Recall Tucker Form of Moments for Bag-of-Words Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[h 4 ](Φ,Φ,Φ,Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[(h h)(h h) ](Φ Φ) For Persistent Topic Model Tensor form: E(x 1 x 2 x 3 x 4 ) = E[hh ](Φ Φ,Φ Φ) Matricized form: E((x 1 x 2 )(x 3 x 4 ) ) = (Φ Φ)E[hh ](Φ Φ) Kronecker vs. Khatri-Rao Products Φ: Topic-word matrix, is n k. (Φ Φ): Kronecker product, is n 2 k 2 matrix. (Φ Φ): Khatri-Rao product, is n 2 k matrix.
64 Some Intuitions Bag-of-words Model: (Φ Φ)E[(h h)(h h) ](Φ Φ) Persistent Model: (Φ Φ)E[hh ](Φ Φ) Topic-Word Matrix Φ k Effective Topic-Word Matrix Given Fourth-Order Moments: Bag of Words Model: Kronecker Product Φ Φ k 2 n Persistent Model: Khatri-Rao Product Φ Φ k n 2 Not Identifiable n 2 Identifiable
65 Outline 1 Introduction 2 Form of Moments 3 Matrix Case: Learning using Pairwise Moments Identifiability and Learning of Topic-Word Matrix Learning Latent Space Parameters of the Topic Model 4 Tensor Case: Learning From Higher Order Moments Overcomplete Representations 5 Conclusion
66 Conclusion Moment-based Estimation of Latent Variable Models Moments are easy to estimate. Low-order moments have good concentration properties
67 Conclusion Moment-based Estimation of Latent Variable Models Moments are easy to estimate. Low-order moments have good concentration properties Tensor Decomposition Methods Moment tensors have tractable forms for many models, e.g. Topic models, HMMs, Gaussian mixtures, ICA. Efficient CP tensor decomposition through power iterations. Efficient structured Tucker decomposition through l 1. Structured topic-word matrices: Expansion conditions Can be extended to overcomplete representations
68 Conclusion Moment-based Estimation of Latent Variable Models Moments are easy to estimate. Low-order moments have good concentration properties Tensor Decomposition Methods Moment tensors have tractable forms for many models, e.g. Topic models, HMMs, Gaussian mixtures, ICA. Efficient CP tensor decomposition through power iterations. Efficient structured Tucker decomposition through l 1. Structured topic-word matrices: Expansion conditions Can be extended to overcomplete representations Practical Considerations for Tensor Methods Not covered in detail in this tutorial. Matrix algebra and iterative methods. Scalable: Parallel implementation on GPUs
Learning Linear Bayesian Networks with Latent Variables
Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar, Daniel Hsu y, Sham Kakade y University of California, Irvine y Microsoft Research,
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationLearning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Animashree Anandkumar 1, Daniel Hsu 2, Adel Javanmard 3, and Sham M. Kakade 2 1 Department of EECS, University of California,
More informationarxiv: v1 [cs.lg] 13 Aug 2013
When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity arxiv:1308.2853v1 [cs.lg] 13 Aug 2013 Animashree Anandkumar, Daniel Hsu, Majid Janzamin
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More informationNon-convex Robust PCA: Provable Bounds
Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationLearning Tractable Graphical Models: Latent Trees and Tree Mixtures
Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationEfficient algorithms for estimating multi-view mixture models
Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationSpectral Learning of Hidden Markov Models with Group Persistence
000 00 00 00 00 00 00 007 008 009 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 0 0 07 08 09 00 0 0 0 0 Abstract In this paper, we develop a general Method of Moments
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationRandom Field Models for Applications in Computer Vision
Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationTHE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING
THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationExtensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.
Extensions of Bayesian Networks Outline Ethan Howe, James Lenfestey, Tom Temple Intro to Dynamic Bayesian Nets (Tom Exact inference in DBNs with demo (Ethan Approximate inference and learning (Tom Probabilistic
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationSummary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition Bayesian Networks Two components: 1. Directed Acyclic Graph (DAG) G: There is a node for every variable D: Some nodes
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationSupplemental for Spectral Algorithm For Latent Tree Graphical Models
Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable
More informationRepresentation of undirected GM. Kayhan Batmanghelich
Representation of undirected GM Kayhan Batmanghelich Review Review: Directed Graphical Model Represent distribution of the form ny p(x 1,,X n = p(x i (X i i=1 Factorizes in terms of local conditional probabilities
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationEfficient Spectral Methods for Learning Mixture Models!
Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationMETHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS
METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationThree right directions and three wrong directions for tensor research
Three right directions and three wrong directions for tensor research Michael W. Mahoney Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/ or Google on Michael Mahoney
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationLecture 5: Bayesian Network
Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationWhy Sparse Coding Works
Why Sparse Coding Works Mathematical Challenges in Deep Learning Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 10 Aug 2011 Deep Learning Kickoff Meeting What is Sparse Coding? There are many formulations
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationVolodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015
Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationcross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.
10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction
More informationIntroduction to the Tensor Train Decomposition and Its Applications in Machine Learning
Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March
More informationLEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS
Submitted to the Annals of Statistics arxiv: stat.ml/1203.0697 LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS By Animashree Anandkumar,, Daniel Hsu, Furong Huang,, and Sham Kakade Univ. of California
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationTemplate-Based Representations. Sargur Srihari
Template-Based Representations Sargur srihari@cedar.buffalo.edu 1 Topics Variable-based vs Template-based Temporal Models Basic Assumptions Dynamic Bayesian Networks Hidden Markov Models Linear Dynamical
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationCS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization
CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationK. Nishijima. Definition and use of Bayesian probabilistic networks 1/32
The Probabilistic Analysis of Systems in Engineering 1/32 Bayesian probabilistic bili networks Definition and use of Bayesian probabilistic networks K. Nishijima nishijima@ibk.baug.ethz.ch 2/32 Today s
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More information