Learning Tractable Graphical Models: Latent Trees and Tree Mixtures
|
|
- Frank Ferguson
- 5 years ago
- Views:
Transcription
1 Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade.
2 High-Dimensional Graphical Modeling Modeling Conditional Independencies through Graphs X u X v X S. Learning and inference are NP-hard. u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chelsea Arsenal Tractable Models: Tree Models Efficient inference using belief propagation B
3 Walk-up: Learning Tree Models Data processing inequality for Markov chains I(X 1 ;X 3 ) I(X 1 ;X 2 ),I(X 2 ;X 3 ) Tree Structure Estimation (Chow and Liu 68) MLE: Max-weight tree with estimated mutual information weights
4 Walk-up: Learning Tree Models Data processing inequality for Markov chains I(X 1 ;X 3 ) I(X 1 ;X 2 ),I(X 2 ;X 3 ) Tree Structure Estimation (Chow and Liu 68) MLE: Max-weight tree with estimated mutual information weights Pairwise statistics suffice
5 Walk-up: Learning Tree Models Data processing inequality for Markov chains I(X 1 ;X 3 ) I(X 1 ;X 2 ),I(X 2 ;X 3 ) Tree Structure Estimation (Chow and Liu 68) MLE: Max-weight tree with estimated mutual information weights Pairwise statistics suffice n samples and p nodes Sample complexity: logp n = O(1).
6 Learning Tractable Graphical Models Tractable Models: Tree Models Efficient inference using belief propagation MLE is easy to compute. Tree models are highly restrictive. B
7 Learning Tractable Graphical Models Tractable Models: Tree Models Efficient inference using belief propagation MLE is easy to compute. Tree models are highly restrictive. B Latent tree graphical models Tree models with hidden variables. Number and location of hidden variables unknown.
8 Application: Hierarchical Topic Modeling Data: Word co-occurrences. Graph: Topic-word structure. cancer car data dealer driver earth food format card case disease disk engine fttp games aids baseball children christian display doctor evidence facts bible bmw computer course dos drive fans files health help israel jesus mac mars nhl number hit hockey jews technology medicine memory orbit patients god government honda human launch law mission moon graphics gun image insurance league lunar msg nasa president problem satellite science program puck scsi season pc phone question religion players power research rights shuttle software solar space state studies system team server university version video vitamin war water win windows won world h1 h3 h4 h15 h7 h5 h6 h11 h14 h10 h3 h18 h16 h17 h12 h13 h9 h8 oil 2 games baseball fans nhl hit hockey league puck season players team win won
9 Application of Latent Trees: Object Recognition Challenge: Succinct representation of large-scale data Input: 100 object categories, 4000 training images Goal: learn co-occurrence probabilities Solution: Latent tree graphical models Context Models and Out-of-context Objects, M. J. Choi, A. Torralba, and A. S. Willsky, Pattern Recognition Letters, 2012.
10 Application of Latent Trees: Object Recognition Challenge: Succinct representation of large-scale data Input: 100 object categories, 4000 training images Goal: learn co-occurrence probabilities Solution: Latent tree graphical models Sky Floor Wall Road Area Car People Road Light Crosswalk Context Models and Out-of-context Objects, M. J. Choi, A. Torralba, and A. S. Willsky, Pattern Recognition Letters, 2012.
11 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H
12 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H Why use tree mixtures? Efficient Inference: BP on component trees and combining them. Similarly marginalization and sampling also efficient.
13 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H Why use tree mixtures? Efficient Inference: BP on component trees and combining them. Similarly marginalization and sampling also efficient. Learning: Alternatives to EM (Meila & Jordan)?
14 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H Why use tree mixtures? Efficient Inference: BP on component trees and combining them. Similarly marginalization and sampling also efficient. Learning: Alternatives to EM (Meila & Jordan)? In this talk: learning latent tree models and tree mixtures.
15 Summary of Results Latent Tree Models Number of hidden variables and location unknown Integrated structure and parameter estimation. Local learning with global consistency guarantees.
16 Summary of Results Latent Tree Models Number of hidden variables and location unknown Integrated structure and parameter estimation. Local learning with global consistency guarantees. Mixtures of Trees Structure and parameters under different contexts unknown Unsupervised setting: choice variable hidden. Efficient methods for consistent structure and parameter learning. H
17 Previous Approaches Algorithms for Structure Estimation Chow and Liu (68): Tree estimation Meinshausen and Bühlmann (06): Convex relaxation Ravikumar, Wainwright, Lafferty (10): Convex relaxation Bresler, Mossel and Sly (09): Bounded-degree graphs... Learning with Hidden Variables Erdös, et. al. (99): Latent trees Daskalakis, Mossel and Roch (06): Latent trees Choi, Tan, Anandkumar and Willsky (10): Latent trees Chandrasekaran, Parrilo and Willsky (11): Latent Gaussian models, Anandkumar et. al (12): Tensor decompositions...
18 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion
19 Learning Latent Tree Graphical Models h y v6 Linear Multivariate Models Conditional independence w.r.t tree Categorical k-state hidden variables. Multivariate d-dimensional observed variables. k d. When y is nbr. of h, E[y h] = Ah. Includes discrete, Poisson and Gaussian models, Gaussian mixtures etc.
20 Additive Tree Distances Information Distances [d i,j ] for Tree Models
21 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij.
22 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ].
23 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E)
24 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E)
25 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E)
26 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E) Learning latent tree using [ˆd i,j ]
27 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships
28 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships
29 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships
30 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships
31 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships
32 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships
33 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.
34 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.
35 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.
36 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.
37 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.
38 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.
39 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion
40 Overview of Proposed Parameter Learning Method Toy Model: 3-star Linear multivariate model. A r x i h := E(x i h = e r ). and λ r := P[h = e r ]. h x 1 x 2 x 3 E(x 1 x 2 x 3 ) = k r=1 λ ra r x 1 h Ar x 2 h Ar x 3 h. Guaranteed Recovery through Tensor Decomposition Transition matrices A xi h have full column rank. Linear algebraic operations: SVD and tensor power iterations. Tensor Decompositions for Learning Latent Variable Models by A. Anandkumar, R. Ge, D. Hsu, S.M. Kakade and M. Telgarsky. Preprint, October 2012.
41 Overview of Tensor Decomposition Technique Let a r = E(x i h = e r ) for all i and λ r := P[h = e r ]. M 3 = E[x 1 x 2 x 3 ] = k i=1 λ ia 3 i. h Intuition: if a i are orthogonal M 3 (I,a 1,a 1 ) := i λ i a i,a 1 2 a i = λ 1 a 1. a i are eigenvectors of the tensor M 3. x 1 x 2 x 3 Convert to an orthogonal tensor using pairwise moments M 2 := E[x 1 x 2 ] = i λ ia 2 i. Whitening matrix: W M 2 W = I. Consider tensor M 3 (W,W,W) := i λ i(w a i ) 3. It is an orthogonal tensor.
42 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.
43 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.
44 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.
45 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.
46 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion
47 Integrated Learning So far.. Consistent structure learning through sibling tests on distances. Parameter learning through tensor decomposition on triplets. Challenges How to integrate structure and parameter learning? Can we save on computations through integration? Can we learn parameters as we learn the structure? Can we parallelize learning for scalability?
48 Integrated Learning So far.. Consistent structure learning through sibling tests on distances. Parameter learning through tensor decomposition on triplets. Challenges How to integrate structure and parameter learning? Can we save on computations through integration? Can we learn parameters as we learn the structure? Can we parallelize learning for scalability? Key Ideas Divide and conquer: find (overlapping) groups of observed variables. Learn local subtrees (and parameters) over the groups independently. Merge subtrees and tweak parameters to obtain global latent tree model.
49 Parallel Chow-Liu Based Grouping Algorithm Minimum spanning tree using information distance [ˆd i,j ]. 4 v1 4 h3 2 h1 2 1 h2 3 2 v2 v3 v5 v6 v2 v3 v5 v6 h1 v2 v3 v5 v6 v2 v3 v4 v5 1 v6 v4 v1 h3 h2 v1 v3 v4 v5 h3 h2 v1 v3 v4 v5 h1 2 5 v2 v3 v5 v1 h3 2 h1 1 h2 v6 v2 2 2 v3 v4 v5 1 h3 h2 4 2 v6 v1 v3 v4 v5
50 Alignment of Parameters Alignment Correction In-group Across-group Across-neighborhood In-group Across-group Across-neighborhood
51 Theorem Consistency Guarantees The proposed method consistently recovers the structure with O(log p) samples and parameters with poly(p) samples. Extent of parallelism Size of groups Γ 1+u l δ. Effective depth δ := max i {min j {path(v i,v j ;T )}. Maximum degree in latent tree:. Upper and lower bound on distances between neighbors in the latent tree: u and l. Implications For homogeneous HMM, constant sized groups. Worst case: star graphs.
52 Computational Complexity N samples, d dimensional observed variables, k state hidden variables. p number of observed variables. z non-zero entries per sample. Γ sized groups. Algorithm Steps Time/worker Degree of parallelism Information Distance Estimation O(Nz +d+k 3 ) O(p 2 ) Structure: Minimum Spanning Tree O(logp) O(p 2 ) Structure: Local Recursive Grouping O(Γ 3 ) O(p/Γ) Parameter: Tensor Decomposition O(Γk 3 +Γdk 2 ) O(p/Γ) Merging and Alignment Correction O(dk 2 ) O(p/Γ) Integrated Structure and Parameter Learning in Latent Tree Graphical Models by F. Huang, U. N. Niranjan, A. Anandkumar. Preprint, June 2014.
53 Proof Ideas Relating Chow-Liu Tree with Latent Tree Surrogate Sg(i) for node i: observed node with strongest correlation Sg(i) := argmind i,j j V Neighborhood preservation (i,j) T (Sg(i),Sg(j)) T ML. Chow-Liu grouping reverses edge contractions Proof by induction
54 Experiments d = k = 2 dimensions, p = 9 number of variables Structure Error CLRG-EM Proposed error samples d p N Struct Error Param Error Running Time(s) K K K , K , k K K K
55 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion
56 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Manchester United
57 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Manchester United
58 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Estimation of union graph Manchester United
59 Mixtures of Graphical Models: Our Approach Our Approach Steps Consider data from graphical model mixture Output tree mixture: best tree approx. of each component Estimation of union graph H G 1 G 2 G Manchester United
60 Mixtures of Graphical Models: Our Approach Our Approach Steps Consider data from graphical model mixture Output tree mixture: best tree approx. of each component Estimation of union graph Estimation of pairwise moments in each component H G 1 G 2 G Manchester United
61 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Estimation of union graph Estimation of pairwise moments in each component G Manchester United Tree approximation of each component via Chow-Liu algorithm.
62 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Estimation of union graph Estimation of pairwise moments in each component G Manchester United Tree approximation of each component via Chow-Liu algorithm. Efficient Learning of Tree Mixture Approximations
63 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions
64 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions G : union graph learnt from rank test H Baseball Mets G Manchester United
65 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions G : union graph learnt from rank test Use separators on union graph G : Baseball H Mets G Manchester United
66 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w X S,H w S Baseball v Mets G Manchester United u
67 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w X S,H w S Baseball v Mets G Manchester United u
68 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X S,H w S Baseball v Learn tree mixture approximation: estimate pairwise mixture moments Mets G Manchester United u
69 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w,w X S,H Learn tree mixture approximation: estimate pairwise mixture moments w w Mets S Baseball G v Manchester United u
70 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w,w X S,H Learn tree mixture approximation: estimate pairwise mixture moments w w Mets S Baseball G v Manchester United u Efficient Estimation of Tree Mixture Approximations
71 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models Dodgers? X u X v X S I(X u ;X v X S ) = 0 Red usox Baseball S v Yankees Mets Phillies Manchester United Chels Arsen
72 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? I(X u ;X v X S ) = 0 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen
73 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? I(X u ;X v X S ) = 0 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k)
74 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? I(X u ;X v X S ) = 0 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k) M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. M u,v,{s;k} =
75 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? Rank(M u,v,{s;k} ) = 1 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k) M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. M u,v,{s;k} =
76 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? Rank(M u,v,{s;k} ) = 1 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k) M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. M u,v,{s;k} = Rank Test on Pairwise Probability Matrices
77 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. H G 1 G 2
78 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. H G 1 G 2
79 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S H G 1 G 2 H u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse
80 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S H G 1 G 2 u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse
81 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S H G 1 G 2 H u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse
82 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2 H u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse
83 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2 H P(X u,x v X S ) = r h=1 u Dodgers S Baseball Yankees Mets Phillies v Manchester United P(X u X S,H = h) P(H = h X S ) P(X v X S,H = h) Chel Arse
84 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2 M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. H M u,v,{s;k} = d d P(X u,x v X S ) = r h=1 r r r d d r u Dodgers S Baseball Yankees Mets Phillies v Manchester United P(X u X S,H = h) P(H = h X S ) P(X v X S,H = h) Chel Arse
85 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2? Rank(M u,v,{s;k} ) = r M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. H M u,v,{s;k} = d d P(X u,x v X S ) = r h=1 r r r d d r u Dodgers S Baseball Yankees Mets Phillies v Manchester United P(X u X S,H = h) P(H = h X S ) P(X v X S,H = h) Chel Arse
86 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S v Manchester United Chels Arsen
87 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S Small η computationally efficient, uses only low order statistics. v Manchester United Chels Arsen
88 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S Small η computationally efficient, uses only low order statistics. Examples of graphs G with small η Mixture of product distributions: G is trivial and η = 0. Mixture on same tree: G is a tree and η = 1. Mixture on arbitrary trees: G is union of r trees and η = r. v Manchester United Chels Arsen
89 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S Small η computationally efficient, uses only low order statistics. Examples of graphs G with small η Mixture of product distributions: G is trivial and η = 0. Mixture on same tree: G is a tree and η = 1. Mixture on arbitrary trees: G is union of r trees and η = r. Simple Test for Estimation of Union Graph of Mixtures v Manchester United Chels Arsen
90 Guarantees on Rank Test Theorem (A., Hsu, Huang, Kakade 12) Rank test recovers graph structure G correctly w.h.p on p nodes under n samples when ρ 2 min logp = O(1). n ρ min : Min. (r +1) th singular value between neighbors.
91 Guarantees on Rank Test Theorem (A., Hsu, Huang, Kakade 12) Rank test recovers graph structure G correctly w.h.p on p nodes under n samples when ρ 2 min logp = O(1). n ρ min : Min. (r +1) th singular value between neighbors. Efficient Test with Low Sample and Computational Requirements.
92 Guarantees on Rank Test Theorem (A., Hsu, Huang, Kakade 12) Rank test recovers graph structure G correctly w.h.p on p nodes under n samples when ρ 2 min logp = O(1). n ρ min : Min. (r +1) th singular value between neighbors. Efficient Test with Low Sample and Computational Requirements. Recall examples of graphs G with small η Mixture of product distributions: G is trivial and η = 0. Mixture on same tree: G is a tree and η = 1. Mixture on arbitrary trees: G is union of r trees and η = r.
93 Guarantees for Learning Graphical Model Mixtures Steps Involved in Tree Mixture Approximation Rank tests for structure estimation of union graph G Tensor decomposition for estimation of pairwise moments of mixture components Chow-Liu algorithm to estimate mixture component trees Computationally Efficient Algorithm for Learning Graphical Model Mixtures Learning High-Dimensional Mixtures of Graphical Models by A. Anandkumar, D. Hsu, F. Huang, and S.M. Kakade. NIPS 2012.
94 Guarantees for Learning Graphical Model Mixtures Steps Involved in Tree Mixture Approximation Rank tests for structure estimation of union graph G Tensor decomposition for estimation of pairwise moments of mixture components Chow-Liu algorithm to estimate mixture component trees Computationally Efficient Algorithm for Learning Graphical Model Mixtures Theorem (A., Hsu, Huang, Kakade 12) The above method recovers correct tree mixture approximation correctly w.h.p on p nodes of r component mixture under n samples when n = poly(p,r). Learning High-Dimensional Mixtures of Graphical Models by A. Anandkumar, D. Hsu, F. Huang, and S.M. Kakade. NIPS 2012.
95 Guarantees for Learning Graphical Model Mixtures Steps Involved in Tree Mixture Approximation Rank tests for structure estimation of union graph G Tensor decomposition for estimation of pairwise moments of mixture components Chow-Liu algorithm to estimate mixture component trees Computationally Efficient Algorithm for Learning Graphical Model Mixtures Theorem (A., Hsu, Huang, Kakade 12) The above method recovers correct tree mixture approximation correctly w.h.p on p nodes of r component mixture under n samples when n = poly(p,r). Efficient Learning of Multiple Graphs and Models in High Dimensions Learning High-Dimensional Mixtures of Graphical Models by A. Anandkumar, D. Hsu, F. Huang, and S.M. Kakade. NIPS 2012.
96 Splice Data 0 error DNA SPLICE-junctions 60 variables(sequence of DNA bases), class variable Splice junction type: EI, IE, none EM Algorithm Proposed Algorithm Proposed EM Number of components Mixing weight estimation(r=3) log-likelihood Number of components error Conditional log-likelihood EM Algorithm Proposed Algorithm Proposed EM Number of components Classification Error
97 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion
98 Summary and Outlook Learning Latent Tree Models Integrated Structure and Parameter Learning High level of parallelism without losing consistency. Learning Graphical Model Mixtures Tree mixture approximations Combinatorial search + spectral decomposition Computational and sample guarantees H G 1 G 2
CPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationCS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationLEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS
Submitted to the Annals of Statistics arxiv: stat.ml/1203.0697 LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS By Animashree Anandkumar,, Daniel Hsu, Furong Huang,, and Sham Kakade Univ. of California
More informationSelf-Organization by Optimizing Free-Energy
Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present
More informationThe Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM
Submitted to: European Symposium on Artificial Neural Networks 2003 Universiteit van Amsterdam IAS technical report IAS-UVA-02-03 The Generative Self-Organizing Map A Probabilistic Generalization of Kohonen
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationLearning Linear Bayesian Networks with Latent Variables
Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar, Daniel Hsu y, Sham Kakade y University of California, Irvine y Microsoft Research,
More informationLecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan
Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability
More informationNon-convex Robust PCA: Provable Bounds
Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More informationSupplemental for Spectral Algorithm For Latent Tree Graphical Models
Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationIdentifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints
Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLarge-Deviations and Applications for Learning Tree-Structured Graphical Models
Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSupplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics
Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSpectral Methods for Learning Multivariate Latent Tree Structure
Spectral Methods for Learning Multivariate Latent Tree Structure Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Kamalika Chaudhuri UC San Diego kamalika@cs.ucsd.edu Daniel Hsu Microsoft Research
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationLatent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Ragupathyraj Valluvan UC Irvine rvalluva@uci.edu Abstract Graphical
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationBAYESIAN MACHINE LEARNING.
BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationMessage-Passing Algorithms for GMRFs and Non-Linear Optimization
Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSum-of-Squares Method, Tensor Decomposition, Dictionary Learning
Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees
More informationEfficient algorithms for estimating multi-view mixture models
Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationapproximation algorithms I
SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationSpectral Probabilistic Modeling and Applications to Natural Language Processing
School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le
More informationGraphical model inference with unobserved variables via latent tree aggregation
Graphical model inference with unobserved variables via latent tree aggregation Geneviève Robin Christophe Ambroise Stéphane Robin UMR 518 AgroParisTech/INRA MIA Équipe Statistique et génome 15 decembre
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationReduced-Rank Hidden Markov Models
Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume
More informationNonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning
Journal of Machine Learning Research 12 (2017) 663-707 Submitted 1/10; Revised 10/10; Published 3/11 Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning Le Song lsong@cc.gatech.edu
More informationLearning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Animashree Anandkumar 1, Daniel Hsu 2, Adel Javanmard 3, and Sham M. Kakade 2 1 Department of EECS, University of California,
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationDiscrete Inference and Learning Lecture 3
Discrete Inference and Learning Lecture 3 MVA 2017 2018 h
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationInferning with High Girth Graphical Models
Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationProf. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course
Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationEfficient Spectral Methods for Learning Mixture Models!
Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory
UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationMixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes
Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationRecent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk
Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Vincent Y. F. Tan Department of Electrical and Computer Engineering, Department of Mathematics, National University
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More information17 Variational Inference
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More information