Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Size: px
Start display at page:

Download "Learning Tractable Graphical Models: Latent Trees and Tree Mixtures"

Transcription

1 Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade.

2 High-Dimensional Graphical Modeling Modeling Conditional Independencies through Graphs X u X v X S. Learning and inference are NP-hard. u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chelsea Arsenal Tractable Models: Tree Models Efficient inference using belief propagation B

3 Walk-up: Learning Tree Models Data processing inequality for Markov chains I(X 1 ;X 3 ) I(X 1 ;X 2 ),I(X 2 ;X 3 ) Tree Structure Estimation (Chow and Liu 68) MLE: Max-weight tree with estimated mutual information weights

4 Walk-up: Learning Tree Models Data processing inequality for Markov chains I(X 1 ;X 3 ) I(X 1 ;X 2 ),I(X 2 ;X 3 ) Tree Structure Estimation (Chow and Liu 68) MLE: Max-weight tree with estimated mutual information weights Pairwise statistics suffice

5 Walk-up: Learning Tree Models Data processing inequality for Markov chains I(X 1 ;X 3 ) I(X 1 ;X 2 ),I(X 2 ;X 3 ) Tree Structure Estimation (Chow and Liu 68) MLE: Max-weight tree with estimated mutual information weights Pairwise statistics suffice n samples and p nodes Sample complexity: logp n = O(1).

6 Learning Tractable Graphical Models Tractable Models: Tree Models Efficient inference using belief propagation MLE is easy to compute. Tree models are highly restrictive. B

7 Learning Tractable Graphical Models Tractable Models: Tree Models Efficient inference using belief propagation MLE is easy to compute. Tree models are highly restrictive. B Latent tree graphical models Tree models with hidden variables. Number and location of hidden variables unknown.

8 Application: Hierarchical Topic Modeling Data: Word co-occurrences. Graph: Topic-word structure. cancer car data dealer driver earth food format card case disease disk engine fttp games aids baseball children christian display doctor evidence facts bible bmw computer course dos drive fans files health help israel jesus mac mars nhl number hit hockey jews technology medicine memory orbit patients god government honda human launch law mission moon graphics gun image insurance league lunar msg nasa president problem satellite science program puck scsi season pc phone question religion players power research rights shuttle software solar space state studies system team server university version video vitamin war water win windows won world h1 h3 h4 h15 h7 h5 h6 h11 h14 h10 h3 h18 h16 h17 h12 h13 h9 h8 oil 2 games baseball fans nhl hit hockey league puck season players team win won

9 Application of Latent Trees: Object Recognition Challenge: Succinct representation of large-scale data Input: 100 object categories, 4000 training images Goal: learn co-occurrence probabilities Solution: Latent tree graphical models Context Models and Out-of-context Objects, M. J. Choi, A. Torralba, and A. S. Willsky, Pattern Recognition Letters, 2012.

10 Application of Latent Trees: Object Recognition Challenge: Succinct representation of large-scale data Input: 100 object categories, 4000 training images Goal: learn co-occurrence probabilities Solution: Latent tree graphical models Sky Floor Wall Road Area Car People Road Light Crosswalk Context Models and Out-of-context Objects, M. J. Choi, A. Torralba, and A. S. Willsky, Pattern Recognition Letters, 2012.

11 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H

12 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H Why use tree mixtures? Efficient Inference: BP on component trees and combining them. Similarly marginalization and sampling also efficient.

13 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H Why use tree mixtures? Efficient Inference: BP on component trees and combining them. Similarly marginalization and sampling also efficient. Learning: Alternatives to EM (Meila & Jordan)?

14 Tree Mixture Models Tree Mixture Models Multiple graphs: context specific dependencies Each component is a tree model Unsupervised learning: Class variable is latent or hidden H Why use tree mixtures? Efficient Inference: BP on component trees and combining them. Similarly marginalization and sampling also efficient. Learning: Alternatives to EM (Meila & Jordan)? In this talk: learning latent tree models and tree mixtures.

15 Summary of Results Latent Tree Models Number of hidden variables and location unknown Integrated structure and parameter estimation. Local learning with global consistency guarantees.

16 Summary of Results Latent Tree Models Number of hidden variables and location unknown Integrated structure and parameter estimation. Local learning with global consistency guarantees. Mixtures of Trees Structure and parameters under different contexts unknown Unsupervised setting: choice variable hidden. Efficient methods for consistent structure and parameter learning. H

17 Previous Approaches Algorithms for Structure Estimation Chow and Liu (68): Tree estimation Meinshausen and Bühlmann (06): Convex relaxation Ravikumar, Wainwright, Lafferty (10): Convex relaxation Bresler, Mossel and Sly (09): Bounded-degree graphs... Learning with Hidden Variables Erdös, et. al. (99): Latent trees Daskalakis, Mossel and Roch (06): Latent trees Choi, Tan, Anandkumar and Willsky (10): Latent trees Chandrasekaran, Parrilo and Willsky (11): Latent Gaussian models, Anandkumar et. al (12): Tensor decompositions...

18 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion

19 Learning Latent Tree Graphical Models h y v6 Linear Multivariate Models Conditional independence w.r.t tree Categorical k-state hidden variables. Multivariate d-dimensional observed variables. k d. When y is nbr. of h, E[y h] = Ah. Includes discrete, Poisson and Gaussian models, Gaussian mixtures etc.

20 Additive Tree Distances Information Distances [d i,j ] for Tree Models

21 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij.

22 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ].

23 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E)

24 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E)

25 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E)

26 Additive Tree Distances Information Distances [d i,j ] for Tree Models Gaussian scalar: d ij := log ρ ij. Linear multivariate models: d ij := log σ i 0 σ(e[y iy j ]) dete[y i y i ]dete[y jy j ]. [d i,j ] is an additive tree metric: d k,l = d i,j. (i,j) Path(k,l;E) Learning latent tree using [ˆd i,j ]

27 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships

28 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships

29 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships

30 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships

31 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships

32 Siblings Test Based on Information Distances Exact Statistics: Distances [d i,j ] Let Φ ijk := d i,k d j,k. d i,j <Φ ijk =Φ ijk <d i,j k,k i,j, i,j leaves with common parent Φ ijk = d i,j, k i,j, i is a leaf and j is its parent. Sample Statistics: ML Estimates [ˆd i,j ] Use only short distances: d i,k,d j,k < τ, Relax equality relationships

33 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.

34 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.

35 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.

36 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.

37 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.

38 Recursive Grouping Recursive Grouping Algorithm (Choi, Tan, A., Willsky) Sibling test and remove leaves Build tree from bottom up Consistent structure estimation. Serial method, high computational complexity.

39 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion

40 Overview of Proposed Parameter Learning Method Toy Model: 3-star Linear multivariate model. A r x i h := E(x i h = e r ). and λ r := P[h = e r ]. h x 1 x 2 x 3 E(x 1 x 2 x 3 ) = k r=1 λ ra r x 1 h Ar x 2 h Ar x 3 h. Guaranteed Recovery through Tensor Decomposition Transition matrices A xi h have full column rank. Linear algebraic operations: SVD and tensor power iterations. Tensor Decompositions for Learning Latent Variable Models by A. Anandkumar, R. Ge, D. Hsu, S.M. Kakade and M. Telgarsky. Preprint, October 2012.

41 Overview of Tensor Decomposition Technique Let a r = E(x i h = e r ) for all i and λ r := P[h = e r ]. M 3 = E[x 1 x 2 x 3 ] = k i=1 λ ia 3 i. h Intuition: if a i are orthogonal M 3 (I,a 1,a 1 ) := i λ i a i,a 1 2 a i = λ 1 a 1. a i are eigenvectors of the tensor M 3. x 1 x 2 x 3 Convert to an orthogonal tensor using pairwise moments M 2 := E[x 1 x 2 ] = i λ ia 2 i. Whitening matrix: W M 2 W = I. Consider tensor M 3 (W,W,W) := i λ i(w a i ) 3. It is an orthogonal tensor.

42 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.

43 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.

44 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.

45 Parameter Learning in Latent Trees v6 Learning through Hierarchical Tensor Decomposition Assume known tree structure. Decompose different triplets: hidden variable is join point on tree. Alignment issue Tensor decomposition is an unsupervised method. Hidden labels permuted across different triplets. Solution: Align using common node in triplets.

46 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion

47 Integrated Learning So far.. Consistent structure learning through sibling tests on distances. Parameter learning through tensor decomposition on triplets. Challenges How to integrate structure and parameter learning? Can we save on computations through integration? Can we learn parameters as we learn the structure? Can we parallelize learning for scalability?

48 Integrated Learning So far.. Consistent structure learning through sibling tests on distances. Parameter learning through tensor decomposition on triplets. Challenges How to integrate structure and parameter learning? Can we save on computations through integration? Can we learn parameters as we learn the structure? Can we parallelize learning for scalability? Key Ideas Divide and conquer: find (overlapping) groups of observed variables. Learn local subtrees (and parameters) over the groups independently. Merge subtrees and tweak parameters to obtain global latent tree model.

49 Parallel Chow-Liu Based Grouping Algorithm Minimum spanning tree using information distance [ˆd i,j ]. 4 v1 4 h3 2 h1 2 1 h2 3 2 v2 v3 v5 v6 v2 v3 v5 v6 h1 v2 v3 v5 v6 v2 v3 v4 v5 1 v6 v4 v1 h3 h2 v1 v3 v4 v5 h3 h2 v1 v3 v4 v5 h1 2 5 v2 v3 v5 v1 h3 2 h1 1 h2 v6 v2 2 2 v3 v4 v5 1 h3 h2 4 2 v6 v1 v3 v4 v5

50 Alignment of Parameters Alignment Correction In-group Across-group Across-neighborhood In-group Across-group Across-neighborhood

51 Theorem Consistency Guarantees The proposed method consistently recovers the structure with O(log p) samples and parameters with poly(p) samples. Extent of parallelism Size of groups Γ 1+u l δ. Effective depth δ := max i {min j {path(v i,v j ;T )}. Maximum degree in latent tree:. Upper and lower bound on distances between neighbors in the latent tree: u and l. Implications For homogeneous HMM, constant sized groups. Worst case: star graphs.

52 Computational Complexity N samples, d dimensional observed variables, k state hidden variables. p number of observed variables. z non-zero entries per sample. Γ sized groups. Algorithm Steps Time/worker Degree of parallelism Information Distance Estimation O(Nz +d+k 3 ) O(p 2 ) Structure: Minimum Spanning Tree O(logp) O(p 2 ) Structure: Local Recursive Grouping O(Γ 3 ) O(p/Γ) Parameter: Tensor Decomposition O(Γk 3 +Γdk 2 ) O(p/Γ) Merging and Alignment Correction O(dk 2 ) O(p/Γ) Integrated Structure and Parameter Learning in Latent Tree Graphical Models by F. Huang, U. N. Niranjan, A. Anandkumar. Preprint, June 2014.

53 Proof Ideas Relating Chow-Liu Tree with Latent Tree Surrogate Sg(i) for node i: observed node with strongest correlation Sg(i) := argmind i,j j V Neighborhood preservation (i,j) T (Sg(i),Sg(j)) T ML. Chow-Liu grouping reverses edge contractions Proof by induction

54 Experiments d = k = 2 dimensions, p = 9 number of variables Structure Error CLRG-EM Proposed error samples d p N Struct Error Param Error Running Time(s) K K K , K , k K K K

55 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion

56 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Manchester United

57 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Manchester United

58 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Estimation of union graph Manchester United

59 Mixtures of Graphical Models: Our Approach Our Approach Steps Consider data from graphical model mixture Output tree mixture: best tree approx. of each component Estimation of union graph H G 1 G 2 G Manchester United

60 Mixtures of Graphical Models: Our Approach Our Approach Steps Consider data from graphical model mixture Output tree mixture: best tree approx. of each component Estimation of union graph Estimation of pairwise moments in each component H G 1 G 2 G Manchester United

61 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Estimation of union graph Estimation of pairwise moments in each component G Manchester United Tree approximation of each component via Chow-Liu algorithm.

62 Mixtures of Graphical Models: Our Approach Our Approach Consider data from graphical model mixture Output tree mixture: best tree approx. of each component H G 1 G 2 Steps Estimation of union graph Estimation of pairwise moments in each component G Manchester United Tree approximation of each component via Chow-Liu algorithm. Efficient Learning of Tree Mixture Approximations

63 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions

64 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions G : union graph learnt from rank test H Baseball Mets G Manchester United

65 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions G : union graph learnt from rank test Use separators on union graph G : Baseball H Mets G Manchester United

66 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w X S,H w S Baseball v Mets G Manchester United u

67 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w X S,H w S Baseball v Mets G Manchester United u

68 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X S,H w S Baseball v Learn tree mixture approximation: estimate pairwise mixture moments Mets G Manchester United u

69 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w,w X S,H Learn tree mixture approximation: estimate pairwise mixture moments w w Mets S Baseball G v Manchester United u

70 Learning Graphical Model Mixtures Adapt Tensor Decomposition Method for Graphical Model Mixtures? Challenges Cannot reduce to triplet tensor decomposition Need to learn higher order moments of mixture components. H G 1 G 2 Solutions G : union graph learnt from rank test H Use separators on union graph G : X u X v X w,w X S,H Learn tree mixture approximation: estimate pairwise mixture moments w w Mets S Baseball G v Manchester United u Efficient Estimation of Tree Mixture Approximations

71 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models Dodgers? X u X v X S I(X u ;X v X S ) = 0 Red usox Baseball S v Yankees Mets Phillies Manchester United Chels Arsen

72 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? I(X u ;X v X S ) = 0 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen

73 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? I(X u ;X v X S ) = 0 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k)

74 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? I(X u ;X v X S ) = 0 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k) M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. M u,v,{s;k} =

75 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? Rank(M u,v,{s;k} ) = 1 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k) M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. M u,v,{s;k} =

76 Sparse Graphical Model Selection: Intuitions First consider a graphical model with no latent variables Markov Property of Graphical Models X u X v X S? Rank(M u,v,{s;k} ) = 1 Alternative Test for Conditional Independence? u Red Sox Yankees Mets Dodgers Phillies S Baseball v Manchester United Chels Arsen P(X u = i,x v = j X S = k) = P(X u = i X S = k) P(X v = j X S = k) M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. M u,v,{s;k} = Rank Test on Pairwise Probability Matrices

77 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. H G 1 G 2

78 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. H G 1 G 2

79 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S H G 1 G 2 H u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse

80 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S H G 1 G 2 u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse

81 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S H G 1 G 2 H u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse

82 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2 H u Yankees Mets Dodgers Phillies S Baseball v Manchester United Chel Arse

83 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2 H P(X u,x v X S ) = r h=1 u Dodgers S Baseball Yankees Mets Phillies v Manchester United P(X u X S,H = h) P(H = h X S ) P(X v X S,H = h) Chel Arse

84 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2 M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. H M u,v,{s;k} = d d P(X u,x v X S ) = r h=1 r r r d d r u Dodgers S Baseball Yankees Mets Phillies v Manchester United P(X u X S,H = h) P(H = h X S ) P(X v X S,H = h) Chel Arse

85 Extending Rank Tests to Mixtures Dimension of latent H is r and each observed variable is d > r. First assume Markov graph is the same for all components. X u X v X S,H H G 1 G 2? Rank(M u,v,{s;k} ) = r M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j. H M u,v,{s;k} = d d P(X u,x v X S ) = r h=1 r r r d d r u Dodgers S Baseball Yankees Mets Phillies v Manchester United P(X u X S,H = h) P(H = h X S ) P(X v X S,H = h) Chel Arse

86 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S v Manchester United Chels Arsen

87 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S Small η computationally efficient, uses only low order statistics. v Manchester United Chels Arsen

88 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S Small η computationally efficient, uses only low order statistics. Examples of graphs G with small η Mixture of product distributions: G is trivial and η = 0. Mixture on same tree: G is a tree and η = 1. Mixture on arbitrary trees: G is union of r trees and η = r. v Manchester United Chels Arsen

89 Rank Test for Mixtures Dim(H) is r and each observed variable is d > r. G : union of Markov graphs of components. η: Bound on separators btw. node pairs in G. M u,v,{s;k} := [P(X u = i,x v = j,x S = k)] i,j Declare (u,v) as edge if min S V\{u,v} S η u Yankees Mets Dodgers Phillies S Baseball H max Rank(M u,v,{s;k} ;ξ n,p ) > r. k X S Small η computationally efficient, uses only low order statistics. Examples of graphs G with small η Mixture of product distributions: G is trivial and η = 0. Mixture on same tree: G is a tree and η = 1. Mixture on arbitrary trees: G is union of r trees and η = r. Simple Test for Estimation of Union Graph of Mixtures v Manchester United Chels Arsen

90 Guarantees on Rank Test Theorem (A., Hsu, Huang, Kakade 12) Rank test recovers graph structure G correctly w.h.p on p nodes under n samples when ρ 2 min logp = O(1). n ρ min : Min. (r +1) th singular value between neighbors.

91 Guarantees on Rank Test Theorem (A., Hsu, Huang, Kakade 12) Rank test recovers graph structure G correctly w.h.p on p nodes under n samples when ρ 2 min logp = O(1). n ρ min : Min. (r +1) th singular value between neighbors. Efficient Test with Low Sample and Computational Requirements.

92 Guarantees on Rank Test Theorem (A., Hsu, Huang, Kakade 12) Rank test recovers graph structure G correctly w.h.p on p nodes under n samples when ρ 2 min logp = O(1). n ρ min : Min. (r +1) th singular value between neighbors. Efficient Test with Low Sample and Computational Requirements. Recall examples of graphs G with small η Mixture of product distributions: G is trivial and η = 0. Mixture on same tree: G is a tree and η = 1. Mixture on arbitrary trees: G is union of r trees and η = r.

93 Guarantees for Learning Graphical Model Mixtures Steps Involved in Tree Mixture Approximation Rank tests for structure estimation of union graph G Tensor decomposition for estimation of pairwise moments of mixture components Chow-Liu algorithm to estimate mixture component trees Computationally Efficient Algorithm for Learning Graphical Model Mixtures Learning High-Dimensional Mixtures of Graphical Models by A. Anandkumar, D. Hsu, F. Huang, and S.M. Kakade. NIPS 2012.

94 Guarantees for Learning Graphical Model Mixtures Steps Involved in Tree Mixture Approximation Rank tests for structure estimation of union graph G Tensor decomposition for estimation of pairwise moments of mixture components Chow-Liu algorithm to estimate mixture component trees Computationally Efficient Algorithm for Learning Graphical Model Mixtures Theorem (A., Hsu, Huang, Kakade 12) The above method recovers correct tree mixture approximation correctly w.h.p on p nodes of r component mixture under n samples when n = poly(p,r). Learning High-Dimensional Mixtures of Graphical Models by A. Anandkumar, D. Hsu, F. Huang, and S.M. Kakade. NIPS 2012.

95 Guarantees for Learning Graphical Model Mixtures Steps Involved in Tree Mixture Approximation Rank tests for structure estimation of union graph G Tensor decomposition for estimation of pairwise moments of mixture components Chow-Liu algorithm to estimate mixture component trees Computationally Efficient Algorithm for Learning Graphical Model Mixtures Theorem (A., Hsu, Huang, Kakade 12) The above method recovers correct tree mixture approximation correctly w.h.p on p nodes of r component mixture under n samples when n = poly(p,r). Efficient Learning of Multiple Graphs and Models in High Dimensions Learning High-Dimensional Mixtures of Graphical Models by A. Anandkumar, D. Hsu, F. Huang, and S.M. Kakade. NIPS 2012.

96 Splice Data 0 error DNA SPLICE-junctions 60 variables(sequence of DNA bases), class variable Splice junction type: EI, IE, none EM Algorithm Proposed Algorithm Proposed EM Number of components Mixing weight estimation(r=3) log-likelihood Number of components error Conditional log-likelihood EM Algorithm Proposed Algorithm Proposed EM Number of components Classification Error

97 Outline 1 Introduction 2 Tests for Structure Learning 3 Parameter Learning through Tensor Methods 4 Integrating Structure and Parameter Learning 5 Mixtures of Trees 6 Conclusion

98 Summary and Outlook Learning Latent Tree Models Integrated Structure and Parameter Learning High level of parallelism without losing consistency. Learning Graphical Model Mixtures Tree mixture approximations Combinatorial search + spectral decomposition Computational and sample guarantees H G 1 G 2

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS

LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS Submitted to the Annals of Statistics arxiv: stat.ml/1203.0697 LEARNING HIGH-DIMENSIONAL MIXTURES OF GRAPHICAL MODELS By Animashree Anandkumar,, Daniel Hsu, Furong Huang,, and Sham Kakade Univ. of California

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

The Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM

The Generative Self-Organizing Map. A Probabilistic Generalization of Kohonen s SOM Submitted to: European Symposium on Artificial Neural Networks 2003 Universiteit van Amsterdam IAS technical report IAS-UVA-02-03 The Generative Self-Organizing Map A Probabilistic Generalization of Kohonen

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Learning Linear Bayesian Networks with Latent Variables

Learning Linear Bayesian Networks with Latent Variables Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar, Daniel Hsu y, Sham Kakade y University of California, Irvine y Microsoft Research,

More information

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints

Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Large-Deviations and Applications for Learning Tree-Structured Graphical Models

Large-Deviations and Applications for Learning Tree-Structured Graphical Models Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Spectral Methods for Learning Multivariate Latent Tree Structure

Spectral Methods for Learning Multivariate Latent Tree Structure Spectral Methods for Learning Multivariate Latent Tree Structure Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Kamalika Chaudhuri UC San Diego kamalika@cs.ucsd.edu Daniel Hsu Microsoft Research

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Ragupathyraj Valluvan UC Irvine rvalluva@uci.edu Abstract Graphical

More information

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

BAYESIAN MACHINE LEARNING.

BAYESIAN MACHINE LEARNING. BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees

More information

Efficient algorithms for estimating multi-view mixture models

Efficient algorithms for estimating multi-view mixture models Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Spectral Probabilistic Modeling and Applications to Natural Language Processing School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le

More information

Graphical model inference with unobserved variables via latent tree aggregation

Graphical model inference with unobserved variables via latent tree aggregation Graphical model inference with unobserved variables via latent tree aggregation Geneviève Robin Christophe Ambroise Stéphane Robin UMR 518 AgroParisTech/INRA MIA Équipe Statistique et génome 15 decembre

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning

Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning Journal of Machine Learning Research 12 (2017) 663-707 Submitted 1/10; Revised 10/10; Published 3/11 Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning Le Song lsong@cc.gatech.edu

More information

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Animashree Anandkumar 1, Daniel Hsu 2, Adel Javanmard 3, and Sham M. Kakade 2 1 Department of EECS, University of California,

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Discrete Inference and Learning Lecture 3

Discrete Inference and Learning Lecture 3 Discrete Inference and Learning Lecture 3 MVA 2017 2018 h

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Inferning with High Girth Graphical Models

Inferning with High Girth Graphical Models Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course Course on Bayesian Networks, winter term 2007 0/31 Bayesian Networks Bayesian Networks I. Bayesian Networks / 1. Probabilistic Independence and Separation in Graphs Prof. Dr. Lars Schmidt-Thieme, L. B.

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Efficient Spectral Methods for Learning Mixture Models!

Efficient Spectral Methods for Learning Mixture Models! Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant.

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk

Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Vincent Y. F. Tan Department of Electrical and Computer Engineering, Department of Mathematics, National University

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information