Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Size: px
Start display at page:

Download "Supplemental for Spectral Algorithm For Latent Tree Graphical Models"

Transcription

1 Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable tree learned by [1] for the stock market data, and the Chow Liu tree to give a more intuitive explanation why latent variable trees can lead to better performance.. The second is a more detailed representation of the tensor representation where internal nodes are allowed to be evidence variables. 3. The third is the proof of Theorem 1. 1 Latent Tree Structure for Stock Data The latent tree structure learned by the algorithm by [1] is shown in Figure 1. The blue nodes are hidden nodes and the red nodes are observed. ote how integrating out some of these hidden nodes could lead to very large cliques. Thus it is not surprising why both our spectral method and EM perform better than Chow Liu. The Chow Liu Tree is shown in Figure 1. ote how it is forced to pick some of the observed variables as hubs even if latent variables may be more natural. Figure 1: Latent variable tree learned by [1]. The hidden variables are in blue while the observed variables are in red. As one can see the hidden variables can model significantly more complex relationships among the observed variables. More Detailed Information about Tensor Representation for LTMs The computation of the marginal distribution of the observed variables can be expressed in terms of tensor multiplications. Basically, the information contained in each tensor will correspond to the information in a conditional probability table CPT of the model and the tensor multiplications implement the summations. However, there are multiple ways of rewriting the marginal distribution of the observed variables using tensor notation, and not all of them provide intuition or easy derivation to a spectral algorithm. In this section, we will derive a specific representation of the latent tree models which requires only tensors up to 3rd order and provides us a basis for deriving a spectral algorithm. More specifically, we first select a latent or observed variable as the root node and sort the nodes in the tree in topological order. Then we associate the root node X r with a vector r related to the marginal probability of X r. Depending on whether the root node is latent or observed, its entries are defined as X r latent X r observed rk P[X r = k] δ kxr P[x r ] where δ kxr is an indicator variable defined as δ kxr = 1 if k = x r and 0 otherwise. Effectively, δ kxr sets all entries of r to zero except the one corresponding to P[x r ]. ext, we associate each internal node X i with a 3rd order tensor T i related the conditional probabilty table between X i and its parent X πi. This tensor is diagonal in its nd and 3rd mode, and hence its nonzero entries can be accessed by two indices k and l. Depending on whether the internal node and its parent are latent or observed variables, the nonzero entries of T i are defined as 1

2 Figure : Tree learned by chow liu algorithm over only observed variables. ote how it is forced to pick some of the observed variables as hubs even if latent variables may be more natural. T i k, l, l X πi latent X πi observed X i latent P[X i = k X πi = l] δ lxπi P[X i = k x πi ] X i observed δ kxi P[ X πi = l] δ kxi δ lxπi P[ x πi ] where δ kxi and δ lxπi are also indicator variables. Effectively, the indicator variables zero out further entries in T i for those values that are not equal to the actual observation. Last, we associate each leaf node, which is always observed, with a diagonal matrix M i related to the likelihood of. Depending on whether the parent of is latent of observed, the diagonal entries of M i are defined as X πi latent X πi observed M i k, k P[ X πi = k] δ kxπi P[ x πi ] Let M i defined above be the messages passed from the leaf nodes to their parents. We can show that the marginal probability of the observed variables can be computed recursively using a message passing algorithm: each node in the tree sends a message to its parent according to the reverse topological order of the nodes, and the final messages are aggregated in the root to give the desired quantity. More formally, the outgoing message from an internal node X i to its parent can be computed as M i = T i 1 M j1 M j... M jj 1 i 1 where j 1, j,..., j J χ i range over all children of X i J = χ i. The 1 i is a vector of all ones with suitable size, and it is used to reduce the incoming messages all are diagonal matrices to a single vector. The computation in 1 essentially implements the message update we often see in an ordinary message passing algorithm [5], i.e., m i [x πi ] = P[ x πi ]m j1 [ ]... m jj [ ], where m j [ ] represents incoming messages to X i or intermediate results of the marginalization operation by summing out all latent variables in subtree T j. The M j1 M j... M jj 1 i corresponds to aggregating all incoming messages m j1 [ ]... m jj [ ], and the T i 1 corresponds to P[ x πi ]. At the root node, all incoming messages are combined to produce the final joint probability, i.e., P[x 1,..., x O ] = r M j1 M j... M jj 1 r. 3 Here r basically implements the operation x r P[x r ], which sums out the root variable.

3 3 otation for Proof of Theorem 1 We now proceed to prove Theorem 1. refers to spectral norm for matrices and tensors but normal euclidean norm for vectors. 1 refers to induced 1 norm for matrices and tensors max column sum, but normal l1 norm for vectors. F refers to Frobenius norm. The tensor spectral norm for 3 dimensions is defined in [4]: T = We will define the induced 1-norm of a tensor as sup T 3 v 3 v 1 v 1 4 v i 1 T 1,1 = sup T 1 v 1 5 v 1 1 using the l 1 norm of a matrix i.e., A 1 = sup v 1 1 Av 1. For more information about matrix norms see []. In general, we suppress the actual subscripts/superscripts on U and O. It is implied that U and O can often be different depending on the transform being considered. However, this makes the notation very messy. It will generally be clear from context which U and O are being referred to. When it is not we will arbitrarily index them 1,,..., so that it is clear which corresponds to which. In general, for simplicity of exposition, we assume that all internal nodes in the tree are unobserved, and all leaves are observed since this is the hardest case. The proof generally follows the technique of HKZ [3], but has key differences due to the tree topology instead of the HMM. We define M i = O Û 1 M i O Û. Then as long as O Û is invertible, O Û 1 Mi O Û = M i. We admit this is a slight abuse of notation, since M i was previously defined to be U O 1 M i U O, but as long as O Û is invertible it doesn t really matter whether it equals U O or not for the purposes of this proof. The other quantities are defined similarly. We seek to prove the following theorem: Theorem 1 Pick any ɛ > 0, δ < 1. Let O 1 d max S H l+1 S O ɛ min i σ SH O i log O min i j σ SH P i,j 4 δ Then with probability 1 δ P[x 1,..., x O ] P[x 1,..., x O ] ɛ 7 In many cases, if the frequency of the observation symbols follow certain distributions, than the dependence on S O can be removed as showed in HKZ [3]. That observation can easily be incorporated into our theorem if desired. 4 Concentration Bounds x denotes a fixed element while i, j, k are over indices. As the number of samples gets large, we expect these quantities to be small. ɛ i = P i P i F 8 ɛ i,j = P i,j P i,j F 9 ɛ x,i,j = P x,i,j P x,i,j F 10 ɛ i,j,k = P i,j,k P i,j,k F 11 Lemma 1 variant of HKZ [3] If the algorithm independently samples observation triples from the tree, then with probability at least 1 δ. C O 1 ɛ i ln δ + 1 C O 1 ɛ i,j ln δ + 13 C O 1 ɛ i,j,k ln δ + 14 C max ɛ O 1 i,x,j ln x δ + 15 max ɛ SO O x,i,j ln x δ + SO

4 where C is some constant from the union bound over OV 3. V is the total number of observed variables in the tree. The proof is the same as that of HKZ [3] except the union bound is larger. The last bound can be made tighter, identical to HKZ, but for simplicity we do not pursue that approach here. 5 Eigenvalue Bounds Basically this is Lemma 9 in HKZ [3], which is stated below for completeness: Lemma Suppose ɛ i,j ε σ SH P i,j for some ε < 1/. Let ε 0 = ɛ i,j /1 εσ S H P i,j. Then: 1. ε 0 < 1. σ SH Û Pi,j 1 εσ SH P i,j 3. σ SH Û P i,j 1 ε 0 σ SH P i,j 4. σ SH O Û 1 εσ SH O The proof is in HKZ [3]. 6 Bounding the Transformed Quantities If Lemma holds then O Û is invertible. Thus, if we define M i = O Û 1 M i O Û. Then clearly, Û O 1 Mi O Û = M i. We admit this is a slight abuse of notation, since M i is previously defined to be U O 1 M i U O, but as long as O Û is invertible it doesn t really matter whether it equals U O or not for the purposes of this proof. The other quantities are defined similarly. We seek to bound the following four quantities: δ i one = O Û 1 i 1 i 1 17 γ i = T i T i 1 O 1 Û1 1 O Û 3 O 3 Û δ root = r r T O j 1 rûj i = O 1 Û1 M i M i O Û Here denotes all observations that are in the subtree of node i since i may be hidden or observed. Sometimes we like to distinguish between when when i is observed and when i is hidden.thus, we sometimes refer to the quantity obs i and hidden i for when i is observed or hidden respectively. Again note that the numbering in O1 Û1 and O Û is just there to avoid confusion in the same equation In reality there are many U s and O s. Lemma 3 Assume ɛ i,j σ SH P i,j /3 for all i j. Then ɛ r δ root 3σSH O j 1 r δone i 4 S H hidden i ɛ i,j σ SH P i,j + ɛ m,j σ SH P i,j + ɛ i 3σSH P i,j ɛ m,j,k 3σSH P i,j γ i 4 S H σ SH O J J 1 + γ i 1 + jk δone i γ i m 1 + jk m obs SH i 4 σ SH O ɛ i,j σ SH P j,i + ɛ m,xi,j 3σSH P i,j The main challenge in this part is v and γv hidden. The rest are similar to HKZ. However, we go through the other bounds to be more explicit about some of the properties used, since sometimes we have used different norms etc. 6.1 δ root We note that r = P j Û and similarly r = P 1 j Û. 1 δ root = r r Oj 1 rûj 1 1 P j P 1 j 1 Ûj1 Oj 1 rûj P j P 1 j O ɛ r 1 j1 rûj 1 1 σ SH Oj 1 rûj 1 7 4

5 The first inequality follows from the relationship between l and l norm and submultiplicativity. The second follows from a matrix perturbation bound given in Lemma 91. We also use the fact that since Û is orthonormal it has spectral norm 1. ɛ Assuming that ɛ i,j σ SH P i,j /3 gives δ root r by Lemma. 6. δ i one 3σSH O j 1 r δ i one = O Û 1 i 1 i 1 S H O Û 1 i 1 8 = S H 1 i 1 i = S H 1 i 1 i 9 Here we have converted l 1 norm to l norm, used submultiplicativity, the fact that Û is orthonormal so has spectral norm 1, and that O is a conditional probability matrix and therefore also has spectral norm 1. We note that 1 i = P i,j Û + Pi and similarly 1 i = P i,j Û + P i, where i and j are a particular pair of observations described in the main paper. 1 i 1 i = P T m,jû+ Pj P T m,jû + P j 30 = P m,jû+ Pj P m,jû+ Pj + P m,jû+ Pj P m,jû+ P j 31 P m,jû+ Pj P m,jû+ Pj + P m,jû+ Pj P m,jû+ P j 3 P m,jû+ P m,jû+ ˆP j 1 + P m,jû+ P m,jû+ P j P j ɛ m,j minσ SH P + ɛ j 34 m,j, σ SH P m,jû σ SH P m,jû where we have used the triangle inequality in the first inequality and the submultiplicative property of matrix norms in the second. The last inequality follows by matrix perturbation bounds. Thus using the assumption that ɛ i,j σ SH P i, j/3, we get that 6.3 Tensor δ one 4 S H ɛ i,j σ SH P i,j + ɛ i 3σSH P i,j Recall that T i = T i 1 O 1 Û1 Û O 1 3 O 3 Û3 = P m,j,k 1 Û 1 P l,j Û + 3 Û 3. Similarly, T i = P m,j,k 1 Û 1 P l,j Û + 3 Û T i T i 1 O 1 Û1 1 O Û 3 O 3 Û3 1 1,1 SH σ SH O T i T i 36 This is because both Û and O have spectral norm one and the S H factor is the cost of converting from 1 norm to spectral norm. T i T i = P m,j,k 1 Û 1 P l,j Û + 3 Û 3 P m,j,k 1 Û 1 P l,j Û + 3 Û 3 37 = P m,j,k 1 Û 1 P i,k Û + 3 Û 3 P m,j,k 1 Û 1 P l,j Û Û P m,j,k 1 Û 1 P l,j Û + 3 Û 3 P m,j,k 1 Û 1 P l,j Û + 3 Û 3 39 = P m,j,k 1 Û 1 P l,j Û + P l,j Û + 3 Û P i,j,k 1 Û 1 3 Û 3 P i,j,k 1 Û 1 3 Û 3 P l,j Û + 41 = P m,j,k ɛ l,j ɛ m,j,k min σ SH P l,j, σ SH P l,j Û + σ SH P l,j Û 4 It is clear that P m,j,k P m,j,k F 1. Using the fact that ɛ i,j σ SH P i,j /3 gives us the following bound: γ v 4 S H σ SH O ɛ i,j σ SH P i,j + ɛ i,j,k 3σSH P i,j 5 43

6 6.4 Bounding i We now seek to bound i = O1 Û1 M i M i Û O 1 1. There are two cases: either i is a leaf or it is not i is leaf node In this case our proof simply follows from HKZ [3] and is repeated here for convenience. O 1 Û1 M MO Û 1 1 S H O 1 1 M i M i O Û 1 44 ote that M i = P j,i Û 1 1 Pm,xi,jÛ and M i = P j,i Û 1 1 P m,xi,jû. S H M i M i σ SH O Û 45 M i M i = P j,i Û 1 1 Pm,xi,jÛ P j,i Û 1 1 P m,xi,jû 46 = P j,i Û 1 1 Pm,xi,jÛ P j,i Û 1 1 Pm,xi,jÛ + P j,i Û 1 1 Pm,xi,jÛ Û 1 P x,i,j Û P i,j 1 47 P j,i Û 1 1 P j,i Û 1 1 P m,xi,jû + P j,i Û 1 1 P m,xi,jû P m,xi,jû 48 P m,xi,j P[ = x] ɛ j,i ɛ m,,j min σ SH P j,i, σ SH P j,i Û + σ SH P j,i Û ɛ j,i ɛ m,,j min σ SH P j,i, σ SH P j,i Û + σ SH P j,i Û where the first inequality follows from the triangle inequality, and the second uses matrix perturbation bounds and the fact that spectral norm of Û is 1. The final inequality follows from the fact that spectral norm is less than frobenius norm which is less than l1 norm: P m,xi,j m,j[ P m,xi,j] m,j [P m,xi,j] m,j P[ = x] 51 m,j The first inequality follows from relation between 1 operator norm and operator norm. Because O is a conditional probability matrix O 1 = 1 i.e. the max column sum is 1. Using the fact that ɛ i,j σ SH P i,j /3 gives us the following bound: SH i,x 4 σ SH O ɛ m,xi,j P[ = x] σ SH P j,i + ɛ m,xi,j 3σSH P j,i Summing over v would give SH i 4 σ SH O ɛ j,i σ SH P j,i + ɛ m,xi,j 3σSH P j,i 53 6

7 6.4. i is not a leaf node Let m J:1 = M J... M 1 1 i and m J:1 = M J... M 1 1 i O Û ˆM i M i O 3 Û = O Û T i 1 Mu... M 1 1 i T i 1 Mu... M 1 1 i O 3 Û = O Û T i 1 m J:1 T i 1 m J:1 O 3 Û = O Û T i T i 1 m J:1 + T i T i 1 m J:1 m J:1 + T i 1 m J:1 m J:1 O3 Û T i T i 1 O 1 Û1 1 Û O 3 O 3 Û3 1 1,1 O 1 Û 1 m J: O1 Û1 m J:1 m J:1 1 T i T i 1 O1 Û1 1 Û O 3 O3 Û3 1 1, T i 1 O Û 1 1 Û O 3 O3 Û3 1 O 1,1 1 Û 1 m J:1 m J: First term is bounded by: T i T i 1 O 1 Û1 1 Û O 3 O 3 Û3 1 1,1 S H S H γ i 61 Second term is bounded by: O 1 Û1 m J:1 m J:1 1 T i T i 1 O 1 Û1 1 Û O 3 O 3 Û3 1 1,1 6 γ i O 1 Û1 m J:1 m J: Third Term is bounded by: T i 1 O1 Û1 1 Û O 3 O3 Û3 1 1,1 O1 Û1 m J:1 m J:1 O 1 Û 1 m J:1 m J: In the next section, we will see that J O Û m J:1 m J: j k δone i J + S H 1 + j k S H. So the overall bound is J J i 1 + γ i 1 + jk δone i γ i S H 1 + jk S H. 65 where j 1,..., j J are children of node i. 6.5 Bounding O Û m J:1 m J:1 1 Lemma 4 O Û m J:1 m J:1 1 where j 1,..., j J are children of node i. The proof is by induction. Base case: O Û 1 i 1 i 1 δ i one, by definition of δ i one. Inductive step: Let us say claim holds up until u 1. We show it holds for u. Thus J J 1 + jk δone i + S H 1 + jk S H 66 u 1 u 1 O Û m u 1:1 m u 1: jk δone i + S H 1 + jk S H 67 7

8 We now decompose the sum over x as O Û m u:1 m u:1 1 = O Û M u M u m u 1:1 + M u M u m u 1:1 m u 1:1 + m u 1:1 m u 1: Using the triangle inequality, we get O Û M u M u O 1 Û1 1 1 O 1 Û1 m u 1: O Û M u M u O 1 Û1 1 1 O 1 Û1 m u 1:1 m u 1: O Û M u O Û 1 1 O Û m u 1:1 m u 1: Again we are just numbering the U s and O s for clarity to see which corresponds with which. They are omitted in the actual theorem statements since we will take minimums etc. at the end. We now must bound these terms. First term: O Û M u M u O Û 1 1 O1 Û1 m u 1:1 1 u m u 1:1 O Û 1 S H u 7 x u x 1:u 1 x u 1:1 since u = O Û M u M u O 1 Û Second term can be bounded by inductive hypothesis: O Û M u M u O 1 Û1 1 1 O 1 Û1 m u 1:1 m u 1:1 1 u u 1 u jk δone i + S H 1 + jk S H 73 The third term is bounded by observing that O Û M u O Û 1 = diagpr[x u Parent]. Thus it is diagonal, and Pr[x Parent] has max row or column sum as 1. This means that the third term is bounded by the inductive hypothesis as well: u 1 O Û M u O Û 1 u 1 1 O Û m u 1:1 m u 1: jk δone i + S H 1 + jk S H 74 7 Bounding the propagation of error in tree We now wrap up the proof based on the approach of HKZ[3]. Lemma 5 J J P[x 1,..., x O ] P[x 1,..., x O ] S H δ root δ root 1 + jk δone r + S H 1 + jk S H P[x 1,..., x O ] P[x 1,..., x O ] = r Mj1... M jj 1 r r Mj1... M jj 1 r 76 r r O Û 1 O Û M J: r r O Û 1 O Û M J:1 1 r M J:1 1 r 78 r O Û 1 O Û M J:1 1 i M J: The first sum is bounded using Holder inequality and noting that the first term is a conditional probability of all observed variables conditioned on the root r r O Û 1 O Û M J: r r O Û 1 O Û M J:1 1 1 S H δ root 81 8

9 The second sum is bounded by another application of Holder s inequality and the previous lemma: r r O Û 1 O Û M J:1 1 r M J:1 1 r 8 r r O Û 1 O Û M J:1 1 r M J:1 1 r 1 83 J J δ root 1 + jk δone r + S H 1 + jk S H 84 The third sum is also bounded by Holder s Inequality and previous lemmas and noting that r U O 1 = P[R = r]: r O Û 1 O Û M J:1 1 i M J: r O Û 1 O Û M J:1 1 r M J:1 1 r 1 86 J J 1 + jk δone r + S H 1 + jk S H 87 Combining these bounds gives us the desired solution. 8 Putting it all together We seek for P[x 1,..., x O ] P[x 1,..., x O ] ɛ 88 Using the fact that for a <.5, 1 + a/t 1 + a, we get that jk Oɛ/S H J. However, j is defined recursively, and thus the error accumulates exponential in the longest path of hidden nodes. For example, obs ɛ i O where l is the d maxs H l longest path of hidden nodes. Tracing this back through will gives the result: Pick any ɛ > 0, δ < 1. Let 1 d max S H l+1 S O O ɛ min i σ SH O i log O 89 min i j σ SH P i,j 4 δ Then with probability 1 δ P[x 1,..., x O ] P[x 1,..., x O ] ɛ 90 In many cases, if the frequency of the observation symbols follow certain distributions, than the dependence on S O can be removed as showed in HKZ [3]. 9 Appendix 9.1 Matrix Perturbation Bounds This is Theorem 3.8 from pg. 143 in Stewart and Sun, 1990 [6]. Let A R m n, with m n and let à = A + E. Then References Ã+ A max A +, à E 91 [1] Myung J. Choi, Vicent Y. Tan, Animashree Anandkumar, and Alan S. Willsky. Learning latent tree graphical models. In arxiv:1009.7v1, 010. [] R.A. Horn and C.R. Johnson. Matrix analysis. Cambridge Univ Pr, [3] D. Hsu, S. Kakade, and T. Zhang. A spectral algorithm for learning hidden markov models. In Proc. Annual Conf. Computational Learning Theory, 009. [4].H. guyen, P. Drineas, and T.D. Tran. Tensor sparsification via a bound on the spectral norm of random tensors. Arxiv preprint arxiv: , 010. [5] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, [6] G.W. Stewart and J. Sun. Matrix perturbation theory, volume 175. Academic press ew York,

A Spectral Algorithm For Latent Junction Trees - Supplementary Material

A Spectral Algorithm For Latent Junction Trees - Supplementary Material A Spectral Algoritm For Latent Junction Trees - Supplementary Material Ankur P. Parik, Le Song, Mariya Isteva, Gabi Teodoru, Eric P. Xing Discussion of Conditions for Observable Representation Te observable

More information

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics

More information

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Spectral Learning of General Latent-Variable Probabilistic Graphical Models: A Supervised Learning Approach

Spectral Learning of General Latent-Variable Probabilistic Graphical Models: A Supervised Learning Approach Spectral Learning of General Latent-Variable Probabilistic Graphical Models: A Supervised Learning Approach Borui Wang Department of Computer Science, Stanford University, Stanford, CA 94305 WBR@STANFORD.EDU

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Cours 7 12th November 2014

Cours 7 12th November 2014 Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along

More information

Hierarchical Tensor Decomposition of Latent Tree Graphical Models

Hierarchical Tensor Decomposition of Latent Tree Graphical Models Le Song, Haesun Park {lsong,hpark}@cc.gatech.edu College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Mariya Ishteva mariya.ishteva@vub.ac.be ELEC, Vrije Universiteit Brussel,

More information

Reduced-Rank Hidden Markov Models

Reduced-Rank Hidden Markov Models Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume

More information

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A). Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning

Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning Journal of Machine Learning Research 12 (2017) 663-707 Submitted 1/10; Revised 10/10; Published 3/11 Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning Le Song lsong@cc.gatech.edu

More information

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 Inference: Exploiting Local Structure aphne Koller Stanford University CS228 Handout #4 We have seen that N inference exploits the network structure, in particular the conditional independence and the

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Efficient Sensitivity Analysis in Hidden Markov Models

Efficient Sensitivity Analysis in Hidden Markov Models Efficient Sensitivity Analysis in Hidden Markov Models Silja Renooij Department of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands silja@cs.uu.nl

More information

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures

Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling

More information

A Spectral Algorithm for Latent Junction Trees

A Spectral Algorithm for Latent Junction Trees A Algorithm for Latent Junction Trees Ankur P. Parikh Carnegie Mellon University apparikh@cs.cmu.edu Le Song Georgia Tech lsong@cc.gatech.edu Mariya Ishteva Georgia Tech mariya.ishteva@cc.gatech.edu Gabi

More information

Section 3.9. Matrix Norm

Section 3.9. Matrix Norm 3.9. Matrix Norm 1 Section 3.9. Matrix Norm Note. We define several matrix norms, some similar to vector norms and some reflecting how multiplication by a matrix affects the norm of a vector. We use matrix

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Chapter 14, Sections 1 4 Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Tensor rank-one decomposition of probability tables

Tensor rank-one decomposition of probability tables Tensor rank-one decomposition of probability tables Petr Savicky Inst. of Comp. Science Academy of Sciences of the Czech Rep. Pod vodárenskou věží 2 82 7 Prague, Czech Rep. http://www.cs.cas.cz/~savicky/

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

NOTES ON LINEAR ODES

NOTES ON LINEAR ODES NOTES ON LINEAR ODES JONATHAN LUK We can now use all the discussions we had on linear algebra to study linear ODEs Most of this material appears in the textbook in 21, 22, 23, 26 As always, this is a preliminary

More information

Fourier analysis of boolean functions in quantum computation

Fourier analysis of boolean functions in quantum computation Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Image Data Compression

Image Data Compression Image Data Compression Image data compression is important for - image archiving e.g. satellite data - image transmission e.g. web data - multimedia applications e.g. desk-top editing Image data compression

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Representing Independence Models with Elementary Triplets

Representing Independence Models with Elementary Triplets Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional

More information

Deriving the EM-Based Update Rules in VARSAT. University of Toronto Technical Report CSRG-580

Deriving the EM-Based Update Rules in VARSAT. University of Toronto Technical Report CSRG-580 Deriving the EM-Based Update Rules in VARSAT University of Toronto Technical Report CSRG-580 Eric I. Hsu Department of Computer Science University of Toronto eihsu@cs.toronto.edu Abstract Here we show

More information

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College

More information

A = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1

A = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1 Lecture Notes: Determinant of a Square Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Determinant Definition Let A [a ij ] be an

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals

Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals Robin Evans Abstract In their 2007 paper, E.S. Allman and J.A. Rhodes characterise the phylogenetic ideal of general

More information

Computational Graphs, and Backpropagation

Computational Graphs, and Backpropagation Chapter 1 Computational Graphs, and Backpropagation (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction We now describe the backpropagation algorithm for calculation of derivatives

More information

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Spectral Probabilistic Modeling and Applications to Natural Language Processing School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

A concise proof of Kruskal s theorem on tensor decomposition

A concise proof of Kruskal s theorem on tensor decomposition A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

V C V L T I 0 C V B 1 V T 0 I. l nk

V C V L T I 0 C V B 1 V T 0 I. l nk Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Sequence modelling. Marco Saerens (UCL) Slides references

Sequence modelling. Marco Saerens (UCL) Slides references Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Variable Elimination: Algorithm

Variable Elimination: Algorithm Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product

More information

Bayesian networks. Chapter Chapter

Bayesian networks. Chapter Chapter Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions

More information

Large-Deviations and Applications for Learning Tree-Structured Graphical Models

Large-Deviations and Applications for Learning Tree-Structured Graphical Models Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

12. Perturbed Matrices

12. Perturbed Matrices MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

A PRIMER ON SESQUILINEAR FORMS

A PRIMER ON SESQUILINEAR FORMS A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form

More information

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS

METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department

More information

Ramanujan Graphs of Every Size

Ramanujan Graphs of Every Size Spectral Graph Theory Lecture 24 Ramanujan Graphs of Every Size Daniel A. Spielman December 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The

More information

Tree-Structured Statistical Modeling via Convex Optimization

Tree-Structured Statistical Modeling via Convex Optimization Tree-Structured Statistical Modeling via Convex Optimization James Saunderson, Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky Abstract We develop a semidefinite-programming-based approach

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings

Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings Thomas J. Best, Burhaneddin Sandıkçı, Donald D. Eisenstein University of Chicago Booth School of

More information

Hidden Markov Models (recap BNs)

Hidden Markov Models (recap BNs) Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view

More information

arxiv: v1 [math.ra] 13 Jan 2009

arxiv: v1 [math.ra] 13 Jan 2009 A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

Comparing Bayesian Networks and Structure Learning Algorithms

Comparing Bayesian Networks and Structure Learning Algorithms Comparing Bayesian Networks and Structure Learning Algorithms (and other graphical models) marco.scutari@stat.unipd.it Department of Statistical Sciences October 20, 2009 Introduction Introduction Graphical

More information

Speeding up Inference in Markovian Models

Speeding up Inference in Markovian Models Speeding up Inference in Markovian Models Theodore Charitos, Peter de Waal and Linda C. van der Gaag Institute of Information and Computing Sciences, Utrecht University P.O Box 80.089, 3508 TB Utrecht,

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

11 The Max-Product Algorithm

11 The Max-Product Algorithm Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Basic Concepts in Linear Algebra

Basic Concepts in Linear Algebra Basic Concepts in Linear Algebra Grady B Wright Department of Mathematics Boise State University February 2, 2015 Grady B Wright Linear Algebra Basics February 2, 2015 1 / 39 Numerical Linear Algebra Linear

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information