Supplemental for Spectral Algorithm For Latent Tree Graphical Models
|
|
- Brittney Harrington
- 5 years ago
- Views:
Transcription
1 Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable tree learned by [1] for the stock market data, and the Chow Liu tree to give a more intuitive explanation why latent variable trees can lead to better performance.. The second is a more detailed representation of the tensor representation where internal nodes are allowed to be evidence variables. 3. The third is the proof of Theorem 1. 1 Latent Tree Structure for Stock Data The latent tree structure learned by the algorithm by [1] is shown in Figure 1. The blue nodes are hidden nodes and the red nodes are observed. ote how integrating out some of these hidden nodes could lead to very large cliques. Thus it is not surprising why both our spectral method and EM perform better than Chow Liu. The Chow Liu Tree is shown in Figure 1. ote how it is forced to pick some of the observed variables as hubs even if latent variables may be more natural. Figure 1: Latent variable tree learned by [1]. The hidden variables are in blue while the observed variables are in red. As one can see the hidden variables can model significantly more complex relationships among the observed variables. More Detailed Information about Tensor Representation for LTMs The computation of the marginal distribution of the observed variables can be expressed in terms of tensor multiplications. Basically, the information contained in each tensor will correspond to the information in a conditional probability table CPT of the model and the tensor multiplications implement the summations. However, there are multiple ways of rewriting the marginal distribution of the observed variables using tensor notation, and not all of them provide intuition or easy derivation to a spectral algorithm. In this section, we will derive a specific representation of the latent tree models which requires only tensors up to 3rd order and provides us a basis for deriving a spectral algorithm. More specifically, we first select a latent or observed variable as the root node and sort the nodes in the tree in topological order. Then we associate the root node X r with a vector r related to the marginal probability of X r. Depending on whether the root node is latent or observed, its entries are defined as X r latent X r observed rk P[X r = k] δ kxr P[x r ] where δ kxr is an indicator variable defined as δ kxr = 1 if k = x r and 0 otherwise. Effectively, δ kxr sets all entries of r to zero except the one corresponding to P[x r ]. ext, we associate each internal node X i with a 3rd order tensor T i related the conditional probabilty table between X i and its parent X πi. This tensor is diagonal in its nd and 3rd mode, and hence its nonzero entries can be accessed by two indices k and l. Depending on whether the internal node and its parent are latent or observed variables, the nonzero entries of T i are defined as 1
2 Figure : Tree learned by chow liu algorithm over only observed variables. ote how it is forced to pick some of the observed variables as hubs even if latent variables may be more natural. T i k, l, l X πi latent X πi observed X i latent P[X i = k X πi = l] δ lxπi P[X i = k x πi ] X i observed δ kxi P[ X πi = l] δ kxi δ lxπi P[ x πi ] where δ kxi and δ lxπi are also indicator variables. Effectively, the indicator variables zero out further entries in T i for those values that are not equal to the actual observation. Last, we associate each leaf node, which is always observed, with a diagonal matrix M i related to the likelihood of. Depending on whether the parent of is latent of observed, the diagonal entries of M i are defined as X πi latent X πi observed M i k, k P[ X πi = k] δ kxπi P[ x πi ] Let M i defined above be the messages passed from the leaf nodes to their parents. We can show that the marginal probability of the observed variables can be computed recursively using a message passing algorithm: each node in the tree sends a message to its parent according to the reverse topological order of the nodes, and the final messages are aggregated in the root to give the desired quantity. More formally, the outgoing message from an internal node X i to its parent can be computed as M i = T i 1 M j1 M j... M jj 1 i 1 where j 1, j,..., j J χ i range over all children of X i J = χ i. The 1 i is a vector of all ones with suitable size, and it is used to reduce the incoming messages all are diagonal matrices to a single vector. The computation in 1 essentially implements the message update we often see in an ordinary message passing algorithm [5], i.e., m i [x πi ] = P[ x πi ]m j1 [ ]... m jj [ ], where m j [ ] represents incoming messages to X i or intermediate results of the marginalization operation by summing out all latent variables in subtree T j. The M j1 M j... M jj 1 i corresponds to aggregating all incoming messages m j1 [ ]... m jj [ ], and the T i 1 corresponds to P[ x πi ]. At the root node, all incoming messages are combined to produce the final joint probability, i.e., P[x 1,..., x O ] = r M j1 M j... M jj 1 r. 3 Here r basically implements the operation x r P[x r ], which sums out the root variable.
3 3 otation for Proof of Theorem 1 We now proceed to prove Theorem 1. refers to spectral norm for matrices and tensors but normal euclidean norm for vectors. 1 refers to induced 1 norm for matrices and tensors max column sum, but normal l1 norm for vectors. F refers to Frobenius norm. The tensor spectral norm for 3 dimensions is defined in [4]: T = We will define the induced 1-norm of a tensor as sup T 3 v 3 v 1 v 1 4 v i 1 T 1,1 = sup T 1 v 1 5 v 1 1 using the l 1 norm of a matrix i.e., A 1 = sup v 1 1 Av 1. For more information about matrix norms see []. In general, we suppress the actual subscripts/superscripts on U and O. It is implied that U and O can often be different depending on the transform being considered. However, this makes the notation very messy. It will generally be clear from context which U and O are being referred to. When it is not we will arbitrarily index them 1,,..., so that it is clear which corresponds to which. In general, for simplicity of exposition, we assume that all internal nodes in the tree are unobserved, and all leaves are observed since this is the hardest case. The proof generally follows the technique of HKZ [3], but has key differences due to the tree topology instead of the HMM. We define M i = O Û 1 M i O Û. Then as long as O Û is invertible, O Û 1 Mi O Û = M i. We admit this is a slight abuse of notation, since M i was previously defined to be U O 1 M i U O, but as long as O Û is invertible it doesn t really matter whether it equals U O or not for the purposes of this proof. The other quantities are defined similarly. We seek to prove the following theorem: Theorem 1 Pick any ɛ > 0, δ < 1. Let O 1 d max S H l+1 S O ɛ min i σ SH O i log O min i j σ SH P i,j 4 δ Then with probability 1 δ P[x 1,..., x O ] P[x 1,..., x O ] ɛ 7 In many cases, if the frequency of the observation symbols follow certain distributions, than the dependence on S O can be removed as showed in HKZ [3]. That observation can easily be incorporated into our theorem if desired. 4 Concentration Bounds x denotes a fixed element while i, j, k are over indices. As the number of samples gets large, we expect these quantities to be small. ɛ i = P i P i F 8 ɛ i,j = P i,j P i,j F 9 ɛ x,i,j = P x,i,j P x,i,j F 10 ɛ i,j,k = P i,j,k P i,j,k F 11 Lemma 1 variant of HKZ [3] If the algorithm independently samples observation triples from the tree, then with probability at least 1 δ. C O 1 ɛ i ln δ + 1 C O 1 ɛ i,j ln δ + 13 C O 1 ɛ i,j,k ln δ + 14 C max ɛ O 1 i,x,j ln x δ + 15 max ɛ SO O x,i,j ln x δ + SO
4 where C is some constant from the union bound over OV 3. V is the total number of observed variables in the tree. The proof is the same as that of HKZ [3] except the union bound is larger. The last bound can be made tighter, identical to HKZ, but for simplicity we do not pursue that approach here. 5 Eigenvalue Bounds Basically this is Lemma 9 in HKZ [3], which is stated below for completeness: Lemma Suppose ɛ i,j ε σ SH P i,j for some ε < 1/. Let ε 0 = ɛ i,j /1 εσ S H P i,j. Then: 1. ε 0 < 1. σ SH Û Pi,j 1 εσ SH P i,j 3. σ SH Û P i,j 1 ε 0 σ SH P i,j 4. σ SH O Û 1 εσ SH O The proof is in HKZ [3]. 6 Bounding the Transformed Quantities If Lemma holds then O Û is invertible. Thus, if we define M i = O Û 1 M i O Û. Then clearly, Û O 1 Mi O Û = M i. We admit this is a slight abuse of notation, since M i is previously defined to be U O 1 M i U O, but as long as O Û is invertible it doesn t really matter whether it equals U O or not for the purposes of this proof. The other quantities are defined similarly. We seek to bound the following four quantities: δ i one = O Û 1 i 1 i 1 17 γ i = T i T i 1 O 1 Û1 1 O Û 3 O 3 Û δ root = r r T O j 1 rûj i = O 1 Û1 M i M i O Û Here denotes all observations that are in the subtree of node i since i may be hidden or observed. Sometimes we like to distinguish between when when i is observed and when i is hidden.thus, we sometimes refer to the quantity obs i and hidden i for when i is observed or hidden respectively. Again note that the numbering in O1 Û1 and O Û is just there to avoid confusion in the same equation In reality there are many U s and O s. Lemma 3 Assume ɛ i,j σ SH P i,j /3 for all i j. Then ɛ r δ root 3σSH O j 1 r δone i 4 S H hidden i ɛ i,j σ SH P i,j + ɛ m,j σ SH P i,j + ɛ i 3σSH P i,j ɛ m,j,k 3σSH P i,j γ i 4 S H σ SH O J J 1 + γ i 1 + jk δone i γ i m 1 + jk m obs SH i 4 σ SH O ɛ i,j σ SH P j,i + ɛ m,xi,j 3σSH P i,j The main challenge in this part is v and γv hidden. The rest are similar to HKZ. However, we go through the other bounds to be more explicit about some of the properties used, since sometimes we have used different norms etc. 6.1 δ root We note that r = P j Û and similarly r = P 1 j Û. 1 δ root = r r Oj 1 rûj 1 1 P j P 1 j 1 Ûj1 Oj 1 rûj P j P 1 j O ɛ r 1 j1 rûj 1 1 σ SH Oj 1 rûj 1 7 4
5 The first inequality follows from the relationship between l and l norm and submultiplicativity. The second follows from a matrix perturbation bound given in Lemma 91. We also use the fact that since Û is orthonormal it has spectral norm 1. ɛ Assuming that ɛ i,j σ SH P i,j /3 gives δ root r by Lemma. 6. δ i one 3σSH O j 1 r δ i one = O Û 1 i 1 i 1 S H O Û 1 i 1 8 = S H 1 i 1 i = S H 1 i 1 i 9 Here we have converted l 1 norm to l norm, used submultiplicativity, the fact that Û is orthonormal so has spectral norm 1, and that O is a conditional probability matrix and therefore also has spectral norm 1. We note that 1 i = P i,j Û + Pi and similarly 1 i = P i,j Û + P i, where i and j are a particular pair of observations described in the main paper. 1 i 1 i = P T m,jû+ Pj P T m,jû + P j 30 = P m,jû+ Pj P m,jû+ Pj + P m,jû+ Pj P m,jû+ P j 31 P m,jû+ Pj P m,jû+ Pj + P m,jû+ Pj P m,jû+ P j 3 P m,jû+ P m,jû+ ˆP j 1 + P m,jû+ P m,jû+ P j P j ɛ m,j minσ SH P + ɛ j 34 m,j, σ SH P m,jû σ SH P m,jû where we have used the triangle inequality in the first inequality and the submultiplicative property of matrix norms in the second. The last inequality follows by matrix perturbation bounds. Thus using the assumption that ɛ i,j σ SH P i, j/3, we get that 6.3 Tensor δ one 4 S H ɛ i,j σ SH P i,j + ɛ i 3σSH P i,j Recall that T i = T i 1 O 1 Û1 Û O 1 3 O 3 Û3 = P m,j,k 1 Û 1 P l,j Û + 3 Û 3. Similarly, T i = P m,j,k 1 Û 1 P l,j Û + 3 Û T i T i 1 O 1 Û1 1 O Û 3 O 3 Û3 1 1,1 SH σ SH O T i T i 36 This is because both Û and O have spectral norm one and the S H factor is the cost of converting from 1 norm to spectral norm. T i T i = P m,j,k 1 Û 1 P l,j Û + 3 Û 3 P m,j,k 1 Û 1 P l,j Û + 3 Û 3 37 = P m,j,k 1 Û 1 P i,k Û + 3 Û 3 P m,j,k 1 Û 1 P l,j Û Û P m,j,k 1 Û 1 P l,j Û + 3 Û 3 P m,j,k 1 Û 1 P l,j Û + 3 Û 3 39 = P m,j,k 1 Û 1 P l,j Û + P l,j Û + 3 Û P i,j,k 1 Û 1 3 Û 3 P i,j,k 1 Û 1 3 Û 3 P l,j Û + 41 = P m,j,k ɛ l,j ɛ m,j,k min σ SH P l,j, σ SH P l,j Û + σ SH P l,j Û 4 It is clear that P m,j,k P m,j,k F 1. Using the fact that ɛ i,j σ SH P i,j /3 gives us the following bound: γ v 4 S H σ SH O ɛ i,j σ SH P i,j + ɛ i,j,k 3σSH P i,j 5 43
6 6.4 Bounding i We now seek to bound i = O1 Û1 M i M i Û O 1 1. There are two cases: either i is a leaf or it is not i is leaf node In this case our proof simply follows from HKZ [3] and is repeated here for convenience. O 1 Û1 M MO Û 1 1 S H O 1 1 M i M i O Û 1 44 ote that M i = P j,i Û 1 1 Pm,xi,jÛ and M i = P j,i Û 1 1 P m,xi,jû. S H M i M i σ SH O Û 45 M i M i = P j,i Û 1 1 Pm,xi,jÛ P j,i Û 1 1 P m,xi,jû 46 = P j,i Û 1 1 Pm,xi,jÛ P j,i Û 1 1 Pm,xi,jÛ + P j,i Û 1 1 Pm,xi,jÛ Û 1 P x,i,j Û P i,j 1 47 P j,i Û 1 1 P j,i Û 1 1 P m,xi,jû + P j,i Û 1 1 P m,xi,jû P m,xi,jû 48 P m,xi,j P[ = x] ɛ j,i ɛ m,,j min σ SH P j,i, σ SH P j,i Û + σ SH P j,i Û ɛ j,i ɛ m,,j min σ SH P j,i, σ SH P j,i Û + σ SH P j,i Û where the first inequality follows from the triangle inequality, and the second uses matrix perturbation bounds and the fact that spectral norm of Û is 1. The final inequality follows from the fact that spectral norm is less than frobenius norm which is less than l1 norm: P m,xi,j m,j[ P m,xi,j] m,j [P m,xi,j] m,j P[ = x] 51 m,j The first inequality follows from relation between 1 operator norm and operator norm. Because O is a conditional probability matrix O 1 = 1 i.e. the max column sum is 1. Using the fact that ɛ i,j σ SH P i,j /3 gives us the following bound: SH i,x 4 σ SH O ɛ m,xi,j P[ = x] σ SH P j,i + ɛ m,xi,j 3σSH P j,i Summing over v would give SH i 4 σ SH O ɛ j,i σ SH P j,i + ɛ m,xi,j 3σSH P j,i 53 6
7 6.4. i is not a leaf node Let m J:1 = M J... M 1 1 i and m J:1 = M J... M 1 1 i O Û ˆM i M i O 3 Û = O Û T i 1 Mu... M 1 1 i T i 1 Mu... M 1 1 i O 3 Û = O Û T i 1 m J:1 T i 1 m J:1 O 3 Û = O Û T i T i 1 m J:1 + T i T i 1 m J:1 m J:1 + T i 1 m J:1 m J:1 O3 Û T i T i 1 O 1 Û1 1 Û O 3 O 3 Û3 1 1,1 O 1 Û 1 m J: O1 Û1 m J:1 m J:1 1 T i T i 1 O1 Û1 1 Û O 3 O3 Û3 1 1, T i 1 O Û 1 1 Û O 3 O3 Û3 1 O 1,1 1 Û 1 m J:1 m J: First term is bounded by: T i T i 1 O 1 Û1 1 Û O 3 O 3 Û3 1 1,1 S H S H γ i 61 Second term is bounded by: O 1 Û1 m J:1 m J:1 1 T i T i 1 O 1 Û1 1 Û O 3 O 3 Û3 1 1,1 6 γ i O 1 Û1 m J:1 m J: Third Term is bounded by: T i 1 O1 Û1 1 Û O 3 O3 Û3 1 1,1 O1 Û1 m J:1 m J:1 O 1 Û 1 m J:1 m J: In the next section, we will see that J O Û m J:1 m J: j k δone i J + S H 1 + j k S H. So the overall bound is J J i 1 + γ i 1 + jk δone i γ i S H 1 + jk S H. 65 where j 1,..., j J are children of node i. 6.5 Bounding O Û m J:1 m J:1 1 Lemma 4 O Û m J:1 m J:1 1 where j 1,..., j J are children of node i. The proof is by induction. Base case: O Û 1 i 1 i 1 δ i one, by definition of δ i one. Inductive step: Let us say claim holds up until u 1. We show it holds for u. Thus J J 1 + jk δone i + S H 1 + jk S H 66 u 1 u 1 O Û m u 1:1 m u 1: jk δone i + S H 1 + jk S H 67 7
8 We now decompose the sum over x as O Û m u:1 m u:1 1 = O Û M u M u m u 1:1 + M u M u m u 1:1 m u 1:1 + m u 1:1 m u 1: Using the triangle inequality, we get O Û M u M u O 1 Û1 1 1 O 1 Û1 m u 1: O Û M u M u O 1 Û1 1 1 O 1 Û1 m u 1:1 m u 1: O Û M u O Û 1 1 O Û m u 1:1 m u 1: Again we are just numbering the U s and O s for clarity to see which corresponds with which. They are omitted in the actual theorem statements since we will take minimums etc. at the end. We now must bound these terms. First term: O Û M u M u O Û 1 1 O1 Û1 m u 1:1 1 u m u 1:1 O Û 1 S H u 7 x u x 1:u 1 x u 1:1 since u = O Û M u M u O 1 Û Second term can be bounded by inductive hypothesis: O Û M u M u O 1 Û1 1 1 O 1 Û1 m u 1:1 m u 1:1 1 u u 1 u jk δone i + S H 1 + jk S H 73 The third term is bounded by observing that O Û M u O Û 1 = diagpr[x u Parent]. Thus it is diagonal, and Pr[x Parent] has max row or column sum as 1. This means that the third term is bounded by the inductive hypothesis as well: u 1 O Û M u O Û 1 u 1 1 O Û m u 1:1 m u 1: jk δone i + S H 1 + jk S H 74 7 Bounding the propagation of error in tree We now wrap up the proof based on the approach of HKZ[3]. Lemma 5 J J P[x 1,..., x O ] P[x 1,..., x O ] S H δ root δ root 1 + jk δone r + S H 1 + jk S H P[x 1,..., x O ] P[x 1,..., x O ] = r Mj1... M jj 1 r r Mj1... M jj 1 r 76 r r O Û 1 O Û M J: r r O Û 1 O Û M J:1 1 r M J:1 1 r 78 r O Û 1 O Û M J:1 1 i M J: The first sum is bounded using Holder inequality and noting that the first term is a conditional probability of all observed variables conditioned on the root r r O Û 1 O Û M J: r r O Û 1 O Û M J:1 1 1 S H δ root 81 8
9 The second sum is bounded by another application of Holder s inequality and the previous lemma: r r O Û 1 O Û M J:1 1 r M J:1 1 r 8 r r O Û 1 O Û M J:1 1 r M J:1 1 r 1 83 J J δ root 1 + jk δone r + S H 1 + jk S H 84 The third sum is also bounded by Holder s Inequality and previous lemmas and noting that r U O 1 = P[R = r]: r O Û 1 O Û M J:1 1 i M J: r O Û 1 O Û M J:1 1 r M J:1 1 r 1 86 J J 1 + jk δone r + S H 1 + jk S H 87 Combining these bounds gives us the desired solution. 8 Putting it all together We seek for P[x 1,..., x O ] P[x 1,..., x O ] ɛ 88 Using the fact that for a <.5, 1 + a/t 1 + a, we get that jk Oɛ/S H J. However, j is defined recursively, and thus the error accumulates exponential in the longest path of hidden nodes. For example, obs ɛ i O where l is the d maxs H l longest path of hidden nodes. Tracing this back through will gives the result: Pick any ɛ > 0, δ < 1. Let 1 d max S H l+1 S O O ɛ min i σ SH O i log O 89 min i j σ SH P i,j 4 δ Then with probability 1 δ P[x 1,..., x O ] P[x 1,..., x O ] ɛ 90 In many cases, if the frequency of the observation symbols follow certain distributions, than the dependence on S O can be removed as showed in HKZ [3]. 9 Appendix 9.1 Matrix Perturbation Bounds This is Theorem 3.8 from pg. 143 in Stewart and Sun, 1990 [6]. Let A R m n, with m n and let à = A + E. Then References Ã+ A max A +, à E 91 [1] Myung J. Choi, Vicent Y. Tan, Animashree Anandkumar, and Alan S. Willsky. Learning latent tree graphical models. In arxiv:1009.7v1, 010. [] R.A. Horn and C.R. Johnson. Matrix analysis. Cambridge Univ Pr, [3] D. Hsu, S. Kakade, and T. Zhang. A spectral algorithm for learning hidden markov models. In Proc. Annual Conf. Computational Learning Theory, 009. [4].H. guyen, P. Drineas, and T.D. Tran. Tensor sparsification via a bound on the spectral norm of random tensors. Arxiv preprint arxiv: , 010. [5] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, [6] G.W. Stewart and J. Sun. Matrix perturbation theory, volume 175. Academic press ew York,
A Spectral Algorithm For Latent Junction Trees - Supplementary Material
A Spectral Algoritm For Latent Junction Trees - Supplementary Material Ankur P. Parik, Le Song, Mariya Isteva, Gabi Teodoru, Eric P. Xing Discussion of Conditions for Observable Representation Te observable
More informationSupplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics
Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics
More information642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004
642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationSpectral Learning of General Latent-Variable Probabilistic Graphical Models: A Supervised Learning Approach
Spectral Learning of General Latent-Variable Probabilistic Graphical Models: A Supervised Learning Approach Borui Wang Department of Computer Science, Stanford University, Stanford, CA 94305 WBR@STANFORD.EDU
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationThe local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance
The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More informationHierarchical Tensor Decomposition of Latent Tree Graphical Models
Le Song, Haesun Park {lsong,hpark}@cc.gatech.edu College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Mariya Ishteva mariya.ishteva@vub.ac.be ELEC, Vrije Universiteit Brussel,
More informationReduced-Rank Hidden Markov Models
Reduced-Rank Hidden Markov Models Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University ... x 1 x 2 x 3 x τ y 1 y 2 y 3 y τ Sequence of observations: Y =[y 1 y 2 y 3... y τ ] Assume
More informationj=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).
Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationNonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning
Journal of Machine Learning Research 12 (2017) 663-707 Submitted 1/10; Revised 10/10; Published 3/11 Nonparametric Latent Tree Graphical Models: Inference, Estimation, and Structure Learning Le Song lsong@cc.gatech.edu
More informationY1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4
Inference: Exploiting Local Structure aphne Koller Stanford University CS228 Handout #4 We have seen that N inference exploits the network structure, in particular the conditional independence and the
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory
UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationEfficient Sensitivity Analysis in Hidden Markov Models
Efficient Sensitivity Analysis in Hidden Markov Models Silja Renooij Department of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands silja@cs.uu.nl
More informationLearning Tractable Graphical Models: Latent Trees and Tree Mixtures
Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling
More informationA Spectral Algorithm for Latent Junction Trees
A Algorithm for Latent Junction Trees Ankur P. Parikh Carnegie Mellon University apparikh@cs.cmu.edu Le Song Georgia Tech lsong@cc.gatech.edu Mariya Ishteva Georgia Tech mariya.ishteva@cc.gatech.edu Gabi
More informationSection 3.9. Matrix Norm
3.9. Matrix Norm 1 Section 3.9. Matrix Norm Note. We define several matrix norms, some similar to vector norms and some reflecting how multiplication by a matrix affects the norm of a vector. We use matrix
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationBayesian networks. Chapter 14, Sections 1 4
Bayesian networks Chapter 14, Sections 1 4 Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1 4 1 Bayesian networks
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationTensor rank-one decomposition of probability tables
Tensor rank-one decomposition of probability tables Petr Savicky Inst. of Comp. Science Academy of Sciences of the Czech Rep. Pod vodárenskou věží 2 82 7 Prague, Czech Rep. http://www.cs.cas.cz/~savicky/
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationNOTES ON LINEAR ODES
NOTES ON LINEAR ODES JONATHAN LUK We can now use all the discussions we had on linear algebra to study linear ODEs Most of this material appears in the textbook in 21, 22, 23, 26 As always, this is a preliminary
More informationFourier analysis of boolean functions in quantum computation
Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationImage Data Compression
Image Data Compression Image data compression is important for - image archiving e.g. satellite data - image transmission e.g. web data - multimedia applications e.g. desk-top editing Image data compression
More informationEstimating Covariance Using Factorial Hidden Markov Models
Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationRepresenting Independence Models with Elementary Triplets
Representing Independence Models with Elementary Triplets Jose M. Peña ADIT, IDA, Linköping University, Sweden jose.m.pena@liu.se Abstract An elementary triplet in an independence model represents a conditional
More informationDeriving the EM-Based Update Rules in VARSAT. University of Toronto Technical Report CSRG-580
Deriving the EM-Based Update Rules in VARSAT University of Toronto Technical Report CSRG-580 Eric I. Hsu Department of Computer Science University of Toronto eihsu@cs.toronto.edu Abstract Here we show
More informationA Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources
A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College
More informationA = , A 32 = n ( 1) i +j a i j det(a i j). (1) j=1
Lecture Notes: Determinant of a Square Matrix Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk 1 Determinant Definition Let A [a ij ] be an
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationMaximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals
Maximum Likelihood Estimates for Binary Random Variables on Trees via Phylogenetic Ideals Robin Evans Abstract In their 2007 paper, E.S. Allman and J.A. Rhodes characterise the phylogenetic ideal of general
More informationComputational Graphs, and Backpropagation
Chapter 1 Computational Graphs, and Backpropagation (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction We now describe the backpropagation algorithm for calculation of derivatives
More informationSpectral Probabilistic Modeling and Applications to Natural Language Processing
School of Computer Science Spectral Probabilistic Modeling and Applications to Natural Language Processing Doctoral Thesis Proposal Ankur Parikh Thesis Committee: Eric Xing, Geoff Gordon, Noah Smith, Le
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationMessage Passing and Junction Tree Algorithms. Kayhan Batmanghelich
Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11
More information2 : Directed GMs: Bayesian Networks
10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types
More informationA concise proof of Kruskal s theorem on tensor decomposition
A concise proof of Kruskal s theorem on tensor decomposition John A. Rhodes 1 Department of Mathematics and Statistics University of Alaska Fairbanks PO Box 756660 Fairbanks, AK 99775 Abstract A theorem
More informationVariable Elimination: Algorithm
Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product
More informationV C V L T I 0 C V B 1 V T 0 I. l nk
Multifrontal Method Kailai Xu September 16, 2017 Main observation. Consider the LDL T decomposition of a SPD matrix [ ] [ ] [ ] [ ] B V T L 0 I 0 L T L A = = 1 V T V C V L T I 0 C V B 1 V T, 0 I where
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationSequence modelling. Marco Saerens (UCL) Slides references
Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationVariable Elimination: Algorithm
Variable Elimination: Algorithm Sargur srihari@cedar.buffalo.edu 1 Topics 1. Types of Inference Algorithms 2. Variable Elimination: the Basic ideas 3. Variable Elimination Sum-Product VE Algorithm Sum-Product
More informationBayesian networks. Chapter Chapter
Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1 Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2 Bayesian networks A simple, graphical notation for conditional independence assertions
More informationLarge-Deviations and Applications for Learning Tree-Structured Graphical Models
Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis
More informationCS281A/Stat241A Lecture 19
CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More information12. Perturbed Matrices
MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationA PRIMER ON SESQUILINEAR FORMS
A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form
More informationOn the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar
Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationMETHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS
METHOD OF MOMENTS LEARNING FOR LEFT TO RIGHT HIDDEN MARKOV MODELS Y. Cem Subakan, Johannes Traa, Paris Smaragdis,,, Daniel Hsu UIUC Computer Science Department, Columbia University Computer Science Department
More informationRamanujan Graphs of Every Size
Spectral Graph Theory Lecture 24 Ramanujan Graphs of Every Size Daniel A. Spielman December 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The
More informationTree-Structured Statistical Modeling via Convex Optimization
Tree-Structured Statistical Modeling via Convex Optimization James Saunderson, Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky Abstract We develop a semidefinite-programming-based approach
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationVector, Matrix, and Tensor Derivatives
Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationOnline Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings
Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings Thomas J. Best, Burhaneddin Sandıkçı, Donald D. Eisenstein University of Chicago Booth School of
More informationHidden Markov Models (recap BNs)
Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view
More informationarxiv: v1 [math.ra] 13 Jan 2009
A CONCISE PROOF OF KRUSKAL S THEOREM ON TENSOR DECOMPOSITION arxiv:0901.1796v1 [math.ra] 13 Jan 2009 JOHN A. RHODES Abstract. A theorem of J. Kruskal from 1977, motivated by a latent-class statistical
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLinear Algebra March 16, 2019
Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented
More informationComparing Bayesian Networks and Structure Learning Algorithms
Comparing Bayesian Networks and Structure Learning Algorithms (and other graphical models) marco.scutari@stat.unipd.it Department of Statistical Sciences October 20, 2009 Introduction Introduction Graphical
More informationSpeeding up Inference in Markovian Models
Speeding up Inference in Markovian Models Theodore Charitos, Peter de Waal and Linda C. van der Gaag Institute of Information and Computing Sciences, Utrecht University P.O Box 80.089, 3508 TB Utrecht,
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationBasic Concepts in Linear Algebra
Basic Concepts in Linear Algebra Grady B Wright Department of Mathematics Boise State University February 2, 2015 Grady B Wright Linear Algebra Basics February 2, 2015 1 / 39 Numerical Linear Algebra Linear
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More information