Cours 7 12th November 2014
|
|
- Hester Evans
- 6 years ago
- Views:
Transcription
1 Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm Motivations Inference, along with estimation and decoding, are the three key operations one must be able to perform efficiently in graphical models. Given a discrete Gibbs model of the form: p(x) = 1 Z C C ψ C(x C ), where C is the set of cliques of the graph, inference enables: Computation of the marginal p(x i ) or more generally, p(x C ). Computation of the partition function Z Computation of the conditional marginal p(x i X j = x j, X k = x k ) And as a consequence Computation of the gradient in a exponential family Computation of the expected value of the loglikelihood of an exponential family at step E of the EM algorithm (for example for HMM) Example 1: Ising model Let X = (X i ) i V be a vector of random variables, taking value in {0, 1} V, of which the exponential form of the distribution is: p(x) = e A(η) e η ix i i V e η i,jx i x j (7.1) (i,j) E We then have the log-likelihood: l(η) = i V η i x i + (i,j) E η i,j x i x j A(η) (7.2) We can therefore write the sufficient statistic: ( ) (xi ) φ(x) = i V (x i x j ) (i,j) E (7.3) 7-1
2 But we have seen that for exponential families: l(η) = φ(x) T η A(η) (7.4) η l(η) = φ(x) η A(η) }{{} E η[φ(x)] We therefore need to compute E η [φ(x)]. In the case of the Ising model, we get: (7.5) E η [X i ] = P η [X i = 1] (7.6) E η [X i X j ] = P η [X i = 1, X j = 1] (7.7) The equation 7.6 is one of the motivations for solving the problem of inference. In order to be able to compute the gradient of the log-likelihood, we need to know the marginal laws. Example 2: Potts model X i are random variables, taking value in {1,..., K i }. We note ik the random variable such that ik = 1 if and only if X i = k. Then, p(δ) = exp K i η i,k δ ik + K i K j η i,j,k,k δ ik δ jk A(η) (7.8) i V and k=1 (i,j) E k=1 k =1 ( ) (δik ) φ(δ) = i,k (δ ik δ jk ) i,j,k,k (7.9) E η [ ik ] = P η [X i = k] (7.10) E η [ ik jk ] = P η [X i = k, X j = k ] (7.11) These examples illustrate the need to perform inference. Problem: In general, the inference problem is NP-hard. For trees the inference problem is efficient as it is linear in n. For tree-like graphs we use the Junction Tree Algorithm which enables us to bring the situation back to that of a tree. In the general case we are forced to carry out approximative inference. 7-2
3 7.1.2 Inference on a chain We define X i a random variable, taking value in {1,..., K}, i V = {1,..., n} with joint distribution p(x) defined as:. p(x) = 1 Z n ψ i (x i ) i=1 n ψ i 1,i (x i 1, x i ) (7.12) We wish to compute p(x j ) for a certain j. The naive solution would be to compute the marginal p(x j ) = i=2 x V \{j} p(x 1,..., x n ) (7.13) Unfortunately, this type of calculation is of complexity O(K n ). We therefore develop the expression p(x j ) = 1 Z = 1 Z = 1 Z x V \{j} i=1 x V \{j} i=1 x V \{j,n} n ψ i (x i ) n ψ i 1,i (x i 1, x i ) (7.14) i=2 n 1 n 1 ψ i (x i ) ψ i 1,i (x i 1, x i )ψ n (x n )ψ n 1,n (x n 1, x n ) (7.15) x n i=1 i=2 n 1 n 1 ψ i (x i ) ψ i 1,i (x i 1, x i )ψ n (x n )ψ n 1,n (x n 1, x n ) (7.16) i=2 Which allows us to bring out the messaged passed by (n) to (n 1): µ n n 1 (x n 1 ). When continuing, we obtain: p(x j ) = 1 Z = 1 Z = 1 Z x V \{j,n} i=1 n 1 n 1 ψ i (x i ) ψ i 1,i (x i 1, x i ) ψ n (x n )ψ n 1,n (x n 1, x n ) x n x V \{j,n,n 1} i=1 i=2 n 2 n 2 ψ i (x i ) ψ i 1,i (x i 1, x i ) i=2 } {{ } µ n n 1 (x n 1 ) x n 1 ψ n 1 (x n 1 )ψ n 2,n 1 (x n 2, x n 1 )µ n n 1 (x n 1 ) (7.17) (7.18) } {{ } µ n 1 n 2 (x n 2 ) µ 1 2 (x 2 )... µ n 1 n 2 (x n 2 ) (7.19) x V \{1,j,n,n 1} 7-3
4 In the above equation, we have implicitely used the following definitions for descending and ascending messages: µ j j 1 (x j 1 ) = x j ψ j (x j )ψ j 1,j (x j 1, x j )µ j+1 j (x j ) (7.20) µ j j+1 (x j+1 ) = x j ψ j (x j )ψ j,j+1 (x j, x j+1 )µ j 1 j (x j ) (7.21) Each of these messages is computed with complexity O((n 1)K 2 ). And finally, we get p(x j ) = 1 Z µ j 1 j(x j ) ψ j (x j ) µ j+1 j (x j ) (7.22) With only 2(n 1) messages, we have calculated p(x j ) j V. Z is otained by summing Z = x i µ i 1 i (x i ) ψ i (x i ) µ i+1 i (x i ) (7.23) Inference in undirected trees We note i the vertice of which we want to compute the marginal law p(x i ). We set i to be the root of our tree. j V, we note C(j) the set of childen of j and D(j) the set of descendants of j. The joint probability is: p(x) = 1 ψ i (x i ) Z i V (i,j) E For a tree with at least two vertices, we define by reccurence, F (x i, x j, x D(j) ) ψ i,j (x i, x j )ψ j (x j ) Then by reformulating the marginal: ψ i,j (x i, x j ) (7.24) k C(j) F (x j, x k, x D(k) ) (7.25) 7-4
5 p(x i ) = 1 ψ i (x i ) F (x i, x j, x D(j) ) Z x V \{i} j C(i) (7.26) = 1 Z ψ i(x i ) F (x i, x j, x D(j) ) j C(i) x j,x D(j) (7.27) = 1 Z ψ i(x i ) ψ i,j (x i, x j )ψ j (x j ) F (x j, x k, x D(k) ) j C(i) x j,x D(j) k C(j) (7.28) = 1 Z ψ i(x i ) ψ i,j (x i, x j )ψ j (x j ) F (x j, x k, x D(k) ) j C(i) x j k C(j) x k,x D(k) }{{} (7.29) µ k j (x j ) = 1 Z ψ i(x i ) ψ i,j (x i, x j )ψ j (x j ) µ k j (x j ) j C(i) x j k C(j) }{{} (7.30) µ j i (x i ) Which leads us to the recurrence relation for the Sum Product Algorithm (SPA): µ j i (x i ) = x j ψ i,j (x i, x j )ψ j (x j ) Sum Product Algorithm (SPA) Sequential SPA for a rooted tree k C(j) µ k j (x j ) (7.31) For a rooted tree, of root (i), the Sum Product Algorithm is written as follows: 1. All the leaves send µ n πn (x πn ) µ n πn (x πn ) = x n ψ n (x n )ψ n,πn (x n, x πn ) (7.32) 2. Iteratively, at each step, all the nodes (k) which have received messages from all their children send µ k πk (x πk ) to their parents. 3. At the root we have p(x i ) = 1 Z ψ i(x i ) j C(i) µ j i (x i ) (7.33) 7-5
6 This algorithm only enables us to compute p(x i ) at the root. To be able to compute all the marginals (as well as the conditional marginals), one must not only collect all the messages from the leafs to the root, but then also distribute them back to the leafs. In fact, the algorithm can then be written independently from the choice of a root. SPA for an undirected tree The case of undirected trees is slightly different: 1. All the leaves send µ n πn (x πn ) 2. At each step, if a node (j) hasn t send a message to one of his neighbours, say (i) (Note: (i) here is not the root) and if it has received messages from all his other neighbours N (j)\i, it send to (i) the following message Parallel SPA (flooding) 1. Initialise the messages randomly µ j i (x i ) = x j ψ j (x j )ψ j,i (x i, x i ) k N (j)\{i} µ k j (x j ) (7.34) 2. At each step, each node sends a new message to each of its neighbours, using the messages received at the previous step. Marginal laws Once all messages have been passed, we can easily calculate all the marginal laws i V, p(x i ) = 1 Z ψ i(x i ) k N (i) (i, j) E, p(x i, x j ) = 1 Z ψ i(x i ) ψ j (x j ) ψ j,i (x i, x i ) Conditional probabilities µ k i (x i ) (7.35) k N (i)\j µ k i (x i ) k N (j)\i µ k j (x j ) (7.36) We can use a clever notation to calculate the conditional probabilities. Suppose that we want to compute We can set p(x i x 5 = 3, x 10 = 2) p(x i, x 5 = 3, x 10 = 2) ψ 5 (x 5 ) = ψ 5 (x 5 ) δ(x 5, 3) 7-6
7 Generally speaking, if we observe X j = x j0 for j J obs, we can define the modified potentials: ψ j (x j ) = ψ j (x j ) δ(x j, x j0 ) such that Indeed we have p(x X Jobs = x Jobs 0) = 1 Z ψ i (x i ) i V (i,j) E ψ i,j (x i, x j ) (7.37) p(x X Jobs = x Jobs 0) p(x Jobs = x Jobs 0) = p(x) δ(x j, x j0 ) (7.38) j J obs so that by dividing the equality by p(x Jobs = x Jobs 0) we obtain the previous equation with Z = Zp(X Jobs = x Jobs 0). We then simply apply the SPA to these new potentials to compute the marginal laws p(x i X Jobs = x Jobs 0) Remarks The SPA is also called belief propagation or message passing. On trees, it is an exact inference algorithm. If G is not a tree, the algorithm doesn t converge in general to the right marginal laws, but sometimes gives reasonable approximations. We then refer to Loopy belief propagation, which is still often used in real life. The only property that we have used to construct the algorithm is the fact that (R, +, ) is a semi-ring. It is interesting to notice that the same can therefore also be done with (R +, max, ) and (R, max, +). Example For (R +, max, ) we define the Max-Product algorithm, also called Viterbi algorithm which enables us to solve the decoding problem, namely to compute the most probable configuartion of the variables, given fixed parameters, thanks to the messages [ ] µ j i (x i ) = max x j ψ i,j (x i, x j )ψ j (x j ) x k µ k j (x j ) (7.39) If we run the Max-Product algorithm with respect to a chosen root, the collection phase of the messages to the root enables us to compute the maximal probability over all configurations, and if at each calculation of a message we have also kept the argmax, we can perform a distribution phase, which instead of propagating the messages, will consist of recursively calculating one of the configurations which will reach the maximum. 7-7
8 In practice, we may be working on such small values that the computer will return errors. For instance, for k binary variables, the joint law p(x 1, x 2...x n ) = 1 can take 2n infinitesimal values for a large k. The solution is to work with logarithms: if p = i p i, by setting a i = log(p i ) we have: log(p) = log [ ] i e a i [ ] log(p) = a i + log e (a i a i ) i (7.40) With a i = max i a i. Using logarithms ensures a numerical stability Proof of the algorithm We are going to prove that the SPA is correct by recurrence. In the case of two nodes, we have: p(x 1, x 2 ) = 1 Z ψ 1(x i )ψ 2 (x 2 )ψ 1,2 (x 1, x 2 ) We marginalize, and we obtain p(x 1 ) = 1 Z ψ 1(x 1 ) x 2 ψ 1,2 (x 1, x 2 )ψ 2 (x 2 ) } {{ } µ 2 1 (x 1 ) We can hence deduct And p(x 1 ) = 1 Z ψ 1(x 1 )µ 2 1 (x 1 ) p(x 2 ) = 1 Z ψ 2(x 2 )µ 1 2 (x 2 ) We assume that the result is true for trees of size n 1, and we consider a tree of size n. Without loss of generality, we can assume that the nodes are numbered, so that the n th be a leaf, and we will call π n its parent (which is unique, the graph being a tree). The first message to be passed is: And the last message to be passed is: µ πn n(x n ) = x πn µ n πn (x πn ) = x n ψ n (x n )ψ n,πn (x n, x πn ) (7.41) ψ πn (x πn )ψ n,πn (x n, x πn ) 7-8 k N (π n)\{n} µ k πn (x πn ) (7.42)
9 We are going to construct a tree T of size n 1, as well as a family of potentials, such that the 2(n 2) messages passed in T (i.e. all the messages except for the first and the last) be equal to the 2(n 2) messages passed in T. We define the tree and the potentials as follows: T = (Ṽ, Ẽ) with Ṽ = {1,..., n 1} and Ẽ = E\{n, π n} (i.e., it is the subtree corresponding to the n 1 first vertices). The potentials are all the same as those of T, except for the potential ψ πn (x πn ) = ψ πn (x πn )µ n πn (x πn ) (7.43) The root is unchanged, and the topological order is also kept. We then obtain two important properties: 1) The product of the potentials of the tree of size n 1 is equal to: p(x 1,..., x n 1 ) = 1 ψ i (x i ) ψ i,j (x i, x j ) Z ψ πn (x πn ) i n,π n (i,j) E\{n,π n} = 1 ψ i (x i ) ψ i,j (x i, x j ) ψ n (x n )ψ πn (x πn )ψ n,πn (x n, x πn ) Z i n,π n (i,j) E\{n,π n} x n = 1 n ψ i (x i ) ψ i,j (x i, x j ) Z x n i=1 (i,j) E = x n p(x 1,..., x n 1, x n ) which shows that these new potentials define on (X 1,..., X n 1 ) exactly the distribution induced by p when marginalizing X n. 2) All of the messages passed in T correspond to the messages passed in T (except for the first and the last). Now, with the recurrence hypothesis that the SPA is true for trees of size n 1, we are going to show that it is true for trees of size n. For nodes i n, π n, the result is obvious, as all messages passed are the same: For the case i = π n, we deduct: i V \{n, π n }, p(x i ) = 1 Z ψ i(x i ) 7-9 k N (i) µ k i (x i ) (7.44)
10 p(x πn ) = 1 Z ψ πn (x πn ) µ k πn (x πn ) ( product over the neighbours of π n in T ) k Ñ (πn) = 1 Z ψ πn (x πn ) µ k πn (x πn ) k N (π n)\{n} = 1 Z ψ π n (x πn )µ n πn (x πn ) = 1 Z ψ π n (x πn ) k N (π n) For the case i = n, we have: p(x n, x πn ) = Therefore: k N (π n)\{n} µ k πn (x πn ) µ k πn (x πn ) x V \{n,πn} p(x) = ψ n (x n )ψ πn (x πn )ψ n,πn (x n, x πn ) p(x) ψ x n (x n )ψ πn (x πn )ψ n,πn (x n, x πn ) V \{n,πn} }{{} α(x πn ) Consequently: p(x n, x πn ) = ψ πn (x πn )α(x πn )ψ n (x n )ψ n,πn (x n, x πn ) (7.45) p(x πn ) = ψ πn (x πn )α(x πn ) x n ψ n (x n )ψ n,πn (x n, x πn ) } {{ } µ n πn (x πn ) Hence: α(x πn ) = p(x πn ) ψ πn (x πn )µ n πn (x πn ) By using (7.31), (7.32) and the previous result, we deduct that: (7.46) p(x πn ) p(x n, x πn ) = ψ πn (x πn )ψ n (x n )ψ n,πn (x n, x πn ) ψ πn (x πn )µ n πn (x πn ) 1 Z = ψ πn (x πn )ψ n (x n )ψ n,πn (x n, x πn ) ψ π n (x πn ) k N (π µ n) k π n (x πn ) ψ πn (x πn )µ n πn (x πn ) = 1 Z ψ π n (x πn )ψ n (x n )ψ n,πn (x n, x πn ) µ k πn (x πn ) 7-10 k N (π n)\{n}
11 By summing with respect to x πn, we get the result for p(x n ): p(x n ) = x πn p(x n, x πn ) = 1 Z ψ n(x n )µ πn n(x n ) Proposition: Let p L(G), for G = (V, E) a tree, then we have: p(x 1,..., x n ) = 1 ψ(x i ) Z i V (i,j) E p(x i, x j ) p(x i )p(x j ) (7.47) Proof: we prove it by reccurence. The case n = 1 is trivial. Then, assuming that n is a leaf, and we can write p(x 1,..., x n ) = p(x 1,..., x n 1 ) p(x n x πn ). But multiplying by p(x n x πn ) = p(xn,xπn) p(x p(x n)p(x πn ) n) boils down to adding the edge potential for (n, π n ) and the node potential for the leaf n. The formula is hence verified by reccurence Junction tree Junction tree is an algorithm designed to tackle the problem of inference on general graphs. The idea is to look at a general graph from far away, where it can be seen as a tree. By merging nodes, one will hopefully be able to build a tree. When this is not the case, one can also think of adding some edges to the graph (i.e., cast the present distribution into a larger set) to be able to build such a graph. The trap is that if one collapses too many nodes, the number of possibles values will explode, and as such the complexity of the whole algorithm. The tree width is the smallest possible clique size. For instance, for a 2D regular grid with n points, the tree width is equal to n. 7.2 Hidden Markov Model (HMM) The Hidden Markov Model ( Modèle de Markov caché in French) is one of the most used Graphical Models. We note z 0, z 1,..., z T the states corresponding to the latent variables, and y 0, y 1,..., y T the states corresponding to the observed variables. We further assume that: 1. {z 0,..., z T } is a Markov chain (hence the name of the model); 2. z 0,..., z T take K values; 3. z 0 follows a multinomial distribution: p(z 0 = i) = (π 0 ) i with i (π 0 ) i = 1; 7-11
12 4. The transition probabilities are homogeneous: p(z t = i z t 1 = j) = A ij, where A satisfies A ij = 1; i 5. The emission probabilities p(y t z t ) are homogeneous, i.e. p(y t z t ) = f(y t, z t ). 6. The joint probability distribution function can be written as: T 1 p(z 0,...z T, y 0,..., y T ) = p(z 0 ) p(z t+1 z t ) There are different tasks that we want to perform on this model: Filtering: p(z t y 1,..., y t 1 ) Smoothing: p(z t y 1,..., y T ) Decoding: max p(z 0,..., z T y 0,..., y T ) z 0,...,z T t=0 T p(y t z t ). All these tasks can be performed with a sum-product or max-product algorithm. Sum-product From now on, we note the observations y = {y 1,..., y T }. The distribution on y t simply becomes the delta function δ(y t = y t ). To use the sum-product algorithm, we define z T as the root, i.e. we send all forward messages to z T and go back afterwards. Forward: t=0 µ y0 z 0 (z 0 ) = y 0 δ(y 0 = y 0 )p(y 0 z 0 ) = p(y 0 z 0 ) µ z0 z 1 (z 1 ) = z 0 p(z 1 z 0 )µ y0 z 0 (z 0 ) µ yt 1 z t 1 (z t 1 ) = y t 1. δ(y t 1 = y t 1 )p(y t 1 z t 1 ) = p(y t 1 z t 1 ) µ zt 1 z t (z t ) = z t 1 p(z t z t 1 )µ zt 2 z t 1 (z t 1 ) p(z t 1 y t 1 ) Let us define the alpha-message α t (z t ) as: α t (z t ) = µ yt zt (z t )µ zt 1 z t (z t ) α 0 (z 0 ) is initialized with the virtual message µ z 1 z 0 (z 0 ) = p(z 0 ) 7-12
13 Property 7.1 α t (z t ) = µ yt z t (z t )µ zt 1 z t (z t ) = p(z t, y 0,..., z t ) This is due to the definition of the messages: the product µ yt z t (z t )µ zt 1 z t (z t ) represents a marginal of the distribution corresponding to the sub-hmm {z 0,..., z t }. Moreover, the following recursion formula for α t (z t ) (called alpha-recursion ) holds: α t+1 (z t+1 ) = p(y t+1 z t+1 ) z t p(z t+1 z t )α(z t ) Backward: µ zt+1 z t (z t ) = z t+1 p(z t+1 z t )µ zt+2 z t+1 (z t+1 )p(y t+1 z t+1 ) We define the beta-message β t (z t ) as: β t (z t ) = µ zt+1 z t (z t ) As an initialization, we take β T (z T ) = 1. The following recursion formula for β t (z t ) (called beta-recursion ) holds: β t (z t ) = z t+1 p(z t+1 z t )p(y t+1 z t+1 )β t+1 (z t+1 ) Property p(z t, y 0,..., y T ) = α t (z t )β t (z t ) 2. p(y 0,..., y T ) = z t α t (z t )β t (z t ) 3. p(z t y 0,..., y T ) = p(zt,y 0,...,y T ) p(y 0,...,y T = αt(zt)βt(zt) ) α t(z t)β t(z t) z t 4. For all t < T, p(z t, z t+1 y 0,..., y T ) = 1 p(y 0,...,y T ) α t(z t ) β t+1 (z t+1 ) p(z t+1 z t ) p(y t+1 z t+1 ) Remark The alpha-recursion and beta-recursion are easy to implement, but one needs to avoid errors in the indices! 2. The sums need to be coded using logs in order to prevent numerical errors. 7-13
14 EM algorithm With the previous notations and assumptions, we write the complete log-likelihood l c (θ) with θ the parameters of the model (containing (π 0, A), but also parameters for f): ( T 1 l c (θ) = log p(z 0 ) p(z t+1 z t ) t=0 = log(p(z 0 )) + = T t=0 T log p(y t z t ) + t=0 K δ(z 0 = i) log((π 0 ) i ) + i=1 ) p(y t z t ) T 1 T log p(z t+1 z t ) t=0 t=0 i,j=1 K δ(z t+1 = i, z t = j) log(a i,j ) + T K δ(z t = i) log f(y t, z t ) t=0 i=1 When applying E-M to estimate the parameters of this HMM, we use Jensen s inequality to obtain a lower bound on the log-likelihood: log p(y 0,..., y T ) E q [log p(z 0,..., z T, y 0,..., y T )] = E q [l c (θ)] At the k-th expectation step, we use q(z 0,..., z T ) = P(z 0,..., z T y 0,..., y T ; θ k 1 ), and this boils down to applying the following rules: E[δ(z 0 = i) y] = p(z 0 = i y; θ k 1 ) E[δ(z t = i) y] = p(z t = i y; θ k 1 ) E[δ(z t+1 = i, z t = j y; θ k 1 ] = p(z t+1 = i, z t = j y; θ k 1 ) Thus, in the former expression of the complete log-likelihood, we just have to replace δ(z 0 = i) by p(z 0 = i y; θ k 1 ), and similarly for the other terms. At the k-th maximization step, we maximize the new obtained expression with respect to the parameters θ in the usual manner to obtain a new estimator θ k. The key is that everything will decouple, thus maximizing is simple and can be done in closed form. 7.3 Principal Component Analysis (PCA) Framework: x 1,..., x N R d Goal: put points on a closest affine subspace Analysis view Find w R d such that Var(x T w) is maximal, with w =
15 With centered data, i.e. 1 N N x n = 0, the empirical variance is: n=1 ˆ Var(x T w) = 1 N N (x T nw) 2 = 1 N wt (X T X)w n=1 where X R N d is the design matrix. In this case: w is the eigenvector of X T X with largest eigenvalue. It is not obvious a priori that this is the direction we care about. If more than one direction is required, one can use deflation: 1. Find w 2. Project x n onto the orthogonal of Vect(w) 3. Start again Synthesis view min w N d(x n, {w T x = 0}) 2 with w R D, w = 1. n=1 Advantage: if one wants more than 1 dimension, replace {w T x = 0} by any subspace. Probabilistic approach: Factor Analysis Model: Λ = (λ 1,..., λ K ) R d k X R k N (0, I) ɛ N (0, Ψ), ɛ R d independent from X with Ψ diagonal. Y R d : Y = ΛX + µ + ɛ We have Y X N (ΛX + µ, Ψ). Problem: get X Y. (X, Y ) is a Gaussian vector on R d+k which satisfies: E[X] = 0 = µ X E[Y ] = E[ΛX + µ + ɛ]µ = µ Y 7-15
16 Σ XX = I Σ XY = Cov(X, ΛX + µ + ɛ) = Cov(X, ΛX) = Λ T Σ Y Y = Var(ΛX + ɛ, ΛX + ɛ) = Var(ΛX, ΛX) + Var(ɛ, ɛ) = ΛΛ T + Ψ Thanks to the results we know on exponential families, we know how to compute X Y : In our case, we therefore have: E[X Y = y] = µ X + Σ XY Σ 1 Y Y (y µ Y ) Cov[X Y = y] = Σ XX Σ XY Σ 1 Y Y Σ Y X E[X Y = y] = Λ T (ΛΛ T + Ψ) 1 (y µ) Cov[X Y = y] = I Λ T (ΛΛ T + Ψ) 1 Λ T To apply EM, one needs to write down the complete log-likelihood. log p(x, Y )α 1 2 XT X 1 2 (Y ΛX µ)t Ψ 1 (Y ΛX µ) 1 log det Ψ 2 Trap: E[XX T Y ] Cov(X Y ) Rather, E[XX T Y ] = Cov(X Y ) + E[X Y ]E[X Y ] T Remark 7.4 Cov(X) = ΛΛ T + Ψ: our parameters are not identifiable, Λ ΛR with R a rotation gives the same results (in other words, a subspace has different orthonormal bases). Why do we care? 1. A probabilistic interpretation allows to model in a finer way the problem. 2. It is very flexible and therefore allows to combine multiple models. 7-16
17 Bibliography 17
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationUNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS
UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems
More information5. Sum-product algorithm
Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationApproximate inference, Sampling & Variational inference Fall Cours 9 November 25
Approimate inference, Sampling & Variational inference Fall 2015 Cours 9 November 25 Enseignant: Guillaume Obozinski Scribe: Basile Clément, Nathan de Lara 9.1 Approimate inference with MCMC 9.1.1 Gibbs
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationLearning in graphical models & MC methods Fall Cours 8 November 18
Learning in graphical models & MC methods Fall 2015 Cours 8 November 18 Enseignant: Guillaume Obozinsi Scribe: Khalife Sammy, Maryan Morel 8.1 HMM (end) As a reminder, the message propagation algorithm
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More information4.1 Notation and probability review
Directed and undirected graphical models Fall 2015 Lecture 4 October 21st Lecturer: Simon Lacoste-Julien Scribe: Jaime Roquero, JieYing Wu 4.1 Notation and probability review 4.1.1 Notations Let us recall
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationCS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm
CS6220 Data Mining Techniques Hidden Markov Models, Exponential Families, and the Forward-backward Algorithm Jan-Willem van de Meent, 19 November 2016 1 Hidden Markov Models A hidden Markov model (HMM)
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationMACHINE LEARNING 2 UGM,HMMS Lecture 7
LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationCS281A/Stat241A Lecture 19
CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationCS838-1 Advanced NLP: Hidden Markov Models
CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationGraphical Models Seminar
Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory
UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics EECS 281A / STAT 241A Statistical Learning Theory Solutions to Problem Set 2 Fall 2011 Issued: Wednesday,
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationSemi-Markov/Graph Cuts
Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More information