Lecture 6: April 19, 2002

Size: px
Start display at page:

Download "Lecture 6: April 19, 2002"

Transcription

1 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin 6.1 Factor Analysis In last lecture we described a projection model which tries to model the variance at each individual component of the data vector including the noise. In this section we will describe Factor Analysis (FA) model which tries to model the correlations among the individual components to the extent that they are generated by an underlying low dimensional hidden variables and/or whether or not there is such an underlying distribution. FA originally is developed by psychologists to find the contributions of different factors to the mental abilities. For example, given the performances of a test subject at English, French and Spanish we would like to find what the underlying mental ability is. Find the factor loading λ = [λ 1, λ 2, λ 3 ] T, where λ i = [λ i1, λ i2 ], the common factors f = [f 1, f 2 ] T and disturbance term V = [v 1, v 2, v 3 ] T such that y i = λ T i f + v i (6.1) where Y = y 1 (French) y 2 (English) y 3 (Spanish) [ and f = f 1 (illiteracy) f 2 (intelligence) ] Since we are trying to model covariances instead of variances FA is invariant to the level of noise at the individual components, but it is sensitive to the direction of data The Factor Analysis Model Let Y be the p component observation vector of variables y 1,..., y p. The FA model states that y i s are linear combinations a small set of common factors f i s, random errors and a constant offset to account for the nonzero means, y 1 = λ 11 f 1 + λ 12 f λ 1k f k + v 1 + µ 1 (6.2) y 2 = λ 21 f 1 + λ 22 f λ 2k f k + v 2 + µ 2. y p = λ p1 f 1 + λ p2 f λ pk f k + v p + µ p where f k s represent the underlying k common factors, v i s represent the combined effects of specific factors and error, µ i s are the means of each variable and λ ik s are the factor loadings representing the effect of kth common factor on ith variable. Usually k p. Writing in matrix form the FA model becomes Y = ΛX + v + µ (6.3) 6-1

2 Lecture 6: April 19, where Λ is called factor loading matrix and we have replaced f by X to be consistent with the notation at other sections, X and v. If we assume that specific factors v i s are uncorrelated with each other, Y i s become conditionally independent given the common factors X, i.e. covariance terms at the covariance matrix of Y are only due to Λ. We assume that factors already have been rotated and standardized so that common factors are uncorrelated with each other : E{X} = 0 and Cov{X} = I, and specific factors are zero mean and uncorrelated with each other: E{v} = 0 and Cov{v} = Ψ. Adding normality we get X N(0, I), v N(0, Ψ) and Cov(X, v) = 0 Y N(µ, ΛΛ T + Ψ) where the factor loading matrix Λ is not necessarily diagonal. There is an identifiability problem. Λ is not unique. Rotating both common factors and factor loading matrix in a suitable manner, we would get the same set of observations: Y = (ΛG)G T X + v + µ (6.4) where G is any orthonormal matrix of appropriate size, i.e. GG T = G T G = I. To make the specification unique, we pick one particular G or add an arbitrary constraint that Λ should be such that Having solved the uniqueness problem, there are two basic problems with FA: Λ T ΨΛ = diagonal matrix. (6.5) learning Given data Y, find the model parameters Λ, Ψ and µ, find x = argmax x p(x Y ). probabilistic inference Find best X given Y, i.e. p(x Y ) using different assumptions about where Y came from. 6.2 Unifying View of These Models - Static Kalman Filter Consider the following continuous time linear dynamical system, x t+1 = Ax t + w t, w t N(0, Q) (6.6) y t = Cx t + v t v t N(0, R). where x t is the hidden state of the system and y t is the output of the system at time t. The state of the system evolves as a first order dynamical system while the output is generated from current state by a linear observation process plus an additive Gaussian noise. Two noise processes are white and independent of each other. This is the classical Kalman filter. However if each state x t is produced independently of each other and identically distributed, then there is no temporal ordering to the data points so that we can drop time index t and the state of the system is completely random, i.e. A = 0, X = w, w N(0, Q) (6.7) Y = CX + v v N(µ, R). The observation at any time only depends on the state of the system at that time. This is why the term static has used. Since the linear combinations of Gaussians are Gaussian and sum of independent Gaussians are also Gaussian, E{Y } = µ, Cov{Y } = CQC T + R Y N(µ, CQC T + R). (6.8)

3 Lecture 6: April 19, X 0 C Y w Figure 6.1: Static Kalman Filter v We add some restriction on the above module: identification It is not possible to uniquely determine C and Q. We can always interchange the factors between C and Q, CQC T = C(ΓΛΓ T )C T (6.9) = (CΓ)Λ(CΓ) T Q = Λ, diagonal = (CΓΛ 1/2 )(CΓΛ 1/2 ) T = C C T Q = I. To ensure uniqueness we constrain Q to be I without loss of generality. Restriction to R. We must restrict R so that the maximum likelihood parameter estimation does not set C = 0 and let R explain all of the variability without capturing any interesting and informative projections in X. Since Y is now single Gaussian without any time dependence, the ML estimate of Cov{Y } is the sample covariance. Nothing restricts R not to be the sample covariance matrix. We constrain R to be diagonal, i.e the components of v are uncorrelated and the correlations in Y are only due to C. Key idea is that FA is scale invariant but not rotation invariant while PCA is rotation invariant but not scale invariant Likelihood of the data The inference problem is to determine the posterior probability of a particular state X given a set of observations Y {Y 1,..., Y N }, i.e, P (X Y). For static Kalman filter model that inference problem reduces to p(x Y ) where Y is the observation corresponding to state X since all of the data points are generated independently. Using By We can easily get Y N(µ, CC T + R) (6.10) X N(0, I), y t = Cx t + v t v t N(0, R). (6.11) P (Y X) N(CX + µ, R) (6.12)

4 Lecture 6: April 19, To compute P (X Y ), we need to know Cov{X, Y } = E{(X µ X )(Y µ Y ) T } (6.13) = E{X(Y µ Y ) T } = E{X(CX + v µ) T } = E{XX T }C T + E{Xv T } = C T since w v. X and Y are jointly Gaussian, [ ] ([ X 0 N Y µ Using the following formula for a Gaussian distribution [ ] ([ ] [ ]) X µx ΣXX Σ N, XY Y we get µ Y Σ Y X Σ Y Y ] [ I C T, C CC T + R p(x Y ) N ( µ X + Σ XY Σ 1 Y Y (Y µ Y ), Σ XX Σ XY Σ 1 Y Y Σ Y X), ]). (6.14) p(x Y ) N ( C T (CC T + R) 1 (Y µ), I C T (CC T + R) 1 C ). (6.15) ML Parameter Estimation Given a set of observations Y we choose the parameters of our models, {C, R, µ} such that the likelihood of Y is maximized. {C, µ, R } = argmax C,µ,R = argmax C,µ,R p(y, X) (6.16) p(y X) where X {X 1,..., X N } is the set of corresponding states producing Y. We prefer to maximize p(y X) instead of p(y) since the latter is difficult to differentiate with respect to C. p(y X) is easy to differentiate, however only Y is observed and X hidden. To maximize with hidden variables we use Expectation-Maximization (EM) algorithm which we will be describing next section. Before going to learning with hidden variables we will show how PCA, FA and econometric model can be formulated by static Kalman filter modeling. PCA i.e. v is deterministic and is equal to µ. X = w w N(0, I) (6.17) Y = CX + v v N(µ, 0), X = C 1 (Y µ) (6.18) = C T (Y µ) assuming C is orthonormal. The learning problem is to find C.

5 Lecture 6: April 19, FA X = w w N(0, I) (6.19) Y = CX + v v N(µ, R) which could be thought as data and sensor noise. C is the correlation matrix (same as Λ in section 4.3.3), R is a diagonal matrix. The covariance structure of Y is in C and the variance structure of Y is in the diagonal matrix R. Econometric X t = w t w t N(0, I) (6.20) Y t = CX t + v t v t N(µ, R t ) and w t is not necessarily white over time. The model is quite complicated because the volatility matrix R t is time dependent. 6.3 Learning with Hidden Variables Hidden variables (unobserved variables or latent variables) are often introduced in a model to simplify it. For example given a set of dependent variables instead of addling links between each of them a top-down structure through hidden variables can simplify the model: given hidden variables they are independent (figure 6.2). Figure 6.2: Introducing hidden variables to simplify a densely connected graph They can be discrete or continuous. If they are discrete they represent the underlying classes representing observations or discrete states associated with a dynamical system like a Hidden Markov Model (HMM). For example in a Gaussian mixture this would be the identity of distribution from which an observation has been produced (figure 6.3). X - discrete Figure 6.3: Mixture Models Y If they are continuous, they represent the state of a static system like X in FA or dynamic system like X t in a Kalman filter. They parameterize low dimensional spaces. In FA the underlying common factors are of less dimension than observations, k p. In PCA, a small number of principal components might account for most of the variance. They can explain independent components of data - ICA (figure (6.4)).

6 Lecture 6: April 19, X Y Figure 6.4: Independent Component Analysis,one possible model They can also be associated with the underlying physical model like diseases cause symptoms. Even only symptoms are observed, hidden variables can be used to infer about diseases: having observed some symptoms find the most probable disease (figure 6.5). diseases symptoms Figure 6.5: QMR diseases Although the hidden variables simplify the model, training models with hidden variables is difficult because they are unobserved. We still like to use ML estimation since it is statistically well founded. However, only a subset of model variables are observed and marginalization of the joint distribution over hidden variables couples observed variables and the resulting likelihood is usually mathematically intractable. For example, for a Gaussian mixture model p(y) = = N log P (Y i ) (6.21) i=1 ( N M ) log π k N(Y i ; µ k, Σ k ) i=1 k=1 and sums inside log are not amenable. The problem would be trivial if we knew which mixture component is responsible for each data point. Taking relative class frequencies would give π k s and sample means and covariances would give mixture means and variances. EM solves the problem with hidden variables EM Algorithm Let X be the observed and Z be the hidden data samples, {X, Z} = {(X i, Z i ), i = 1,..., N}. We assume that there exists a complete probability assuming some implementation and structure. Usually this joint distribution factors out nicely like in our Gaussian mixture example so that the maximization of l c (θ) = log (6.22)

7 Lecture 6: April 19, called complete data log likelihood, is usually easier. θ = argmax θ l c (θ) (6.23) However when Z is not observed, observations will not decouple, in this case, we have p(x θ), and our goal is to find We get the log likelihood ration θ = argmax θ p(x θ) (6.24) l(θ) log p(x θ) = log Z (6.25) as we showed for Gaussian mixture. The problem is the sum inside the log couples together variables. Suppose we have Q(Z X, θ), which is an appropriate distribution of Z depending on observed values, observing that = i p(x i x πi, θ ix ) i p(z i z πi, θ iz ) (6.26) log() = i log(p(x i x πi, θ ix )) + i log(p(z i z πi, θ iz )) (6.27) The variables of X and Z are decoupled and therefore can lead to tractable inference. < l C (θ) > Q = Z Q(Z X, θ) log (6.28) = E Q(Z,X θ) [log X, θ] which is called as the expected complete log likelihood. Note that if we had observed Z i s, and Q(Z X, θ) assigned to the sequence consisting of observed Z i s 1, and 0 to everything else, then expected complete log likelihood would reduce to complete log likelihood. Since we don t know these true values, we hope that maximizing the average of complete likelihoods by different assignments to Z by Q( ) will give an improvement towards the value maximizing (6.25). It is crucial that averaging should weight probable Z sequences more. X is a source for this since Z and X are related. An intuitive choice is P (Z X) since different sequences are weighted based on how likely they are if they were to produce X with the assumed model. We will first show that Q(Z X) can provide a lower bound for l(θ) and then we will describe an iterative procedure which raises this lower bound every iteration. Manipulating l(θ), l(θ) = log Z (6.29) = log Z Z = L(Q, θ) Q(Z X, θ) Q(Z X, θ) Q(Z X, θ) log Q(Z X, θ) where we have used Ef(X) f(ex) for any concave f(x), i.e. Jensen s inequality. Hence for a given θ maximizing L(θ, Q) would raise the lower bound of l(θ).

8 Lecture 6: April 19, Denote L(θ, Q) = Z Q(Z X) log p(z X) (6.30) EM is a coordinate ascent algorithm on L(θ, Q). We iteratively maximize with respect to Q and θ. At the (t + 1)st iteration we first maximize L(θ (t), Q) with respect to Q, and we then maximize L(θ, Q (t+1) ) with respect to θ. Q (t+1) = argmax Q θ (t+1) = argmax θ L(Q, θ (t) ) E-step (6.31) L(Q (t+1), θ) The E-step is easy to solve by setting Q (t+1) (Z X) = P (Z X, θ (t) ) since M-step L(p(Z X, θ (t) ), θ (t) ) = Z = Z p(z X, θ (t) ) log p(x, Z θ(t) ) p(z X, θ (t) ) p(z X, θ (t) ) log p(x θ (t) ) (6.32) = log p(x θ (t) ) Z p(z X, θ (t) ) = log p(x θ (t) ) l(θ (t) ) and using L(Q, θ) l(θ). Hence at the at the end of E-step of (t + 1)st iteration we have l(θ (t) ) = L(Q (t+1), θ (t) ). Increasing L(Q (t+1), θ) will necessarily increase l(θ). This E-step can also seen using the following: l(θ) L(Q, θ) = log p(x θ) Z Q(Z X) log p(z X) = Q(Z X) log p(x θ) Q(Z X) log Z Z = [ Q(Z X) log Q(Z X)p(X θ) ] Z = [ Q(Z X) log Q(Z X) ] p(z X, θ) Z = D(Q(Z X) P (Z X, θ)) 0 = 0 only when Q(Z X) = P (Z X, θ) p(z X) (6.33) M-step maximizes the expected complete log likelihood because L(Q, θ) = Z Q(Z X, θ) log Q(Z X, θ) (6.34) = Z Q(Z X, θ) log Z Q(Z X, θ) log Q(Z X, θ) = E Q(Z X) [log ] + H(Q) = E p(z X,θ (t) )[log ] + H(Q)

9 Lecture 6: April 19, and H(Q) does not depend on θ for a given Q. By repeating these two steps EM is guaranteed to converge to a maximum of θ which is not necessarily the global maximum. (t+1) l( θ ) θ (t+1) θ (t) l( θ ) θ (t) L(Q (t+1), θ) L(Q (t), θ) l(θ) Figure 6.6: EM climbing the hill In figure (6.6), at each step first curve L(Q (t), θ) is found based on current value of θ which is θ (t) and the curve for constant Q (t+1) is maximized with respect to θ. At the maximizing θ the two curves L(Q (t+1), θ) and l(θ) touch each other. References [Bilmes97] [Bishop96] J.A. BILMES, A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, ICSI Technical Report ICSI-TR , 1997 C.M. BISHOP, Learning with Latent Variables, in Learning in Graphical Models, M.I. Jordan (ed.), 1996 [Dempster77] A.P. DEMPSTER, N.M. LAIRD, D.B. RUBIN, Maximum Likelihood Estimation from Incomplete Data, J. Royal Stat. Soc. (B), vol.39, no.1. pp.1-38, 1977 [Everitt84] B.S. EVERITT, An Introduction to Latent Variable Models, Chapman and Hill, 1984 [JB00] M.I. JORDAN and C. BISHOP, An Introduction to Graphical Models To be published, 2000 [Neal96] R.M. NEAL, G.E. HINTON, A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants, in Learning in Graphical Models, M.I. Jordan (ed.), 1996 [Roweis97] S. ROWEIS, Z. GHAHRAMANI, A Unifying View of Linear Gaussian Models, Unpublished, 1997 [Svensen98] J.H.M. SVENSEN, GTM: The Generative Topographic Mapping, Aston University PhD Thesis, 1998

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x (i) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting,

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

MIXTURE MODELS AND EM

MIXTURE MODELS AND EM Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

A latent variable modelling approach to the acoustic-to-articulatory mapping problem

A latent variable modelling approach to the acoustic-to-articulatory mapping problem A latent variable modelling approach to the acoustic-to-articulatory mapping problem Miguel Á. Carreira-Perpiñán and Steve Renals Dept. of Computer Science, University of Sheffield {miguel,sjr}@dcs.shef.ac.uk

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1 Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters Lecturer: Drew Bagnell Scribe:Greydon Foil 1 1 Gauss Markov Model Consider X 1, X 2,...X t, X t+1 to be

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Linear Factor Models. Sargur N. Srihari

Linear Factor Models. Sargur N. Srihari Linear Factor Models Sargur N. srihari@cedar.buffalo.edu 1 Topics in Linear Factor Models Linear factor model definition 1. Probabilistic PCA and Factor Analysis 2. Independent Component Analysis (ICA)

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Expectation-Maximization (EM) algorithm

Expectation-Maximization (EM) algorithm I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce

More information

A Unifying Review of Linear Gaussian Models

A Unifying Review of Linear Gaussian Models REVIEW Communicated by Steven Nowlan A Unifying Review of Linear Gaussian Models Sam Roweis Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91125, U.S.A. Zoubin Ghahramani

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

What is the expectation maximization algorithm?

What is the expectation maximization algorithm? primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Dynamic models 1 Kalman filters, linearization,

Dynamic models 1 Kalman filters, linearization, Koller & Friedman: Chapter 16 Jordan: Chapters 13, 15 Uri Lerner s Thesis: Chapters 3,9 Dynamic models 1 Kalman filters, linearization, Switching KFs, Assumed density filters Probabilistic Graphical Models

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information