Outline Lecture 3 2(40)

Size: px
Start display at page:

Download "Outline Lecture 3 2(40)"

Transcription

1 Outline Lecture 3 4 Lecture 3 Expectation aximization E and Clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. schon@isy.liu.se Phone: Office: House B Entrance 7.. Summary of Lecture. Expectation aximization General derivation Example - identification of a linear state-space model Example - identification of a Wiener system 3. Gaussian mixtures Standard construction Equivalent construction using latent variables L estimation using E 4. Connections to the K-means algorithm for clustering Chapter 9 Summary of Lecture I/IV 34 Summary of Lecture II/IV 44 Using the posterior mean m = βs Φ T T the linear regression model is given by yx m = m T φx = βφxt S Φ T T = βφx T S φx n xx n Hence the mean of the predictive distribution at a point x is a linear combination of the training target variables t n yx m = x x nt n where x x = βφx T S φx is called the equivalent ernel. An alternative approach to regression is to directly mae use of a localized ernel rather than starting from basis functions. This leads to the so called ernel methods. t n Investigated one linear discriminant a function that taes an input and assigns it to one of K classes method in detail least squares odelled each class as y x = w T x + w and solved the LS problem resulting in ŵ = X T X X T T Showed how probabilistic generative models could be built for classification using the strategy. odel px C a..a. class-conditional density. odel pc 3. Use L to find the parameters in px C and pc. 4. Use Bayes rule to find pc x

2 Summary of Lecture III/IV 54 The direct method called logistic regression was introduced. Start by stating the model pc φ = σw T φ = + e wt φ which results in a log-lielihood function according to Lw = ln pt w = t n lny n + t n ln y n where y n = pc φ = σw T φ. ote that this is a nonlinear but concave function of w. Hence we can easily find the global minimum using ewton s method resulting in an algorithm nown as IRLS. Summary of Lecture IV/IV 64 The lielihood function for logistic regression is pt w = σw T φ n t n σw T tn φ n Hence computing the posterior density pw T = pt wpw is pt intractable and we considered the Laplace approximation for solving this. The Laplace approximation is a simple local approximation that is obtained by fitting a Gaussian centered around the AP mode of the distribution. Very Brief on Evaluating Classifiers 74 Let us define the following concepts True positives tp: Items correctly classified as belonging to the class. True negatives tn: Items correctly classified as not belonging to the class. False positives fp: Items incorrectly classified as belonging to the class. In other words a false alarm. False negatives fn: Items that are not classified in the correct class even though they should have. In other words a missed detection. Precision = tp tp + fp Recall = tp tp + fn Latent Variables - Example 84 A latent variable is a variable that is not directly observed. Other common names are hidden variables unobserved variables or missing data. An example of a latent variable is the state x t in a state-space model. Consider the following linear scalar state-space model x t+ = θx t + v t y t = x t + e t vt e t...

3 Expectation aximization E - Strategy and Idea 94 Expectation aximization E Algorithm 4 The strategy underlying the E algorithm is to separate the original L problem into two lined problems each of which is hopefully easier to solve than the original problem. This separation is accomplished by exploiting the structure inherent in the probabilistic model. The ey idea is to consider the joint log-lielihood function of both the observed variables X and the latent variables Z L θ Z X = ln p θ Z X. Algorithm Expectation aximization E. Set i = and choose an initial guess θ.. Expectation E step: Compute Qθ θ i = E θi {ln p θ Z X X} = ln p θ Z Xp θi Z XdZ. 3. aximization step: Compute θ i+ = arg max Qθ θ i. θ 4. If not converged update i := i + and return to step. E Example - Linear System Identification 4 Consider the following linear scalar state-space model x t+ = θx t + v t y t = x t + e t vt e t... The initial state is fully nown x =. Finally the true θ-parameter is given by θ =.9. The identification problem is now to determine the parameter θ on the basis of the observations Y = {y... y } using the E algorithm. The latent variables Z are given by the states Z = X {x... x + }. ote the difference in notation compared to Bishop! The observations are denoted Y and the latent variables are given by X. E Example - Linear System Identification 4 The resulting Q-function is Qθ θ i E θi { x t Y = ϕθ + ψθ where we have defined ϕ } θ + E θi { x t x t+ Y } E θi {x t Y ψ E θi {x t x t+ Y}. There exists explicit expressions for these expected values. } θ

4 E Example - Linear System Identification 34 E Example - Linear System Identification 44 The maximization step: θ i+ = arg max Qθ θ i. θ Hence the step simply amounts to solving the following quadratic problem which results in θ i+ = arg max ϕθ + ψθ θ θ i+ = ψ ϕ. Algorithm. Set i = and choose an initial guess θ.. Expectation E step: Compute ϕ = } E θi {x t Y ψ = E θi {x t x t+ Y}. 3. aximization step: Find the next iterate according to θ i+ = ψ ϕ. 4. If L θi Y L θi Y 6 update i := i + and return to step otherwise terminate. E Example - Linear System Identification 54 E Example - Linear System Identification 64 Different number of samples used. onte Carlo studies each using realisations of data. Initial guess θ = θ o surprise since L is asymptotically efficient. Log lielihood Q function Log lielihood and Q function Parameter a a Iteration Log lielihood and Q function Parameter a b Iteration Log lielihood Q function

5 E Example - Linear System Identification 74 onlinear System Identification Using E I/VI 84 Log lielihood Q function Log lielihood Q function ow consider the problem of estimating the parameters θ in Log lielihood and Q function Log lielihood and Q function Commonly also written as x t+ = f t x t θ + v t θ y t = h t x t θ + e t θ Parameter a c Iteration Parameter a d Iteration x t+ p θ x t+ x t y t p θ y t x t. All details including ATLAB code are provided in Thomas B. Schön An Explanation of the Expectation aximization Algorithm. Division of Automatic Control Linöping University Sweden Technical Report nr: LiTH-ISY-R-95 August 9. According to the above the first step is to compute the Q-function Qθ ˆθ = E θ {log p θ Z Y Y} onlinear System Identification Using E II/VI 94 Applying E θ { Y} to logp θ X Y = log p θ Y X + log p θ X = log p θ x + log p θ x t+ x t + results in Qθ θ = I + I + I 3 where I = log p θ x p θ x Ydx log p θ y t x t I = log p θ x t+ x t p θ x t+ x t Ydx t dx t+ I 3 = log p θ y t x t p θ x t Ydx t. onlinear System Identification Using E III/VI 4 This leads us to a nonlinear state smoothing problem which we can solve using particle smoothers. The current particle smoothers provides us with the following approximation of the joint smoothing density px Y δ X X i which allows for the following approximations of the marginal smoothing densities that we need p θ x t Y ˆp θ x t Y = p θ x t:t+ Y ˆp θ x t:t+ Y = δx t x i t δx t:t+ x i t:t+.

6 onlinear System Identification Using E IV/VI 4 Inserting the above approximations into the integrals straightforwardly yields the approximation we are looing for Î = log p θ x Î = δx x i dx = log p θ x t+ x t = log p θ x i t+ xi t Î 3 = log p θ y t x t log p θ x i δ x t:t+ x i t:t+ δx t x i tdx t = dx t:t+ log p θ y t x i t onlinear System Identification Using E V/VI 4 It is straightforward to mae use of the approximation of the Q-functions just derived in order to compute gradients of the Q-function θ Qθ θ = Î θ + Î θ + Î 3 θ For example the other two terms analogously Î 3 = Î 3 θ = log p θ y t x i t log p θ y t x i t θ With these gradients in place there are many algorithms that can be used in order to solve the maximization problem we employ BFGS. onlinear System Identification Using E VI/VI 34 E Example - Blind Wiener Identification I/IV 44 Algorithm onlinear System Identification Using E. Set i = and choose an initial guess θ.. Expectation E step:. Run the particle filter and the particle smoother. Calculate an approximation of the Q-function Q θ ˆθ i = Î + Î + Î aximization step: Compute Figure: The blind only the outputs y t are available Wiener linear dynamic system followed by a static nonlinearity problem. ξ t+ = Aξ t + w t w t Q x t = Cξ t y t = f x t η + e t e t R. θ = [ϑ T η T vec{r} T ] ϑ = [vec{a} T vec{c} T vec{q}] T. θ i+ = arg max θ Q θ θ i. 4. If not converged update i := i + and return to step. Thomas B. Schön Adrian Wills and Brett inness. System Identification of onlinear State-Space odels. Automatica 47:39-49 January. ost existing wor deals with special cases with assumptions lie;. Invertible nonlinearities. no measurement noise 3. no process noise. We are not forced to do any of these assumptions.

7 E Example - Blind Wiener Identification II/IV 54 R Recall that Z = Ξ Qθ θi ln pθ Ξ Ypθi Ξ YdΞ. According to the model we have q +.q.49q 3 +.q q q +.546q q 4 : x >.3.3 f x = x :. x.3. : x <.. H q = ln pθ yt ξ t + ln pθ ξ t+ ξ t resulting in Qθ θi = I + I Z Z I = I = ln pθ ξ t+ ξ t pθi ξ t+ ξ t Ydξ t dξ t+ = samples used wt. et.. ln pθ yt ξ t pθ ξ t Ydξ t = particles are used and onte Carlo runs. i Initialize the linear system using a subspace method. where pθi ξ t+ ξ t Y and pθi ξ t Y are provided by the particle smoother. 64 Wiener system: ln pθ Ξ Y = ln pθ Y Ξ + ln pθ Ξ = E Example - Blind Wiener Identification III/IV E Example - Blind Wiener Identification IV/IV 74 E Example - Blind Wiener Identification 84 3 et 5 yt f η Hq ϑ yt f η Gain db 5 wt 5 5 et Figure: Bode plot of estimated grey and true blue systems. Figure: Estimated grey and true blue memoryless nonlinearity saturation. Figure: Bloc diagram of blind Wiener model with two outputs ormalised frequency.5 3 Figure: Bode plot of estimated grey and true blue systems.

8 E Example - Blind Wiener Identification 94 Gaussian ixture - Standard Construction 34 Output of nonlinearity.5.5 Output of nonlinearity 3 A linear superposition of Gaussians px = K = π p x µ Σ px is called a Gaussian mixture. The mixture coefficients π satisfies Input to nonlinearity Figure: Estimated grey and true blue nonlinearity saturation Input to nonlinearity Figure: Estimated grey and true blue nonlinearity dead band. Adrian Wills Thomas B. Schön Lennart Ljung and Brett inness. Blind Identification of Wiener odels. Proceedings of the 8th World Congress of the International Federation of Automatic Control IFAC ilan Italy September. accepted for publication K π = π. = Interpretation: The density px = x µ Σ is the probability of x given that component was chosen. The probability of choosing component is given by the prior probability p. Gaussian ixture - Example 34 Consider the following Gaussian mixture px =.3 π x } {{ } µ Σ +.5 π Figure: Probability density function. x 8 } {{ } µ Σ x π 3 x } {{ } µ 3 Σ x Figure: Contour plot. Gaussian ixture - Problem with Standard Construction 34 Given independent observations {x n } the log-lielihood function if given by ln px; π :K µ :K Σ :K = K ln π x µ Σ = There is no closed form solution available due to the sum inside the logarithm. We will now see that this problem can be separated into two simple problems using the E algorithm. First we introduce an equivalent construction of the Gaussian mixture by introducing a latent variable.

9 E for Gaussian ixtures - Intuitive Preview 334 Based on pz n = K = π z n and px n z n = K = we have for independent observations {x n } px Z = K = resulting in the following log-lielihood ln px Z = K = π z n x n µ Σ z n x n µ Σ z n z n ln π + ln x µ Σ Let us now use wishful thining and assume that Z is nown. Then maximization of is straightforward. E for Gaussian ixtures - Explicit Algorithm 344 Algorithm E for Gaussian ixtures. Initialize µ Σ π and set i =.. Expectation E step: Compute γz n = πi x n µ i Σi K j= πi j x n µ i j n =... =... K. Σi j 3. aximization step: Compute µ i+ = γz n x n π i+ = = γz n Σ i+ = γz n x n µ i+ x n µ i+ 4. If not converged update i := i + and return to step. T Example - E for Gaussian ixtures I/III 354 Consider the same Gaussian mixture as before px =.3 π x } {{ } µ Σ +.5 π 8 x } {{ } µ Σ +. π 3 x } {{ } µ 3 Σ 3 Example - E for Gaussian ixtures II/III 364 x x Apply the E algorithm to estimate a Gaussian mixture with K = 3 Gaussians i.e. use the samples to compute estimates of π π π 3 µ µ µ 3 Σ Σ Σ 3. iterations. x x Figure: Initial guess. Figure: Probability density function. Figure: = samples from the Gaussian mixture px.

10 Example - E for Gaussian ixtures III/III 374 The K-eans Algorithm I/II 384 Algorithm K-means algorithm a..a. Lloyd s algorithm. Initialize µ and set i =.. inimize J w.r.t. r n eeping µ = µ i fixed. x x r i+ n = { if = arg minj x n µ i j otherwise x Figure: True PDF x Figure: Estimate after iterations of the E algorithm. 3. inimize J w.r.t. µ eeping r n = r i+ n µ i+ fixed. = ri+ n x n ri+ n 4. If not converged update i := i + and return to step.. The K-eans Algorithm II/II 394 The name K-means stems from the fact that in step 3 of the algorithm u is give by the mean of all the data points assigned to cluster. ote the similarities between the K-means algorithm and the E algorithm for Gaussian mixtures! K-means is deterministic with hard assignment of data points to clusters no uncertainty whereas E is a probabilistic method that provides a soft assignment. If the Gaussian mixtures are modeled using covariance matrices Σ = ɛi =... K A Few Concepts to Summarize Lecture 6 44 Latent variable: A variable that is not directly observed. Sometimes also referred to as hidden variable or missing data. Expectation aximization E: The E algorithm computes maximum lielihood estimates of unnown parameters in probabilistic models involving latent variables. Jensen s inequality: States that if f is a convex function then Ef x f Ex. Clustering: Unsupervised learning where a set of observations is divided into clusters. The observations belonging to a certain cluster are similar in some sense. K-means algorithm a..a. Lloyd s algorithm: A clustering algorithm that assigns observations into K clusters such that each observation belongs to the cluster with nearest in the Euclidean sense mean. it is straightforward to show that the E algorithm for a mixture of K Gaussian s is equivalent to the K-means algorithm when ɛ.

Outline lecture 6 2(35)

Outline lecture 6 2(35) Outline lecture 35 Lecture Expectation aximization E and clustering Thomas Schön Division of Automatic Control Linöping University Linöping Sweden. Email: schon@isy.liu.se Phone: 13-1373 Office: House

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Outline lecture 4 2(26)

Outline lecture 4 2(26) Outline lecture 4 2(26), Lecture 4 eural etworks () and Introduction to Kernel Methods Thomas Schön Division of Automatic Control Linköping University Linköping, Sweden. Email: schon@isy.liu.se, Phone:

More information

Outline lecture 2 2(30)

Outline lecture 2 2(30) Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Iterative Reweighted Least Squares

Iterative Reweighted Least Squares Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015.

1 / 31 Identification of nonlinear dynamic systems (guest lectures) Brussels, Belgium, June 8, 2015. Outline Part 4 Nonlinear system identification using sequential Monte Carlo methods Part 4 Identification strategy 2 (Data augmentation) Aim: Show how SMC can be used to implement identification strategy

More information

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015.

1 / 32 Summer school on Foundations and advances in stochastic filtering (FASF) Barcelona, Spain, June 22, 2015. Outline Part 4 Nonlinear system identification using sequential Monte Carlo methods Part 4 Identification strategy 2 (Data augmentation) Aim: Show how SMC can be used to implement identification strategy

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

ESTIMATION ALGORITHMS

ESTIMATION ALGORITHMS ESTIMATIO ALGORITHMS Solving normal equations using QR-factorization on-linear optimization Two and multi-stage methods EM algorithm FEL 3201 Estimation Algorithms - 1 SOLVIG ORMAL EQUATIOS USIG QR FACTORIZATIO

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

CS Lecture 18. Expectation Maximization

CS Lecture 18. Expectation Maximization CS 6347 Lecture 18 Expectation Maximization Unobserved Variables Latent or hidden variables in the model are never observed We may or may not be interested in their values, but their existence is crucial

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

Probabilistic clustering

Probabilistic clustering Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

More information

Previously on TT, Target Tracking: Lecture 2 Single Target Tracking Issues. Lecture-2 Outline. Basic ideas on track life

Previously on TT, Target Tracking: Lecture 2 Single Target Tracking Issues. Lecture-2 Outline. Basic ideas on track life REGLERTEKNIK Previously on TT, AUTOMATIC CONTROL Target Tracing: Lecture 2 Single Target Tracing Issues Emre Özan emre@isy.liu.se Division of Automatic Control Department of Electrical Engineering Linöping

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Optimal input design for nonlinear dynamical systems: a graph-theory approach

Optimal input design for nonlinear dynamical systems: a graph-theory approach Optimal input design for nonlinear dynamical systems: a graph-theory approach Patricio E. Valenzuela Department of Automatic Control and ACCESS Linnaeus Centre KTH Royal Institute of Technology, Stockholm,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information