Data Mining and Analysis: Fundamental Concepts and Algorithms

Size: px
Start display at page:

Download "Data Mining and Analysis: Fundamental Concepts and Algorithms"

Transcription

1 Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 13: Representative-based Clustering Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 1 / 30

2 Representative-based Clustering Given a dataset with n points in a d-dimensional space, D = {x i } n i=1, and given the number of desired clusters k, the goal of representative-based clustering is to partition the dataset into k groups or clusters, which is called a clustering and is denoted as C = {C 1, C 2,...,C k }. For each cluster C i there exists a representative point that summarizes the cluster, a common choice being the mean (also called the centroid) µ i of all points in the cluster, that is, µ i = 1 n i x j C i x j where n i = C i is the number of points in cluster C i. A brute-force or exhaustive algorithm for finding a good clustering is simply to generate all possible partitions of n points into k clusters, evaluate some optimization score for each of them, and retain the clustering that yields the best score. However, this is clearly infeasilbe, since there are O(k n /k!) clusterings of n points into k groups. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 2 / 30

3 K-means Algorithm: Objective The sum of squared errors scoring function is defined as SSE(C) = k i=1 x j C i x j µ i 2 The goal is to find the clustering that minimizes the SSE score: C = arg min C {SSE(C)} K-means employs a greedy iterative approach to find a clustering that minimizes the SSE objective. As such it can converge to a local optima instead of a globally optimal clustering. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 3 / 30

4 K-means Algorithm: Objective K-means initializes the cluster means by randomly generating k points in the data space. Each iteration of K-means consists of two steps: (1) cluster assignment, and (2) centroid update. Given the k cluster means, in the cluster assignment step, each point x j D is assigned to the closest mean, which induces a clustering, with each cluster C i comprising points that are closer to µ i than any other cluster mean. That is, each point x j is assigned to cluster C j, where j = arg min k { x j µ i 2} i=1 Given a set of clusters C i, i = 1,...,k, in the centroid update step, new mean values are computed for each cluster from the points in C i. The cluster assignment and centroid update steps are carried out iteratively until we reach a fixed point or local minima. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 4 / 30

5 K-Means Algorithm K-MEANS (D, k,ǫ): t = 0 Randomly initialize k centroids: µ t 1,µt 2,...,µt k Rd repeat t t + 1 C j for all j = 1,, k // Cluster Assignment Step foreach x j D do { j arg min i x j µ t 2} i // Assign x j to closest centroid C j C j {x j } // Centroid Update Step foreach i = 1 to k do µ t i 1 C i x j C i x j until k µ t i µ t 1 2 ǫ i=1 i Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 5 / 30

6 K-means in One Dimension 2 3 µ 1 = 2 4 µ 2 = (a) Initial dataset µ 1 = (b) Iteration: t = 1 µ 2 = (c) Iteration: t = Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 6 / 30

7 K-means in One Dimension (contd.) µ 1 = 3 µ 2 = µ 1 = (d) Iteration: t = 3 µ 2 = µ 1 = (e) Iteration: t = µ 2 = (f) Iteration: t = 5 (converged) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 7 / 30

8 K-means in 2D: Iris Principal Components u (a) Random initialization: t = 0 u 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 8 / 30

9 K-means in 2D: Iris Principal Components y (b) Iteration: t = 1 x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 9 / 30

10 K-means in 2D: Iris Principal Components y (c) Iteration: t = 8 (converged) x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 10 / 30

11 Kernel K-means In K-means, the separating boundary between clusters is linear. Kernel K-means allows one to extract nonlinear boundaries between clusters via the use of the kernel trick, i.e., we show that all operations involve only the kernel value between a pair of points. Let x i D be mapped to φ(x i ) in feature space. Let K = { K(x i, x j ) } i,j=1,...,n denote the n n symmetric kernel matrix, where K(x i, x j ) = φ(x i ) T φ(x j ). The cluster means in feature space are {µ φ 1,...,µφ k }, where µ φ i = 1 n i x j C i φ(x j ). The sum of squared errors in feature space is min C SSE(C) = k i=1 x j C i φ(x j ) µ φ i 2 = n K(x j, x j ) j=1 k 1 n i i=1 x a C i x b C i K(x a, x b ) Thus, the kernel K-means SSE objective function can be expressed purely in terms of the kernel function. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 11 / 30

12 Kernel K-means: Cluster Reassignment Consider the distance of a point φ(x j ) to the mean µ φ i in feature space φ(x j ) µ φ i 2 = φ(x j ) 2 2φ(x j ) T µ φ i + µ φ i 2 = K(x j, x j ) 2 K(x a, x j )+ 1 n i n 2 K(x a, x b ) x a C i i x b C i x a C i Kernel K-means assign a point to the closest cluster mean as follows: { φ(xj C (x j ) = arg min ) µ φ i i 2} { } 1 = arg min i ni 2 K(x a, x b ) 2 K(x a, x j ) n i x b C i x a C i x a C i Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 12 / 30

13 Kernel-Kmeans Algorithm KERNEL-KMEANS(K, k, ǫ): 1 t 0 2 C t {C1,...,C t k}// t Randomly partition points into k clusters 3 repeat 4 t t foreach C i C t 1 do// Compute squared norm of cluster means 6 sqnorm i 1 n x i 2 a C i x b C i K(x a, x b ) foreach x j D do// Average kernel value for x j foreach C i C t 1 do avg ji 1 n i x a C i K(x a, x j ) and C i // Find closest cluster for each point 10 foreach x j D do 11 foreach C i C t 1 do 12 d(x j, C i ) sqnorm i 2 avg ji { j arg min i d(xj, C i ) } Cj t Ct j {x j}// Cluster reassignment C t { } 15 C1,...,C t k t until 1 1 k n i=1 C t i C t 1 ǫ 16Zaki & Meira Jr. (RPI and UFMG) i Data Mining and Analysis Chap 13: Rep-based Clustering 13 / 30

14 K-Means or Kernel K-means with Linear Kernel X (a) Linear kernel: t = 5 iterations X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 14 / 30

15 Kernel K-means: Gaussian Kernel y (b) Gaussian kernel: t = 4 Iterations x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 15 / 30

16 Expectation-Maximization Clustering: Gaussian Mixture Model Let X a denote the random variable corresponding to the ath attribute. Let X = (X 1, X 2,...,X d ) denote the vector random variable across the d-attributes, with x j being a data sample from X. We assume that each cluster C i is characterized by a multivariate normal distribution { } 1 f i (x) = f(x µ i,σ i ) = exp (x µ i) T Σ 1 i (x µ i ) (2π) d 2 Σ i where the cluster mean µ i R d and covariance matrix Σ i R d d are both unknown parameters. The probability density function of X is given as a Gaussian mixture model over all the k clusters k k f(x) = f i (x)p(c i ) = f(x µ i,σ i )P(C i ) i=1 where the prior probabilities P(C i ) are called the mixture parameters, which must satisfy the condition k P(C i ) = 1 Zaki & Meira Jr. (RPI and UFMG) Data i=1mining and Analysis Chap 13: Rep-based Clustering 16 / 30 i=1

17 Expectation-Maximization Clustering: Maximum Likelihood Estimation We write the set of all the model parameters compactly as θ = {µ 1,Σ 1, P(C i )...,µ k,σ k, P(C k )} Given the dataset D, we define the likelihood of θ as the conditional probability of the data D given the model parameters θ P(D θ) = n f(x j ) The goal of maximum likelihood estimation (MLE) is to choose the parametersθ that maximize the likelihood. We do this by maximizing the log of the likelihood function j=1 θ = arg max{ln P(D θ)} θ where the log-likelihood function is given as ln P(D θ) = n ln f(x j ) = j=1 n j=1 ( k ) ln f(x j µ i,σ i )P(C i ) i=1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 17 / 30

18 Expectation-Maximization Clustering Directly maximizing the log-likelihood over θ is hard. Instead, we can use the expectation-maximization (EM) approach for finding the maximum likelihood estimates for the parameters θ. EM is a two-step iterative approach that starts from an initial guess for the parameters θ. Given the current estimates for θ, in the expectation step EM computes the cluster posterior probabilities P(C i x j ) via the Bayes theorem: P(C i x j ) = P(C i and x j ) P(x j ) = P(x j C i )P(C i ) k a=1 P(x j C a )P(C a ) = f i (x j ) P(C i ) k a=1 f a(x j ) P(C a ) In the maximization step, using the weights P(C i x j ) EM re-estimates θ, that is, it re-estimates the parameters µ i, Σ i, and P(C i ) for each cluster C i. The re-estimated mean is given as the weighted average of all the points, the re-estimated covariance matrix is given as the weighted covariance over all pairs of dimensions, and the re-estimated prior probability for each cluster is given as the fraction of weights that contribute to that cluster. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 18 / 30

19 EM in One Dimension: Expectation Step Let D comprise of a single attribute X, with each point x j R a random sample from X. For the mixture model, we use univariate normals for each cluster: f i (x) = f(x µ i,σi 2 ) = 1 exp { (x µ i) 2 } 2πσi with the cluster parameters µ i, σ 2 i, and P(C i ). Initialization: For each cluster C i, with i = 1, 2,...,k, we can randomly initialize the cluster parameters µ i, σ 2 i, and P(C i ). Expectation Step: Given the mean µ i, variance σ 2 i, and prior probability P(C i ) for each cluster, the cluster posterior probability is computed as w ij = P(C i x j ) = 2σ 2 i f(x j µ i,σ 2 i ) P(C i ) k a=1 f(x j µ a,σ 2 a ) P(C a) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 19 / 30

20 EM in One Dimension: Maximization Step Given w ij values, the re-estimated cluster mean is µ i = n j=1 w ij x j n j=1 w ij The re-estimated value of the cluster variance is computed as the weighted variance across all the points: n σi 2 j=1 = w ij(x j µ i ) 2 n j=1 w ij The prior probability of cluster C i is re-estimated as P(C i ) = n j=1 w ij n Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 20 / 30

21 EM in One Dimension µ 1 = 6.63 µ 2 = (a) Initialization: t = 0 µ 2 = µ 1 = (b) Iteration: t = 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 21 / 30

22 EM in One Dimension: Final Clusters µ 2 = µ 1 = (c) Iteration: t = 5 (converged) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 22 / 30

23 EM in d Dimensions Each cluster we have to reestiamte the d d covariance matrix: (σ1 i )2 σ12 i... σ1d i σ21 i (σ2 i Σ i = )2... σ2d i..... σd1 i σd2 i... (σd i )2 It requires O(d 2 ) parameters, which may be too many for reliable estimation. A simplification is to assume that all dimensions are independent, which leads to a diagonal covariance matrix: (σ i 1 ) (σ2 i Σ i = ) (σd i )2 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 23 / 30

24 EM in d Dimensions Expectation Step: Given µ i, Σ i, and P(C i ), the posterior probability is given as f i (x j ) P(C i ) w ij = P(C i x j ) = k a=1 f a(x j ) P(C a ) Maximization Step: Given the weights w ij, in the maximization step, we re-estimate Σ i, µ i and P(C i ). The mean µ i for cluster C i can be estimated as n j=1 µ i = w ij x j n j=1 w ij The covariance matrix Σ i is re-estimated via the outer-product form n j=1 Σ i = w ij(x j µ i )(x j µ i ) T n j=1 w ij The prior probability P(C i ) for each cluster is n j=1 P(C i ) = w ij n Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 24 / 30

25 Expectation-Maximization Clustering Algorithm EXPECTATION-MAXIMIZATION (D, k, ǫ): t 0 Randomly initialize µ t 1,...,µt k Σ t i I, i = 1,...,k repeat t t + 1 for i = 1,...,k and j = 1,...,n do w ij f(x j µ i,σ i ) P(C i ) ka=1 // posterior probability f(x j µ a,σ a) P(C a) P t (C i x j ) for i = 1,...,k do µ t i Σ t i n j=1 w ij x j n j=1 w ij n j=1 w ij(x j µ i )(x j µ i ) T n j=1 w ij matrix n j=1 w ij // re-estimate mean // re-estimate covariance P t (C i ) n // re-estimate priors until k µ t i µ t 1 2 ǫ i=1 i Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 25 / 30

26 EM Clustering in 2D Mixture of k = 3 Gaussians f(x) X 2 X 1 (a) Iteration: t = 0 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 26 / 30

27 EM Clustering in 2D Mixture of k = 3 Gaussians f(x) X 2 X 1 (b) Iteration: t = 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 27 / 30

28 EM Clustering in 2D Mixture of k = 3 Gaussians f(x) 1 (c) iteration: t = 36 X Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 28 / 30 4 X 1

29 Iris Principal Components Data: Full Covariance Matrix X (a) Full covariance matrix (t = 36) X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 29 / 30

30 Iris Principal Components Data: Diagonal Covariance Matrix X (b) Diagonal covariance matrix (t = 29) X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chap 13: Rep-based Clustering 30 / 30

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Data Mining for Social and Cyber-Physical Systems

Data Mining for Social and Cyber-Physical Systems Data Mining for Social and Cyber-Physical Systems Wagner Meira Jr. 1 1 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil August 2, 2017 Meira Jr. (UFMG) Data Mining

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 11: Unsupervised Machine Learning

Lecture 11: Unsupervised Machine Learning CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Probabilistic clustering

Probabilistic clustering Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

EM (cont.) November 26 th, Carlos Guestrin 1

EM (cont.) November 26 th, Carlos Guestrin 1 EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

The Gaussian distribution

The Gaussian distribution The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1 EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information