Solution. Daozheng Chen. Challenge 1

Similar documents
Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

CSE446: Clustering and EM Spring 2017

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Expectation Maximization

COM336: Neural Computing

Clustering, K-Means, EM Tutorial

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Statistical Pattern Recognition

Brief Introduction of Machine Learning Techniques for Content Analysis

Basic math for biology

Expectation-Maximization (EM) algorithm

Machine Learning for Signal Processing Bayes Classification and Regression

EM for Spherical Gaussians

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

1 Expectation Maximization

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Mixture Models and Expectation-Maximization

Latent Variable View of EM. Sargur Srihari

Data Preprocessing. Cluster Similarity

Lecture 3: Machine learning, classification, and generative models

Lecture 3: Pattern Classification

MIXTURE MODELS AND EM

p(d θ ) l(θ ) 1.2 x x x

Computing the MLE and the EM Algorithm

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

The Expectation-Maximization Algorithm

Mixtures of Gaussians. Sargur Srihari

Weighted Finite-State Transducers in Computational Biology

Expectation Propagation Algorithm

Latent Variable Models

Clustering with k-means and Gaussian mixture distributions

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Expectation Maximization

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Lecture 4: Probabilistic Learning

Mobile Robot Localization

CPSC 540: Machine Learning

The Expectation Maximization or EM algorithm

Lecture 6: Gaussian Mixture Models (GMM)

Statistical learning. Chapter 20, Sections 1 4 1

Unsupervised learning (part 1) Lecture 19

Cheng Soon Ong & Christian Walder. Canberra February June 2018

STA 4273H: Statistical Machine Learning

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Based on slides by Richard Zemel

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

Latent Variable Models and Expectation Maximization

Gaussian Mixture Models

Outline of Today s Lecture

Estimating Gaussian Mixture Densities with EM A Tutorial

STA 414/2104: Machine Learning

A minimalist s exposition of EM

Introduction to Machine Learning

Expectation Maximization

Mixture Models and EM

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Hidden Markov Models Part 1: Introduction

Probabilistic Graphical Models

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

K-Means and Gaussian Mixture Models

CS534 Machine Learning - Spring Final Exam

Clustering with k-means and Gaussian mixture distributions

Latent Variable Models and Expectation Maximization

Recent Advances in Bayesian Inference Techniques

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Mixture Models & EM algorithm Lecture 21

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Expectation Maximization Algorithm

Hidden Markov Models and Gaussian Mixture Models

Introduction to Machine Learning

Markov Chains and Hidden Markov Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

But if z is conditioned on, we need to model it:

Technical Details about the Expectation Maximization (EM) Algorithm

Linear Dynamical Systems

Lecture 6: April 19, 2002

STA 4273H: Statistical Machine Learning

Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems

p L yi z n m x N n xi

Unsupervised Learning

Lecture 6: Graphical Models: Learning

A Note on the Expectation-Maximization (EM) Algorithm

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

L11: Pattern recognition principles

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Advanced Introduction to Machine Learning

Toward Probabilistic Forecasts of Convective Storm Activity for En Route Air Traffic Management

Probabilistic Graphical Models

ECE 5984: Introduction to Machine Learning

Variational Inference (11/04/13)

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Lecture 2: GMM and EM

Data Analyzing and Daily Activity Learning with Hidden Markov Model

Machine Learning Summer School

Data Mining Techniques

Transcription:

Solution Daozheng Chen 1 For all the scatter plots and 2D histogram plots within this solution, the x axis is for the saturation component, and the y axis is the value component. Through out the solution, we will denote k-means clustering method as K-means, and Expectation-Maximization method as EM. The four leaf images in this project come from the real leaf images in the EFG project [1]. Challenge 1 Please refer to Chall 1.m for the MATLAB program. Figure 1 shows the segmentation results for the four images. Figure 2 displays scatter plots and 2D histograms for leaf1.jpg. (a) From the 2D histogram plot of all data points in figure 2, we can see this it has two clusters. One is at the upper left corner, whose data points are concentrated. The other is at the middle right part, whose data points seem to be less densely packed. (b) The shape of the boundary that separates two clusters is approximately a straight line segment. The reason is the following. Let p = (x, y) by a point on the boundary, c 1 = (c 11, c 12 ) be the center of the first cluster and c 2 = (c 21, c 22 ) be the center of the second cluster. Because the distance between p and c 1 and distance between p and c 2 are the same for points on boundary, and we are using euclidean distance, then we have Thus (x c 11 ) 2 + (y c 12 ) 2 = (x c 21 ) 2 + (y c 22 ) 2. 2(c 21 c 11 )x + 2(c 22 c 12 )y + c 2 11 + c 2 12 c 2 21 c 2 22 = 0. Since in general, we do not expect both (c 21 c 11 ) and (c 22 c 12 ) to be zero, so this equation defines a line in our 2D space. (c) On one hand, using scatter plots, we can see the range of data points clearly. For example, according to the middle left plot in figure 2, we clearly see that the range of saturation value for cluster 1 data is approximately [0, 0.4]. However, in the corresponding 2D histogram (the middle left plot), it seems that there is no data point having saturation value greater than 0.2. The reason is that the amount of data having saturation greater than 0.2 is very small compare with the amount of those that are smaller than 0.2. On the other, we can clearly tell density of the data points using 2D histograms. For example, from the top right plot in figure 2, it indicates that we have two clusters. The one on the upper left corner is more concentrated and there are much less data in between two clusters. However, from the corresponding scatter plot (the top left) plot in figure 2, it seems that only one cluster exists.

2 Figure 1. Original and segmentation images for leaf4.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg Challenge 2 The solution in this problem closely follows the derivation in [2].

3 Figure 2. Scatter and 2D histogram plots for leaf1.jpg using K-means (a) N N log(p(x, Y Θ)) = log( p(x i, y i Θ)) = log( p(x i y i, Θ)p(y i Θ)) = log w yi p(x i θ i ) (b) p(y i x i, Θ old ) = p(y i, x i Θ old ) p(x i Θ old ) = p(x i y i, Θ old )p(y i Θ old ) K j=1 p(y j, x i Θ old )

4 = = p(x i y i, Θ old )p(y i Θ old ) K j=1 p(x i y j, Θ old )p(y j Θ old ) wy old i p(x i θy old i ) K j=1 wold j p(x i θy old j ) (c) For each y i in a y, it is value can be 1, 2,..., K. Since y is vector of length N. We have in total K N different ys in Q. This number grows exponentially as N grows, so it is not very practical to directly use this formula to do the evaluation unless N is small. Challenge 3 To find w new 1 and w new 2 that maximize E(Θ, Θ old ) = 2 j=1 log(w j )p(j x i, Θ old ) + 2 j=1 log(p(x i θ j ))p(j x i, Θ old ), (1) we only need to optmize the terms having w 1 and w 2. Using the fact that w 1 +w 2 = 1, we know 2 j=1 log(w j )p(j x i, Θ old ) = log w 1 G 1 + log(1 w 2 )G 2, (2) where G 1 = p(1 x i, Θ old ) and G 2 = p(2 x i, Θ old ). Assuming 0 < w 1 < 1, let us take the derivative with respect to w 1 in ( 2), and set it to 0. We get G 1 w 1 G 2 1 w 1 = 0, Thus w 1 = G1 G 1+G 2. Then w 2 = 1 w 1 = G2 G 1+G 2. Since p(1 x i, Θ old ) + p(2 x i, Θ old ) = 1 for K = 2, we have G 1 + G 2 = N. This verifies the formula for weight in the assignment. For µ new 1, we only need to optimize terms having µ 1 in (1), which is log(p(x i θ 1 ))p(1 x i, Θ old ) (3) Since p(x i θ 1 ) is 1-dimensional Gaussian distribution, we have log(p(x i θ 1 ))p(1 x i, Θ old 1 ) = log( )p(1 x i, Θ old ) 2πσ1 + ( (x i µ 1 ) 2 2σ1 2 p(1 x i, Θ old ))

5 Taking the derivative with respect to µ 1 and set it to be 0, we get Then we have Thus ( (x i µ 1 ) σ1 2 p(1 x i, Θ old )) = 0 x i p(1 x i, Θ old ) µ 1 N p(1 x i, Θ old ) = 0 µ 1 = x ip(1 x i, Θ old ) p(1 x i, Θ old ) This verifies the formula for µ 1. For Σ new 1 = (σ1 new ) 2, since σ 1 0, we only need to optimize terms having σ 1 in (1), which is also (3). Taking the derivative with respect to σ 1 and set it to 0, we get, 1 2π p(1 x i, Θ old ) + 2 2πσ1 σ1 3 Times both sides by σ 1, we have ( (x i µ 1 ) 2 p(1 x i, Θ old )) = 0 2 Thus p(1 x i, Θ old ) 1 σ 2 1 ((x i µ 1 ) 2 p(1 x i, Θ old )) = 0 σ 2 1 = ((x i µ 1 ) 2 p(1 x i, Θ old )) p(1 x. i, Θ old ) This verifies the formula for Σ new 1. Similarly, we can verify the formulas for µ new 2 and Σ new 2. Challenge 4 Please refer to Chall 4.m for the MATLAB program. Table 1 shows the loglikelihood for each iteration, getting rid of the decimal part of the numbers. Figure 3 shows the segmentation results for the four images by EM. Figure 4 compares the segmentation by K-means and EM. Figure 5, 6, and 7 display scatter plots and 2D histograms for leaf1.jpg, leaf2.jpg, and leaf3.jpg respectively using both K-means and EM. The graphs on the left columns are results from K-means, and those on the right columns are from EM. (a) According to table 1, the log-likelihood increases in each iteration. This is consistent with assignment description.

6 (b) According to the scatter plots in Figure 5, the right part of cluster 1 by K-means is given to cluster 2 in EM. The 2D histograms for two clusters do not have much difference using both methods, and the segmentation result looks quite similar too. This implies that amount of data exchange between these two clusters is relatively small compared with the main part of the clusters. So the clusterings by K-means and EM are very similar. However, K-means fails to put some very small region inside the leaf body to white pixels, while EM does (Figure 4). Note that in this case, the two clusters are well separated. (c) First, for leaf2.jpg, similar to part (b), the right part of cluster 1 by K- means is given to cluster 2 in EM (Figure 6). However, different from part (b), the 2D histogram of EM also clearly shows that cluster 2 adds another top left region compared with the cluster 2 by K-means. This means that the amount of new data points to cluster 2 is significant compared to cluster 2 s original size. Note that the two clusters are not well separated in this case. One cluster is concentrated in one region. The other spreads widely, and it looks like a tail attaching to the dense cluster. K-means gives the dense cluster a small tail, while EM gets rid of the tail completely. The segmentation images (Figure 4) also look very different. K-means segmentation has a big leaf region missing, while EM algorithm segments out the whole leaf region and put some lower right corner of the image as leaf region. Second, for leaf3.jpg, we have a very similar situation as that in leaf2.jpg. The right portion of cluster 1 by K-means is put into cluster 2 by EM (Figure 7). EM s 2D histogram for cluster 2 also shows this adding, which is a light blue area (like a tail) on the left of the main region for cluster 2. Similarly, the two clusters are not well separated, and there is a set of sparsely distributed data connecting the two clusters. K-means gives some of this data to the upper left dense cluster, while EM gets rid of all this data. In the resulting segmentation (Figure 4), the stem of the leaf, which is missed by K-means, shows up by EM. (d) Comparing two segmentation images (Figure 4) for leaf4.jpg, K-means Table 1. Log-likelihood per iteration Iteration leaf1.jpg leaf2.jpg leaf3.jpg leaf4.jpg 1 488201 833469 510219 366208 2 489114 869357 518228 366815 3 492438 957410 552401 369101 4 504566 1059221 633994 378038 5 550631 1183383 821064 400478 6 723990 1343250 1258385 454810 7 947808 1380842 1296188 581763 8 1010842 1381540 1298169 733740 9 1016346-1298622 962717 10 1017713 - - 1035286 11 1018287 - - 1037507 12 - - - 1038317

7 Figure 3. Original and segmentation images for leaf1.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg using EM produces fewer white dots outside the leaf region. This happens for leaf2.jpg too. Furthermore, the stem produced by EM is too big compared with the original image. This also happens in leaf3.jpg. Although EM algorithm successfully makes the stem show up, but the segmented leaf is bigger compared with the leaf in the original image.

8 Figure 4. segmentation images for leaf1.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg using K-means (left) and EM (right) (e) Based on the discussion in part (b), (c), and (d). In general, K-means produces less white dots outside the leaf region, and the shape of leaf is more closely tight to the shape of actual leaf. However, it tends to miss some part of the leaf and makes the resulting segmentation incorrect. EM is more capable of producing the whole leaf and showing more details of the leaf, such as the stem in leaf4.jpg.

9 However, the segmented leaf region tends to be bigger than the actual leaf. For the four images, EM is in general better than K-means in term of segmentation quality. Challenge 5 (a) The idea of this problem is based a paper by Roberts [3]. Using ˆp, the three formulas for wj new, µ new j, and Σ new j in the assignment description becomes w new j = 1 N ˆp(j x i, Θ old ) µ new j = x i ˆp(j x i, Θ old ) ˆp(j x i, Θ old ) Σ new j = ˆp(j x i, Θ old )(x i µ new j )(x i µ new j ˆp(j x i, Θ old ) For all the summation in these formulas, it only sums over those elements whose ˆp function value is 1. If we say that p(s x i, Θ old ) is the highest at s = j means that we assign the data point to cluster C j, then C j will contains and only contains those elements whose ˆp is 1. Therefore, C j = ˆp(j x i, Θ old ), where C j is the size of C j. And we can write the summation over i = 1,..., N as the summation of over all elements in C j. Then these three formulas become Σ new j = w new j = 1 N C j (4) µ new j = x i C j x i C j x i C j (x i µ new j C j )(x i µ new j ) T This means that formula (4) becomes the fraction of data points which are in C j. Formula (5) becomes the mean of the data points in C j. Formula (6) computes the covariance matrix for the data points in C j. (b) We can say that within one iteration of EM algorithm, first, we assign each data point to a cluster whose posterior probability is the highest; ) T (5) (6) then for each cluster, we update its distribution parameter according to the data points assigned to it from the first step.

10 Figure 5. Scatter plots and 2D histograms for leaf1.jpg using K-means (left) and EM (right)

11 Figure 6. Scatter plots and 2D histograms for leaf2.jpg using K-means (left) and EM (right)

12 Figure 7. Scatter plos and 2D histograms for leaf3.jpg using K-means (left) and EM (right)

Bibliography [1] Peter N. Belhumeur, Daozheng Chen, Steven Feiner, David W. Jacobs, W. John Kress, Haibin Ling, Ida Lopez, Ravi Ramamoorthi, Sameer Sheorey, Sean White, and Ling Zhang. Searching the world s herbaria: A system for visual identification of plant species. In David A. Forsyth, Philip H. S. Torr, and Andrew Zisserman, editors, ECCV (4), volume 5305 of Lecture Notes in Computer Science, pages 116 129. Springer, 2008. [2] Jeff A. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, 1997. [3] Stephen J. Roberts. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30:261 272, 1997. 13