Solution. Daozheng Chen. Challenge 1

Solution Daozheng Chen 1 For all the scatter plots and 2D histogram plots within this solution, the x axis is for the saturation component, and the y axis is the value component. Through out the solution, we will denote k-means clustering method as K-means, and Expectation-Maximization method as EM. The four leaf images in this project come from the real leaf images in the EFG project [1]. Challenge 1 Please refer to Chall 1.m for the MATLAB program. Figure 1 shows the segmentation results for the four images. Figure 2 displays scatter plots and 2D histograms for leaf1.jpg. (a) From the 2D histogram plot of all data points in figure 2, we can see this it has two clusters. One is at the upper left corner, whose data points are concentrated. The other is at the middle right part, whose data points seem to be less densely packed. (b) The shape of the boundary that separates two clusters is approximately a straight line segment. The reason is the following. Let p = (x, y) by a point on the boundary, c 1 = (c 11, c 12 ) be the center of the first cluster and c 2 = (c 21, c 22 ) be the center of the second cluster. Because the distance between p and c 1 and distance between p and c 2 are the same for points on boundary, and we are using euclidean distance, then we have Thus (x c 11 ) 2 + (y c 12 ) 2 = (x c 21 ) 2 + (y c 22 ) 2. 2(c 21 c 11 )x + 2(c 22 c 12 )y + c 2 11 + c 2 12 c 2 21 c 2 22 = 0. Since in general, we do not expect both (c 21 c 11 ) and (c 22 c 12 ) to be zero, so this equation defines a line in our 2D space. (c) On one hand, using scatter plots, we can see the range of data points clearly. For example, according to the middle left plot in figure 2, we clearly see that the range of saturation value for cluster 1 data is approximately [0, 0.4]. However, in the corresponding 2D histogram (the middle left plot), it seems that there is no data point having saturation value greater than 0.2. The reason is that the amount of data having saturation greater than 0.2 is very small compare with the amount of those that are smaller than 0.2. On the other, we can clearly tell density of the data points using 2D histograms. For example, from the top right plot in figure 2, it indicates that we have two clusters. The one on the upper left corner is more concentrated and there are much less data in between two clusters. However, from the corresponding scatter plot (the top left) plot in figure 2, it seems that only one cluster exists.

2 Figure 1. Original and segmentation images for leaf4.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg Challenge 2 The solution in this problem closely follows the derivation in [2].

3 Figure 2. Scatter and 2D histogram plots for leaf1.jpg using K-means (a) N N log(p(x, Y Θ)) = log( p(x i, y i Θ)) = log( p(x i y i, Θ)p(y i Θ)) = log w yi p(x i θ i ) (b) p(y i x i, Θ old ) = p(y i, x i Θ old ) p(x i Θ old ) = p(x i y i, Θ old )p(y i Θ old ) K j=1 p(y j, x i Θ old )

4 = = p(x i y i, Θ old )p(y i Θ old ) K j=1 p(x i y j, Θ old )p(y j Θ old ) wy old i p(x i θy old i ) K j=1 wold j p(x i θy old j ) (c) For each y i in a y, it is value can be 1, 2,..., K. Since y is vector of length N. We have in total K N different ys in Q. This number grows exponentially as N grows, so it is not very practical to directly use this formula to do the evaluation unless N is small. Challenge 3 To find w new 1 and w new 2 that maximize E(Θ, Θ old ) = 2 j=1 log(w j )p(j x i, Θ old ) + 2 j=1 log(p(x i θ j ))p(j x i, Θ old ), (1) we only need to optmize the terms having w 1 and w 2. Using the fact that w 1 +w 2 = 1, we know 2 j=1 log(w j )p(j x i, Θ old ) = log w 1 G 1 + log(1 w 2 )G 2, (2) where G 1 = p(1 x i, Θ old ) and G 2 = p(2 x i, Θ old ). Assuming 0 < w 1 < 1, let us take the derivative with respect to w 1 in ( 2), and set it to 0. We get G 1 w 1 G 2 1 w 1 = 0, Thus w 1 = G1 G 1+G 2. Then w 2 = 1 w 1 = G2 G 1+G 2. Since p(1 x i, Θ old ) + p(2 x i, Θ old ) = 1 for K = 2, we have G 1 + G 2 = N. This verifies the formula for weight in the assignment. For µ new 1, we only need to optimize terms having µ 1 in (1), which is log(p(x i θ 1 ))p(1 x i, Θ old ) (3) Since p(x i θ 1 ) is 1-dimensional Gaussian distribution, we have log(p(x i θ 1 ))p(1 x i, Θ old 1 ) = log( )p(1 x i, Θ old ) 2πσ1 + ( (x i µ 1 ) 2 2σ1 2 p(1 x i, Θ old ))

5 Taking the derivative with respect to µ 1 and set it to be 0, we get Then we have Thus ( (x i µ 1 ) σ1 2 p(1 x i, Θ old )) = 0 x i p(1 x i, Θ old ) µ 1 N p(1 x i, Θ old ) = 0 µ 1 = x ip(1 x i, Θ old ) p(1 x i, Θ old ) This verifies the formula for µ 1. For Σ new 1 = (σ1 new ) 2, since σ 1 0, we only need to optimize terms having σ 1 in (1), which is also (3). Taking the derivative with respect to σ 1 and set it to 0, we get, 1 2π p(1 x i, Θ old ) + 2 2πσ1 σ1 3 Times both sides by σ 1, we have ( (x i µ 1 ) 2 p(1 x i, Θ old )) = 0 2 Thus p(1 x i, Θ old ) 1 σ 2 1 ((x i µ 1 ) 2 p(1 x i, Θ old )) = 0 σ 2 1 = ((x i µ 1 ) 2 p(1 x i, Θ old )) p(1 x. i, Θ old ) This verifies the formula for Σ new 1. Similarly, we can verify the formulas for µ new 2 and Σ new 2. Challenge 4 Please refer to Chall 4.m for the MATLAB program. Table 1 shows the loglikelihood for each iteration, getting rid of the decimal part of the numbers. Figure 3 shows the segmentation results for the four images by EM. Figure 4 compares the segmentation by K-means and EM. Figure 5, 6, and 7 display scatter plots and 2D histograms for leaf1.jpg, leaf2.jpg, and leaf3.jpg respectively using both K-means and EM. The graphs on the left columns are results from K-means, and those on the right columns are from EM. (a) According to table 1, the log-likelihood increases in each iteration. This is consistent with assignment description.

6 (b) According to the scatter plots in Figure 5, the right part of cluster 1 by K-means is given to cluster 2 in EM. The 2D histograms for two clusters do not have much difference using both methods, and the segmentation result looks quite similar too. This implies that amount of data exchange between these two clusters is relatively small compared with the main part of the clusters. So the clusterings by K-means and EM are very similar. However, K-means fails to put some very small region inside the leaf body to white pixels, while EM does (Figure 4). Note that in this case, the two clusters are well separated. (c) First, for leaf2.jpg, similar to part (b), the right part of cluster 1 by K- means is given to cluster 2 in EM (Figure 6). However, different from part (b), the 2D histogram of EM also clearly shows that cluster 2 adds another top left region compared with the cluster 2 by K-means. This means that the amount of new data points to cluster 2 is significant compared to cluster 2 s original size. Note that the two clusters are not well separated in this case. One cluster is concentrated in one region. The other spreads widely, and it looks like a tail attaching to the dense cluster. K-means gives the dense cluster a small tail, while EM gets rid of the tail completely. The segmentation images (Figure 4) also look very different. K-means segmentation has a big leaf region missing, while EM algorithm segments out the whole leaf region and put some lower right corner of the image as leaf region. Second, for leaf3.jpg, we have a very similar situation as that in leaf2.jpg. The right portion of cluster 1 by K-means is put into cluster 2 by EM (Figure 7). EM s 2D histogram for cluster 2 also shows this adding, which is a light blue area (like a tail) on the left of the main region for cluster 2. Similarly, the two clusters are not well separated, and there is a set of sparsely distributed data connecting the two clusters. K-means gives some of this data to the upper left dense cluster, while EM gets rid of all this data. In the resulting segmentation (Figure 4), the stem of the leaf, which is missed by K-means, shows up by EM. (d) Comparing two segmentation images (Figure 4) for leaf4.jpg, K-means Table 1. Log-likelihood per iteration Iteration leaf1.jpg leaf2.jpg leaf3.jpg leaf4.jpg 1 488201 833469 510219 366208 2 489114 869357 518228 366815 3 492438 957410 552401 369101 4 504566 1059221 633994 378038 5 550631 1183383 821064 400478 6 723990 1343250 1258385 454810 7 947808 1380842 1296188 581763 8 1010842 1381540 1298169 733740 9 1016346-1298622 962717 10 1017713 - - 1035286 11 1018287 - - 1037507 12 - - - 1038317

7 Figure 3. Original and segmentation images for leaf1.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg using EM produces fewer white dots outside the leaf region. This happens for leaf2.jpg too. Furthermore, the stem produced by EM is too big compared with the original image. This also happens in leaf3.jpg. Although EM algorithm successfully makes the stem show up, but the segmented leaf is bigger compared with the leaf in the original image.

8 Figure 4. segmentation images for leaf1.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg using K-means (left) and EM (right) (e) Based on the discussion in part (b), (c), and (d). In general, K-means produces less white dots outside the leaf region, and the shape of leaf is more closely tight to the shape of actual leaf. However, it tends to miss some part of the leaf and makes the resulting segmentation incorrect. EM is more capable of producing the whole leaf and showing more details of the leaf, such as the stem in leaf4.jpg.

9 However, the segmented leaf region tends to be bigger than the actual leaf. For the four images, EM is in general better than K-means in term of segmentation quality. Challenge 5 (a) The idea of this problem is based a paper by Roberts [3]. Using ˆp, the three formulas for wj new, µ new j, and Σ new j in the assignment description becomes w new j = 1 N ˆp(j x i, Θ old ) µ new j = x i ˆp(j x i, Θ old ) ˆp(j x i, Θ old ) Σ new j = ˆp(j x i, Θ old )(x i µ new j )(x i µ new j ˆp(j x i, Θ old ) For all the summation in these formulas, it only sums over those elements whose ˆp function value is 1. If we say that p(s x i, Θ old ) is the highest at s = j means that we assign the data point to cluster C j, then C j will contains and only contains those elements whose ˆp is 1. Therefore, C j = ˆp(j x i, Θ old ), where C j is the size of C j. And we can write the summation over i = 1,..., N as the summation of over all elements in C j. Then these three formulas become Σ new j = w new j = 1 N C j (4) µ new j = x i C j x i C j x i C j (x i µ new j C j )(x i µ new j ) T This means that formula (4) becomes the fraction of data points which are in C j. Formula (5) becomes the mean of the data points in C j. Formula (6) computes the covariance matrix for the data points in C j. (b) We can say that within one iteration of EM algorithm, first, we assign each data point to a cluster whose posterior probability is the highest; ) T (5) (6) then for each cluster, we update its distribution parameter according to the data points assigned to it from the first step.

10 Figure 5. Scatter plots and 2D histograms for leaf1.jpg using K-means (left) and EM (right)

11 Figure 6. Scatter plots and 2D histograms for leaf2.jpg using K-means (left) and EM (right)

12 Figure 7. Scatter plos and 2D histograms for leaf3.jpg using K-means (left) and EM (right)

Bibliography [1] Peter N. Belhumeur, Daozheng Chen, Steven Feiner, David W. Jacobs, W. John Kress, Haibin Ling, Ida Lopez, Ravi Ramamoorthi, Sameer Sheorey, Sean White, and Ling Zhang. Searching the world s herbaria: A system for visual identification of plant species. In David A. Forsyth, Philip H. S. Torr, and Andrew Zisserman, editors, ECCV (4), volume 5305 of Lecture Notes in Computer Science, pages 116 129. Springer, 2008. [2] Jeff A. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, 1997. [3] Stephen J. Roberts. Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition, 30:261 272, 1997. 13