Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous densty HMMs. Here, we wll dscuss them further and learn how to estmate ther parameters, gven data. The man dea behnd a mxture model s contaned n the name,.e. t s a mxture of dfferent models. What do we mean by a mxture? A mxture model s composed of K components, each component beng responsble for a porton of the data. The responsbltes of these components are represented by mxture weghts w, for =,, k. As you may have guessed, these weghts are nonnegatve and sum to. Thus component j s responsble for 00 w j percent of the data generated by the model. Each component s tself a probablty dstrbuton. In a GMM, each component s specfcally a Gaussan (multvarate normal) dstrbuton. Thus we addtonally have parameters µ, Σ for =,, K,.e. a mean and covarance for each component n the GMM. It s mportant here to keep n mnd that a GMM does not arse from addng weghted multvarate normal random varables, but rather from weghtng the responsblty of each multvarate normal random varable. In the frst case, we would smply have a dfferent multvarate normal dstrbuton, whereas n the second case we have a mxture. Refer to Fgure?? for a vsualzaton of ths. Thus, a fully defned GMM has parameters λ = (w, µ, Σ). The densty of a GMM s gven by P(x λ) = = w N (x; µ, Σ ) where N (x; µ, Σ ) = (2π) K 2 Σ 2 e 2 (x µ)t Σ (x µ ) Problem. Wrte a functon to evaluate the densty of a normal dstrbuton at a pont x, gven parameters µ and Σ. Include the opton to return the log of ths probablty, but be sure to do t ntellgently! Also wrte a functon
2 Lab. Gaussan Mxture Models (a) Sum of weghted multvarate normal random varables. (b) Weghted mxture of multvarate normal random varables. that computes the densty of a GMM at a pont x, gven the parameters λ, along wth the log opton. Throughout ths lab, we wll buld a GMM class wth varous methods. We wll outlne ths now. Problem 2. Wrte the skeleton of a GMM class. In the nt method, t should accept the non-null parameter n components, as well as parameters for the weghts, means, and covarance matrces whch defne the GMM. Include a functon to generate data from a fully defned GMM (you may use your code from the CDHMM lab for ths), as well as the densty functon you recently defned. The man focus of ths lab wll be to estmate the parameters of a GMM, gven observed multvarate data Y = y, y 2,, y T. Ths can be done va Gbbs samplng, as well as wth EM (Expectaton Maxmzaton). We choose the latter approach for ths lab. To do ths, we must compute the probablty of an observaton beng from each component of a GMM wth parameters λ (n) = ( w (n), µ (n), Σ (n)). Ths s smply P(x t = y t, λ) w (n) N (y t ; µ (n), Σ (n) ) Just as wth HMMs, we refer to these probabltes as γ t (), and ths s the E-step n the algorthm. Ths mght seem straghtforward, except ths drect computaton wll lkely lead to numercal ssues. Instead, we work n the log space, whch means we have to be a bt more careful. It s feasble (and occurs qute often) that each term w (n) N (y t ; µ (n), Σ (n) ) s 0, because of underflow n the computaton of the multvarate normal densty. Lettng l (n) = ln w (n) + ln N (y t ; µ (n), Σ (n) ), we can compute these probabltes
3 more carefully, as follows: P(x t = y t, λ) = = = e l j= elj e l e max k l k j= elj e max k l k e l max k l k j= elj max k l k whch wll effectvely avod underflow problems. Problem 3. Add a method to your class to compute γ t () for t =,, T and =,, K. Don t forget to do ths ntellgently to avod underflow! Gven our matrx γ, we can reestmate our weghts, means, and covarance matrces as follows: w (n+) = µ (n+) = Σ (n+) = T γ t () t= t= γ t()y t t= γ t() t= γ t()(y t µ (n+) )(y t µ (n+) t= γ t() ) T for =,, K. These updates are the M -step n the algorthm. Problem 4. Add methods to your class to update w, µ and Σ as descrbed above. Wth the above work, we are almost ready to complete our class. To tran, we wll randomly ntalze our parameters λ, and then teratvely update them as above. Problem 5. Add a method to ntalze λ. Do ths ntellgently,.e. your means should not be far from your actual data used for tranng, and your covarances should nether be too bg nor too small. Your weghts should roughly be equal, and stll sum to. Also add a method to tran your model, as descrbed prevously, teratng untl convergence wthn some tolerance.
4 Lab. Gaussan Mxture Models We wll use our work to tran the Mckey Mouse GMM, whch has parameters w = [ 0.7 0.5 0.5 ] µ = [ 0.0 0.0 ] µ 2 = [.5 2.0 ] µ 3 = [.5 2.0 ] Σ = I 3 Σ 2 = 0.25 I 3 Σ 3 = 0.25 I 3 To look at ths GMM, we wll evaluate the densty at each pont on a grd, as follows: >>> mport matplotlb.pyplot as plt >>> x = np.arange(-3, 3, 0.) >>> y = np.arange(-2, 3, 0.) >>> X, Y = np.meshgrd(x, y) >>> N, M = X.shape >>> mmat = np.array([[model.dgmm(np.array([x[,j],y[,j]])) for j n xrange(m)] for n xrange(n)]) >>> plt.mshow(mmat, orgn='lower') >>> plt.show() See Fgure.2 for ths plot. Problem 6. Generate 750 samples from the above mxture model. Usng just the drawn samples, retran your model. Evaluate and plot your densty on the grd used above. How smlar s your densty to the orgnal? How close s our traned model to the orgnal one? We can use the symmetrc Kullback-Lebler dvergence to measure the dstance between two probablty dstrbutons wth denstes p(x) and p (x): SKL(p, p ) = 2 p(x) ln p(x) p (x) dx + 2 p (x) ln p (x) p(x) dx We cannot analytcally compute ths, so we use a Monte Carlo approxmaton, whch uses the fact that N f(x ) f(x)p(x)dx N = as N, assumng that each x p. Then we have the followng approxmaton of the symmetrc KL dvergence: SKL(p, p ) N ln p(x ) N 2N p (x ) + ln p (x ) p(x ) where x p and x p, for large N. = =
5 Fgure.2: Densty of true Mckey Mouse GMM. Problem 7. Wrte a functon to compute the approxmate the SKL of two GMMs. Compute the SKL between a randomly ntalzed GMM and the known GMM. Compute the SKL between the traned GMM and the known GMM. Is our traned model a good ft?