ACONSENSUS-BASEDDECENTRALIZEDEMFORAMIXTUREOFFACTORANALYZERS. Gene T. Whipps Emre Ertin Randolph L. Moses

Size: px
Start display at page:

Download "ACONSENSUS-BASEDDECENTRALIZEDEMFORAMIXTUREOFFACTORANALYZERS. Gene T. Whipps Emre Ertin Randolph L. Moses"

Transcription

1 214 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP , 214, REIMS, FRANCE ACONSENSUS-BASEDDECENRALIZEDEMFORAMIXUREOFFACORANALYZERS Gene. Whipps Emre Ertin Randolph L. Moses he Ohio State University, ECE Department, Columbus, OH 432 ABSRAC We consider the problem of decentralized learning of a target appearance manifold using a networ of sensors. Sensor nodes observe an obect from different aspects and then, in an unsupervised and distributed manner, learn a oint statistical model for the data manifold. We employ a mixture of factor analyzers (MFA) model, approximating a potentially nonlinear manifold. We derive a consensus-based decentralized expectation maximization (EM) algorithm for learning the parameters of the mixture densities and mixing probabilities. A simulation example demonstrates the efficacy of the algorithm. Index erms decentralized learning, Gaussian mixture, mixture of factor analyzers, consensus 1. INRODUCION Aspatiallydistributedsensornetworcanbeusedtoconstruct a rich appearance model for targets in their common field-of-view. hese models can then be used to identify previously seen obects if they reappear in the networ at a later time. As an example, consider a networ of cameras capturing images of an obect from different but possibly overlapping aspects as the obect traverses through the networ s field of view. he ensemble of images captured by the networ forms a low-dimensional nonlinear manifold in the high-dimensional ambient space of images. One approach to appearance modeling would be to construct independent models of a local data manifold at each sensor and share it across the networ. However, such an ensemble of models suffers from discretization of the aspect space and poor parameter estimates as the number of unnown parameters necessarily scale linearly with the number of sensor nodes. Alternatively, the sensor nodes can collaborate to construct aointmodelfortheimageensemble. heparameterestimates of the oint model will improve with the number of sensor nodes, since the number of unnown parameters in the model is intrinsic to the obect and fixed, whereas the measurements scale linearly with the number of sensor nodes. he straightforward method of pooling images to a central location for oint model construction will require large and liely impractical networ bandwidth. Inthispaper,we he research reported here was partially supported by the U.S. Army Research Laboratory and by a grant from ARO, W911NF develop a decentralized learning method with a potentially reduced data bandwidth need, and which results in a global appearance manifold model shared by all sensor nodes. Previously [1] developed methods for in-networ learning of aspect-independent target signatures. In this wor we focus on learning appearance models with aspect dependence. We model the overall statistics as a mixture of factor analyzers (MFA) and derive a decentralized EM algorithm for learning model parameters. he MFA model is a probabilistic and generative one, and can be used for dimensionality reduction, manifold learning, and signal recovery from compressed sensing [2]. In the case of learning a data manifold, the MFA model is a linearization of a (potentially) nonlinear structure. he EM algorithm for amixtureoffactoranalyzers was derived in [3] in a centralized setting. here, the goal is to provide an algorithm for clustering and dimensionality reduction of high-dimensional data with a low-ran covariance model, also consistent with the structure of low-dimensional data manifolds. In this paper we extend the EM algorithm for the MFA model to the case of a spatially-distributed sensor networ with goals of distributing computations across the networ and being robust to individual node failures (e.g., losing connectivity to a central node in centralized or distributed systems). Previous wor [4, 5, 6, 7] employed MFA models for high-dimensional data with low-dimensional structure. he authors in [4] modify the standard EM for general mixture models by splittingandmergingmixturecomponentstoavoid local maxima in mixture models when the data are dominated by some components in parts of the data space. he split and merge technique is essentially applied as post-processing after standard EM converges to a stationary point. In [5], the authors propose an algorithm that ointly learns model parameters and model order (i.e.,thenumberofmixtures)formore general mixture models. he EM-lie algorithms proposed in [6] and [7] for the MFA model address the issue of global alignment of the linear mappings to the underlying manifold using a modified EM cost function. he algorithm in [6] requires a fixed-point algorithm in the M-step, addressed in [7] in closed-form. It was shown in [8], that maximizing the cost function in [6] and [7] is equivalent to that in EM. his paper differs from the results in [4, 5, 6, 7] in two important aspects. First, we provide a decentralized algo /14/$31. c 214 IEEE

2 rithm for fitting MFA models to manifold-structured high dimensional data; the algorithm is ideally suited for distributed sensing where sensor data ensemble is not present at any central node. Second, we consider a more general MFA model suitable for modeling data observed by heterogeneous sensor nodes differing in their aspect angle with respect to the obect. Specifically, we assume observations are drawn from the mixture density with mixture probabilities which can vary across the different sensor nodes. In other words, one sensor node may observe a mixture component with more (or less) probability than other nodes. In this context, the wor in [4, 5, 6, 7] considers the homogenous sensing case, where all the sensor nodes observe different realizations of a single mixture model with identical mixture probabilities. Our decentralized algorithm for MFA modeling is a consensus-based decentralized implementation of EM formulated for the general MFA model we consider. One of the earliest decentralized EM algorithms for a high-dimensional Gaussian mixture model was developed in [9] assuming the existence of a reliable networ with a ring topology. he authors in [, 11] later developed gossip-based decentralized EM algorithms for a Gaussian mixture model suitable for mesh networs. he algorithm proposed in [11] applies consensus to average local sufficient statistics across the networ. he authors in [12] applied alternating direction method of multipliers (ADMM) with a consensus constraint to the maximization step of EM for more general mixture densities with log-concave conditionals. he goal in each of these references is to learn, in a distributed way, highdimensional parametric models for data that are typically well-separated clusters; in contrast, our wor incorporates a low-dimensional structure which is ey to accurately modeling high-dimensional data observed by a networ of sensors and whose relevant characteristics lie on a common lowdimensional manifold structure. he remainder of this paper is outlined as follows. In Section 2, we outline the (approximated) observation model (i.e., amixtureoffactoranalyzers).insection3,wederivetheupdate equations of the consensus-based decentralized EM algorithm for a mixture of factor analyzers. In Section 5, a numerical example demonstrating the efficacy of the algorithms is provided. Finally, conclusions are given in Section MIXURE OF FACOR ANALYZERS We consider a sensor networ of M sensors. Sensor m {1, 2,...,M} collects N m observations. he i th measurement at the m th node is denoted by x m,i R p.hemeasurements are assumed to be i.i.d. observations from a nonlinear r-dimensional manifold, with r p. We approximate the system by linear mappings embedded in the higher dimensional space according to x =Λ y + µ + w, where y R r and w R p for {1, 2,...,J},and y N(, I r ) and w N (, Ψ ), where I r is the (r r) identity matrix. he number of mixture components J is assumed nown. he vector x is a random vector conditioned on nowing, whilex m,i is the i th observation at the m th sensor node. Given the mixture component to which the sensor observation belongs, then x m,i is a realization of the random vector x.itisstraightforwardtoshow x N ( ) µ, Σ with Σ =Ψ +Λ Λ. he covariance Ψ = diag([ψ 1,,ψ 2,,...,ψ p, ]) is diagonal and positive definite. he local coordinate y is not directly observed and can be seen as spherical perturbations along piecewise linear segments of the manifold that are mapped to the observation space. he term w accounts for observation noise. he columns of Λ span the th lower dimensional space. Sensor nodes observe the environment composed of the same component Gaussian densities, but with potentially different proportions of each component. In the observation space, the measurements follow amixturemodelaccording to J x m,i α m, N ( ) µ, Σ, =1 for m {1, 2,...,M}, i {1, 2,...,N m }.heunnown parameters are µ, Ψ, Λ,andcomponentprobabilities α m, for each mixture component and sensor node m. hismodel extends the mixture of factor analyzers model [3] to a networ of sensors where sensor nodes observe the factors (mixture components) with potentially varying proportions across the networ (i.e., α m, α, for m ). 3. EXPECAION-MAXIMIZAION ALGORIHM FOR A MIXURE OF FACOR ANALYZERS he complete data is {x m,i, y m,i,z m,i } m,i,wherey m,i and z m,i are missing data that indicate the local space and the mixture component, respectively, from which the observation x m,i was drawn. Define ψ =[ψ 1,,ψ 2,,...,ψ p, ], λ = vect(λ ),andα m =[α m,1,α m,2,...,α m,j ].henotation vect(a) represents a column vector of staced columns from matrix A. Similartotherelationshipbetweenx m, and x,themissingdatafromthelocalspacey m,i is a realization from the random vector y given that the data belongs to the th mixture component. We define the unnown parameter vector as θ =[µ 1,...,µ J, ψ 1,...,ψ J, λ 1,...,λ J, α 1,...,α M ]. Let θ t denote an iterate of the parameter vector θ. Letx = {x m,i } m,i, y = {y m,i } m,i,andz = {z m,i } m,i represent the collection of all samples from all nodes for the measured and missing data. he EM cost function is then given by Q ( θ, θ t) = E [ log p (x, y, z θ) x, θ t] M N m J = Pr ( z m,i = x m,i, θ t) m=1 =1 ( log αm, N ( ) [ ( x m,i µ, Σ + E log p ym,i x m,i, θ ) x m,i, θ t]).

3 In the E-step, updates of the posterior distributions are found to be Pr ( z m,i = x m,i, θ t) α t m, = N ( x m,i µ t, ) Σt J =1 αt m, N (x m,i µ t, Σt ) wm,i, t+1 (, p ym,i x m,i, θ t) = p ( y m,i z m,i =, x m,i, θ t) We define the matrix = N ( y m,i κ t+1 m,i,,ct+1 ). U = C Λ Ψ 1. (1) he mean and covariance of the conditional posterior of y m,i for z m,i = are given by κ m,i, = U (x m,i µ ), (2) C =(I r +Λ Ψ 1 Λ ) 1. (3) he iterates C t+1, U t+1,andκ t+1 m,i, are found according to equations (1)-(3) given θ t. he probabilities w t+1 m,i, require the inverse and determinant of a (p p) covariance estimate Σ t. Instead of O(p3 ) computations, the matrix inversion lemma permits the inverse with O(r 3 ) computations due to the structure imposed by the MFA model. As noted in [3], the inverse of Σ can be found by Σ 1 =Ψ 1 Ψ 1 Λ C Λ Ψ 1. Asimilarreductionincomputationscanbeachievedinthe computing of the determinant. he determinant of Σ reduces to [7] p Σ = C 1 ψ,. =1 Given the E-step updates of the posteriors, the unnown global parameters are then found in the M-step according to θ t+1 =argmaxq ( θ, θ t). θ he maximizer of Q ( θ, θ t) for the MFA model has a closedform solution. We denote the centered observations by x t m,i, = x m,i µ t. Similar to [9, 11], but for a MFA, we define the following local statistics for sensor node m for mixture component : Nm a t+1 m, = w t+1 m,i,, (4) and b t+1 m, = Nm V t+1 m, s t+1 = Nm m, = diag ( Nm w t+1 m,i, x m,i, (5) wm,i, t+1 xt t+1 m,i, (U x t m,i, ), (6) w t+1 m,i, xt m,i,( x t m,i,) ), (7) where diag(a) is a column vector of the diagonal elements of square matrix A. NotethatinthestandardEMimplementa- tion, the centered observations are relative to µ t+1.instead, we reference the observations to µ t to simplify the decentralization of EM. Otherwise, the global parameter µ t+1 would need to be nown at all nodes before computing local statistics. It was shown in [13] that the modified EM (i.e., referencing to µ t instead of µt+1 )isequivalenttostandardemin its convergence properties. he mixture probabilities are updated locally according to α t m, = 1 a t N m,. (8) m he global parameters are updated according to µ t = m=1 bt m, m=1 at m, Λ t = m=1 V t m, m=1 at m,, (9) ( C t + U t ( m=1 V t m, m=1 at m, ) 1, () ψ t = m=1 st m, diag Λ t m=1 (V m, t m=1 at ) M. (11) m, m=1 at m, he update equations (9)-(11) represent the centralized or standard M-step updates for the MFA model. We outline a decentralized approach of the global updates in the sequel. 4. CONSENSUS-BASED DECENRALIZED EM FOR A MIXURE OF FACOR ANALYZERS In a decentralized algorithm, each sensor node computes and communicates local statistics to nearby sensor nodes, instead of a central node. hen, each node estimates global parameters from updates of local statistics from its neighbors. he local statistics can updated by various gossip-type strategies. See [14, 15, 16, 17], among many others, for wors on general distributed averaging. Our focus is on a decentralized algorithm for the MFA model. In the standard EM approach for a MFA model, the local statistics from all sensor nodes are summed and combined to estimate the global parameters. he summation across nodes (m =1, 2,...,M)inequations(9)-(11)canbereplacedby sample averages since the constant 1/M is then a common factor to each numerator and denominator in the update equations (9)-(11). hus, similar to [11], we employ consensusbased averaging [14] for the decentralized algorithm to estimate the sample average of local statistics across the sensor networ. In consensus-based averaging, nodes compute a weighted average of their information with information from neighboring nodes. Here, prior to the M-step, each sensor node first calculates its local statistics using equations (4)-(7), and then the sensor networ establishes consensus on the average of the local statistics. he total number of real values shared under consensus with neighbor nodes for the MFA model is n MFA = Jp(r +2)+J,whereasthetotalforastandardGMM is n GMM = Jp(p/2+3/2) + J. hus,areductioninthecom- )

4 munications load (and dimensionality reduction) are achieved when r<(p 1)/2. Finally,eachnodeupdatesits(local) iterates of the global parameters using the consensus-based averages of the local statistics. Each node computes their own local estimates of the global parameters. We relabel theglobalparameterstodenote their relation to a given node : µ t,, Λt,,andΨt,. In the E-step at each node, the parameters of the posterior distributions are computed from the locally-estimated global parameters of the previous M-step and are similarly relabeled: w,i, t, Ct,,andU, t.hemixtureprobabilitiesαt, are still updated according to (8) since they are relevant to node only. Abstractly, sample averages of the form 1 M m=1 y m in the updates of the global parameters (9)-(11) are replaced by consensus averages denoted by ȳ at node. Forexample, the iterate µ t at node becomes µt, = b t, /āt, in the decentralized approach. We assume a connected, undirected communications networ. We assume a consensus matrix, based on the Metropolis-Hastings algorithm [16], given by 1 max{d, if N i,d } i [W ] i, = 1 1 N max{d,d }, if = i (12), else, where N m is the set of nodes in the (networ) neighborhood of sensor node m, excludingitself,andd m = N m is the degree of node m. the Metropolis-Hastings algorithm is a good fit for decentralized computations. he consensus matrix based on the Metropolis-Hastings algorithm requires local neighborhood information only (i.e., the degrees of neighbor nodes). Additionally, the Metropolis-Hastings algorithm and its convergence properties are well-studied for sampling from distributions [18], and its performance is comparable to other heuristic-based algorithms [16]. We can write an equivalent, compact expression of the consensus method across the networ. Define the vector of local statistics at node from equations (4)-(7) as β t =[b t,1,...,b t,j, v t,1,...,v t,j, s t,1,...,s t,j,a t,1,...,at,j ], where v t, = vec(v, t ).heconsensusupdatesofthesample average of the local statistics across all nodes can be written as ] [ βt t 1, β 2,..., β t M = [ βt 1 1, β ] t 1 2,..., β t 1 M W, where t indicates the consensus iteration and is initialized as β = β t from iterate t of EM. he iterates of consensus using the Metropolis-Hastings method are guaranteed to converge to the sample average provided the networ is connected and not bipartite [17]. hus, β 1 m t M m=1 βt m as t,foreach =1, 2,...,M. Each node requires a stopping criterion to exit the consensus iterations. he stopping criterion is usually based on the global optimization measure, which in this case is the oint data log-lielihood. he log-lielihood can be written in terms of a sum of local contributions, l(θ) = m=1 l m(θ), where l m (θ) = N m log J α m, N ( ) x m,i µ, Σ =1 (13) As a stopping criteria for the consensus iterations, we define for each node =1, 2,..., M the following consensus update error e (t )= l t l t 1, (14) t where l is an estimate after t consensus iterations of the average data log-lielihood, M 1 M m=1 l m(θ),withl (θ) evaluated at θ = θ t at node. Forsufficientlylarget,theper-step error behaves according to log e (t ) t log ρ(w ) + log c, (15) for some finite constant c and where ρ(w ) is the spectral radius of W = W 1 M 11 [17]. Since ρ(w ) < 1 for aconnectedandnon-bipartitenetwor,thelocalconsensus error measure e (t) converges to as t for all nodes =1, 2,...,M. 5. NUMERICAL EXAMPLES In this section, we provide an example using synthetic data to demonstrate the effectiveness ofthedecentralizedalgorithm to approximate a nonlinear, low-dimensional data manifold. As numerical examples, both [4] and [5] apply their techniques to a MFA model approximating a shrining spiral. We use the shrining spiral example for comparison with these centralized approaches with published results. For this numerical example, the number of sensor nodes is M =9and the number of mixture components is J =12. he number of data points per node is N m = N =for m =1, 2,...,M and is generated according to x = [(13.5t)cost, (.5 13) sin t, t] + w, where t [, 4π] and w N(, I 3 ).hedimensionofthe observations is p =3while the lower intrinsic dimension is r = 1. Each sensor node observes equal-length segments of the data manifold with 5% overlap of adacent segments of other nodes, not necessarily those of networ neighbors. Sensor nodes should observe at least one mixture component seen by at least one other sensor node. Figure 1 shows the communications networ structure. Each sensor node is a vertex on the graph, and each edge between vertices represents an undirected communications lin. For the Metropolis-Hastings-based consensus matrix W and the networ graph in Fig. 1, the resulting convergence rate is ρ(w 1 M 11 )= he decentralized EM algorithm is initialized with estimates from a consensus-based -means clustering (see for example [12]). While the obective functions between -means and EM are different, the resulting update equations for the mixture means (i.e., equations (4), (5), (9), and the corresponding consensus updates of (4) and (5)) are identical, with the exception that the posterior probabilities w t+1 m,i, are hard assignments in -means instead of soft ones. In our experi-

5 ence, the consensus-based -means converges much faster in both consensus and clustering due to the hard assignments. Since -means initialization can be seen as a simplification and a special case of the EM algorithm [19, Chapter 9.3], we will not discuss it further here. Figures 2(a) and 2(b) plot the noiseless decaying spiral (solid blue curve), sample observation points from all M nodes (blac scatter points), and 2-σ line segments (red segments) fromestimatesoftheglobalparameters. Notethata 2-σ segment from each node is plotted for all J components (i.e., 8 total line segments). In Fig. 2(a) and Fig. 2(b), the number of consensus steps are 5 and 2, respectively, per EM step in estimating the global parameters. he sample root-mean-squared-error is defined by s-rmse(t) = M m=1 Nm J wm,i, t =1 pmnm (Ip P (θ t m))(x m,i µ t m, ) 2, 2 where P (θ) =Λ (Λ Λ ) 1 Λ and, given θ = θt m proects the point (x m,i µ t m, ) onto the th subspace estimated by node m. After 4 EM iterations, the s-rmse is.8328 and.8326 for 5 and 2 consensus steps per EM step, respectively. Surprisingly, the average data-log-lielihood with ust 5 consensus steps ( ) is slightly greater than that with 2 consensus steps ( ). However, Fig. 2(a) and Fig. 2(b) show that the local estimates of the global parameters are in better agreement across the networ with more consensus steps. he agreement is made clearer in Fig. 3. Figure 3 plots the local update error of the data-loglielihood, given in equation (14), versus consensus iterations at the t =5iterate of EM steps. he solid lines are the update errors measured at each of the M =9nodes, and the dashed line represents the predicted convergence rate of ρ(w ) =.8345 for the simulated networ. his demonstrates that after a small number of steps, the consensus updates begin to behave as expected linear (in the exponent) convergence. Additionally, the sensor nodes have about an order of magnitude better agreement after 2 consensus steps compared with 5 steps. More consensus iterations improve agreement, but at the expense of additional communications. Figures 4 and 5 show the local update error of the global parameters, given by θ t+1 θ t for node = 1, 2,...,M,andtheaveragedata-log-lielihood, l t+1,versus EM iteration number. he error and average log-lielihood curves are plotted for each M =9sensor nodes, and each are generated after 2 consensus steps per EM iteration. Both figures show tight consensus between the nodes. As seen in Fig. 5, despite the intermediate consensus steps, the data-loglielihood appears non-decreasing as expected in standard EM algorithms. For comparison, error and log-lielihood curves are plotted in Figs. 4 and 5 for a centralized EM (i.e., assuming the E-step and M-step updates of Section 3 are available at a central node). After 4 iterations, the centralized EM has an s-rmse and average log-lielihood of.8374 and , respectively. v 5 v 4 v 6 v 3 v 2 v 7 v 8 Fig. 1. Simulatedsensornetworcommunicationlins.Consensus with Metropolis-Hastings has a convergence rate of ρ(w )= (a) 5 consensus iterations per EM iteration 5 v 1 v 9 (b) 2 consensus iterations per EM iteration Fig σ line segments (red) afterseveralconsensusiterations per EM iteration and 4 EM iterations. Each line segment corresponds to a single mixture component. Line segments from local estimates from each of M =9nodes of the J =12mixture components are plotted. e(t ) consensus iterate t Fig. 3. Localupdateerror(14)oftheaveragelog-lielihood versus consensus iterations at all M =9nodes at iteration 5 of EM. he dashed line represents the predicted convergence rate of ρ(w )=.8345.

6 θ t+1 θ t iterate t Fig. 4. Local update error of the global parameters, θ t+1 θ t for =1, 2,...,M versus EM iterations. All M =9error curves from the decentralized EM algorithm are plotted. he dashed curve corresponds to the update error for the centralized EM algorithm. 1 M l(θ t ) iterate t Fig. 5. Averagedata-log-lielihoodversusEMiterations.All M =9log-lielihood curves of the decentralized EM algorithm are plotted. he dashed curve corresponds to the loglielihood for the centralized EM algorithm. 6. CONCLUSION We have presented a decentralized EM-based algorithm for estimating the parameters of a mixture of factor analyzers. his model approximates the oint distribution of noisy observations of a nonlinear manifold from a networ of sensor nodes. We have provided results based on simulated data to demonstrate the efficacy of the algorithm. 7. REFERENCES [1] N. Ramarishnan, E. Ertin, and R. L. Moses, Gossip-based algorithm for oint signature estimation and node calibration in sensor networs, IEEE Journal of Selected opics in Signal Processing, vol.5,no.4,pp ,211. [2] L. Carin, R. G. Baraniu, V. Cevher, D. Dunson, M. I. Jordan, G. Sapiro, and M. B. Wain, Learning low-dimensional signal models, IEEE Signal Processing Magazine, vol. 28, no. 2, pp , 211. [3] Z. Ghahramani and G. E. Hinton, he EM algorithm for mixtures of factor analyzers, ech. Rep., CRG-R-96-1, University of oronto, [4] N. Ueda, R. Naano, Z. Ghahramani, and G. E. Hinton, SMEM algorithm for mixture models, Neural Comput., vol. 12, no. 9, pp , September 2. [5] M. A.. Figueiredo and A. K. Jain, Unsupervised learning of finite mixture models, IEEE rans. on Pattern Analysis and Machine Intelligence,vol.24,no.3,pp ,March22. [6] S. Roweis, L. Saul, and G. E. Hinton, Global coordination of local linear models, in Neural Information Processing Systems: Natural and Synthetic, 21 Conference on, Dec.21, pp [7] J. Verbee, Learning nonlinear image manifolds by global alignment of local linear models, IEEE rans. on Pattern Analysis and Machine Intelligence, vol.28,no.8,pp , Aug. 26. [8] R. M. Neal and G. E. Hinton, A view of the EM algorithm that ustifies incremental, sparse, and other variants, in Learning in Graphical Models, pp Springer,1998. [9] R. D. Nowa, Distributed EM algorithms for density estimation and clustering in sensor networs, IEEE rans. on Signal Processing, vol.51,no.8,pp ,Aug.23. [] W. Kowalczy and N. Vlassis, Newscast EM, in In NIPS , pp , MI Press. [11] D. Gu, Distributed EM algorithm for Gaussian mixtures in sensor networs, IEEE rans. on Neural Networs, vol.19, no. 7, pp , Jul. 28. [12] P. A. Forero, A. Cano, and G. B. Giannais, Consensus-based distributed expectation-maximization algorithm for density estimation and classification using wireless sensor networs, in Acoustics, Speech and Signal Processing, 28. ICASSP 28. IEEE International Conference on, March28,pp [13] L. Xu, Comparative analysis on convergence rates of the EM algorithm and its two modifications for Gaussian mixtures, Neural Processing Letters,vol.6,no.3,pp.69 76,1997. [14] M. H. DeGroot, Reaching a consensus, Journal of the American Statistical Association, vol.69,no.345,pp , [15] U. A. Khan, S. Kar, and J. M. F. Moura, Distributed algorithms in sensor networs, in Handboo on Array Processing and Sensor Networs, pp JohnWiley&Sons,Inc., 2. [16] S. Boyd, P. Diaconis, and L. Xiao, Fastest mixing Marov chain on a graph, SIAM REVIEW,vol.46,pp ,23. [17] L. Xiao and S. Boyd, Fast linear iterations for distributed averaging, Systems and Control Letters, vol.53,pp.65 78, 23. [18] P. Diaconis and L. Saloff-Coste, What do we now about the metropolis algorithm?, J. Comput. System. Sci, pp.2 36, [19] C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics. Springer, 26.

VoI for Learning and Inference! in Sensor Networks!

VoI for Learning and Inference! in Sensor Networks! ARO MURI on Value-centered Information Theory for Adaptive Learning, Inference, Tracking, and Exploitation VoI for Learning and Inference in Sensor Networks Gene Whipps 1,2, Emre Ertin 2, Randy Moses 2

More information

Constrained Fisher Scoring for a Mixture of Factor Analyzers

Constrained Fisher Scoring for a Mixture of Factor Analyzers ARL-TR-7836 SEP 2016 US Army Research Laboratory Constrained Fisher Scoring for a Mixture of Factor Analyzers by Gene T Whipps, Emre Ertin, and Randolph L Moses NOTICES Disclaimers The findings in this

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Newscast EM. Abstract

Newscast EM. Abstract Newscast EM Wojtek Kowalczyk Department of Computer Science Vrije Universiteit Amsterdam The Netherlands wojtek@cs.vu.nl Nikos Vlassis Informatics Institute University of Amsterdam The Netherlands vlassis@science.uva.nl

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the

More information

Matching the dimensionality of maps with that of the data

Matching the dimensionality of maps with that of the data Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

THE presence of missing values in a dataset often makes

THE presence of missing values in a dataset often makes 1 Efficient EM Training of Gaussian Mixtures with Missing Data Olivier Delalleau, Aaron Courville, and Yoshua Bengio arxiv:1209.0521v1 [cs.lg] 4 Sep 2012 Abstract In data-mining applications, we are frequently

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

EUSIPCO

EUSIPCO EUSIPCO 3 569736677 FULLY ISTRIBUTE SIGNAL ETECTION: APPLICATION TO COGNITIVE RAIO Franc Iutzeler Philippe Ciblat Telecom ParisTech, 46 rue Barrault 753 Paris, France email: firstnamelastname@telecom-paristechfr

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

Maximum Likelihood Diffusive Source Localization Based on Binary Observations Maximum Lielihood Diffusive Source Localization Based on Binary Observations Yoav Levinboo and an F. Wong Wireless Information Networing Group, University of Florida Gainesville, Florida 32611-6130, USA

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Fast Linear Iterations for Distributed Averaging 1

Fast Linear Iterations for Distributed Averaging 1 Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

Single-tree GMM training

Single-tree GMM training Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Distributed Estimation and Detection for Smart Grid

Distributed Estimation and Detection for Smart Grid Distributed Estimation and Detection for Smart Grid Texas A&M University Joint Wor with: S. Kar (CMU), R. Tandon (Princeton), H. V. Poor (Princeton), and J. M. F. Moura (CMU) 1 Distributed Estimation/Detection

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

On Agreement Problems with Gossip Algorithms in absence of common reference frames

On Agreement Problems with Gossip Algorithms in absence of common reference frames On Agreement Problems with Gossip Algorithms in absence of common reference frames M. Franceschelli A. Gasparri mauro.franceschelli@diee.unica.it gasparri@dia.uniroma3.it Dipartimento di Ingegneria Elettrica

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Robust Motion Segmentation by Spectral Clustering

Robust Motion Segmentation by Spectral Clustering Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

EM Algorithm LECTURE OUTLINE

EM Algorithm LECTURE OUTLINE EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Cross entropy-based importance sampling using Gaussian densities revisited

Cross entropy-based importance sampling using Gaussian densities revisited Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Matrix and Tensor Factorization from a Machine Learning Perspective

Matrix and Tensor Factorization from a Machine Learning Perspective Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of

More information

Bayesian Mixtures of Bernoulli Distributions

Bayesian Mixtures of Bernoulli Distributions Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Gaussian Mixture Distance for Information Retrieval

Gaussian Mixture Distance for Information Retrieval Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Discrete-time Consensus Filters on Directed Switching Graphs

Discrete-time Consensus Filters on Directed Switching Graphs 214 11th IEEE International Conference on Control & Automation (ICCA) June 18-2, 214. Taichung, Taiwan Discrete-time Consensus Filters on Directed Switching Graphs Shuai Li and Yi Guo Abstract We consider

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models

EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models Antonio Peñalver Benavent, Francisco Escolano Ruiz and Juan M. Sáez Martínez Robot Vision Group Alicante University 03690 Alicante, Spain

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

A Coupled Helmholtz Machine for PCA

A Coupled Helmholtz Machine for PCA A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August

More information

Fastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks

Fastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks Fastest Distributed Consensus Averaging Problem on Chain of Rhombus Networks Saber Jafarizadeh Department of Electrical Engineering Sharif University of Technology, Azadi Ave, Tehran, Iran Email: jafarizadeh@ee.sharif.edu

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Implicitely and Densely Discrete Black-Box Optimization Problems

Implicitely and Densely Discrete Black-Box Optimization Problems Implicitely and Densely Discrete Black-Box Optimization Problems L. N. Vicente September 26, 2008 Abstract This paper addresses derivative-free optimization problems where the variables lie implicitly

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information