ACONSENSUS-BASEDDECENTRALIZEDEMFORAMIXTUREOFFACTORANALYZERS. Gene T. Whipps Emre Ertin Randolph L. Moses
|
|
- Belinda Patterson
- 5 years ago
- Views:
Transcription
1 214 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP , 214, REIMS, FRANCE ACONSENSUS-BASEDDECENRALIZEDEMFORAMIXUREOFFACORANALYZERS Gene. Whipps Emre Ertin Randolph L. Moses he Ohio State University, ECE Department, Columbus, OH 432 ABSRAC We consider the problem of decentralized learning of a target appearance manifold using a networ of sensors. Sensor nodes observe an obect from different aspects and then, in an unsupervised and distributed manner, learn a oint statistical model for the data manifold. We employ a mixture of factor analyzers (MFA) model, approximating a potentially nonlinear manifold. We derive a consensus-based decentralized expectation maximization (EM) algorithm for learning the parameters of the mixture densities and mixing probabilities. A simulation example demonstrates the efficacy of the algorithm. Index erms decentralized learning, Gaussian mixture, mixture of factor analyzers, consensus 1. INRODUCION Aspatiallydistributedsensornetworcanbeusedtoconstruct a rich appearance model for targets in their common field-of-view. hese models can then be used to identify previously seen obects if they reappear in the networ at a later time. As an example, consider a networ of cameras capturing images of an obect from different but possibly overlapping aspects as the obect traverses through the networ s field of view. he ensemble of images captured by the networ forms a low-dimensional nonlinear manifold in the high-dimensional ambient space of images. One approach to appearance modeling would be to construct independent models of a local data manifold at each sensor and share it across the networ. However, such an ensemble of models suffers from discretization of the aspect space and poor parameter estimates as the number of unnown parameters necessarily scale linearly with the number of sensor nodes. Alternatively, the sensor nodes can collaborate to construct aointmodelfortheimageensemble. heparameterestimates of the oint model will improve with the number of sensor nodes, since the number of unnown parameters in the model is intrinsic to the obect and fixed, whereas the measurements scale linearly with the number of sensor nodes. he straightforward method of pooling images to a central location for oint model construction will require large and liely impractical networ bandwidth. Inthispaper,we he research reported here was partially supported by the U.S. Army Research Laboratory and by a grant from ARO, W911NF develop a decentralized learning method with a potentially reduced data bandwidth need, and which results in a global appearance manifold model shared by all sensor nodes. Previously [1] developed methods for in-networ learning of aspect-independent target signatures. In this wor we focus on learning appearance models with aspect dependence. We model the overall statistics as a mixture of factor analyzers (MFA) and derive a decentralized EM algorithm for learning model parameters. he MFA model is a probabilistic and generative one, and can be used for dimensionality reduction, manifold learning, and signal recovery from compressed sensing [2]. In the case of learning a data manifold, the MFA model is a linearization of a (potentially) nonlinear structure. he EM algorithm for amixtureoffactoranalyzers was derived in [3] in a centralized setting. here, the goal is to provide an algorithm for clustering and dimensionality reduction of high-dimensional data with a low-ran covariance model, also consistent with the structure of low-dimensional data manifolds. In this paper we extend the EM algorithm for the MFA model to the case of a spatially-distributed sensor networ with goals of distributing computations across the networ and being robust to individual node failures (e.g., losing connectivity to a central node in centralized or distributed systems). Previous wor [4, 5, 6, 7] employed MFA models for high-dimensional data with low-dimensional structure. he authors in [4] modify the standard EM for general mixture models by splittingandmergingmixturecomponentstoavoid local maxima in mixture models when the data are dominated by some components in parts of the data space. he split and merge technique is essentially applied as post-processing after standard EM converges to a stationary point. In [5], the authors propose an algorithm that ointly learns model parameters and model order (i.e.,thenumberofmixtures)formore general mixture models. he EM-lie algorithms proposed in [6] and [7] for the MFA model address the issue of global alignment of the linear mappings to the underlying manifold using a modified EM cost function. he algorithm in [6] requires a fixed-point algorithm in the M-step, addressed in [7] in closed-form. It was shown in [8], that maximizing the cost function in [6] and [7] is equivalent to that in EM. his paper differs from the results in [4, 5, 6, 7] in two important aspects. First, we provide a decentralized algo /14/$31. c 214 IEEE
2 rithm for fitting MFA models to manifold-structured high dimensional data; the algorithm is ideally suited for distributed sensing where sensor data ensemble is not present at any central node. Second, we consider a more general MFA model suitable for modeling data observed by heterogeneous sensor nodes differing in their aspect angle with respect to the obect. Specifically, we assume observations are drawn from the mixture density with mixture probabilities which can vary across the different sensor nodes. In other words, one sensor node may observe a mixture component with more (or less) probability than other nodes. In this context, the wor in [4, 5, 6, 7] considers the homogenous sensing case, where all the sensor nodes observe different realizations of a single mixture model with identical mixture probabilities. Our decentralized algorithm for MFA modeling is a consensus-based decentralized implementation of EM formulated for the general MFA model we consider. One of the earliest decentralized EM algorithms for a high-dimensional Gaussian mixture model was developed in [9] assuming the existence of a reliable networ with a ring topology. he authors in [, 11] later developed gossip-based decentralized EM algorithms for a Gaussian mixture model suitable for mesh networs. he algorithm proposed in [11] applies consensus to average local sufficient statistics across the networ. he authors in [12] applied alternating direction method of multipliers (ADMM) with a consensus constraint to the maximization step of EM for more general mixture densities with log-concave conditionals. he goal in each of these references is to learn, in a distributed way, highdimensional parametric models for data that are typically well-separated clusters; in contrast, our wor incorporates a low-dimensional structure which is ey to accurately modeling high-dimensional data observed by a networ of sensors and whose relevant characteristics lie on a common lowdimensional manifold structure. he remainder of this paper is outlined as follows. In Section 2, we outline the (approximated) observation model (i.e., amixtureoffactoranalyzers).insection3,wederivetheupdate equations of the consensus-based decentralized EM algorithm for a mixture of factor analyzers. In Section 5, a numerical example demonstrating the efficacy of the algorithms is provided. Finally, conclusions are given in Section MIXURE OF FACOR ANALYZERS We consider a sensor networ of M sensors. Sensor m {1, 2,...,M} collects N m observations. he i th measurement at the m th node is denoted by x m,i R p.hemeasurements are assumed to be i.i.d. observations from a nonlinear r-dimensional manifold, with r p. We approximate the system by linear mappings embedded in the higher dimensional space according to x =Λ y + µ + w, where y R r and w R p for {1, 2,...,J},and y N(, I r ) and w N (, Ψ ), where I r is the (r r) identity matrix. he number of mixture components J is assumed nown. he vector x is a random vector conditioned on nowing, whilex m,i is the i th observation at the m th sensor node. Given the mixture component to which the sensor observation belongs, then x m,i is a realization of the random vector x.itisstraightforwardtoshow x N ( ) µ, Σ with Σ =Ψ +Λ Λ. he covariance Ψ = diag([ψ 1,,ψ 2,,...,ψ p, ]) is diagonal and positive definite. he local coordinate y is not directly observed and can be seen as spherical perturbations along piecewise linear segments of the manifold that are mapped to the observation space. he term w accounts for observation noise. he columns of Λ span the th lower dimensional space. Sensor nodes observe the environment composed of the same component Gaussian densities, but with potentially different proportions of each component. In the observation space, the measurements follow amixturemodelaccording to J x m,i α m, N ( ) µ, Σ, =1 for m {1, 2,...,M}, i {1, 2,...,N m }.heunnown parameters are µ, Ψ, Λ,andcomponentprobabilities α m, for each mixture component and sensor node m. hismodel extends the mixture of factor analyzers model [3] to a networ of sensors where sensor nodes observe the factors (mixture components) with potentially varying proportions across the networ (i.e., α m, α, for m ). 3. EXPECAION-MAXIMIZAION ALGORIHM FOR A MIXURE OF FACOR ANALYZERS he complete data is {x m,i, y m,i,z m,i } m,i,wherey m,i and z m,i are missing data that indicate the local space and the mixture component, respectively, from which the observation x m,i was drawn. Define ψ =[ψ 1,,ψ 2,,...,ψ p, ], λ = vect(λ ),andα m =[α m,1,α m,2,...,α m,j ].henotation vect(a) represents a column vector of staced columns from matrix A. Similartotherelationshipbetweenx m, and x,themissingdatafromthelocalspacey m,i is a realization from the random vector y given that the data belongs to the th mixture component. We define the unnown parameter vector as θ =[µ 1,...,µ J, ψ 1,...,ψ J, λ 1,...,λ J, α 1,...,α M ]. Let θ t denote an iterate of the parameter vector θ. Letx = {x m,i } m,i, y = {y m,i } m,i,andz = {z m,i } m,i represent the collection of all samples from all nodes for the measured and missing data. he EM cost function is then given by Q ( θ, θ t) = E [ log p (x, y, z θ) x, θ t] M N m J = Pr ( z m,i = x m,i, θ t) m=1 =1 ( log αm, N ( ) [ ( x m,i µ, Σ + E log p ym,i x m,i, θ ) x m,i, θ t]).
3 In the E-step, updates of the posterior distributions are found to be Pr ( z m,i = x m,i, θ t) α t m, = N ( x m,i µ t, ) Σt J =1 αt m, N (x m,i µ t, Σt ) wm,i, t+1 (, p ym,i x m,i, θ t) = p ( y m,i z m,i =, x m,i, θ t) We define the matrix = N ( y m,i κ t+1 m,i,,ct+1 ). U = C Λ Ψ 1. (1) he mean and covariance of the conditional posterior of y m,i for z m,i = are given by κ m,i, = U (x m,i µ ), (2) C =(I r +Λ Ψ 1 Λ ) 1. (3) he iterates C t+1, U t+1,andκ t+1 m,i, are found according to equations (1)-(3) given θ t. he probabilities w t+1 m,i, require the inverse and determinant of a (p p) covariance estimate Σ t. Instead of O(p3 ) computations, the matrix inversion lemma permits the inverse with O(r 3 ) computations due to the structure imposed by the MFA model. As noted in [3], the inverse of Σ can be found by Σ 1 =Ψ 1 Ψ 1 Λ C Λ Ψ 1. Asimilarreductionincomputationscanbeachievedinthe computing of the determinant. he determinant of Σ reduces to [7] p Σ = C 1 ψ,. =1 Given the E-step updates of the posteriors, the unnown global parameters are then found in the M-step according to θ t+1 =argmaxq ( θ, θ t). θ he maximizer of Q ( θ, θ t) for the MFA model has a closedform solution. We denote the centered observations by x t m,i, = x m,i µ t. Similar to [9, 11], but for a MFA, we define the following local statistics for sensor node m for mixture component : Nm a t+1 m, = w t+1 m,i,, (4) and b t+1 m, = Nm V t+1 m, s t+1 = Nm m, = diag ( Nm w t+1 m,i, x m,i, (5) wm,i, t+1 xt t+1 m,i, (U x t m,i, ), (6) w t+1 m,i, xt m,i,( x t m,i,) ), (7) where diag(a) is a column vector of the diagonal elements of square matrix A. NotethatinthestandardEMimplementa- tion, the centered observations are relative to µ t+1.instead, we reference the observations to µ t to simplify the decentralization of EM. Otherwise, the global parameter µ t+1 would need to be nown at all nodes before computing local statistics. It was shown in [13] that the modified EM (i.e., referencing to µ t instead of µt+1 )isequivalenttostandardemin its convergence properties. he mixture probabilities are updated locally according to α t m, = 1 a t N m,. (8) m he global parameters are updated according to µ t = m=1 bt m, m=1 at m, Λ t = m=1 V t m, m=1 at m,, (9) ( C t + U t ( m=1 V t m, m=1 at m, ) 1, () ψ t = m=1 st m, diag Λ t m=1 (V m, t m=1 at ) M. (11) m, m=1 at m, he update equations (9)-(11) represent the centralized or standard M-step updates for the MFA model. We outline a decentralized approach of the global updates in the sequel. 4. CONSENSUS-BASED DECENRALIZED EM FOR A MIXURE OF FACOR ANALYZERS In a decentralized algorithm, each sensor node computes and communicates local statistics to nearby sensor nodes, instead of a central node. hen, each node estimates global parameters from updates of local statistics from its neighbors. he local statistics can updated by various gossip-type strategies. See [14, 15, 16, 17], among many others, for wors on general distributed averaging. Our focus is on a decentralized algorithm for the MFA model. In the standard EM approach for a MFA model, the local statistics from all sensor nodes are summed and combined to estimate the global parameters. he summation across nodes (m =1, 2,...,M)inequations(9)-(11)canbereplacedby sample averages since the constant 1/M is then a common factor to each numerator and denominator in the update equations (9)-(11). hus, similar to [11], we employ consensusbased averaging [14] for the decentralized algorithm to estimate the sample average of local statistics across the sensor networ. In consensus-based averaging, nodes compute a weighted average of their information with information from neighboring nodes. Here, prior to the M-step, each sensor node first calculates its local statistics using equations (4)-(7), and then the sensor networ establishes consensus on the average of the local statistics. he total number of real values shared under consensus with neighbor nodes for the MFA model is n MFA = Jp(r +2)+J,whereasthetotalforastandardGMM is n GMM = Jp(p/2+3/2) + J. hus,areductioninthecom- )
4 munications load (and dimensionality reduction) are achieved when r<(p 1)/2. Finally,eachnodeupdatesits(local) iterates of the global parameters using the consensus-based averages of the local statistics. Each node computes their own local estimates of the global parameters. We relabel theglobalparameterstodenote their relation to a given node : µ t,, Λt,,andΨt,. In the E-step at each node, the parameters of the posterior distributions are computed from the locally-estimated global parameters of the previous M-step and are similarly relabeled: w,i, t, Ct,,andU, t.hemixtureprobabilitiesαt, are still updated according to (8) since they are relevant to node only. Abstractly, sample averages of the form 1 M m=1 y m in the updates of the global parameters (9)-(11) are replaced by consensus averages denoted by ȳ at node. Forexample, the iterate µ t at node becomes µt, = b t, /āt, in the decentralized approach. We assume a connected, undirected communications networ. We assume a consensus matrix, based on the Metropolis-Hastings algorithm [16], given by 1 max{d, if N i,d } i [W ] i, = 1 1 N max{d,d }, if = i (12), else, where N m is the set of nodes in the (networ) neighborhood of sensor node m, excludingitself,andd m = N m is the degree of node m. the Metropolis-Hastings algorithm is a good fit for decentralized computations. he consensus matrix based on the Metropolis-Hastings algorithm requires local neighborhood information only (i.e., the degrees of neighbor nodes). Additionally, the Metropolis-Hastings algorithm and its convergence properties are well-studied for sampling from distributions [18], and its performance is comparable to other heuristic-based algorithms [16]. We can write an equivalent, compact expression of the consensus method across the networ. Define the vector of local statistics at node from equations (4)-(7) as β t =[b t,1,...,b t,j, v t,1,...,v t,j, s t,1,...,s t,j,a t,1,...,at,j ], where v t, = vec(v, t ).heconsensusupdatesofthesample average of the local statistics across all nodes can be written as ] [ βt t 1, β 2,..., β t M = [ βt 1 1, β ] t 1 2,..., β t 1 M W, where t indicates the consensus iteration and is initialized as β = β t from iterate t of EM. he iterates of consensus using the Metropolis-Hastings method are guaranteed to converge to the sample average provided the networ is connected and not bipartite [17]. hus, β 1 m t M m=1 βt m as t,foreach =1, 2,...,M. Each node requires a stopping criterion to exit the consensus iterations. he stopping criterion is usually based on the global optimization measure, which in this case is the oint data log-lielihood. he log-lielihood can be written in terms of a sum of local contributions, l(θ) = m=1 l m(θ), where l m (θ) = N m log J α m, N ( ) x m,i µ, Σ =1 (13) As a stopping criteria for the consensus iterations, we define for each node =1, 2,..., M the following consensus update error e (t )= l t l t 1, (14) t where l is an estimate after t consensus iterations of the average data log-lielihood, M 1 M m=1 l m(θ),withl (θ) evaluated at θ = θ t at node. Forsufficientlylarget,theper-step error behaves according to log e (t ) t log ρ(w ) + log c, (15) for some finite constant c and where ρ(w ) is the spectral radius of W = W 1 M 11 [17]. Since ρ(w ) < 1 for aconnectedandnon-bipartitenetwor,thelocalconsensus error measure e (t) converges to as t for all nodes =1, 2,...,M. 5. NUMERICAL EXAMPLES In this section, we provide an example using synthetic data to demonstrate the effectiveness ofthedecentralizedalgorithm to approximate a nonlinear, low-dimensional data manifold. As numerical examples, both [4] and [5] apply their techniques to a MFA model approximating a shrining spiral. We use the shrining spiral example for comparison with these centralized approaches with published results. For this numerical example, the number of sensor nodes is M =9and the number of mixture components is J =12. he number of data points per node is N m = N =for m =1, 2,...,M and is generated according to x = [(13.5t)cost, (.5 13) sin t, t] + w, where t [, 4π] and w N(, I 3 ).hedimensionofthe observations is p =3while the lower intrinsic dimension is r = 1. Each sensor node observes equal-length segments of the data manifold with 5% overlap of adacent segments of other nodes, not necessarily those of networ neighbors. Sensor nodes should observe at least one mixture component seen by at least one other sensor node. Figure 1 shows the communications networ structure. Each sensor node is a vertex on the graph, and each edge between vertices represents an undirected communications lin. For the Metropolis-Hastings-based consensus matrix W and the networ graph in Fig. 1, the resulting convergence rate is ρ(w 1 M 11 )= he decentralized EM algorithm is initialized with estimates from a consensus-based -means clustering (see for example [12]). While the obective functions between -means and EM are different, the resulting update equations for the mixture means (i.e., equations (4), (5), (9), and the corresponding consensus updates of (4) and (5)) are identical, with the exception that the posterior probabilities w t+1 m,i, are hard assignments in -means instead of soft ones. In our experi-
5 ence, the consensus-based -means converges much faster in both consensus and clustering due to the hard assignments. Since -means initialization can be seen as a simplification and a special case of the EM algorithm [19, Chapter 9.3], we will not discuss it further here. Figures 2(a) and 2(b) plot the noiseless decaying spiral (solid blue curve), sample observation points from all M nodes (blac scatter points), and 2-σ line segments (red segments) fromestimatesoftheglobalparameters. Notethata 2-σ segment from each node is plotted for all J components (i.e., 8 total line segments). In Fig. 2(a) and Fig. 2(b), the number of consensus steps are 5 and 2, respectively, per EM step in estimating the global parameters. he sample root-mean-squared-error is defined by s-rmse(t) = M m=1 Nm J wm,i, t =1 pmnm (Ip P (θ t m))(x m,i µ t m, ) 2, 2 where P (θ) =Λ (Λ Λ ) 1 Λ and, given θ = θt m proects the point (x m,i µ t m, ) onto the th subspace estimated by node m. After 4 EM iterations, the s-rmse is.8328 and.8326 for 5 and 2 consensus steps per EM step, respectively. Surprisingly, the average data-log-lielihood with ust 5 consensus steps ( ) is slightly greater than that with 2 consensus steps ( ). However, Fig. 2(a) and Fig. 2(b) show that the local estimates of the global parameters are in better agreement across the networ with more consensus steps. he agreement is made clearer in Fig. 3. Figure 3 plots the local update error of the data-loglielihood, given in equation (14), versus consensus iterations at the t =5iterate of EM steps. he solid lines are the update errors measured at each of the M =9nodes, and the dashed line represents the predicted convergence rate of ρ(w ) =.8345 for the simulated networ. his demonstrates that after a small number of steps, the consensus updates begin to behave as expected linear (in the exponent) convergence. Additionally, the sensor nodes have about an order of magnitude better agreement after 2 consensus steps compared with 5 steps. More consensus iterations improve agreement, but at the expense of additional communications. Figures 4 and 5 show the local update error of the global parameters, given by θ t+1 θ t for node = 1, 2,...,M,andtheaveragedata-log-lielihood, l t+1,versus EM iteration number. he error and average log-lielihood curves are plotted for each M =9sensor nodes, and each are generated after 2 consensus steps per EM iteration. Both figures show tight consensus between the nodes. As seen in Fig. 5, despite the intermediate consensus steps, the data-loglielihood appears non-decreasing as expected in standard EM algorithms. For comparison, error and log-lielihood curves are plotted in Figs. 4 and 5 for a centralized EM (i.e., assuming the E-step and M-step updates of Section 3 are available at a central node). After 4 iterations, the centralized EM has an s-rmse and average log-lielihood of.8374 and , respectively. v 5 v 4 v 6 v 3 v 2 v 7 v 8 Fig. 1. Simulatedsensornetworcommunicationlins.Consensus with Metropolis-Hastings has a convergence rate of ρ(w )= (a) 5 consensus iterations per EM iteration 5 v 1 v 9 (b) 2 consensus iterations per EM iteration Fig σ line segments (red) afterseveralconsensusiterations per EM iteration and 4 EM iterations. Each line segment corresponds to a single mixture component. Line segments from local estimates from each of M =9nodes of the J =12mixture components are plotted. e(t ) consensus iterate t Fig. 3. Localupdateerror(14)oftheaveragelog-lielihood versus consensus iterations at all M =9nodes at iteration 5 of EM. he dashed line represents the predicted convergence rate of ρ(w )=.8345.
6 θ t+1 θ t iterate t Fig. 4. Local update error of the global parameters, θ t+1 θ t for =1, 2,...,M versus EM iterations. All M =9error curves from the decentralized EM algorithm are plotted. he dashed curve corresponds to the update error for the centralized EM algorithm. 1 M l(θ t ) iterate t Fig. 5. Averagedata-log-lielihoodversusEMiterations.All M =9log-lielihood curves of the decentralized EM algorithm are plotted. he dashed curve corresponds to the loglielihood for the centralized EM algorithm. 6. CONCLUSION We have presented a decentralized EM-based algorithm for estimating the parameters of a mixture of factor analyzers. his model approximates the oint distribution of noisy observations of a nonlinear manifold from a networ of sensor nodes. We have provided results based on simulated data to demonstrate the efficacy of the algorithm. 7. REFERENCES [1] N. Ramarishnan, E. Ertin, and R. L. Moses, Gossip-based algorithm for oint signature estimation and node calibration in sensor networs, IEEE Journal of Selected opics in Signal Processing, vol.5,no.4,pp ,211. [2] L. Carin, R. G. Baraniu, V. Cevher, D. Dunson, M. I. Jordan, G. Sapiro, and M. B. Wain, Learning low-dimensional signal models, IEEE Signal Processing Magazine, vol. 28, no. 2, pp , 211. [3] Z. Ghahramani and G. E. Hinton, he EM algorithm for mixtures of factor analyzers, ech. Rep., CRG-R-96-1, University of oronto, [4] N. Ueda, R. Naano, Z. Ghahramani, and G. E. Hinton, SMEM algorithm for mixture models, Neural Comput., vol. 12, no. 9, pp , September 2. [5] M. A.. Figueiredo and A. K. Jain, Unsupervised learning of finite mixture models, IEEE rans. on Pattern Analysis and Machine Intelligence,vol.24,no.3,pp ,March22. [6] S. Roweis, L. Saul, and G. E. Hinton, Global coordination of local linear models, in Neural Information Processing Systems: Natural and Synthetic, 21 Conference on, Dec.21, pp [7] J. Verbee, Learning nonlinear image manifolds by global alignment of local linear models, IEEE rans. on Pattern Analysis and Machine Intelligence, vol.28,no.8,pp , Aug. 26. [8] R. M. Neal and G. E. Hinton, A view of the EM algorithm that ustifies incremental, sparse, and other variants, in Learning in Graphical Models, pp Springer,1998. [9] R. D. Nowa, Distributed EM algorithms for density estimation and clustering in sensor networs, IEEE rans. on Signal Processing, vol.51,no.8,pp ,Aug.23. [] W. Kowalczy and N. Vlassis, Newscast EM, in In NIPS , pp , MI Press. [11] D. Gu, Distributed EM algorithm for Gaussian mixtures in sensor networs, IEEE rans. on Neural Networs, vol.19, no. 7, pp , Jul. 28. [12] P. A. Forero, A. Cano, and G. B. Giannais, Consensus-based distributed expectation-maximization algorithm for density estimation and classification using wireless sensor networs, in Acoustics, Speech and Signal Processing, 28. ICASSP 28. IEEE International Conference on, March28,pp [13] L. Xu, Comparative analysis on convergence rates of the EM algorithm and its two modifications for Gaussian mixtures, Neural Processing Letters,vol.6,no.3,pp.69 76,1997. [14] M. H. DeGroot, Reaching a consensus, Journal of the American Statistical Association, vol.69,no.345,pp , [15] U. A. Khan, S. Kar, and J. M. F. Moura, Distributed algorithms in sensor networs, in Handboo on Array Processing and Sensor Networs, pp JohnWiley&Sons,Inc., 2. [16] S. Boyd, P. Diaconis, and L. Xiao, Fastest mixing Marov chain on a graph, SIAM REVIEW,vol.46,pp ,23. [17] L. Xiao and S. Boyd, Fast linear iterations for distributed averaging, Systems and Control Letters, vol.53,pp.65 78, 23. [18] P. Diaconis and L. Saloff-Coste, What do we now about the metropolis algorithm?, J. Comput. System. Sci, pp.2 36, [19] C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics. Springer, 26.
VoI for Learning and Inference! in Sensor Networks!
ARO MURI on Value-centered Information Theory for Adaptive Learning, Inference, Tracking, and Exploitation VoI for Learning and Inference in Sensor Networks Gene Whipps 1,2, Emre Ertin 2, Randy Moses 2
More informationConstrained Fisher Scoring for a Mixture of Factor Analyzers
ARL-TR-7836 SEP 2016 US Army Research Laboratory Constrained Fisher Scoring for a Mixture of Factor Analyzers by Gene T Whipps, Emre Ertin, and Randolph L Moses NOTICES Disclaimers The findings in this
More informationEstimating Gaussian Mixture Densities with EM A Tutorial
Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationConvex Optimization of Graph Laplacian Eigenvalues
Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Stanford University (Joint work with Persi Diaconis, Arpita Ghosh, Seung-Jean Kim, Sanjay Lall, Pablo Parrilo, Amin Saberi, Jun Sun, Lin
More informationSelf-Organization by Optimizing Free-Energy
Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationNewscast EM. Abstract
Newscast EM Wojtek Kowalczyk Department of Computer Science Vrije Universiteit Amsterdam The Netherlands wojtek@cs.vu.nl Nikos Vlassis Informatics Institute University of Amsterdam The Netherlands vlassis@science.uva.nl
More informationConvex Optimization of Graph Laplacian Eigenvalues
Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the
More informationMatching the dimensionality of maps with that of the data
Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationTHE presence of missing values in a dataset often makes
1 Efficient EM Training of Gaussian Mixtures with Missing Data Olivier Delalleau, Aaron Courville, and Yoshua Bengio arxiv:1209.0521v1 [cs.lg] 4 Sep 2012 Abstract In data-mining applications, we are frequently
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationEUSIPCO
EUSIPCO 3 569736677 FULLY ISTRIBUTE SIGNAL ETECTION: APPLICATION TO COGNITIVE RAIO Franc Iutzeler Philippe Ciblat Telecom ParisTech, 46 rue Barrault 753 Paris, France email: firstnamelastname@telecom-paristechfr
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationMaximum Likelihood Diffusive Source Localization Based on Binary Observations
Maximum Lielihood Diffusive Source Localization Based on Binary Observations Yoav Levinboo and an F. Wong Wireless Information Networing Group, University of Florida Gainesville, Florida 32611-6130, USA
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationFast Linear Iterations for Distributed Averaging 1
Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture Notes Fall 2009 November, 2009 Byoung-Ta Zhang School of Computer Science and Engineering & Cognitive Science, Brain Science, and Bioinformatics Seoul National University
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationSingle-tree GMM training
Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationAn indicator for the number of clusters using a linear map to simplex structure
An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationDistributed Estimation and Detection for Smart Grid
Distributed Estimation and Detection for Smart Grid Texas A&M University Joint Wor with: S. Kar (CMU), R. Tandon (Princeton), H. V. Poor (Princeton), and J. M. F. Moura (CMU) 1 Distributed Estimation/Detection
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationOn Agreement Problems with Gossip Algorithms in absence of common reference frames
On Agreement Problems with Gossip Algorithms in absence of common reference frames M. Franceschelli A. Gasparri mauro.franceschelli@diee.unica.it gasparri@dia.uniroma3.it Dipartimento di Ingegneria Elettrica
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationRobust Motion Segmentation by Spectral Clustering
Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationEM Algorithm LECTURE OUTLINE
EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationCross entropy-based importance sampling using Gaussian densities revisited
Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationA Distributed Newton Method for Network Utility Maximization, I: Algorithm
A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationMatrix and Tensor Factorization from a Machine Learning Perspective
Matrix and Tensor Factorization from a Machine Learning Perspective Christoph Freudenthaler Information Systems and Machine Learning Lab, University of Hildesheim Research Seminar, Vienna University of
More informationBayesian Mixtures of Bernoulli Distributions
Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions
More informationSpectral Generative Models for Graphs
Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationDiscrete-time Consensus Filters on Directed Switching Graphs
214 11th IEEE International Conference on Control & Automation (ICCA) June 18-2, 214. Taichung, Taiwan Discrete-time Consensus Filters on Directed Switching Graphs Shuai Li and Yi Guo Abstract We consider
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationEBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models
EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models Antonio Peñalver Benavent, Francisco Escolano Ruiz and Juan M. Sáez Martínez Robot Vision Group Alicante University 03690 Alicante, Spain
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationA Coupled Helmholtz Machine for PCA
A Coupled Helmholtz Machine for PCA Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu Pohang 79-784, Korea seungjin@postech.ac.kr August
More informationFastest Distributed Consensus Averaging Problem on. Chain of Rhombus Networks
Fastest Distributed Consensus Averaging Problem on Chain of Rhombus Networks Saber Jafarizadeh Department of Electrical Engineering Sharif University of Technology, Azadi Ave, Tehran, Iran Email: jafarizadeh@ee.sharif.edu
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationImplicitely and Densely Discrete Black-Box Optimization Problems
Implicitely and Densely Discrete Black-Box Optimization Problems L. N. Vicente September 26, 2008 Abstract This paper addresses derivative-free optimization problems where the variables lie implicitly
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationTwo-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix
Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More information