ACONSENSUS-BASEDDECENTRALIZEDEMFORAMIXTUREOFFACTORANALYZERS. Gene T. Whipps Emre Ertin Randolph L. Moses

Size: px

Start display at page:

Download "ACONSENSUS-BASEDDECENTRALIZEDEMFORAMIXTUREOFFACTORANALYZERS. Gene T. Whipps Emre Ertin Randolph L. Moses"

Belinda Patterson
5 years ago
Views:

1 214 IEEE INERNAIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEP , 214, REIMS, FRANCE ACONSENSUS-BASEDDECENRALIZEDEMFORAMIXUREOFFACORANALYZERS Gene. Whipps Emre Ertin Randolph L. Moses he Ohio State University, ECE Department, Columbus, OH 432 ABSRAC We consider the problem of decentralized learning of a target appearance manifold using a networ of sensors. Sensor nodes observe an obect from different aspects and then, in an unsupervised and distributed manner, learn a oint statistical model for the data manifold. We employ a mixture of factor analyzers (MFA) model, approximating a potentially nonlinear manifold. We derive a consensus-based decentralized expectation maximization (EM) algorithm for learning the parameters of the mixture densities and mixing probabilities. A simulation example demonstrates the efficacy of the algorithm. Index erms decentralized learning, Gaussian mixture, mixture of factor analyzers, consensus 1. INRODUCION Aspatiallydistributedsensornetworcanbeusedtoconstruct a rich appearance model for targets in their common field-of-view. hese models can then be used to identify previously seen obects if they reappear in the networ at a later time. As an example, consider a networ of cameras capturing images of an obect from different but possibly overlapping aspects as the obect traverses through the networ s field of view. he ensemble of images captured by the networ forms a low-dimensional nonlinear manifold in the high-dimensional ambient space of images. One approach to appearance modeling would be to construct independent models of a local data manifold at each sensor and share it across the networ. However, such an ensemble of models suffers from discretization of the aspect space and poor parameter estimates as the number of unnown parameters necessarily scale linearly with the number of sensor nodes. Alternatively, the sensor nodes can collaborate to construct aointmodelfortheimageensemble. heparameterestimates of the oint model will improve with the number of sensor nodes, since the number of unnown parameters in the model is intrinsic to the obect and fixed, whereas the measurements scale linearly with the number of sensor nodes. he straightforward method of pooling images to a central location for oint model construction will require large and liely impractical networ bandwidth. Inthispaper,we he research reported here was partially supported by the U.S. Army Research Laboratory and by a grant from ARO, W911NF develop a decentralized learning method with a potentially reduced data bandwidth need, and which results in a global appearance manifold model shared by all sensor nodes. Previously [1] developed methods for in-networ learning of aspect-independent target signatures. In this wor we focus on learning appearance models with aspect dependence. We model the overall statistics as a mixture of factor analyzers (MFA) and derive a decentralized EM algorithm for learning model parameters. he MFA model is a probabilistic and generative one, and can be used for dimensionality reduction, manifold learning, and signal recovery from compressed sensing [2]. In the case of learning a data manifold, the MFA model is a linearization of a (potentially) nonlinear structure. he EM algorithm for amixtureoffactoranalyzers was derived in [3] in a centralized setting. here, the goal is to provide an algorithm for clustering and dimensionality reduction of high-dimensional data with a low-ran covariance model, also consistent with the structure of low-dimensional data manifolds. In this paper we extend the EM algorithm for the MFA model to the case of a spatially-distributed sensor networ with goals of distributing computations across the networ and being robust to individual node failures (e.g., losing connectivity to a central node in centralized or distributed systems). Previous wor [4, 5, 6, 7] employed MFA models for high-dimensional data with low-dimensional structure. he authors in [4] modify the standard EM for general mixture models by splittingandmergingmixturecomponentstoavoid local maxima in mixture models when the data are dominated by some components in parts of the data space. he split and merge technique is essentially applied as post-processing after standard EM converges to a stationary point. In [5], the authors propose an algorithm that ointly learns model parameters and model order (i.e.,thenumberofmixtures)formore general mixture models. he EM-lie algorithms proposed in [6] and [7] for the MFA model address the issue of global alignment of the linear mappings to the underlying manifold using a modified EM cost function. he algorithm in [6] requires a fixed-point algorithm in the M-step, addressed in [7] in closed-form. It was shown in [8], that maximizing the cost function in [6] and [7] is equivalent to that in EM. his paper differs from the results in [4, 5, 6, 7] in two important aspects. First, we provide a decentralized algo /14/$31. c 214 IEEE

2 rithm for fitting MFA models to manifold-structured high dimensional data; the algorithm is ideally suited for distributed sensing where sensor data ensemble is not present at any central node. Second, we consider a more general MFA model suitable for modeling data observed by heterogeneous sensor nodes differing in their aspect angle with respect to the obect. Specifically, we assume observations are drawn from the mixture density with mixture probabilities which can vary across the different sensor nodes. In other words, one sensor node may observe a mixture component with more (or less) probability than other nodes. In this context, the wor in [4, 5, 6, 7] considers the homogenous sensing case, where all the sensor nodes observe different realizations of a single mixture model with identical mixture probabilities. Our decentralized algorithm for MFA modeling is a consensus-based decentralized implementation of EM formulated for the general MFA model we consider. One of the earliest decentralized EM algorithms for a high-dimensional Gaussian mixture model was developed in [9] assuming the existence of a reliable networ with a ring topology. he authors in [, 11] later developed gossip-based decentralized EM algorithms for a Gaussian mixture model suitable for mesh networs. he algorithm proposed in [11] applies consensus to average local sufficient statistics across the networ. he authors in [12] applied alternating direction method of multipliers (ADMM) with a consensus constraint to the maximization step of EM for more general mixture densities with log-concave conditionals. he goal in each of these references is to learn, in a distributed way, highdimensional parametric models for data that are typically well-separated clusters; in contrast, our wor incorporates a low-dimensional structure which is ey to accurately modeling high-dimensional data observed by a networ of sensors and whose relevant characteristics lie on a common lowdimensional manifold structure. he remainder of this paper is outlined as follows. In Section 2, we outline the (approximated) observation model (i.e., amixtureoffactoranalyzers).insection3,wederivetheupdate equations of the consensus-based decentralized EM algorithm for a mixture of factor analyzers. In Section 5, a numerical example demonstrating the efficacy of the algorithms is provided. Finally, conclusions are given in Section MIXURE OF FACOR ANALYZERS We consider a sensor networ of M sensors. Sensor m {1, 2,...,M} collects N m observations. he i th measurement at the m th node is denoted by x m,i R p.hemeasurements are assumed to be i.i.d. observations from a nonlinear r-dimensional manifold, with r p. We approximate the system by linear mappings embedded in the higher dimensional space according to x =Λ y + µ + w, where y R r and w R p for {1, 2,...,J},and y N(, I r ) and w N (, Ψ ), where I r is the (r r) identity matrix. he number of mixture components J is assumed nown. he vector x is a random vector conditioned on nowing, whilex m,i is the i th observation at the m th sensor node. Given the mixture component to which the sensor observation belongs, then x m,i is a realization of the random vector x.itisstraightforwardtoshow x N ( ) µ, Σ with Σ =Ψ +Λ Λ. he covariance Ψ = diag([ψ 1,,ψ 2,,...,ψ p, ]) is diagonal and positive definite. he local coordinate y is not directly observed and can be seen as spherical perturbations along piecewise linear segments of the manifold that are mapped to the observation space. he term w accounts for observation noise. he columns of Λ span the th lower dimensional space. Sensor nodes observe the environment composed of the same component Gaussian densities, but with potentially different proportions of each component. In the observation space, the measurements follow amixturemodelaccording to J x m,i α m, N ( ) µ, Σ, =1 for m {1, 2,...,M}, i {1, 2,...,N m }.heunnown parameters are µ, Ψ, Λ,andcomponentprobabilities α m, for each mixture component and sensor node m. hismodel extends the mixture of factor analyzers model [3] to a networ of sensors where sensor nodes observe the factors (mixture components) with potentially varying proportions across the networ (i.e., α m, α, for m ). 3. EXPECAION-MAXIMIZAION ALGORIHM FOR A MIXURE OF FACOR ANALYZERS he complete data is {x m,i, y m,i,z m,i } m,i,wherey m,i and z m,i are missing data that indicate the local space and the mixture component, respectively, from which the observation x m,i was drawn. Define ψ =[ψ 1,,ψ 2,,...,ψ p, ], λ = vect(λ ),andα m =[α m,1,α m,2,...,α m,j ].henotation vect(a) represents a column vector of staced columns from matrix A. Similartotherelationshipbetweenx m, and x,themissingdatafromthelocalspacey m,i is a realization from the random vector y given that the data belongs to the th mixture component. We define the unnown parameter vector as θ =[µ 1,...,µ J, ψ 1,...,ψ J, λ 1,...,λ J, α 1,...,α M ]. Let θ t denote an iterate of the parameter vector θ. Letx = {x m,i } m,i, y = {y m,i } m,i,andz = {z m,i } m,i represent the collection of all samples from all nodes for the measured and missing data. he EM cost function is then given by Q ( θ, θ t) = E [ log p (x, y, z θ) x, θ t] M N m J = Pr ( z m,i = x m,i, θ t) m=1 =1 ( log αm, N ( ) [ ( x m,i µ, Σ + E log p ym,i x m,i, θ ) x m,i, θ t]).

3 In the E-step, updates of the posterior distributions are found to be Pr ( z m,i = x m,i, θ t) α t m, = N ( x m,i µ t, ) Σt J =1 αt m, N (x m,i µ t, Σt ) wm,i, t+1 (, p ym,i x m,i, θ t) = p ( y m,i z m,i =, x m,i, θ t) We define the matrix = N ( y m,i κ t+1 m,i,,ct+1 ). U = C Λ Ψ 1. (1) he mean and covariance of the conditional posterior of y m,i for z m,i = are given by κ m,i, = U (x m,i µ ), (2) C =(I r +Λ Ψ 1 Λ ) 1. (3) he iterates C t+1, U t+1,andκ t+1 m,i, are found according to equations (1)-(3) given θ t. he probabilities w t+1 m,i, require the inverse and determinant of a (p p) covariance estimate Σ t. Instead of O(p3 ) computations, the matrix inversion lemma permits the inverse with O(r 3 ) computations due to the structure imposed by the MFA model. As noted in [3], the inverse of Σ can be found by Σ 1 =Ψ 1 Ψ 1 Λ C Λ Ψ 1. Asimilarreductionincomputationscanbeachievedinthe computing of the determinant. he determinant of Σ reduces to [7] p Σ = C 1 ψ,. =1 Given the E-step updates of the posteriors, the unnown global parameters are then found in the M-step according to θ t+1 =argmaxq ( θ, θ t). θ he maximizer of Q ( θ, θ t) for the MFA model has a closedform solution. We denote the centered observations by x t m,i, = x m,i µ t. Similar to [9, 11], but for a MFA, we define the following local statistics for sensor node m for mixture component : Nm a t+1 m, = w t+1 m,i,, (4) and b t+1 m, = Nm V t+1 m, s t+1 = Nm m, = diag ( Nm w t+1 m,i, x m,i, (5) wm,i, t+1 xt t+1 m,i, (U x t m,i, ), (6) w t+1 m,i, xt m,i,( x t m,i,) ), (7) where diag(a) is a column vector of the diagonal elements of square matrix A. NotethatinthestandardEMimplementa- tion, the centered observations are relative to µ t+1.instead, we reference the observations to µ t to simplify the decentralization of EM. Otherwise, the global parameter µ t+1 would need to be nown at all nodes before computing local statistics. It was shown in [13] that the modified EM (i.e., referencing to µ t instead of µt+1 )isequivalenttostandardemin its convergence properties. he mixture probabilities are updated locally according to α t m, = 1 a t N m,. (8) m he global parameters are updated according to µ t = m=1 bt m, m=1 at m, Λ t = m=1 V t m, m=1 at m,, (9) ( C t + U t ( m=1 V t m, m=1 at m, ) 1, () ψ t = m=1 st m, diag Λ t m=1 (V m, t m=1 at ) M. (11) m, m=1 at m, he update equations (9)-(11) represent the centralized or standard M-step updates for the MFA model. We outline a decentralized approach of the global updates in the sequel. 4. CONSENSUS-BASED DECENRALIZED EM FOR A MIXURE OF FACOR ANALYZERS In a decentralized algorithm, each sensor node computes and communicates local statistics to nearby sensor nodes, instead of a central node. hen, each node estimates global parameters from updates of local statistics from its neighbors. he local statistics can updated by various gossip-type strategies. See [14, 15, 16, 17], among many others, for wors on general distributed averaging. Our focus is on a decentralized algorithm for the MFA model. In the standard EM approach for a MFA model, the local statistics from all sensor nodes are summed and combined to estimate the global parameters. he summation across nodes (m =1, 2,...,M)inequations(9)-(11)canbereplacedby sample averages since the constant 1/M is then a common factor to each numerator and denominator in the update equations (9)-(11). hus, similar to [11], we employ consensusbased averaging [14] for the decentralized algorithm to estimate the sample average of local statistics across the sensor networ. In consensus-based averaging, nodes compute a weighted average of their information with information from neighboring nodes. Here, prior to the M-step, each sensor node first calculates its local statistics using equations (4)-(7), and then the sensor networ establishes consensus on the average of the local statistics. he total number of real values shared under consensus with neighbor nodes for the MFA model is n MFA = Jp(r +2)+J,whereasthetotalforastandardGMM is n GMM = Jp(p/2+3/2) + J. hus,areductioninthecom- )

4 munications load (and dimensionality reduction) are achieved when r<(p 1)/2. Finally,eachnodeupdatesits(local) iterates of the global parameters using the consensus-based averages of the local statistics. Each node computes their own local estimates of the global parameters. We relabel theglobalparameterstodenote their relation to a given node : µ t,, Λt,,andΨt,. In the E-step at each node, the parameters of the posterior distributions are computed from the locally-estimated global parameters of the previous M-step and are similarly relabeled: w,i, t, Ct,,andU, t.hemixtureprobabilitiesαt, are still updated according to (8) since they are relevant to node only. Abstractly, sample averages of the form 1 M m=1 y m in the updates of the global parameters (9)-(11) are replaced by consensus averages denoted by ȳ at node. Forexample, the iterate µ t at node becomes µt, = b t, /āt, in the decentralized approach. We assume a connected, undirected communications networ. We assume a consensus matrix, based on the Metropolis-Hastings algorithm [16], given by 1 max{d, if N i,d } i [W ] i, = 1 1 N max{d,d }, if = i (12), else, where N m is the set of nodes in the (networ) neighborhood of sensor node m, excludingitself,andd m = N m is the degree of node m. the Metropolis-Hastings algorithm is a good fit for decentralized computations. he consensus matrix based on the Metropolis-Hastings algorithm requires local neighborhood information only (i.e., the degrees of neighbor nodes). Additionally, the Metropolis-Hastings algorithm and its convergence properties are well-studied for sampling from distributions [18], and its performance is comparable to other heuristic-based algorithms [16]. We can write an equivalent, compact expression of the consensus method across the networ. Define the vector of local statistics at node from equations (4)-(7) as β t =[b t,1,...,b t,j, v t,1,...,v t,j, s t,1,...,s t,j,a t,1,...,at,j ], where v t, = vec(v, t ).heconsensusupdatesofthesample average of the local statistics across all nodes can be written as ] [ βt t 1, β 2,..., β t M = [ βt 1 1, β ] t 1 2,..., β t 1 M W, where t indicates the consensus iteration and is initialized as β = β t from iterate t of EM. he iterates of consensus using the Metropolis-Hastings method are guaranteed to converge to the sample average provided the networ is connected and not bipartite [17]. hus, β 1 m t M m=1 βt m as t,foreach =1, 2,...,M. Each node requires a stopping criterion to exit the consensus iterations. he stopping criterion is usually based on the global optimization measure, which in this case is the oint data log-lielihood. he log-lielihood can be written in terms of a sum of local contributions, l(θ) = m=1 l m(θ), where l m (θ) = N m log J α m, N ( ) x m,i µ, Σ =1 (13) As a stopping criteria for the consensus iterations, we define for each node =1, 2,..., M the following consensus update error e (t )= l t l t 1, (14) t where l is an estimate after t consensus iterations of the average data log-lielihood, M 1 M m=1 l m(θ),withl (θ) evaluated at θ = θ t at node. Forsufficientlylarget,theper-step error behaves according to log e (t ) t log ρ(w ) + log c, (15) for some finite constant c and where ρ(w ) is the spectral radius of W = W 1 M 11 [17]. Since ρ(w ) < 1 for aconnectedandnon-bipartitenetwor,thelocalconsensus error measure e (t) converges to as t for all nodes =1, 2,...,M. 5. NUMERICAL EXAMPLES In this section, we provide an example using synthetic data to demonstrate the effectiveness ofthedecentralizedalgorithm to approximate a nonlinear, low-dimensional data manifold. As numerical examples, both [4] and [5] apply their techniques to a MFA model approximating a shrining spiral. We use the shrining spiral example for comparison with these centralized approaches with published results. For this numerical example, the number of sensor nodes is M =9and the number of mixture components is J =12. he number of data points per node is N m = N =for m =1, 2,...,M and is generated according to x = [(13.5t)cost, (.5 13) sin t, t] + w, where t [, 4π] and w N(, I 3 ).hedimensionofthe observations is p =3while the lower intrinsic dimension is r = 1. Each sensor node observes equal-length segments of the data manifold with 5% overlap of adacent segments of other nodes, not necessarily those of networ neighbors. Sensor nodes should observe at least one mixture component seen by at least one other sensor node. Figure 1 shows the communications networ structure. Each sensor node is a vertex on the graph, and each edge between vertices represents an undirected communications lin. For the Metropolis-Hastings-based consensus matrix W and the networ graph in Fig. 1, the resulting convergence rate is ρ(w 1 M 11 )= he decentralized EM algorithm is initialized with estimates from a consensus-based -means clustering (see for example [12]). While the obective functions between -means and EM are different, the resulting update equations for the mixture means (i.e., equations (4), (5), (9), and the corresponding consensus updates of (4) and (5)) are identical, with the exception that the posterior probabilities w t+1 m,i, are hard assignments in -means instead of soft ones. In our experi-

5 ence, the consensus-based -means converges much faster in both consensus and clustering due to the hard assignments. Since -means initialization can be seen as a simplification and a special case of the EM algorithm [19, Chapter 9.3], we will not discuss it further here. Figures 2(a) and 2(b) plot the noiseless decaying spiral (solid blue curve), sample observation points from all M nodes (blac scatter points), and 2-σ line segments (red segments) fromestimatesoftheglobalparameters. Notethata 2-σ segment from each node is plotted for all J components (i.e., 8 total line segments). In Fig. 2(a) and Fig. 2(b), the number of consensus steps are 5 and 2, respectively, per EM step in estimating the global parameters. he sample root-mean-squared-error is defined by s-rmse(t) = M m=1 Nm J wm,i, t =1 pmnm (Ip P (θ t m))(x m,i µ t m, ) 2, 2 where P (θ) =Λ (Λ Λ ) 1 Λ and, given θ = θt m proects the point (x m,i µ t m, ) onto the th subspace estimated by node m. After 4 EM iterations, the s-rmse is.8328 and.8326 for 5 and 2 consensus steps per EM step, respectively. Surprisingly, the average data-log-lielihood with ust 5 consensus steps ( ) is slightly greater than that with 2 consensus steps ( ). However, Fig. 2(a) and Fig. 2(b) show that the local estimates of the global parameters are in better agreement across the networ with more consensus steps. he agreement is made clearer in Fig. 3. Figure 3 plots the local update error of the data-loglielihood, given in equation (14), versus consensus iterations at the t =5iterate of EM steps. he solid lines are the update errors measured at each of the M =9nodes, and the dashed line represents the predicted convergence rate of ρ(w ) =.8345 for the simulated networ. his demonstrates that after a small number of steps, the consensus updates begin to behave as expected linear (in the exponent) convergence. Additionally, the sensor nodes have about an order of magnitude better agreement after 2 consensus steps compared with 5 steps. More consensus iterations improve agreement, but at the expense of additional communications. Figures 4 and 5 show the local update error of the global parameters, given by θ t+1 θ t for node = 1, 2,...,M,andtheaveragedata-log-lielihood, l t+1,versus EM iteration number. he error and average log-lielihood curves are plotted for each M =9sensor nodes, and each are generated after 2 consensus steps per EM iteration. Both figures show tight consensus between the nodes. As seen in Fig. 5, despite the intermediate consensus steps, the data-loglielihood appears non-decreasing as expected in standard EM algorithms. For comparison, error and log-lielihood curves are plotted in Figs. 4 and 5 for a centralized EM (i.e., assuming the E-step and M-step updates of Section 3 are available at a central node). After 4 iterations, the centralized EM has an s-rmse and average log-lielihood of.8374 and , respectively. v 5 v 4 v 6 v 3 v 2 v 7 v 8 Fig. 1. Simulatedsensornetworcommunicationlins.Consensus with Metropolis-Hastings has a convergence rate of ρ(w )= (a) 5 consensus iterations per EM iteration 5 v 1 v 9 (b) 2 consensus iterations per EM iteration Fig σ line segments (red) afterseveralconsensusiterations per EM iteration and 4 EM iterations. Each line segment corresponds to a single mixture component. Line segments from local estimates from each of M =9nodes of the J =12mixture components are plotted. e(t ) consensus iterate t Fig. 3. Localupdateerror(14)oftheaveragelog-lielihood versus consensus iterations at all M =9nodes at iteration 5 of EM. he dashed line represents the predicted convergence rate of ρ(w )=.8345.

6 θ t+1 θ t iterate t Fig. 4. Local update error of the global parameters, θ t+1 θ t for =1, 2,...,M versus EM iterations. All M =9error curves from the decentralized EM algorithm are plotted. he dashed curve corresponds to the update error for the centralized EM algorithm. 1 M l(θ t ) iterate t Fig. 5. Averagedata-log-lielihoodversusEMiterations.All M =9log-lielihood curves of the decentralized EM algorithm are plotted. he dashed curve corresponds to the loglielihood for the centralized EM algorithm. 6. CONCLUSION We have presented a decentralized EM-based algorithm for estimating the parameters of a mixture of factor analyzers. his model approximates the oint distribution of noisy observations of a nonlinear manifold from a networ of sensor nodes. We have provided results based on simulated data to demonstrate the efficacy of the algorithm. 7. REFERENCES [1] N. Ramarishnan, E. Ertin, and R. L. Moses, Gossip-based algorithm for oint signature estimation and node calibration in sensor networs, IEEE Journal of Selected opics in Signal Processing, vol.5,no.4,pp ,211. [2] L. Carin, R. G. Baraniu, V. Cevher, D. Dunson, M. I. Jordan, G. Sapiro, and M. B. Wain, Learning low-dimensional signal models, IEEE Signal Processing Magazine, vol. 28, no. 2, pp , 211. [3] Z. Ghahramani and G. E. Hinton, he EM algorithm for mixtures of factor analyzers, ech. Rep., CRG-R-96-1, University of oronto, [4] N. Ueda, R. Naano, Z. Ghahramani, and G. E. Hinton, SMEM algorithm for mixture models, Neural Comput., vol. 12, no. 9, pp , September 2. [5] M. A.. Figueiredo and A. K. Jain, Unsupervised learning of finite mixture models, IEEE rans. on Pattern Analysis and Machine Intelligence,vol.24,no.3,pp ,March22. [6] S. Roweis, L. Saul, and G. E. Hinton, Global coordination of local linear models, in Neural Information Processing Systems: Natural and Synthetic, 21 Conference on, Dec.21, pp [7] J. Verbee, Learning nonlinear image manifolds by global alignment of local linear models, IEEE rans. on Pattern Analysis and Machine Intelligence, vol.28,no.8,pp , Aug. 26. [8] R. M. Neal and G. E. Hinton, A view of the EM algorithm that ustifies incremental, sparse, and other variants, in Learning in Graphical Models, pp Springer,1998. [9] R. D. Nowa, Distributed EM algorithms for density estimation and clustering in sensor networs, IEEE rans. on Signal Processing, vol.51,no.8,pp ,Aug.23. [] W. Kowalczy and N. Vlassis, Newscast EM, in In NIPS , pp , MI Press. [11] D. Gu, Distributed EM algorithm for Gaussian mixtures in sensor networs, IEEE rans. on Neural Networs, vol.19, no. 7, pp , Jul. 28. [12] P. A. Forero, A. Cano, and G. B. Giannais, Consensus-based distributed expectation-maximization algorithm for density estimation and classification using wireless sensor networs, in Acoustics, Speech and Signal Processing, 28. ICASSP 28. IEEE International Conference on, March28,pp [13] L. Xu, Comparative analysis on convergence rates of the EM algorithm and its two modifications for Gaussian mixtures, Neural Processing Letters,vol.6,no.3,pp.69 76,1997. [14] M. H. DeGroot, Reaching a consensus, Journal of the American Statistical Association, vol.69,no.345,pp , [15] U. A. Khan, S. Kar, and J. M. F. Moura, Distributed algorithms in sensor networs, in Handboo on Array Processing and Sensor Networs, pp JohnWiley&Sons,Inc., 2. [16] S. Boyd, P. Diaconis, and L. Xiao, Fastest mixing Marov chain on a graph, SIAM REVIEW,vol.46,pp ,23. [17] L. Xiao and S. Boyd, Fast linear iterations for distributed averaging, Systems and Control Letters, vol.53,pp.65 78, 23. [18] P. Diaconis and L. Saloff-Coste, What do we now about the metropolis algorithm?, J. Comput. System. Sci, pp.2 36, [19] C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics. Springer, 26.

VoI for Learning and Inference! in Sensor Networks!

ARO MURI on Value-centered Information Theory for Adaptive Learning, Inference, Tracking, and Exploitation VoI for Learning and Inference in Sensor Networks Gene Whipps 1,2, Emre Ertin 2, Randy Moses 2