A onte Carlo schee for diffusion estiation L. artino, J. Plata, F. Louzada Institute of atheatical Sciences Coputing, Universidade de São Paulo (ICC-USP), Brazil. Dep. of Electrical Engineering (ESAT-STADIUS), Leuven, Belgiu. Abstract In this work, we design an efficient onte Carlo schee for diffusion estiation, where global local paraeters are involved in a unique inference proble. This scenario often appears in distributed inference probles in wireless sensor networks. The proposed schee uses parallel local CC chains then an iportance sapling (IS) fusion for obtaining an efficient estiation of the global paraeters. The resulting algorith is siple flexible. It can be easily applied iteratively, or extended in a sequential fraework. In order to apply the novel schee, the only assuption required about the odel is that the easureents are conditionally independent given the related paraeters. Keywords: Diffusion estiation; Distributed inference; Parallel CC; Iportance sapling. I. ITRODUCTIO Distributed inference have obtained a assive attention in the last decades, for different applications [], [2], [3], [4], [5], [6], [7], [8]. For instance, in signal processing several algoriths have been proposed for solving inference probles in wireless sensor networks, where each node provides easureents about global local paraeters. aely, all the received easureents are statistically related to a global paraeter, whereas only a subset of the observations provides statistical inforation about the local variables. Furtherore, Bayesian ethods have becoe very popular in signal processing during the last years, with the, onte Carlo (C) techniques that are often necessary for the ipleentation of optial a posteriori estiators [9], [0], []. Indeed, C ethods are powerful tools for nuerical inference optiization [2], [3], [4], [5], [6]. They are very flexible techniques. The only requireent needs for applying an C technique is to be able to evaluate point-wise the posterior probability density function (pdf) [9], [0]. In this work, we introduce a siple flexible approach for the diffusion estiation of global paraeters, providing siultaneously the inference of the local paraeters. The proposed solution eploys parallel CC algoriths for analyzing the local features of the network an iportance sapling (IS) fusion for obtaining coplete estiators of the global paraeters. Each CC ethod addresses a different target posterior function, obtained by considering a subset of observations. This approach also presents several coputational benefits fro a onte Carlo point of view (as rearked This work has been supported by the Grant 204/2360-6 of the São Paulo Research Foundation (FAPESP), by the Grant 30536/203-3 of the ational Council for Scientific Technological Developent (CPq) ERC Consolidator Grant SEDAL ERC-204-CoG 647423. exhaustively in Section III-A). For instance, the ixing of the CC ethods is facilitated by the reduced nuber of easureents involved in the partial posterior, since this partial posterior distribution is iplicitly tepered [7], [8], [9], [6]. Furtherore, several parallel or related schees, proposed in literature, could adapted for this fraework [20], [2], [22], [8], [9], [23], [6], [7], [8]. uerical siulations show the advantages of the proposed approach. II. PROBLE STATEET In this work, we are interested in aking inference the following variable of interest, x [ ] Θ (G) Θ Θ (L). v v A RD θ, () coposed by the vectors x [x,..., x dx ] R d X v [v,,..., v d,] R d, for,...,. The vector Θ (G) x represents a global paraeter, whereas Θ (L) [v,..., v ] R L are local paraeters. We receive a set of d Y easureents, Y {z, z 2,..., z dy }, with each z j R (we assue z j be scalar only for siplicity), related to variable of interest Θ. We consider disjoint subset of Y, i.e., we can write Y y : y, y j y k, for all j k. We assue that the observations are conditionally independent, i.e., the likelihood function can be factorized as L(y : Θ) l (y x, v ). (2) Considering a prior probability density function (pdf) p(θ) p(x, v ), the coplete posterior pdf can written as Ω(Θ y : ) π (x, v y ), (3) where the partial posteriors are π (x, v y ) π (x, v y ) l (y x, v ) [p(x, v )] /. (4)
For the inference of x, an iportant role is played by the arginal posterior density G(x y : ) Ω(Θ y : )dv... dv (5) R L π (x, v y )dv... dv (6) where R L g (x y ), (7) g (x y ) π (x, v y )dv. R d (8) ote that the integrals above cannot be coputed analytically, in general. This fraework appears naturally in several applications, for instance in the so-called ode-specific Paraeter Estiation (SPE) proble within a sensor network [4], [5], [24], [25] (see Figure ). Furtherore, a siilar approach is considered in Big Data context [23], [6], [7], [8]. x v z 3 z z 2 y y 3 z z 0 9 v 3 z 4 z 5 z 8 v 2 z 7 y 2 Figure. Diffusion estiation in a sensor network. In this graphical exaples, the network is coposed by 8 nodes, dividing in 3 clusters providing 3 different vectors of easureents y, y 2 y 3. In this case, Y y y 2 y 3, y j y k 0 for j k (in this case, d Y 5, d Y2 3, d Y3 2). Each sensor can provide ore than one easureent. Reark. ote that we could assue easily that each partial posterior π (x, v y ) depends also to other local variables, e.g., π (x, v, v k, v j y ), with k, j. The algorith proposed in this work can be autoatically extended for this case. For siplicity, we have considered only one local variable in each partial posterior. III. BAYESIA IFERECE Our purpose is to ake inference about Θ [x, v : ] given Y y :. For instance, we desire to copute the iniu ean Square Error (SE) estiators of x v :, i.e., x x Ω(x, v : y : )dxdv :, (9) A x G(x y : )dx, (0) R d X z 6 v A v Ω(x, v : y : )dxdv :, () for,...,. In general, we are not able to calculate analytically the integrals above. Thus, we apply a onte Carlo (C) approach for coputing approxiately x v :. A. Benefits of the parallel C ipleentation The previous factorization of the posterior pdf suggests the use of parallel algoriths then cobine the corresponding outputs. This is convenient fro a onte Carlo point of view. aely, the use of parallel C ethods, each one addressing one partial posterior π, presents several coputational benefits: a) Each partial posterior π is ebedded in a state space of lower diensionality, specifically, x, v R d X d. This clearly helps the exploration of the space by the C algorith. b) Each partial posterior involves a saller nuber of easureents. This is an advantage since the ass of probability is in general ore disperse than when a big nuber of observations is jointly considered, producing a tepering effect (datatepering) [7], [6], [9]. This again helps the exploration of the state space (as suggested, e.g., in [9]). c) This scenario autoatically allows a possible parallel ipleentation. IV. DISTRIBUTED OTE CARLO IFERECE Let us consider that we able to draw saples θ (n) [x (n), v (n) ], with n,...,, directly fro each π, i.e., θ (),..., θ () π (x, v y ), with,...,. Due to the factorization of the coplete target pdf Ω(Θ y : ), for inferring the local variable v, we use only the saples v (n), n,...,, obtained fro π. For the global variable x, we can build different partial onte Carlo estiators. However, all the inforation contained in the different partial posteriors should be eployed for providing a ore efficient unique estiator of x. It can be done cobining adequately the partial onte Carlo estiators of x. Let us consider that we are able to draw saples fro each arginal pdfs g (x y ) in Eq. (8) also assue that we are able to evaluate it. In this case, we can use the following IS schee: ) Draw x (),..., x () g (x y ). 2) Assign the weight w (n) G(x(n) y : ) g (x (n) y ), (2) g k (x (n) k y k), (3) k;k to each saple x (n), for n,...,.
Then, the IS approxiation of the SE estiator x is x n w(n) n The previous approach has two ain probles: w (n) x (n) (4) We are not able to draw fro g (x y ) in Eq. (8). It is not possible to evaluate g (x y ) G(x y : ). A. Proposed Algorith A possible approxiate solution consists in the following procedure. First, we use parallel CC algoriths for drawing saples θ (n) [x (n), v (n) ] fro each partial posterior π (x, v y ). Given an index {,..., }, after that the chain converges, note that the saples x (n) are distributed as g (x y ) whereas θ (n) is distributed as π (x, v y ). Then, we build a kernel density approxiation [26], ĝ (x y ) n ϕ(x x (n), C), (5) for each {,..., }, using the generated saples x (n), n,...,, as eans of a kernel function ϕ with bwidth atrix C. In this way, we can copute an approxiate weight ŵ (n) k;k ĝ k (x (n) k y k), (6) so that the IS estiator x in Eq. (4) can be approxiated with x ŵ n ŵ(n) (n) x (n). (7) n Table I Figure 2 suarizes the proposed algorith. Table II suarized the notation about the estiators of x. x, v x, v 2...... x, v y y 2 y...... CC CC 2 CC...... KDE KDE 2 KDE IS fusion Figure 2. Graphical representation of Parallel arginal arkov Iportance Sapling schee: local CC chains are run, each one addressing one partial posterior π. A kernel density estiation (KDE) for approxiating the arginal partial posteriors g (x y ) is perfored locally. Then, an iportance sapling fusion is eployed for providing a global estiator of the global paraeter x. Theoretical support. After a burn-in period, the CC chains converges to the invariant target pdf (for, the convergence is ensured [9], [0]). aely after soe Table I PARALLEL ARGIAL ARKOV IPORTACE SAPLER (PIS).. Local CC saplers: Generate chains of length, i.e., θ () [x (), v () ],..., θ () [x (), v () ], with target pdf π (x y ) π (x y ),,...,. 2. Kernel density estiation: Build bg (x y ) X ϕ(x x (n), C), (8) n given a function ϕ scale paraeter C. 3. Global IS fusion: Copute the weights bw (n) Y k;k for,..., n,...,. 4. Outputs: Return the onte Carlo estiators x P P n bw(n) bg k (x (n) k y k), (9) X X n bw (n) x (n), (20) ev X v (n),,...,. (2) n Table II OTATIO OF DIFFERET ESTIATORS OF x. otation Description bx SE estiator in Eq. (9). ex onte Carlo estiator (approxiation of bx) in Eq. (4). x Approxiate onte Carlo estiator (approxiation of ex) in Eqs. (7)-(20). iterations, the CC ethods yields saples {x (n), v (n) } distributed according to π, so that {x (n) } are distributed as the arginal partial posteriors g (x y ) [9]. There exists an optial bwidth C [26] such that ĝ (x y ) g (x y ), for. x x, for. oreover, the IS estiator is consistent [9] so that x x x, as. In general, the optial bwidth C is unknown. However, using a bwidth C C, Eq. (5) provides an estiator of g (x y ) [26], in any case. Alternative IS weights. Other proper IS weights can be eployed in our fraework, providing consistent estiators [27], [28], [29]. For instance, a full deterinistic ixture approach [28], [30], [3] for ultiple iportance sapling (IS) schees can be used, i.e., As a consequence, we have that ŵ (n) w (n) G(x (n) w (n) y : ) j g j(x (n) y j ), (22) j g j(x (n) y j ) j g j(x (n) y j ). (23) It is possible to show that the application of these D-IS weights provides ore efficient IS estiators [29], [28] (i.e.,
with saller variance). B. Parallel etropolis-hastings algoriths For siplicity, we consider etropolis-hastings (H) ethods in the first step of the novel schee. However, ore sophisticated algoriths can be eployed. ore specifically, starting with a roly chosen θ (0), we perfor the following steps: For,..., : For n,..., : ) Draw θ fro a proposal pdf q (θ θ (n ) ). 2) Set θ (n) θ, with probability α in [, otherwise set θ (n) π (θ y )q (θ (n ) θ ) π (θ (n ) y )q (θ θ (n ) ) θ (n ) V. UERICAL SIULATIOS ], (24) (with probability α). In order to test PIS, we consider a Gaussian likelihoods f (y x, v ) d Y j (z j x, v, Σ ), (25) with y [z,..., z dy ],,...,, where z j, x, v R. ote that d Y y. We consider flat iproper priors over x v,,...,. We set 0, so that we have 0 different partial target pdfs π (x, v y ) f (y x, v ). The covariance atrices are Σ [σ, 2 ρ ; ρ σ2,] 2 with, [ σ,: 2, 3 2, 4, 5 2, 3, 7 2, 3, 5 2, 2, ], 2 [ σ 2,: 3, 2 3,, 4 3, 5 3, 2, 7 3, 8 3, 3, 0 ], 3 [ ρ : 0, 0, 2 0, 3 0, 4 0, 5 0, 6 0, 7 0, 8 0, 9 ] 0 We set x v : [ 5, 4, 3, 2,, 0,, 2, 3, 4] as true values of the paraeters. Then, given these values, we generate (according to the odel in Eq. (25)) different nubers of easureents for each partial likelihood, specifically, d Y: [2, 2, 50, 2, 5, 20, 5, 00, 2, 0]. Given a set Y y : of generated observations, in this toy exaple we can copute the SE estiator x 0.9 by a costly deterinistic nuerical procedure using a thin grid (for approxiating the arginal posterior then the corresponding expected value). Thus, we apply PIS in 400 independent runs copare the obtained estiator x with x 0.9, coputing the corresponding SE. We consider Gaussian functions ϕ for the kernel density estiation, with the optial bwidth suggested in [26]. For the proposal pdfs q s of the H algoriths, we eploy stard Gaussian ro walk proposals (with identity covariance atrix [ 0; 0 ]). We test PIS for different values of the length of the chains,, fro 5 to 2000. Furtherore, at each run, we also copute a trivial onte Carlo approxiation of x, given by x trivial x x (n), n x(n) n where x is the onte Carlo approxiation of x, obtained using the saples of -th chain. The results are shown in Figure 3, in ters of SE versus (length of the chains, i.e., nuber of H iterations for each partial target). Three curves are shown, corresponding to the use of stard IS weights (dashed line), ultiple IS weights (solid line) the trivial solution (dotted-dashed line). We can observe that PIS provides good results outperforing the trivial solution for > 40. This eans that, for 40, the saples generated by the H ethods still belong to the burn-in period the convergence to the invariant pdf is not reached. However, with a adequate nuber of iterations of the chains, PIS provides good results. In the exaple, the stard IS D-IS weights perfor siilarly (with a slight advantages for the D-IS weights). SE 0 0 0 0 2 0 3 IS IS Trivial 0 4 0 500 000 500 2000 Figure 3. SE in the estiation of ˆx obtaining by the onte Carlo approxiation x by PIS using stard IS weights (dashed line), D- IS weights (solid line) the trivial weights (dotted-dashed line), for the final fusion (seilog scale representation). VI. COCLUSIOS In this work, we have introduced a novel onte Carlo schee in order to obtain an efficient distributed estiation. The new Bayesian ethod provides a siultaneous estiation of local global paraeters. The estiation of the global paraeters takes into account all the possible statistical inforation. The proposed algorith is an iportance sapler that assigns weights to the saples obtained by the application of parallel CC ethods. Each CC addresses a different partial target distribution, considering only a subset of easureents. As future line, we consider the possible design of an iterative ipleentation of the proposed schee, where the proposal pdfs eployed by the CC algoriths are adapted online, generating in this way an interaction aong the parallel chains.
REFERECES [] J. Tsitsiklis, D. Bertsekas,. Athans, Distributed asynchronous deterinistic stochastic gradient optiization algoriths, IEEE Transactions on Autoatic Control, vol. 3, no. 9, pp. 803 82, 986. [2] F. S. Cattivelli, C. G. Lopes, A. H. Sayed, Diffusion recursive least-squares for distributed estiation over adaptive networks, IEEE Transactions on Signal Processing, vol. 56, no. 5, pp. 865 877, 2008. [3] F. S. Cattivelli A. H. Sayed, Diffusion LS strategies for distributed estiation, IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 035 048, 200. [4] J. Plata-Chaves,. Bogdanovic, K. Berberidis, Distributed diffusion-based LS for node-specific paraeter estiation over adaptive networks, IEEE Transactions on Signal Processing, vol. 3, no. 63, pp. 3448 3460, 205. [5]. Bogdanovic, J. Plata-Chaves, K. Berberidis, Distributed increental-based LS for node-specific adaptive paraeter estiation, IEEE Transactions on Signal Processing, vol. 62, no. 20, pp. 5382 5397, 204. [6] Steven L. Scott, Alexer W. Blocker, Ferno V. Bonassi, Hugh A. Chipan, Edward I. George, Robert E. cculloch, Bayes big data: The consensus onte Carlo algorith, in EFaBBayes 250th conference, 203, vol. 6. [7] W. eiswanger, C. Wang, E. Xing, Asyptotically exact, ebarrassingly parallel CC, arxiv:3.4780, pp. 6, 2 ar. 204. [8] R. Bardenet, A. Doucet, C. Holes, On arkov chain onte Carlo ethods for tall data, arxiv:505.02827, 205. [9] C. P. Robert G. Casella, onte Carlo Statistical ethods, Springer, 2004. [0] J. S. Liu, onte Carlo Strategies in Scientific Coputing, Springer, 2004. [] A. Doucet,. de Freitas,. Gordon, Eds., Sequential onte Carlo ethods in Practice, Springer, ew York (USA), 200. [2] L. artino J. íguez, A generalization of the adaptive rejection sapling algorith, Statistics Coputing, vol. 2, no., pp. 633 647, July 20. [3] W. J. Fitzgerald, arkov chain onte Carlo ethods with applications to signal processing, Signal Processing, vol. 8, no., pp. 3 8, January 200. [4]. Hong,. F Bugallo, P. Djuric, Joint odel selection paraeter estiation by population onte carlo siulation, Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 3, pp. 526 539, 200. [5] P.. Djurić, B. Shen,. F. Bugallo, Population onte Carlo ethodology a la Gibbs sapling, in EUSIPCO, 20. [6] P. Del oral, A. Doucet, A. Jasra, Sequential onte Carlo saplers, Journal of the Royal Statistical Society: Series B (Statistical ethodology), vol. 68, no. 3, pp. 4 436, 2006. [7] E. arinari G. Parisi, Siulated tepering: a new onte Carlo schee, Europhysics Letters, vol. 9, no. 6, pp. 45 458, July 992. [8] A. Jasra, D. A. Stephens, C. C. Holes, On population-based siulation for static inference, Statistics Coputing, vol. 7, no. 3, pp. 263 279, 2007. [9]. Chopin, A sequential particle filter ethod for static odels, Bioetrika, vol. 89, no. 3, pp. 539 55, 2002. [20] L. artino, V. Elvira, D. Luengo, J. Corer, Layered adaptive iportance sapling, Statistics Coputing, to appear, 206. [2] L. artino, V. Elvira, D. Luengo, A. Artes, J. Corer, Orthogonal CC algoriths, IEEE Statistical Signal Processing Workshop, (SSP), 204. [22] L. artino, V. Elvira, D. Luengo, A. Artes, J. Corer, Selly parallel CC chains, IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP), 205. [23] D. Luengo, L. artino, V. Elvira,. Bugallo, Efficient linear cobination of partial onte Carlo estiators, IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP), 205. [24] J. Plata-Chaves,. Bogdanovic, K. Berberidis, Distributed increental-based RLS for node-specific paraeter estiation over adaptive networks, in IEEE 2st European Signal Conference, 203. EUSIPCO 203, 203. [25] J. Plata-Chaves, A. Bertr,. oonen, Distributed signal estiation in a wireless sensor network with partially-overlapping nodespecific interests or source observability, in IEEE 40th International Conference on Acoustics, Speech Signal Processing, 205. ICASSP 205, 205. [26].P. W.C. Jones, Kernel Soothing, Chapan Hall, 994. [27] V. Elvira, L. artino, D. Luengo,. F. Bugallo, Efficient ultiple iportance sapling estiators, Signal Processing Letters, IEEE, vol. 22, no. 0, pp. 757 76, 205. [28] V. Elvira, L. artino, D. Luengo,. F. Bugallo, Generalized ultiple iportance sapling, arxiv preprint arxiv:5.03095, 205. [29] L. artino, V. Elvira, D. Luengo, J. Corer, An adaptive population iportance sapler: Learning fro the uncertanity, IEEE Transactions on Signal Processing, vol. 63, no. 6, pp. 4422 4437, 205. [30] E. Veach L. Guibas, Optially cobining sapling techniques for onte Carlo rendering, In SIGGRAPH 995 Proceedings, pp. 49 428, 995. [3] A. Owen Y. Zhou, Safe effective iportance sapling, Journal of the Aerican Statistical Association, vol. 95, no. 449, pp. 35 43, 2000.