Ergodicity in Stationary Graph Processes: A Weak Law of Large Numbers

Similar documents
Stationary Graph Processes and Spectral Estimation

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 22, NOVEMBER 15,

Stationary Graph Processes: Nonparametric Spectral Estimation

Reconstruction of graph signals: percolation from a single seeding node

INTERPOLATION OF GRAPH SIGNALS USING SHIFT-INVARIANT GRAPH FILTERS

Network Topology Inference from Non-stationary Graph Signals

Optimal Graph-Filter Design and Applications to Distributed Linear Network Operators

Sampling of graph signals with successive local aggregations

Principal Component Analysis

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Statistical signal processing

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

Digraph Fourier Transform via Spectral Dispersion Minimization

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

A Distributed Newton Method for Network Utility Maximization, II: Convergence

Lecture Notes 1: Vector spaces

ECE 275A Homework 6 Solutions

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides

Reconstruction of Graph Signals through Percolation from Seeding Nodes

On Spectral Properties of the Grounded Laplacian Matrix

11. Further Issues in Using OLS with TS Data

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Notes 6 : First and second moment methods

Reconstruction of Graph Signals through Percolation from Seeding Nodes

Statistics: Learning models from data

Linear Regression and Its Applications

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

arxiv: v1 [cs.na] 6 Jan 2017

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs 1

Mathematical foundations - linear algebra

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Q-Learning and Stochastic Approximation

DATA defined on network-like structures are encountered

Lecture 15 Perron-Frobenius Theory

Markov Chains and Stochastic Sampling

Graph Signal Processing: Fundamentals and Applications to Diffusion Processes

Detailed Proof of The PerronFrobenius Theorem

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Invertibility and stability. Irreducibly diagonally dominant. Invertibility and stability, stronger result. Reducible matrices

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

ECE Lecture Notes: Matrix Computations for Signal Processing

Lecture Semidefinite Programming and Graph Partitioning

STAT 200C: High-dimensional Statistics

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat

Multi-Robotic Systems

ECE534, Spring 2018: Solutions for Problem Set #3

Boolean Inner-Product Spaces and Boolean Matrices

Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking

Matrix Factorizations

Ch. 12 Linear Bayesian Estimators

Recall the convention that, for us, all vectors are column vectors.

Markov Chains, Random Walks on Graphs, and the Laplacian

Decentralized Stabilization of Heterogeneous Linear Multi-Agent Systems

Spectral Theorem for Self-adjoint Linear Operators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Review of some mathematical tools

WE consider the problem of estimating a time varying

26. Filtering. ECE 830, Spring 2014

Statistics 910, #5 1. Regression Methods

Efficient Sampling Set Selection for Bandlimited Graph Signals Using Graph Spectral Proxies

Principal Components Theory Notes

Matrix Vector Products

Lecture 2: From Classical to Quantum Model of Computation

Algebraic Representation of Networks

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

MATH36001 Perron Frobenius Theory 2015

1 Review: symmetric matrices, their eigenvalues and eigenvectors

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Statistical Learning Notes III-Section 2.4.3

We will discuss matrix diagonalization algorithms in Numerical Recipes in the context of the eigenvalue problem in quantum mechanics, m A n = λ m

LINEAR NETWORK OPERATORS USING NODE-VARIANT GRAPH FILTERS

Probability and Measure

Lecture 10: October 27, 2016

Hierarchical Clustering and Consensus in Trust Networks

Machine Learning. A Bayesian and Optimization Perspective. Academic Press, Sergios Theodoridis 1. of Athens, Athens, Greece.

Graph Signal Processing in Distributed Network Processing

Introduction to Real Analysis Alternative Chapter 1

Symmetric matrices and dot products

Discrete Signal Processing on Graphs: Sampling Theory

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Rethinking Sketching as Sampling: A Graph Signal Processing Approach

L(x; 0) = f(x) This family is equivalent to that obtained by the heat or diffusion equation: L t = 1 2 L

8.1 Concentration inequality for Gaussian random matrix (cont d)

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

Linear Algebra: Matrix Eigenvalue Problems

arxiv: v1 [quant-ph] 11 Mar 2016

Learning Graphs from Data: A Signal Representation Perspective

3 Integration and Expectation

Lecture: Local Spectral Methods (1 of 4)

1 Last time: least-squares problems

Designing Information Devices and Systems I Spring 2016 Official Lecture Notes Note 21

Semidefinite Programming

Principal Component Analysis

Statistical Data Analysis

Graph Signal Processing: Fundamentals and Applications to Diffusion Processes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

A lower bound for the Laplacian eigenvalues of a graph proof of a conjecture by Guo

Notes on Time Series Modeling

Transcription:

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) Ergodicity in Stationary Graph Processes: A Weak Law of Large umbers Fernando Gama and Alejandro Ribeiro arxiv:803.04550v [eess.sp] Mar 08 Abstract For stationary signals in time the weak law of large numbers (WLL) states that ensemble and realization averages are within ɛ of each other with a probability of order O(/ɛ ) when considering signal components. The graph WLL introduced in this paper shows that the same is essentially true for signals supported on graphs. However, the notions of stationarity, ensemble mean, and realization mean are different. Recent papers have defined graph stationary signals as those that satisfy a form of invariance with respect to graph diffusion. The ensemble mean of a graph stationary signal is not a constant but a node-varying signal whose structure depends on the spectral properties of the graph. The realization average of a graph signal is defined here as an average of successive weighted averages of local signal values with signal values of neighboring nodes. The graph WLL shows that this two node-varying signals are within ɛ of each other with probability of order O(/ɛ ) in at least some nodes. In stationary time signals, the realization average is not only a consistent estimator of the ensemble mean but also optimal in terms of mean squared error (MSE). This is not true of graph signals. Optimal MSE graph filter designs are also presented. An example problem concerning the estimation of the mean of a Gaussian random field is presented. Index Terms Graph signal processing, ergodicity, law of large numbers, unbiased, consistent, optimal estimators I. ITRODUCTIO Random signals and stochastic processes provide the foundation of statistical signal processing which is concerned with the problem of extracting useful information out of random (noisy) data. One important concept in the field is the notion of stationarity that can be defined on signals with regular structure such as images or signals in time [], [3]. In particular, wide sense stationarity (WSS) of time signals models processes which have constant mean and a correlation function that only depends on the time elapsed between different signal components. A fundamental property that arises in WSS is ergodicity: The equivalence between ensemble and realization averages [4], [5]. This property is valuable in situations where we have access to a single realization of the process because it allows estimation of ensemble averages a property of the process using realization averages a property of the individual given realization. Ergodicity results are a manifestation of the Law of Large umbers (LL) as they state convergence of the sample mean obtained from averaging all the time samples of a single realization to the true mean of the process. The Weak (W)LL states that this convergence is obtained in probability [6, Ch. 3]. The pointwise ergodic theorem proves almost sure convergence for stationary processes [7, Thm. 7..]. Supported by SF CCF 770 and ARO W9F70438. Authors are with Dept. of Electrical and Systems Eng., Univ. of Pennsylvania, {fgama, aribeiro}@seas.upenn.edu. Part of the results in this paper appeared in []. The goal of this paper is to generalize the WLL to signals supported in irregular domains described by arbitrary graphs [8], [9]. We do so by building on the theory of graph signal processing (GSP) [0] [3] and, more specifically, on the concept of WSS graph signals [4] [6]. GSP is built on the notion of a graph shift operator. This is a matrix representation of the graph that defines a linear operator that can be applied to graph signals. The graph shift diffuses the signal through the graph and it represents a local operation that permits comparison of the value of the signal with the values of neighboring nodes. This is a natural generalization of a time shift, which compares signal values at adjacent points in time and, indeed, the time shift is a particular case of a graph shift. This motivates definition of the graph Fourier transform (GFT) as a projection on the eigenvector space of the graph shift. Since eigenvectors can be shown to represent different modes of variability, the GFT provides a natural description of (graph) frequency components. The relationship between time and graph shifts is also exploited in the definition of graph WSS signals which are required to satisfy a form of invariance with respect to the graph shift operator [6]. As is the case of time signals, this invariance induces conditions in the first and second moments of the signal. The mean of a WSS graph signal must be aligned with an eigenvector of the graph shift operator and the covariance matrix is diagonalized by its eigenvector basis. As a consequence of the latter, WSS graph signals have a covariance structure that is completely characterized by a power spectral density (PSD). The question we ask in this paper is whether an analogous notion of ergodicity can be defined for WSS graph signals. We will provide an affirmative answer but in understanding our results it is important to explain the differences between ensemble and realization averages for graph signals and their respective counterparts for time signals. E.g., when we have a Markov random field observed by a sensor network it is ready to construct a graph linking the sensors so that the sensor s observations are WSS with respect to the graph. Fig. 5-(c) provides an illustration of the ensemble mean of the Markov random field monitored by the sensor network, Fig. 5-(a) measurements of a single realization, and Fig. 5-(b) the graph realization mean of this set of observations. It is apparent that the ensemble mean of a graph signal is not a constant and that, consequently, the graph realization average is not a simple average of the values observed by individual nodes. Still, the similarity between Fig. 5-(b) and Fig. 5-(c) illustrates that it is possible to define an ensemble mean for graph stationary signals (Def. ) and to define a graph realization mean (Def. ) that is close to the true ensemble mean.

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) We begin the paper by introducing necessary GSP concepts to define graph realization averages and graph WSS signals (Sec. II). To emphasize the difference between graph realization averages and simple averages, we refer to the former as graph shift averages. A graph shift average considers subsequent applications of the graph shift and averages the resulting values (Def. ). From the perspective of a given node, this is tantamount to computing a weighted average of its own information with the information of neighbors, followed by a weighted average with neighboring weighted averages, and so on. Eventually, each node utilizes network-wide information to refine an estimate of the mean at its own location while relying on interactions with neighboring nodes only this graph shift average is used to produce the signal in Fig. 5-(b) from the observations in Fig. 5-(a). To elucidate ergodicity properties (Sec. III) we study the moments of graph shift averages (Sec. III-A). We show that graph shift averages are unbiased estimators of the graph ensemble mean (Prop. ) and that they are themselves stationary with respect to the graph shift operator (Prop. ). The latter fact allows to show that the graph shift average behaves like a low pass filter that keeps the DC graph frequency component unchanged but attenuates all other coefficients. From this spectral properties we derive the graph weak law of large numbers which states that the difference between the ensemble average and the graph shift average exceeds ɛ with a probability of order O(/ɛ ) for a graph with nodes (Thm. ). This is the same property that can be established from signals in time. We observe that in the latter case it is possible to let grow to conclude convergence in probability; a result that we can recover as a particular case (Cor. ). In the case of graph signals we cannot let grow in general except if we assume the graph belong to a certain class that gives proper meaning to the limit of growing [7], [8]. We do that for the case of random Erdős-Rényi graphs to exemplify how convergence in probability results can be obtained from our probability bounds for finite (Cor. ). In time signals the realization mean is also an optimal estimator in the sense of minimizing the mean squared error (MSE). In the case of graph signals this is not true. The MSE of graph shift averages can be further improved by adequately rescaling each subsequent application of the graph shift; i.e., through the use of a graph filter (Sec. IV). ot surprisingly, the filter that minimizes the MSE is an ideal low pass filter (Thm. ) which we further show is also optimal in terms of minimizing the volume of the error ellipsoid (Thm. 3). umerical experiments on the application of graph shift averages are conducted on Erdős-Rényi graphs, covariance graphs and stochastic block models to corroborate convergence to the true ensemble mean as the number of nodes increases (Sec. V). An example application consisting of the aforementioned problem of estimating the mean of a Markov random field using a single realization of the field and relying on local information exchanges is presented and compared to existing approaches [9]. We close the paper with concluding remarks (Sec. VI). II. GRAPH SHIFT AVERAGES Let G = (V, E, W) be a connected graph with vertices n V, edges e E V V, and weight function W : E R. Further define the graph shift operator as a matrix S R having the same sparsity pattern of the graph G so that we can have [S] ij = s ij 0 only when (j, i) E. The graph shift operator is a stand in for any of the various matrix representations associated with the graph G [0]. Particular cases include the adjacency matrix A with nonzero elements [A] ij = a ij = W(j, i) for all (j, i) E, the diagonal degree matrix D = diag(a), and the normalized adjacency A = D / AD /. We assume that the graph shift operator is normal, which implies existence of an orthonormal basis of eigenvectors V = [v,..., v ] C satisfying VV H = I and a diagonal eigenvalue matrix Λ = diag(λ,..., λ ) C such that we can write the shift as S = VΛV H. Adjacency matrices of undirected graphs are symmetric and therefore always normal. We further restrict attention to nonnegative operators satisfying s ij 0 for all i, j. For these operators the Perron-Frobenius Theorem applies and we can guarantee that a unique real nonnegative eigenvalue attains maximal norm [0]. Without loss of generality we assume that λ is such eigenvalue so that we have λ n λ R +, for all n. () Although we may have eigenvalues with λ n = λ we will relax nomenclature to say that λ is the largest eigenvalue of S. The eigenvector v = [v,..., v ] T C associated with eigenvalue λ can be chosen to be real and nonnegative and it plays an important role in the definition of graph stationary signals (Section II-A) and the ergodicity theorems in Sections III and IV. The adjacency and normalized adjacency are nonnegative shift operators but the Laplacian L = D A does not belong to this category see Remark. Associated with the nodes of the graph we have a random signal x = [x,..., x ] T R. We use µ := E[x] to denote the mean of the signal x which represents an ensemble average. We use C x := E[(x µ)(x µ) H ] to represent the covariance matrix of the random graph signal x. The signal x is defined on an irregular domain and the graph shift S is a description of the underlying signal structure. The structure of the graph induces a domain-specific transform that we call the graph Fourier transform (GFT) of x associated with the shift S []. The GFT of x is the signal x obtained after projecting on the eigenvector basis of S and its inverse (i) transform is defined as the inverse projection operation, x = V H x x = V x. () The elements [ x] n = x n of x are known as the frequency coefficients of the signal and the eigenvectors v n are the frequency components or frequency basis. The GFT is also a random signal with mean µ := E[ x] = V H µ and covariance matrix C x := E[( x E [ x])( x E [ x]) H ]. To relate with the covariance matrix C x observe that it follows from the definition of the GFT in () that C x = V H E[(x E [x])(x E [x]) H ]V. The expectation in this latter expression is the covariance C x of the signal in the node domain implying that C x = V H C x V.

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 3 x 6 x 5 x x 4 x + A dc x x x 3 x 6 + x 5 x + x 6 x + x x 5 + x 4 x 3 + x x 4 + x 3 5 Al dc x xk xk xk xk x k xk Figure. Graph shift average in discrete-time processes. A discretetime processes can be described as supported by a directed cycle graph. Then, application of the graph shift S dc = A dc moves the value of the signal at one node to the next, generating the causality typical of time. This means that, by aggregating all shifted version of the signal (i.e. computing the graph shift average) at a single node, one can construct the arithmetic mean at every node. Of particular importance in our future discussions is the cyclic graph G dc with edge set composed of all the edges of the form (n, + n mod ) for n [, ]; see Figure. Let A dc denote the corresponding adjacency matrix with nonzero elements [A dc ] +n mod,n =. The cyclic graph is a natural description of discrete (periodic) time and we can therefore say that a discrete time signal x is a graph signal defined on the cyclic graph G dc. Since the cycle is a circulant graph it is diagonalized by the Fourier basis and it follows that the GFT reduces to the conventional DFT for this particular graph. Our interest in this paper is to define and study graph shift averages of random graph signals. To motivate the definition recall that in the case of time signals the time average of the signal x is the arithmetic mean ˆµ = x n. (3) n= If we think of x as a windowed version of a stationary stochastic process, we know that as grows, the realization average ˆµ converges to the ensemble average of the process E[x n ] = µ [5]. For general graph signals, notice that the shift operator S represents a local transformation that when applied to the graph signal x produces the signal y = Sx in which the component [y] i = y i = (j,i) E s ijx j depends on the values of x at nodes j that are adjacent to i. The locality of the shift operator motivates the following definition of graph shift averages. Definition. Given a graph with shift operator S and a corresponding graph signal x, the graph shift average of x is the graph signal ˆµ := S l x, (4) α(s) for some constant α(s) > 0 that depends on the topology of the underlying graph support; see Proposition. The graph shift average is a simple diffusion of the signal x through the graph. A practically important property of diffusions is that they can be implemented in a distributed manner. ode n can evaluate the nth component of ˆµ through communication exchanges with neighboring nodes. When S = A dc application of the shift operator results in the signal y = A dc x with components y +n mod = x n implying that y is a time shifted copy of x. Making α(s) = and S = A dc in (4), it follows that ˆµ = ˆµ. This justifies the use of (4) as a generalization of (3). As we will see in Section III, the use of the graph shift average as a generalization of realization averages is further justified by its relationship to the ensemble average µ. This relationship will hold for graph stationary signals [4] [6] which we introduce in the following section. A. Wide Sense Stationarity of Graph Signals Wide sense stationarity (WSS) of graph signals is defined with respect to an associated graph shift operator S and imposes certain statistical structure on the random graph signal x. We adopt the following definition [6]. Definition (WSS Graph Signal). Let x R be a random graph signal supported on the -node graph G having associated graph shift S R which we assume nonnegative and normal with eigenvalue decomposition S = VΛV H. The signal x is a WSS graph signal with respect to the graph shift S if (i) The expectation of x is proportional to the eigenvector of the graph shift S associated to its largest eigenvalue. I.e., there exist a constant µ R such that µ := E[x] = µv. (5) (ii) The covariance matrix of x is diagonalized by the eigenvector basis of the shift operator. I.e., there exists a vector p R such that [ := E (x µ)(x µ) H] = Vdiag(p)V H. (6) C x We say that p is the power spectral density (PSD) of x. In aligning the mean with an eigenvector of S, Property (i) in Def. implies that the direction of the mean remains unchanged after successive applications of the shift. This is an intuitively reasonable invariance that is consistent with stationarity of time signals because the Perron-Frobenius root of A dc is v =. Further observe that because µ = µv the expected value of the frequency representation of x is E[ x] = V H µ = µe where e = [, 0,..., 0] T is the first element of the canonical basis of R. Property (ii) implies that the graph frequency representation of x is comprised of uncorrelated components. Indeed, since we know that the spectral representation covariance is C x = V H C x V and the covariance accepts the form in (6) we conclude that [ := E ( x E [ x])( x E [ x]) H] = diag(p). (7) C x The covariance in (7) not only implies that frequency coefficients are uncorrelated but also states that the components

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 4 of the power spectral density p = [p,..., p ] T represent the energy of the frequency coefficients of x. amely, that E[( x n E[ x n ]) ] = p n. This justifies referring to p as the PSD of x. When particularized to time signals, we recover the usual definition of time PSD for time stationary signals. Property (ii) can be shown to be equivalent to asking the covariance matrix C x to obey a form of invariance with respect to applications of the graph shift operator [6]. The goal of this paper is to study the relationship between the graph shift average ˆµ in (4) and the ensemble average µ, which in the case of graph WSS signals satisfies (5). We will see that for graph WSS, these two averages are close in terms of expressions that are similar to what follows from laws of large numbers of ergodic processes. Remark. The selection of the eigenvector v to codify the direction of the mean of a stationary graph signal is justified by the unique properties of the Perron-Frobenius root. We know that λ is real and that v can be chosen to be real with nonnegative components []. This means v has no zero crossings and in that sense it makes for a reasonable choice of DC component []. Another important property of v is that it corresponds to the eigenvalue with minimal variation []. Specifically, define the total variation of a graph signal x as T V (x) := x (/λ )Sx. (8) The expression in (8) compares components of x with the neighborhood average Ax normalized by λ and it is therefore a proper measure of signal variability. The normalization by λ is so that shifting the signal around does not cause an unbounded grow of the signal energy [] recall that λ is the largest eigenvalue of S. The eigenvector v is the one with minimal variation, i.e., T V (v ) < T V (v n ) for all n. III. GRAPH ERGODICITY If we restrict attention to the cyclic graph adjacency A dc, classical results for stationary signals state that the graph shift average in (4) is unbiased, E[ˆµ ] = µ = µ. We further know that the the weak Law of Large umbers (WLL) holds [] and that we therefore have ( ) ( ) P ˆµ µ n ɛ O ɛ. (9) We begin by showing that ˆµ can be made unbiased by properly defining α(s) (Section III-A) and follow by considering generalizations of (9) (Section III-B). A. Moments of Graph Shift Averages As we show in the following proposition, the expectation of the graph shift average ˆµ is aligned with v which is itself aligned with the mean µ. To make the estimator unbiased we need to select α(s) properly as we also show in the following proposition. Proposition. Let x be a WSS graph signal on the -node graph G described by a normal graph shift operator S = VΛV H. Let µ be the ensemble average [cf. (5)] and ˆµ be the graph shift average [cf. (4)]. Then, E[ˆµ ] = λl µ. (0) α(s) In particular, making α(s) = λl the graph shift average is an unbiased estimator as (0) reduces to E[ˆµ ] = µ. Proof. To compute the expectation of ˆµ use the expression in (4) and the linearity of the expectation operator to write E[ˆµ ] = S l E[x]. () α(s) Observe now that since the signal x is stationary on the graph, it holds that its expectation is E[x] = µv for some scalar µ. Substituting this fact into () and reordering terms yields E[ˆµ ] = S l µv = µ S l v. () α(s) α(s) But since v is an eigenvector of S associated with eigenvalue λ we have S l v = λ l v which reduces () to E[ˆµ ] = µ λ l α(s) v. (3) Using the condition in (5) we substitute µ = µv in (3) and reorder terms to obtain (0). Henceforth, we use α(s) = λl in all subsequent analyses so that ˆµ is an unbiased estimator of µ, ˆµ = S l x. (4) λl The term / λl is a normalization that plays a role that is equivalent to the / factor in the time average in (3). In fact, if we make λ = we have / λl = /. This is always possible as it just entails a normalization of the unit that is used to measure proximity between the nodes of the graph. We keep working with arbitrary λ for generality. Writing the componentwise version of E[ˆµ ] = µ = E[x] we observe that E[[ˆµ ] k ] = µ k = E[x k ] for k =,...,. which means that node k can use the graph shift average to estimate the mean of the local random signal x k. This is of practical importance, we recall, because the diffusion process that generates (4) can be implemented in a distributed manner. This means that node k can rely on communication with neighboring nodes to aggregate information from the network to refine its estimate of the mean of the signal component it has observed. Of course, the mean of x k is itself E[x k ] = µ k so the question arises if the graph shift average is actually a refinement of simply using x k as an estimate of µ k. To investigate this question, notice that ˆµ is a random signal that is WSS with respect to the shift operator S. The following proposition states this formally and computes the PSD of ˆµ. Proposition. The graph shift average in (4) is WSS with respect to the graph shift S [cf. Def. ]. The covariance can

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 5 be written as Cˆµ = Vdiag(q)V H with the elements of the PSD q = [q,..., q n ] T explicitly given by / q n = p n λ l n λ l. (5) Proof. First, we prove that the graph shift average (4) is WSS with respect to S [cf. Def. ]. It is noted that (i) is satisfied since E[ˆµ ] = µ = µv where v is an eigenvector of S. To prove that (ii) holds, we compute the covariance matrix Cˆµ and show that it is diagonalizable by V. We start by using (4) to write ˆµ µ = = S l x µ (6) λl λl S l (x µ) + λl S l µ µ (7) Using the fact that µ = µv, the second term in (7) is equivalent to λl S l µv µ = µ S l v µ. (8) λl Recalling that S l v = λ l v since v is the eigenvector of S associated to λ and reordering terms we further get µ λl λ l v µ = λl µv µ = 0 (9) λl where property (i) of WSS graph signals µ = µv was used once more. Canceling out the second term of (7), then (6) yields ˆµ µ = S l (x µ). (0) λl Eq. (0) can be immediately used to compute the covariance matrix Cˆµ = E[(ˆµ µ)(ˆµ µ) H ] as follows ( ) ( ) Cˆµ = S l C x (S l ) H λl λl () where linearity of the expectation and the fact that C x = E[(x µ)(x µ) H ] was used. But x is WSS on S and thus C x = Vdiag(p)V H. This fact, together with S = VΛV H and reordering terms yields Cˆµ = λl V ( Λ l ) diag(p) ( ) (Λ l ) H V H. () Since Λ is a diagonal matrix, then () can be written as Cˆµ = Vdiag(q)V H (3) for a PSD vector q R, proving that (ii) in Def. holds and thus ˆµ is WSS on S. Furthermore, each element q n of the PSD q is given by (5) completing the proof. ote that the first element of the PSD is q = p, so that this frequency coefficient remains unchanged while the rest are attenuated. In other words, the graph shift average (4) acts as a low-pass graph filter that decreases the power of the signal on all frequencies except the lowest frequency coefficient corresponding to the mean. It is further observed that estimating the mean by simply using x k incurs in a variance var(x k ) = E[(x k µ k ) ] that is given by E[(x k µ k ) ] = [C x ] k,k = p n v k,n (4) n= which depends on the power of all frequencies p n. It becomes clear then, that if we lower the power p n, the variance is reduced. Thus, the fact that ˆµ acts as a low-pass graph filters plays a key role in the convergence of the graph shift average to the ensemble average. This is further elaborated in the next section. Remark. Our assumption on the graph shift being nonnegative precludes use of the Laplacian L = D A as a graph shift. The nonnegative assumption can be relaxed to the existence of a real maximal eigenvalue whose associated eigenvector defines the direction of the mean. This would allow the use of symmetric Laplacians since the eigenvalues are real and nonnegative. We could then define a graph shift average as in (4) to which Propositions and would apply. While mathematically correct, the results are irrelevant because when S = L the mean of the process is naturally aligned with the eigenvector associated with the smallest eigenvalue as opposed to the one associated with the largest eigenvalue. Indeed, since L = 0 it follows that requiring E [x] = µ is a natural choice for the mean of the WSS graph signal x. The eigenvalue associated with eigenvector is 0, which is the smallest a Laplacian eigenvalue can be. This observation is consistent with the fact that the signal Lx computes the difference between the value at a node and the average values of its neighbors. The Laplacian diffusion then acts as a highpass operation and it is therefore natural that it amplifies high frequency components as opposed to adjacency diffusions which are averaging operations and therefore amplify low frequency components. B. Weak Law of Large umbers Convergence of graph shift average (4) to the ensemble average µ of the process is determined by the following theorem. Theorem (Weak law of large numbers). Let x be a WSS graph signal on a -node graph G characterized by a normal graph shift operator S = VΛV H. Let λ R be the largest positive eigenvalue such that λ n λ, λ λ n for all n =,...,. If λ > and λ n /λ = o( δ/() ) for some δ > 0, or if λ =, then, if p n < for all n, we have that min P ( [ˆµ µ] k > ɛ) p k=,..., ɛ + o( δ ). (5) Additionally, max k=,..., P ( [ˆµ µ] k > ɛ) p ɛ + o( δ ). (6)

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 6 Thm. states that the probability of the graph shift average (4) being far from the true ensemble average µ at any node decreases polynomially as the size of the graph increases, provided that the graph spectra satisfies certain conditions. In particular, eq. (5) states that there exists a node for which the probability of the graph shift average being arbitrarily far from the ensemble average decreases to zero as the number of nodes in the graph increases. Alternatively, we can make ɛ = / and get a concentration inequality by which the graph shift average gets arbitrarily close to the mean at a rate of / with constant probability. It is shown later that these conditions on the graph spectra do hold for some practical graphs such as the directed cycle (Cor. ) and Erdős-Rényi graphs (Cor. ). Eq. (5) is reminiscent of the traditional WLL for time sequences [cf. (9)] and thus prove ergodicity of the first moment in WSS graph signals. In order to prove Thm. some preliminary results are needed. First, we compute a bound on the probability of error when estimating the mean at a single node, resulting in Lemma. We observe that such bound depends on the PSD of the estimator q (5) and on the value of the eigenvectors at the given node. We study the behavior of the PSD q with the number of nodes and find, in Lemma, the conditions on the graph spectra under which the PSD q decreases with increasing. Finally, we couple this result together with Lemma to prove ergodicity in Thm.. Corollaries and show applications of Thm. to the particular cases of directed cycle and Erdős-Rényi graphs. Lemma (Error bound). Let x be a WSS graph signal on a -node graph G described by a normal graph shift operator S = VΛV H. Let E[x] = µv be the mean of the process and C x = Vdiag(p)V H be the covariance matrix, where p n < for all n =,...,. Then, at node k {,..., } we have P ( [ˆµ µ] k > ɛ) ɛ n= q n v k,n. (7) Proof. Let e k be a vector containing all zeros except for a in position k {,..., }. Then, we can write [ˆµ µ] k = e T k (ˆµ µ). Since E[e T k ˆµ ] = [µ] k we have that var(e T k (ˆµ µ)) = e T k Cˆµe k = e T k Vdiag(q)VH e k <. oting that e T k V = [v k,,..., v k, ] is the kth row of V, the variance of e T k (ˆµ µ) turns out to be var ( e T k (ˆµ µ) ) = q n v k,n. (8) n= Finally, Chebyshev s inequality [7, Thm..6.4] is applied to obtain (7). The bound given in Lemma is a Chebyshev type bound for estimating the mean at a single node. The Chebyshev bound is the basic building block to prove the traditional WLL. ote that, if we let x have i.i.d. random variables with variance σ as elements, then p n = 0 for all n =,..., so that q = [q, 0,..., 0] T where q = p = σ. Then, only the value v k, remains in (7) and, in the case of a directed cycle, it has a value of v k, = /, see the specifics in Corollary. Also, observe that {v k,n, n =,..., } are the values contained in the kth row of V, as noted in the proof of the lemma. Because V is an unitary matrix, rows also form an orthonormal set. Understanding (7) from a GFT viewpoint, we note that the performance of the estimation at a single node k depends on the variance of the nodes within the ( )-hop neighborhood and on all frequency coefficients (all eigenvalues) of the graph shift operator. Additionally, it is affected by the value of each frequency component (each eigenvector) on node k alone. Another important observation stemming from Lemma is that the graph shift average (4) improves the estimation over simply looking at x k for every node k =,...,. More precisely, the variance of x k given in (4) has a similar form than the variance of [ˆµ ] k [cf. (8)]. But, since q n p n for every n [cf. (5)], then we have that, for every k =,..., var ([ˆµ ] k ) var (x k ) (9) proving that the graph shift average improves the estimation of the ensemble average at every node. In order to prove convergence of the graph shift average we need to analyze how q n and v k,n behave with the size of the graph. Lemma (Behavior of q n with size of graph ). Let G = (V, E, W) be a -node weighted graph that admits a normal graph shift operator S = VΛV H. Let λ R be the largest positive eigenvalue such that λ n λ, λ λ n for all n =,...,. If λ > and λ n /λ = o( δ/() ) for some δ > 0, or if λ =, then q n = o( δ ), n =,...,. (30) For n = we always have q = p. Proof. Let λ n = Re jθ for n =,...,, and where j denotes the imaginary unit j =. Assume first that λ > and without loss of generality that R. Then, using the geometric sum on (5) we have λ λ n q n = p n λ n λ p ( λ ) ( + R ) n ( R) ( λ ) (3) ow, because λ > then λ = O(λ ) and λ = O(λ ). Likewise, if R > then ( R) = O(R ) and ( + R ) = O(R ) so that ( λ ) ( + R ) ( λ ( R) ( λ = O ) R R λ ) ( = O R () λ () (3) And, by hypotesis, λ n /λ = R/λ = o( δ/() ) so that ( λ ) ( R) ( + R ) ( λ ) = O If R <, and since λ >, then ( ( λ ) ( + R ) ( R) ( λ = O ) ) ( ) λ n () = o( δ ). (33) λ () λ () ) = o( δ ). (34).

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 7 For λ =, we have that λl = so that q n = p n λ l n = p n R e jθ λ n. (35) For R < λ = we have that R e jθ = O(). If R = o(), then Re jθ = O() and the / guarantees that q n = o(/). If R = then e jθ / e jθ oscillates in a bounded fashion so that, again, the factor / guarantees q n = o(/) completing the proof. Lemma shows that q n is polynomially decreasing for n =,...,, under specific restrictions on the spectrum of the graph. Lemma is the building block to prove convergence of the graph shift average ˆµ. We are finally equipped with all the necessary tools to prove Thm.. Proof of Thm.. Consider (7). First, observe that because v = there is always a node k for which v k, / so that min v k, k=,...,. (36) Similarly, we have that max v k,. (37) k=,..., Let q max = max n=,..., {q n }. Then, from (7) together with (36) we have that min P ( [ˆµ µ] k > ɛ) ( p ) k=,..., ɛ + q max (38) where the fact that n= v k,n for all k was used (because the rows of V also form an orthonormal basis since V is unitary). Analogously, using (7) in combination with (37) yields max P ( [ˆµ µ] k > ɛ) k=,..., ɛ (p + q max ) (39) ow because the eigenvalues of S satisfy the assumptions of Lemma by hypothesis and because q max = q n for some n =,...,, then we know that q max = o( δ ) thus turning (38) into (5) and (39) into (6), completing the proof. To close this section we consider two examples of widespread use that satisfy the conditions in Thm., namely the directed cycle and Erdő-Rényi graphs. Directed cycle (Classical WLL). The directed cycle G dc represents the graph support for time-stationary signals, see Fig.. Then, by applying Thm. we expect to recover the traditional WLL. Corollary (Convergence of Directed Cycle). Let G dc be the directed cycle graph. Then, for any node k {,..., } the error bound is ( ) P x n µ > ɛ p ɛ. (40) n= Proof. ote that λ = so that Thm. holds. More specifically, q n = 0 for all n and v k, = / for all k {,..., } so that µ = µ is the constant vector. Finally, Sl x = n= x n because after shifts the values of the signal have been aggregated at all nodes due to the nature of the directed cycle, see Fig.. Corollary is a statement of the WLL for signals that are stationary in time. The result in this case is stronger than the one in (5) because it lacks the order term o( δ ). This term vanishes because in the case of a cycle graph the nth component of the estimator s PSD is q n = 0, n =,...,. It is also stronger in that the minimum disappeared since, after shifts, all nodes have aggregated the sample mean, so any node yields the same estimator and thus the same probability of error. Finally, note that if x are i.i.d. r.v. then the DC component of the signal is p = σ and (40) is the Chebyshev s bound that leads to the classical WLL. Erdős-Rényi (ER) graphs. Another family of graphs that satisfies Thm. are ER graphs. These graphs have the particularity that the largest eigenvalue grows linearly with the number of nodes whereas the rest of the eigenvalues have a growth rate that does not exceed. This means that the graph is well-suited for estimation since the PSD of the graph shift average concentrates around the largest eigenvalue corresponding to the mean of the process. This shows in the proof of the following corollary. Corollary (WLL for Erdős-Rényi graphs). Let G ER be an ER graph of size with edge probability p ER such that p ER β. Then, for any node k {,..., } and any 0 < δ < we have that P ( [ˆµ µ] k > ɛ) p ɛ + o( δ ). (4) Proof. First, note that λ = p ER + o() and that by the semi-circle law, with probability o() all eigenvalues except the largest one lie in the interval ( c, c ) for any c > p ER ( p ER ) [3], [4]. Then, we have that λ c so that λ /λ = o( δ/() ) for any 0 < δ <, satisfying Thm.. Additionally, because vk, = + o( (/ r) ), 0 < r < / with probability o(), see [5], then any node k {,..., } yields similar probability of error. Corollary states that the estimator obtained at any node is arbitrarily close to the ensemble average at that node, with a convergence rate that is polynomial on the size of the graph. Remark 3 (Infinite diffusions on a fixed size graph). Alternatively, one could think of considering a graph of fixed size and unbiased estimator ˆµ L = ( L λl ) L Sl x in which the signal is diffused (L ) times, L independent of. We observe that, if λ > and λ n < λ, or if λ =, then q n 0 as L for n =,...,, and q = p. This implies that bound in Lemma hits a fundamental limit given by lim L P( [ˆµ L ˆµ k > ɛ) q v k, /ɛ. Since v k, depends on the graph topology, and is fixed for fixed, then the bound cannot be shown to decrease any further, even if the signal is diffused an infinite number of times. The latter situation is typically known as the consensus problem and we note that the result obtained in Thm. is fundamentally

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 8 different since it deals with convergence of the graph shift average (4) as the size of the graph gets larger. Remark 4 (Power method). The power method is an iterative algorithm for computing the eigenvector associated to the largest eigenvalue of a matrix [6, Sec. 0.3]. The procedure consists of repeatedly applying the given matrix to an initial vector so that the result eventually converges to the eigenvector, after proper normalization: (/ S l x ) S l x = v as l. If we view the graph shift average (4) in product form [7], then it consists on repeatedly applying a collection of different matrices to the available realization. We observe two main differences. First, we apply a different collection of matrices depending on the number of nodes and the diffusion steps, all these matrices might not be the same as in the power method; this is necessary to estimate the mean instead of just the eigenvector. Second, the rate of convergence for the power method depends on λ /λ = o( δ/() ) whereas for (4) we have o( δ ) which is faster. IV. OPTIMAL MEA ESTIMATIO WITH GRAPH FILTERS In cases where the graph spectra does not fall under the conditions of Lemma and thus Thm. cannot be applied, we analyze the convergence of the graph shift average (4) in the following lemma. Lemma 3 (on-convergent graphs). Let G = (V, E, W) be a weighted -node graph that admits a normal graph shift operator S = VΛV H. Let λ R be the largest positive eigenvalue such that λ n λ, λ n λ for all n =,...,. Let M be the set of indices m such that λ m /λ does not satisfy o( δ/() ) for any δ > 0. If λ > and M is nonempty, or if λ <, then for any node k {,..., } it holds that P ( [ˆµ µ] k > ɛ) p ɛ v k, + o() (4) + λ p m λ m v k,m ( + o()). m M If λ <, then M = {,..., }. Proof. In analogy with the proof of Lemma we prove that for the conditions of λ < for which M = {,..., } or for the case when λ > and M is nonempty, then λ q n = p n λ n ( + o()) (43) First, let m M and λ m = Re jθ with λ > and j =. Then, since R/λ does not decrease any faster than δ/(), we have λ m λ = R cos(θ) + R λ + λ = + o(). (44) For λ < we have that, since R/λ and R = o() and λ = o(), then R e jθ = + o() and λ = + o(), completing the proof. For the case in which λ < we reach a fundamental limit under which is not possible to achieve a better estimation. This situation occurs because on each successive step of the diffusion process, the information harnessed from neighboring nodes is less and less (because all the eigenvalues are less than ), eventually making it impossible to accurately estimate the mean. Alternatively, when λ >, if M = o() then the graph shift average (4) is still consistent since at most finitely many values of q m do not follow the rate o( δ ). When M = O(), then no assertions about the convergence rate of the graph shift average can be done. Alternatively, convergence rates can be tuned by the use of graph filters. Linear shift-invariant (LSI) graph filters are linear transformations that can be applied to the signal in a local fashion, operating only with the values of the signal at the neighborhood of each node [7]. More precisely, let {h l } L be a set of L filter taps, then a LSI graph filter is the matrix H = L h ls l, where S 0 = I. ote that the output signal L y = Hx = h l (S l x) (45) can be computed by accessing the values on the nodes in the (L )-hop neighborhood at most. For each hop l, the resulting shifted value on the node is further weighted by filter tap h l, l = 0,..., L. The effect of LSI graph filters on the signal can also be analyzed by projecting the output y on the frequency basis, ỹ = V H y. In order to do this, first define the GFT of the graph filter h C as follows [] λ λ λ L h = Ψh =....... λ λ λl h 0 h h. h L. (46) Matrix Ψ C L is a Vandermonde matrix that acts as the linear transformation that computes the frequency coefficients h C of the graph filter from the filter taps given in h R L. Observe that, unlike temporal signals and filters, the GFT of graph signals and graph filters are computed differently. The former depending on the eigenvectors of the graph shift operator, whereas the latter depends only on the eigenvalues. By denoting as the elementwise (Hadamard) product of two vectors, we can then obtain the frequency coefficients of the output signal directly from the frequency coefficients of the filter and the signal as follows ỹ = diag( h) x = h x. (47) ote that (47) is analogous to the convolution theorem for temporal signals [8, Sec..9.6]. Graph filters are useful in shaping graph signals and its frequency coefficients by means of local linear operations only. As shown in Lemma 3 the effect of the graph frequencies q n on ˆµ determine its convergence. Therefore, by carefully designing filter taps {h l } L we can obtain a desired graph signal with specific frequency characteristics that can aid in the convergence of the estimator. More precisely, we propose an optimal design of a LSI graph filter that can be applied to the single realization of the graph signal that, not only improves

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 9 the performance (by minimizing the mean squared error and the volume of the ellipsoid error) but that is also shown to converge for any graph, see Theorems and 3. First, we need to restrict the possible filter taps to yield an unbiased estimator. Proposition 3. Let x be a WSS graph signal on a -node graph G described by a normal graph shift operator S = VΛV H. Let E[x] = µ = µv where v is the eigenvector associated to λ, the largest, positive eigenvalue such that λ n λ, λ n λ for all n =,...,. Let {h l } be a set of tap filters, where at least one is nonzero. Let y be the output of processing x through the graph filter with taps given by h = [h 0,..., h ]. That is, y = h ls l x. Then, the estimator z = h y = lλ l h h l S l x (48) lλ l is unbiased for any choice of {h l }. Proof. The expectation of (48) is computed as follows. First, the fact that the mean is a linear operator is used, E[z ] = h h l S l E[x]. (49) lλ l ow, by definition of WSS graph signals (Def. ), it holds that E[x] = µv where v is the eigenvector associated to eigenvalue λ so that S l v = λ l v. Then, E[z ] = h lλ l µ h l S l v = µ h lλ l v h lλ l (50) Finally, noting that v does not depend on the index of the summation, and taking it out, we observe that numerator and denominator are the same, and thus, E[z ] = µv which completes the proof. h lλ l h = µv = µ. (5) lλ l Prop. 3 determines how to obtain an unbiased estimator from filtering a WSS graph signal with an arbitrary LSI graph filter. Filtered output z is itself a WSS graph signal with covariance matrix as specified in the following proposition. Proposition 4. The output z of LSI graph filter (48) is WSS with respect to the graph shift S [cf. Def. ]. The covariance can be written as C z = Vdiag(r)V H with the elements of the PSD r = [r,..., r n ] T explicitly given by r n = p n / h l λ l n h l λ l h n = p n h. (5) Proof. The unbiased LSI graph filter (48) has GFT coefficients given by h n / h [cf. (46)]. Then, from [6, Property ] it is obtained that each element of the PSD of the output r n of a LSI graph filter is equal to the squared magnitude of each frequency coefficient of the filter h n / h, multiplied by the corresponding PSD coefficient of the input p n. This yields. (5). The expression for the covariance matrix C z readily follows, cf. Sec. II. First, it is noted that, if h l = for all l = 0,...,, then (5) boils down to (5). Also, it is observed that r = p, while the rest of the frequencies are rescaled proportional to the amplitude of the specific frequency coefficient h n. It is finally noted that any LSI graph filter, when used as an unbiased estimator (48), leaves the first frequency coefficient unchanged, which is consistent with the notion of the mean being associated with the first frequency coefficient, and also with the intuition of trying to estimate this specific frequency coefficient (and hence, left unmodified). ow that we have a systematic way to construct unbiased estimators from arbitrary LSI graph filters, we can determine optimality criteria to obtain the best performing estimator. For instance, consider the estimator that minimizes the MSE which is given by h n tr [C z ] = p n h. (53) n= The design that minimizes the MSE is given by h n = 0 for n =,..., as shown in the following theorem. Theorem. Let x be a WSS graph signal on a -node graph G described by a normal graph shift operator S = VΛV H. Let λ be the largest positive eigenvalue such that λ n λ, λ n λ for all n =,...,. Let {h l } be a set of filter taps and { h n } n= the frequency coefficients of the filter. Then, the filter that minimizes the MSE (53) is given by and any h 0. h n = 0, n =,..., (54) Proof. Start by taking the derivative of (53) with respect to each frequency coefficient and set it to zero tr[c z ] = h h n p n = 0; (55) h h h n= tr[c z ] hn = p n = 0, n =,...,. (56) h n h ote that by setting h 0 and h n = 0 for all n =,..., both necessary and sufficient conditions are satisfied, and thus, these are the optimal frequency coefficients of the filter. Thm. is in agreement with intuition: the optimal design is the one that lets the mean frequency component unfiltered while suppressing the rest of the frequencies (i.e. an ideal low-pass graph filter), yielding an output that contains only the desired value. Interestingly enough, a similar solution also minimizes the volume of the ellipsoid given by the covariance matrix subject to an energy constraint. Theorem 3. Under the conditions of Thm., the frequency coefficients of the filter taps that solve minimize log(det(c z )) (57) h,..., h subject to h ν max (58)

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 0 0 0 0 0 0 0 0! 0! 0! 0! 0!3 Graph shift average Bound 0!4 0 3 7 8 36 46 60 77 00 (a) Graph shift average ˆµ (4) 0! 0!3 Optimal Estimator Bound 0!4 0 3 7 8 36 46 60 77 00 (b) Optimal estimator z (48)-(54) 0! Optimal 0!3 Optimal (Bound) Graph shift avg Graph shift avg (bound) 0!4 0 3 7 8 36 46 60 77 00 (c) Comparison Figure. Erdős-Rényi Graph. The full lines correspond to the mean bound and mean probability of error for 50 graph realizations. The dashed lines correspond to the maximum and minimum values obtained in some realization. (a)-(b) Estimated probability of error and theoretical bound for ER graphs of varying size from 0 to 00 for the graph shift average ˆµ and the optimal estimator z, respectively. (c) Comparison of the mean probability of error and mean theoretical bound for both the graph shift average and the optimal estimator. are also given by (54) and h = ν max for some ν max > 0 that determines the maximum energy in the GFT of the filter taps. Proof. Recall that the determinant is the product of the eigenvalues so that ( ) ( ) h n log(det(c z )) = log r n = log p n. (59) h n= n= This, in turn, can be rewritten as [ ] log(det(c z )) = log(p n ) + log( h n ) log( h ). n= (60) ote that log( h n ) is minimized when h n = 0. Also, log( h ) is a monotone increasing function so that log( h ) is minimized for the largest possible value of h. Then, by setting h n = 0 for all n =,..., and h = ν max so that the constraint is satisfied for the largest possible value of h we effectively minimize the objective function, thus completing the proof. V. UMERICAL EXPERIMETS In this section we consider numerical experiments to illustrate the effect of the graph shift average (4), the error bound (7) and the optimal estimator (48)-(54). In Sections V-A, V-B and V-C, we use these estimators in the context of Erdős-Rényi graphs, covariance graphs and stochastic block models, respectively. In Section V-D we deploy this estimator in the context of distributed estimation in sensor networks in which we want to estimate the mean of a Gaussian-Markov Random Field (GMRF). In this last section we compare the performance of the unbiased diffusion estimator with the distributed LMS estimator proposed in [9]. Unless otherwise specified, we consider an -node graph G described by a graph shift operator given by the adjacency matrix S = A = VΛV H R which is normal since the adjacency is symmetric (undirected graph). We consider a single realization x of a WSS graph signal with mean µ = µv, proportional to the eigenvector associated to the largest eigenvalue. The covariance matrix is determined by the PSD given by p R. We define the signal-to-noise ratio as SR = 0 log 0 (µ /p ) (in db). For each graph size, 50 different graphs are generated. For each one of these graphs, 0 5 different signal realizations are simulated and the error probability for some ɛ is estimated from these, see (7). Additionally, for each graph realization, we aggregate the results on a node k determined by the node with the largest v,k such that v,k < /. Results presented include those obtained averaging across all graphs realizations (full lines), as well as the maximum and minimum results (dashed lines). A. Erdős-Rényi Graphs In this first example, we consider an -node Erdős- Rényi (ER) graph where each edge is drawn with probability p ER = 0. independently of all other edges [9]. Only realizations of this graph that are connected are considered. We set µ = 3 and SR = 0 db. For computing the bound (7) we consider ɛ = 0. 0 SR/0. The PSD is given by p n = p 0 3(n )/()+, i.e. p consists of n logarithmically spaced points between 0p and 0 4 p. In the experiment, we vary from 0 to 00 and also simulate the optimal estimator (48). In Figs. a and b we show the error probability and the bound as a function of for the graph shift average and optimal estimator, respectively. We observe that both decrease as the size of the graph grows larger, as expected from the convergence of Thm.. In Fig. c we compare the results between both estimators. ote that, because the ER graphs satisfy Thm. (see Cor. ), then the graph shift averages converges, indeed, to the optimal estimator. B. Covariance Graphs As a second example, we consider covariance graphs of size. That is, we create a covariance matrix Σ at random of size, then we generate 0 5 training samples of a zero-mean gaussian random vector with covariance matrix given by Σ. We use these training samples to estimate the covariance matrix and set this estimate ˆΣ as the graph shift operator S = ˆΣ. Then, we generate a WSS graph signal over this graph with mean given by µ = 3 v and PSD given by p = p.

IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) 0 0 0 0 0 0 Graph shift average Bound 0!5 0 3 7 8 36 46 60 77 00 (a) Graph shift average ˆµ (4) Optimal Estimator Bound 0!5 0 3 7 8 36 46 60 77 00 (b) Optimal estimator z (48)-(54) Optimal Optimal (Bound) Graph shift avg Graph shift avg (bound) 0!5 0 3 7 8 36 46 60 77 00 (c) Comparison Figure 3. Covariance Graph. The full lines correspond to the mean bound and mean probability of error for 50 graph realizations. The dashed lines correspond to the maximum and minimum values obtained in some realization. (a)-(b) Estimated probability of error and theoretical bound for covariance graphs of varying size from 0 to 00 for the graph shift average and the optimal estimator, respectively. (c) Comparison of the mean probability of error and mean theoretical bound for both the graph shift average and the optimal estimator. 0 0 0 0 0 0 0! 0! 0! 0! 0!3 Graph shift average Bound 0!4 0 3 7 8 36 46 60 77 00 (a) Graph shift average ˆµ (4) 0! 0!3 Optimal Estimator Bound 0!4 0 3 7 8 36 46 60 77 00 (b) Optimal estimator z (48)-(54) 0! Optimal 0!3 Optimal (Bound) Graph shift avg Graph shift avg (bound) 0!4 0 3 7 8 36 46 60 77 00 (c) Comparison Figure 4. Stochastic block model. The full lines correspond to the mean bound and mean probability of error for 50 graph realizations. The dashed lines correspond to the maximum and minimum values obtained in some realization. (a)-(b) Estimated probability of error and theoretical bound for stochastic block models of varying size from 0 to 00 for the graph shift average and the optimal estimator, respectively. (c) Comparison of the mean probability of error and mean theoretical bound for both the graph shift average and the optimal estimator. For the simulation, we vary from 0 to 00. Estimated error probabilities and bounds can be found in Figs. 3a and 3b. It is observed that both the bound and the estimated error probability decrease with. In Fig. 3c we observe the comparison between the graph shift average and the optimal estimator. Given that a covariance graph does not necessarily satisfy the conditions on Thm., we observe that, while the graph shift average still has decreasing error probability, the optimal estimator does have a better performance having up to 3 orders of magnitude less error for = 60. C. Stochastic block models In the third example, we study the performance of the proposed estimators for a stochastic block model (SBM) [30]. A stochastic block model of nodes with C communities {C α, α =,..., C}, C α C β =, α β and C α=c α = V is constructed in such a way that edges within the same community C α are drawn independently with probability p α and edges between nodes belonging to different communities C α and C β, α β, are drawn independently with p α,β. We consider C = 4, p α = 0.6 and p α,β = 0. for all α, β =,..., C, α β. For the situations in which is not divisible by C, we add the remainder of the nodes to the communities, one in each, until there are nodes in the graph. The WSS graph signal is considered to have a PSD given by p n = p 0 3(n )/()+. We run simulations for varying graphs sizes from 0 to 00. Figs. 4a and 4b show the estimated probability of error and the theoretical bound as a function of. ote that both decrease as increases. Finally, Fig. 4c shows the compared error probability between the graph shift average and the optimal estimator. It is observed that for > both estimators yield the same result. It is believed that, since the SBM is a combination of ER graphs, then its eigenvalues might satisfy the conditions of Thm. and thus the graph shift average is optimal. D. Gaussian-Markov Random Fields As a final example we consider the problem of estimating the mean of a Gaussian-Markov random field (GMRF) [3]. This problem arises particularly in the context of sensor networks in which measurements are correlated based on the distance among these sensors. Let G = (V, E, W) be the -sensor network, see Fig. 6a for an example. We consider sensors deployed at random over a field. The influence between sensors i and j is described by a function ρ(i, j) = αe β ui uj, where ui and u j are R vectors describing the positions of sensor i and j respectively, i j, i, j =,...,. The constants α and β are chosen such that max i,j ρ(i, j) = ρ max and min i,j ρ(i, j) = ρ min. We set ρ(i, i) = 0. An edge between two sensors is drawn whenever the influence function exceeds a certain threshold

Estimated MSE IEEE TRASACTIOS O SIGAL PROCESSIG (SUBMITTED) x 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0 0 0. 0.4 0.6 0.8 x x 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0 0 0. 0.4 0.6 0.8 x (a) Sensor measurements (b) Graph shift average ˆµ (4) (c) True mean field Figure 5. Estimating the mean of a Gaussian Markov Random Field (GMRF). (a) A collection of = 000 sensors distributed uniformly take measurements of a GRMF. (b) The result of computing the graph shift average ˆµ (4). (c) The true mean field of the GMRF. It is observed how the graph shift average yields an accurate estimator of the true mean field. 0 0 Graph shift average ATC-GMRF Di,usion LMS (a) Sensor network 0! 0 3 7 8 36 46 60 77 00 (b) Varying graph size Figure 6. Gaussian-Markov random field estimation. (a) Example of a = 50 sensor network. Sensors are located at random. A weight function ρ(i, j) = αe β u i u j is computed among nodes. A weighted edge between two nodes i and j is drawn if ρ(i, j) > ρ thres (the weight of the edge is ρ(i, j)). (b) Estimated MSE for the graph shift average as well as the ATC-GMRF diffusion LMS estimator for a sensor network of varying graph size. (i, j) E ρ(i, j) ρ thres. The weight function W is given by the influence function W = ρ; note that we consider graphs without self-loops so that (i, i) / E. A WSS graph signal is a GMRF if it has a gaussian distribution with covariance matrix C x = a 0 (I as) [(I as) ] H [6]. As an illustrative simulation, consider Fig. 5. In this scenario, we consider = 000 sensor distributed uniformly at random over the area, each of which takes one measurement. The measurement of each sensor is shown in Fig. 5a. This realization of the GMRF process is then diffused through the graph to compute the graph shift average ˆµ as in (4). The resulting value at each sensor is shown in Fig. 5b. We can see how the output of the graph shift average is an accurate estimator of the true mean field of the GMRF process, which is shown in Fig. 5c. For the other simulations in Fig. 6, we set the mean of x to be µ = µ v with µ = 3. For building the graph we consider ρ min = 0.0, ρ max = and ρ thres =.75 ρ where ρ is the average of all the elements in {ρ(i, j), i j, i, j =,..., }. We set a = /λ and a 0 so that SR = 0 db. For each value of we simulate 50 different sensor networks, and for each network we simulate 0 3 realizations of the WSS graph signal to compute the estimated MSE. In our simulations we compare the MSE resulting from using the graph shift average (4) with the MSE obtained from estimating the mean according to the ATC-GMRF diffusion LMS algorithm introduced in [9]. In Fig. 6b we computed the estimated MSE as a function of the graph size. We observe that the graph shift average (4) performs better than the ATC-GMRF diffusion LMS. It is worth pointing out that the graph shift average is designed to work on stationary processes whereas the ATC-GMRF diffusion LMS algorithm also works for nonstationary GMRFs. VI. COCLUSIOS In the present paper we set to expand the field of statistical graph signal processing by developing a first notion of ergodicity. More precisely, we computed the realization average as a graph shift average: a diffusion of a single realization through the graph. We proved, in a result reminiscent of the WLL, that this graph shift average converges, under some mild conditions on the graph, to the ensemble mean. For graphs that do not satisfy the conditions for the WLL we proposed a LSI graph filter that, when applied to a single realization of the WSS graph signal, yields an unbiased estimator that converges to the ensemble mean on any graph;