Graph Signal Processing: Fundamentals and Applications to Diffusion Processes

Size: px

Start display at page:

Download "Graph Signal Processing: Fundamentals and Applications to Diffusion Processes"

Clarence McKinney
6 years ago
Views:

1 Graph Signal Processing: Fundamentals and Applications to Diffusion Processes Antonio G. Marques, Alejandro Ribeiro, Santiago Segarra King Juan Carlos University University of Pennsylvania Massachusetts Institute of Technology Thanks: Gonzalo Mateos, Geert Leus, and Weiyu Huang ICASSP17 New Orleans, USA - March 5-9, 2017 Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 1 / 120

2 Network Science analytics Online social media Internet Clean energy and grid analy,cs Desiderata: Process, analyze and learn from network data [Kolaczyk 09] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 2 / 120

3 Network Science analytics Online social media Internet Clean energy and grid analy,cs Desiderata: Process, analyze and learn from network data [Kolaczyk 09] Network as graph G = (V, E, W ): encode pairwise relationships Interest here not in G itself, but in data associated with nodes in V Object of study is a graph signal x Networks are common and interesting. So are graph signals? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 2 / 120

4 Network of economic sectors of the United States Bureau of Economic Analysis of the U.S. Department of Commerce E = Output of sector i is an input to sector j (62 sectors in V) Oil and Gas Services Finance PC AS FR RA OG IC CO MP SC MP Oil extraction (OG), Petroleum and coal products (PC), Construction (CO) Administrative services (AS), Professional services (MP) Credit intermediation (FR), Securities (SC), Real state (RA), Insurance (IC) Only interactions stronger than a threshold are shown Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 3 / 120

5 Network of economic sectors of the United States Bureau of Economic Analysis of the U.S. Department of Commerce E = Output of sector i is an input to sector j (62 sectors in V) A few sectors have widespread strong influence (services, finance, energy) Some sectors have strong indirect influences (oil) The heavy last row is final consumption This is an interesting network Signals on this graph are as well Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 4 / 120

6 Disaggregated GDP of the United States Signal x = output per sector = disaggregated GDP Network structure used to, e.g., reduce GDP estimation noise Signal is as interesting as the network itself. Arguably more Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 5 / 120

7 Disaggregated GDP of the United States Signal x = output per sector = disaggregated GDP Network structure used to, e.g., reduce GDP estimation noise Signal is as interesting as the network itself. Arguably more Same is true on brain connectivity and fmri brain signals,... Gene regulatory networks and gene expression levels,... Online social networks and information cascades,... Alignment of customer preferences and product ratings,... Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 5 / 120

8 Graph signal processing Graph SP: broaden classical SP to graph signals [Shuman et al. 13] Our view: GSP well suited to study network (diffusion) processes x x 4 x x 3 x 5 As.: Signal properties related to topology of G (locality, smoothness) Algorithms that fruitfully leverage this relational structure Q: Why do we expect the graph structure to be useful in processing x? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 6 / 120

9 Importance of signal structure in time Signal and Information Processing is about exploiting signal structure Discrete time described by cyclic graph Time n follows time n 1 x x 1 2 x 2 Signal value x n similar to x n 1 Formalized with the notion of frequency Cyclic structure Fourier transform x = F H x x x 4 3 x 3 ) (F kn = ej2πkn/n N Fourier transform Projection on eigenvector space of cycle Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 7 / 120

10 Covariances and principal components Random signal with mean E [x] = 0 and covariance C x = E [ xx H] Eigenvector decomposition C x = VΛV H Covariance matrix C x is a graph Not a very good graph, but still Precision matrix C 1 x a common graph too Conditional dependencies of Gaussian x x 1 σ 16 1 σ 12 x 6 x σ 15 σ 45 4 x 4 σ 13 σ x 2 x 3 Covariance matrix structure Principal components (PCA) x = V H x PCA transform Projection on eigenvector space of (inverse) covariance Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 8 / 120

11 Covariances and principal components Random signal with mean E [x] = 0 and covariance C x = E [ xx H] Eigenvector decomposition C x = VΛV H Covariance matrix C x is a graph Not a very good graph, but still Precision matrix C 1 x a common graph too Conditional dependencies of Gaussian x x 1 σ 16 1 σ 12 x 6 x σ 15 σ 45 4 x 4 σ 13 σ x 2 x 3 Covariance matrix structure Principal components (PCA) x = V H x PCA transform Projection on eigenvector space of (inverse) covariance GSP: Extend principles to general graphs and corresponding (graph) signals Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 8 / 120

12 Part I: Fundamentals Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 9 / 120

13 Graphs 101 Formally, a graph G (or a network) is a triplet (V, E, W ) V = {1, 2,..., N} is a finite set of N nodes or vertices E V V is a set of edges defined as ordered pairs (n, m) Write N (n) = {m V : (m, n) E} as the in-neighbors of n W : E R is a map from the set of edges to scalar values w nm Represents the level of relationship from n to m Often weights are strictly positive, W : E R++ Unweighted graphs w nm {0, 1}, for all (n, m) E Undirected graphs (n, m) E if and only if (m, n) E and w nm = w mn, for all (n, m) E Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 10 / 120

14 Time, Images, and Covariances Time: Unweighted and directed graphs V = {0, 1,..., 23} E = {(0, 1), (1, 2),..., (22, 23), (23, 0)} W : (n, m) 1, for all (n, m) E Images: Unweighted and undirected graphs V = {1, 2, 3,..., 9} E = {(1, 2), (2, 3),..., (8, 9), (1, 4),..., (6, 9)} W : (n, m) 1, for all (n, m) E Σ 11 Σ 22 Σ 12 p 1 p 2 Σ 14 Σ 13 Σ23 Σ 24 p 3 p 4 Σ 34 Σ 33 Σ 44 Covariances: Weighted and undirected graphs V = {1, 2, 3, 4} E = {(1, 1), (1, 2),..., (4, 4)} = V V W : (n, m) σnm = σ mn, for all (n, m) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 11 / 120

15 Adjacency matrix Algebraic graph theory: matrices associated with a graph G Adjacency A and Laplacian L matrices Spectral graph theory: properties of G using spectrum of A or L Given G = (V, E, W ), the adjacency matrix A R N N is { w nm, if (n, m) E A nm = 0, otherwise Matrix representation incorporating all information about G For unweighted graphs, positive entries represent connected pairs For weighted graphs, also denote proximities between pairs Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 12 / 120

Degree and k-hop neighbors If G is unweighted and undirected, the degree of node i is N (i) In directed graphs, have out-degree and an in-degree Using the adjacency matrix in the undirected case For

16 Degree and k-hop neighbors If G is unweighted and undirected, the degree of node i is N (i) In directed graphs, have out-degree and an in-degree Using the adjacency matrix in the undirected case For node i: deg(i) = j N (i) A ij = j A ij For all N nodes: d = A1 Degree matrix: D := diag(d) The kth power of A describes k hop neighborhoods of each node. [A k ] ij non-zero only if there exists a path of length k from i to j Support of A k : pairs that can be reached in k hops Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 13 / 120

17 Laplacian of a graph Given undirected G with A and D, the Laplacian matrix L R N N is L = D A Equivalently, L can be defined element-wise as deg(i), if i = j L ij = w ij, if (i, j) E 0, otherwise Normalized Laplacian: L = D 1/2 LD 1/2 (we will focus on L) L = Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 14 / 120

18 Spectral properties of the Laplacian Denote by λ i and v i the eigenvalues and eigenvectors of L L is positive semi-definite x T Lx = 1 2 (i,j) E w ij(x i x j ) 2 0, for all x All eigenvalues are nonnegative, i.e. λ i 0 for all i A constant vector 1 is an eigenvector of L with eigenvalue 0 [L1] i = j N (i) Thus, λ 1 = 0 and v 1 = (1/ N) 1 w ij (1 1) = 0 In connected graphs, it holds that λ i > 0 for i = 2,..., N Multiplicity{λ = 0} = number of connected components Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 15 / 120

19 Graph signals and the shift operator Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 16 / 120

20 Graph signals Consider graph G = (V, E, W ). Graph signals are mappings x : V R Defined on the vertices of the graph (data tied to nodes) Ex: Opinion profile, buffer congestion levels, neural activity, epidemic May be represented as a vector x R N x n denotes the signal value at the n-th vertex in V Implicit ordering of vertices (same as in A or L) x x x = x 2 = x Data associated with links of G Use line graph of G Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 17 / 120

21 Graph signals Genetic profiles Graphs representing gene-gene interactions Each node denotes a single gene (loosely speaking) Connected if their coded proteins participate in same metabolism Genetic profiles for each patient can be considered as a graph signal Signal on each node is 1 if mutated and 0 otherwise x 1 = Sample patient 1 with subtype 1 P P T C K J T I I Sample patient 2 with subtype 1 P P T C K J T I I x 2 = To understand a graph signal, the structure of G must be considered Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 18 / 120

22 Graph-shift operator To understand and analyze x, useful to account for G s structure Associated with G is the graph-shift operator S R N N S ij = 0 for i j and (i, j) E (captures local structure in G) S can take nonzero values in the edges of G or in its diagonal Ex: Adjacency A, degree D, and Laplacian L = D A matrices Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 19 / 120

23 Relevance of the graph-shift operator Q: Why is S called shift? A: Resemblance to time shifts S will be building block for GSP algorithms (More soon) Same is true in the time domain (filters and delay) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 20 / 120

24 Local structure of graph-shift operator S represents a linear transformation that can be computed locally at the nodes of the graph. More rigorously, if y is defined as y = Sx, then node i can compute y i if it has access to x j at j N (i). Straightforward because [S] ij 0 only if i = j or (j, i) E What if y = S 2 x? Like powers of A: neighborhoods y i found using values within 2-hops Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 21 / 120

25 Graph Fourier Transform Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 22 / 120

Discrete Fourier Transform (DFT) Let x be a temporal signal, its DFT is x = F H x, with F kn = 1 N e +j 2π N kn Equivalent description, provides insights Oftentimes, more parsimonious (bandlimited)

26 Discrete Fourier Transform (DFT) Let x be a temporal signal, its DFT is x = F H x, with F kn = 1 N e +j 2π N kn Equivalent description, provides insights Oftentimes, more parsimonious (bandlimited) Facilitates the design of SP algorithms: e.g., filters Many other transformations (orthogonal dictionaries) exist Q: What transformation is suitable for graph signals? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 23 / 120

27 Graph Fourier Transform (GFT) Useful transformation? S involved in generation/description of x Let S = VΛV 1 be the shift associated with G The Graph Fourier Transform (GFT) of x is defined as x = V 1 x While the inverse GFT (igft) of x is defined as x = V x Eigenvectors V = [v 1,..., v N ] are the frequency basis (atoms) Additional structure If S is normal, then V 1 = V H and x k = v H k x =< v k, x > Parseval holds, x 2 = x 2 GFT Projection on eigenvector space of shift operator S Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 24 / 120

28 Is this a reasonable transform? Particularized to cyclic graphs GFT Fourier transform Particularized to covariance matrices GFT PCA transform But really, this is an empirical question. GFT of disaggregated GDP GFT transform characterized by a few coefficients Notion of bandlimitedness: x = K k=1 x kv k Sampling, compression, filtering, pattern recognition Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 25 / 120

29 Meaning of the eigenvalues Columns of V are the frequency atoms: x = k x kv k Q: What about the eigenvalues λ k = Λ kk? When S = A dc, we get λ k = e j 2π N (k 1) λ k can be viewed as frequencies!! In time, well-defined relation between frequency and variation Higher k faster oscillations Bounds on total-variation: TV (x) = n (x n x n 1 ) 2 Q: Does this carry over for graph signals? No, only if S = L {λ k } N k=1 will be very important when analyzing graph filters Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 26 / 120

30 Interpretation for the Laplacian Consider a graph G, let x be a signal on G, and set S = L Define the quadratic form x T Lx = 1 2 w ij (x i x j ) 2 (i,j) E x T Lx quantifies the (aggregated) local variation of signal x Natural measure of signal smoothness w.r.t. G Q: Interpretation of frequencies {λ k } N k=1 when S = L? If x = v k, we get x T Lx = λ k local variation of v k Frequencies account for local variation, they can be ordered Eigenvector associated with eigenvalue 0 is constant Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 27 / 120

Frequencies of the Laplacian Laplacian eigenvalue λ k accounts for the local variation of v k Let us plot some of the eigenvectors of L (also graph signals) Ex: gene network, N =10, k =1,

31 Frequencies of the Laplacian Laplacian eigenvalue λ k accounts for the local variation of v k Let us plot some of the eigenvectors of L (also graph signals) Ex: gene network, N =10, k =1, k =2, k =9 T P I T P I T P I C P I C P I C P I K J T K J T K J T Ex: smooth natural images, N = 2 16, k = 2,..., 6 Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 28 / 120

32 Application: Cancer subtype classification Patients diagnosed with same disease exhibit different behaviors Each patient has a genetic profile describing gene mutations Would be beneficial to infer phenotypes from genotypes Targeted treatments, more suitable suggestions, etc. Traditional approaches consider different genes to be independent Not ideal, as different genes may affect same metabolism Alternatively, consider genetic network Genetic profiles become graph signals on genetic network We will see how this consideration improves subtype classification Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 29 / 120

33 Genetic network Genes Undirected and unweighted gene-to-gene interaction graph 2458 nodes are genes in human DNA related to breast cancer An edge between two genes represents interaction Coded proteins participate in the same metabolic process Adjacency matrix of the gene-interaction network Genes Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 30 / 120

34 Genetic profiles Genetic profile of 240 women with breast cancer 44 with serous subtype and 196 with endometrioid subtype Patient i has an associated profile x i {0, 1} 2458 Mutations are very varied across patients Some patients present a lot of mutations Some genes are consistently mutated across patients Q: Can we use genetic profiles to classify patients across subtypes? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 31 / 120

35 Improving k-nearest neighbor classification Signal Amplitude Distance between genetic profiles d(i, j) = x i x j 2 N-fold cross-validation error from k-nn classification k = %, k = %, k = % Q: Can we do any better using graph signal processing? Each genetic profile x i is a graph signal on the genetic network Look at the frequency components x i using the GFT (S = L) Example of signal x i Frequency representation x i Gene Frequencies Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 32 / 120

36 Discriminative power Distinguishing Power Define the discriminative power of frequency v k as DP(v k ) = i:y i =1 x i(k) i 1 {y i = 1} i:y i =2 x i(k) i 1 {y i = 2} / i x i (k), Normalized difference between the mean GFT coefficient for v k Among patients with serous and endometrioid subtypes Discriminative power is not equal across frequencies Frequency The discriminative power defined is one of many proper heuristics Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 33 / 120

37 Retaining most discriminative frequencies Keep information in frequencies with higher discriminative power Filter, i.e., multiply x i by diag( h p ) where { [ h p 1, if DP(v k ) p-th percentile of DP ] k = 0, otherwise Perform igft to get the graph signal ˆx i Classify with k-nn 15 error percentage original one frequency p = 60 p = 75 p = 90 k = 3 k = 5 k = 7 Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 34 / 120

38 Graph filters and network processes Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 35 / 120

Linear (shift-invariant) graph filter Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 36 / 120 A graph filter H : R N R N is a map between graph signals Focus on linear filters map

39 Linear (shift-invariant) graph filter Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 36 / 120 A graph filter H : R N R N is a map between graph signals Focus on linear filters map represented by an N N matrix Def1: Polynomial in S of degree L, with coeff. h = [h 0,..., h L ] T H := h 0 S 0 + h 1 S h L S L = L l=0 h ls l [Sandryhaila13] Def2: Orthogonal operator in the frequency domain H := Vdiag ( h) V 1, hk = g(λ k ) With [Ψ] k,l := λ l 1 k, we have h = Ψh Defs can be rendered equivalent More on this later, now focus on Def1

40 Graph filters as linear network operators Def1 says H = L l=0 h ls l Suppose H acts on a graph signal x to generate y = Hx If we define x (l) := S l x = Sx (l 1) y = L h l x (l) l=0 y is a linear combination of successive shifted versions of x After introducing S, we stressed that y =Sx can be computed locally x (l) can be found locally if x (l 1) is known The output of the filter can be found in L local steps A graph filter represents a linear transformation that Accounts for local structure of the graph Can be implemented distributedly in L steps Only requires info in L-neighborhood [Shuman13, Sandryhaila14] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 37 / 120

41 An example of a graph filter x=[ 1, 2, 0, 0, 0, 0] T, h=[1, 1, 0.5] T, y =( L l=0 h ls)x= L l=0 h lx (l) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 38 / 120

42 Frequency response of a graph filter Def2 says H = Vdiag( h)v 1 Since S = VΛV 1, Def1 H = L l=0 h ls l yields ( L L L ) H = h l S l = h l VΛ l V 1 = V h l Λ l l=0 l=0 l=0 V 1 The application Hx of filter H to x can be split into three parts V 1 takes signal x to the graph freq. domain x H := L l=0 h lλ l modulates the freq. coefficients x to obtain ỹ V brings the signal ỹ back to the graph domain y Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 39 / 120

43 Frequency response of a graph filter Def2 says H = Vdiag( h)v 1 Since S = VΛV 1, Def1 H = L l=0 h ls l yields ( L L L ) H = h l S l = h l VΛ l V 1 = V h l Λ l l=0 l=0 l=0 V 1 The application Hx of filter H to x can be split into three parts V 1 takes signal x to the graph freq. domain x H := L l=0 h lλ l modulates the freq. coefficients x to obtain ỹ V brings the signal ỹ back to the graph domain y Since H is diagonal, define H =: diag( h) h is the frequency response of the filter H Output at frequency k depends only on input at frequency k ỹ k = h k x k Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 39 / 120

44 Frequency response and filter coefficients Relation between h and h in a more friendly manner? Since h = diag( L l=0 h lλ l ), we have that h k = L l=0 h lλ l k Define the Vandermonde matrix Ψ as 1 λ 1... λ L 1 Ψ :=... 1 λ N... λ L N Frequency response of a graph filter If h are the coefficients of a graph filter, its frequency response is h = Ψh Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 40 / 120

45 Frequency response and filter coefficients Relation between h and h in a more friendly manner? Since h = diag( L l=0 h lλ l ), we have that h k = L l=0 h lλ l k Define the Vandermonde matrix Ψ as 1 λ 1... λ L 1 Ψ :=... 1 λ N... λ L N Frequency response of a graph filter If h are the coefficients of a graph filter, its frequency response is h = Ψh Given a desired h, we can find the coefficients h as h = Ψ 1 h Since Ψ is Vandermonde, invertible as long as λ k λ k for k k Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 40 / 120

46 More on the frequency response Since h = Ψ 1 h If all {λk } N k=1 distinct, then Any h can be implemented with at most L+1 =N coefficients Since h = Ψh If λ k = λ k, then The corresponding frequency response will be the same h k = h k For the particular case when S = A dc, we have that λ k = e j 2π N (k 1) π(1)(1) j 1 e N... e Ψ =... 2π(N 1)(1) j 1 e N... e j 2π(1)(N 1) N j 2π(N 1)(N 1) N = FH The frequency response is the DFT of the impulse response h = F H h Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 41 / 120

47 Frequency response for graph signals and filters Suppose that we have a signal x and filter coefficients h For time signals, it holds that the output y is ỹ = diag(f H h)f H x For graph signals, the output y in the frequency domain is ỹ = diag(ψh)v 1 x The GFT for filters is different from the GFT for signals Symmetry is lost, but both depend on spectrum of S Some of the properties are not true for graphs Several options to generalize operations Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 42 / 120

48 Deltas and system identification In time, given canonical delta δ 1 = e 1 = [1, 0,..., 0] T Any other delta e i is a shifted version of e 1 Fourier transform is constant: ẽ 1 = 1 (all freqs. are excited) Output to e 1 characterizes the filter, in fact output to e 1 is h In graph SP none of the above is true! Shifting is not translation e j is not a shifted version of e i GFT ẽ i = V 1 e i how strongly node i expresses each freq. ẽ i is not an eigenvector, ẽ i 1, and entries of ẽ i can be zero System id? Let y i be filter output when x = e i, then we have h = Ψ 1 diag 1 (V 1 e i )V 1 y i = Ψ 1 diag 1 (ẽ i )V 1 y i More involved, works if [ẽ i ] k are non-zero (see [Segarra16] for blind id) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 43 / 120

49 Implementing graph filters: frequency or space Frequency or space? y = Vdiag( h)v 1 x vs. y = L l=0 h ls l x In space: leverage the fact that Sx can be computed locally Signal x is percolated L times to find {x (l) } L l=0 Every node finds its own y i by computing L l=0 h l[x (l) ] i Frequency implementation useful for processing if, e.g., Filter bandlimited and eigenvectors easy to find Low complexity [Anis16, Tremblay16] Space definition useful for modeling Diffusion, percolation, opinion formation,... (more on this soon) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 44 / 120

50 Implementing graph filters: frequency or space Frequency or space? y = Vdiag( h)v 1 x vs. y = L l=0 h ls l x In space: leverage the fact that Sx can be computed locally Signal x is percolated L times to find {x (l) } L l=0 Every node finds its own y i by computing L l=0 h l[x (l) ] i Frequency implementation useful for processing if, e.g., Filter bandlimited and eigenvectors easy to find Low complexity [Anis16, Tremblay16] Space definition useful for modeling Diffusion, percolation, opinion formation,... (more on this soon) More on filter design Chebyshev polyn. [Shuman12]; AR-MA [Isufi-Leus15]; Node-var. [Segarra15]; Time-var. [Isufi-Leus16]; Median filters [Segarra16] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 44 / 120

51 Linear network processes via graph filters Consider linear dynamics of the form x t x t 1 = αjx t 1 x t = (I αj)x t 1 If x is network process [x t ] i depends only on [x t 1 ] j, j N (i) [S] ij =[J] ij x t = (I αs)x t 1 x t = (I αs) t x 0 x t = Hx 0, with H a polynomial of S linear graph filter Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 45 / 120

52 Linear network processes via graph filters Consider linear dynamics of the form x t x t 1 = αjx t 1 x t = (I αj)x t 1 If x is network process [x t ] i depends only on [x t 1 ] j, j N (i) [S] ij =[J] ij x t = (I αs)x t 1 x t = (I αs) t x 0 x t = Hx 0, with H a polynomial of S linear graph filter If the system has memory output weighted sum of previous exchanges (opinion dynamics) still a polynomial of S y = T t=0 βt x t y = T t=0 (βi βαs)t x 0 Everything holds true if α t or β t are time varying Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 45 / 120

53 Diffusion dynamics and AR (IIR) filters Before finite-time dynamics (FIR filters) Consider now diffusion dynamics x t = αsx t 1 + w x t = α t S t x 0 + t t =0 αt S t w When t : x = (I αs) 1 w AR graph filter Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 46 / 120

54 Diffusion dynamics and AR (IIR) filters Before finite-time dynamics (FIR filters) Consider now diffusion dynamics x t = αsx t 1 + w x t = α t S t x 0 + t t =0 αt S t w When t : x = (I αs) 1 w AR graph filter Higher orders [Isufi-Leus16] M successive diffusion dynamics AR of order M Process is the sum of M parallel diffusions ARMA order M x = M m=1 (I α ms) 1 w x = M m=1 (I α ms) 1 w Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 46 / 120

General linear network processes Combinations of all the previous are possible x t = H a t (S)x t 1 + H b t (S)w x t = H A t (S)x 0 + H B t (S)w y = x t, sequential/parallel application, linear

55 General linear network processes Combinations of all the previous are possible x t = H a t (S)x t 1 + H b t (S)w x t = H A t (S)x 0 + H B t (S)w y = x t, sequential/parallel application, linear combination Expands range of processes that can be modeled via GSP Coefficients can change according to some control inputs A number of linear processes can be modeled using graph filters Theoretical GSP results can be applied to distributed networking Deconvolution, filtering, system id,... Beyond linearity possible too (more at the end of the talk) Links with control theory (of networks and complex systems) Controllability, observability Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 47 / 120

56 Graph filters: Summary Linear graph-signal operators that account for the structure of G Polynomials of S, orthogonal operators on the frequency domain A few take-home messages Output is weighted sum of shifted inputs Shifting is not a translation, but a local diffusion Distributed implementation across L-hop neighborhood GFT given by Vandermonde matrix Ψ, different from V 1 System identification more involved Graph filter design Designing h, designing h, and... designing S? Useful to process signals, but also to model networked phenomena Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 48 / 120

skill in 112 cortical regions fmri while

Does brain state variability correlate with

57 Application: Explaining human learning rates Why do some people learn faster than others? Can we answer this by looking at their brain activity? Brain activity during learning of a motor skill in 112 cortical regions fmri while learning a piano pattern for 20 individuals Pattern is repeated, reducing the time needed for execution Learning rate = rate of decrease in execution time Define a functional brain graph Based on correlated activity fmri outputs a series of graph signals x(t) R 112 describing brain states Does brain state variability correlate with learning? region brain signal n time # entries in highe Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 49 / 120

58 Measuring brain state variability We propose three different measures capturing different time scales Changes in micro, meso, and macro scales Micro: instantaneous changes higher than a threshold α m 1 (x) = T t=1 { } x(t) x(t 1) 2 1 > α x(t) 2 Meso: Cluster brain states and count the changes in clusters m 2 (x) = T 1 {c(t) c(t 1)} t=1 where c(t) is the cluster to which x(t) belongs. Macro: Sample entropy. Measure of complexity of time series ( t s t m 3 (x) = log 1{ x ) 3(t) x 3 (s) > α} t s t 1{ x 2(t) x 2 (s) > α} Where x r (t) = [x(t), x(t + 1),..., x(t + r 1)] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 50 / 120

59 Response Diffusion as low-pass filtering We diffuse each time signal x(t) across the brain graph x diff (t) = (I + βl) 1 x(t) Laplacian L = VΛV 1 and β represents the diffusion rate Analyzing diffusion in the frequency domain x diff (t) = (I + βλ) 1 V 1 x(t) = diag( h) x(t) Freq. response h k = 1/(1 + βλ k ) Diffusion acts as low-pass filtering High freq. components are attenuated β controls the level of attenuation Frequency Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 51 / 120

60 Computing correlation for three signals Variability measures consider the order of brain signal activity As a control, we include in our analysis a null signal time series x null x null (t) = x diff (π t ) π t is a random permutation of the time indices Correlation between variability (m 1, m 2, and m 3 ) and learning? We consider three time series of brain activity The original fmri data x The filtered data x diff The null signal x null Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 52 / 120

61 Low-pass filtering reveals correlation Learning Rate Learning Rate Learning Rate Correlation coeff. between learning rate and brain state variability Original Filtered Null m m m Correlation is clear when the signal is filtered Result for original signal similar to null signal Scatter plots for original, filtered, and null signals (m 2 variability) # Variability # Variability # Variability Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 53 / 120

62 Part II: Applications Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 54 / 120

Application domains Design graph filters to approximate desired network operators Sampling bandlimited graph signals Blind graph filter identification Infer diffusion coefficients from observed

63 Application domains Design graph filters to approximate desired network operators Sampling bandlimited graph signals Blind graph filter identification Infer diffusion coefficients from observed output Statistical GSP: understanding random graph signals Network topology inference Infer shift from collection of network diffused signals Many more (glad to discuss or redirect): Filter banks Windowing, convolution, duality... Nonlinear GSP Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 55 / 120

64 Sampling bandlimited graph signals Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 56 / 120

65 Motivation and preliminaries Sampling and interpolation are cornerstone problems in classical SP How to recover a signal using only a few observations? Need to limit the degrees of freedom: subspace, smoothness What are reasonable observation models for graph signals? Subset selection model Static graph signal Have access to a subset of the nodes Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 57 / 120

Subset selection model Static graph signal Have access to a subset of the nodes Node aggregation model Incorporate local graph structure

66 Motivation and preliminaries Sampling and interpolation are cornerstone problems in classical SP How to recover a signal using only a few observations? Need to limit the degrees of freedom: subspace, smoothness What are reasonable observation models for graph signals? Subset selection model Static graph signal Have access to a subset of the nodes Node aggregation model Incorporate local graph structure into the observation model Recover signal from local observations at one node Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 57 / 120

67 Sampling bandlimited graph signals: Overview How to find x R N using P < N observations? Our focus on bandlimited signals, but other models possible x = V 1 x sparse x = k K x kv k, with K =K <N S involved in generation of x Agnostic to the particular form of S Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 58 / 120

68 Sampling bandlimited graph signals: Overview How to find x R N using P < N observations? Our focus on bandlimited signals, but other models possible x = V 1 x sparse x = k K x kv k, with K =K <N S involved in generation of x Agnostic to the particular form of S Two sampling schemes were introduced in the literature Correspond to the previous two observation models Selection [Anis14, Chen15, Tsitsvero15, Puy15, Wang15] Aggregation [Segarra15], [Marques15] Hybrid scheme combining both Space-shift sampling Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 58 / 120

Revisiting sampling in time There are two ways of interpreting sampling of time signals We can either freeze the signal and sample values at different times We can fix a point (present) and sample

69 Revisiting sampling in time There are two ways of interpreting sampling of time signals We can either freeze the signal and sample values at different times We can fix a point (present) and sample the evolution of the signal Both strategies coincide for time signals but not for general graphs Give rise to selection and aggregation sampling Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 59 / 120

Selection sampling: Definition Intuitive generalization to graph signals C {0, 1} P N (matrix P rows of I N ) Sampled signal is x = Cx Goal: recover x based on x Assume that the support of K is known

70 Selection sampling: Definition Intuitive generalization to graph signals C {0, 1} P N (matrix P rows of I N ) Sampled signal is x = Cx Goal: recover x based on x Assume that the support of K is known (w.l.o.g. K = {k} K k=1 ) Since x k = 0 for k > K, define x K := [ x 1,..., x K ] T = E T K x Approach: use x to find x K, and then recover x as x = V(E K x K ) = (VE K ) x K = V K x K Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 60 / 120

71 Selection sampling: Recovery Number of samples P K x = Cx = CV K x K (CV K ) submatrix of V Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 61 / 120

72 Selection sampling: Recovery Number of samples P K x = Cx = CV K x K (CV K ) submatrix of V Recovery of selection sampling If rank(cv K ) K, x can be recovered from the P values in x as x = V K x K = V K (CV K ) x With P = K, hard to check invertibility (by inspection) Columns of V K (CV K ) 1 are the interpolators Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 61 / 120

Selection sampling: Recovery Number of samples P K x = Cx = CV K x K (CV K ) submatrix of V Recovery of selection sampling If rank(cv K ) K, x can be recovered from the P values in x as x = V K x K =

73 Selection sampling: Recovery Number of samples P K x = Cx = CV K x K (CV K ) submatrix of V Recovery of selection sampling If rank(cv K ) K, x can be recovered from the P values in x as x = V K x K = V K (CV K ) x With P = K, hard to check invertibility (by inspection) Columns of V K (CV K ) 1 are the interpolators In time (S = A dc ), if the samples in C are equally spaced (CV K ) is Vandermonde (DFT) and V K (CV K ) 1 are sincs Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 61 / 120

74 Aggregation sampling: Definition Idea: incorporating S to the sampling procedure Reduces to classical sampling for time signals Consider shifted (aggregated) signals y (l) = S l x y (l) = Sy (l 1) found sequentially with only local exchanges Form y i = [y (0) i, y (1) i,..., y (N 1) i ] T (obtained locally by node i) The sampled signal is ȳ i = Cy i Goal: recover x based on ȳ i Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 62 / 120

75 Aggregation sampling: Recovery Goal: recover x based on ȳ i Same approach than before Use ȳ i to find x K, and then recover x as x = V K x K Define ū i := V T K e i and recall Ψ kl = λ l 1 k Recovery of aggregation sampling Signal x can be recovered from the first K samples in ȳ i as x = V K x K = V K diag 1 (ū i )(CΨ T E K ) 1 ȳ i provided that [ū i ] k 0 and all {λ k } K k=1 are distinct. If C = E T K, node i can recover x with info from K 1 hops! Node i has to be able to capture frequencies in K The frequencies have to distinguishable Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 63 / 120

76 Aggregation sampling: Recovery Goal: recover x based on ȳ i Same approach than before Use ȳ i to find x K, and then recover x as x = V K x K Define ū i := V T K e i and recall Ψ kl = λ l 1 k Recovery of aggregation sampling Signal x can be recovered from the first K samples in ȳ i as x = V K x K = V K diag 1 (ū i )(CΨ T E K ) 1 ȳ i provided that [ū i ] k 0 and all {λ k } K k=1 are distinct. If C = E T K, node i can recover x with info from K 1 hops! Node i has to be able to capture frequencies in K The frequencies have to distinguishable Bandlimited signals: Signals that can be well estimated locally Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 63 / 120

77 Aggregation and selection sampling: Example In time (S = A dc ), selection and aggregation are equivalent Differences for a more general graph? Erdős-Rényi p = 0.2, S = A, K = 3, non-smooth First 3 observations at node 4: y 4 = [ 0.55, 1.27, 2.94] T [y 4 ] 1 = x 4 = 0.55, [y 4 ] 2 = x 2 + x 3 + x 5 + x 6 + x 7 = 1.27 For this example, any node guarantees recovery Selection sampling fails if, e.g., {1, 3, 4} Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 64 / 120

78 Sampling: Discussion and extensions Discussion on aggregation sampling Observation matrix: diagonal times Vandermonde Very appropriate in distributed scenarios Different nodes will lead to different performance (soon) Types of signals that are actually bandlimited (role of S) Three extensions: Sampling in the presence of noise Unknown frequency support Space-shift sampling (hybrid) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 65 / 120

79 Presence of noise Linear observation model additive noise w i covariance R (i) w BLUE interpolation ˆ x (i) K = [ΨH i C H ( R (i) w ) 1 CΨ i ] 1 Ψ H i C H ( R (i) w ) 1 z i If P = K, then ˆx (i) = V K (CΨ i ) 1 z i Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 66 / 120

Presence of noise Linear observation model additive noise w i covariance R (i) w BLUE interpolation ˆ x (i) K = [ΨH i C H ( R (i) w ) 1 CΨ i ] 1 Ψ H i C H ( R (i) w ) 1 z i If P = K, then ˆx (i) = V

80 Presence of noise Linear observation model additive noise w i covariance R (i) w BLUE interpolation ˆ x (i) K = [ΨH i C H ( R (i) w ) 1 CΨ i ] 1 Ψ H i C H ( R (i) w ) 1 z i If P = K, then ˆx (i) = V K (CΨ i ) 1 z i Error covariances (R e (i), R (i) e ) in closed form Noise covariances? Colored, different models: white noise in z i, in x, or in x K Metric to optimize? [ trace(r (i) e ), λ max (R (i) e ), log det( R (i) e ), trace )] 1 1 ( R(i) e Select i and C to min. error Depends on metric and noise [Marques16] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 66 / 120

81 Unknown frequency support Falls into the class of sparse reconstruction: observation matrix? Selec. submatrix of unitary V K Aggr. Vander. diag [u i ] k 0 and λ k λ k full-spark Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 67 / 120

82 Unknown frequency support Falls into the class of sparse reconstruction: observation matrix? Selec. submatrix of unitary V K Aggr. Vander. diag [u i ] k 0 and λ k λ k full-spark Joint recovery and support identification (noiseless) x := arg min x x 0 s. to Cy i = CΨ i x, If full spark P = 2K samples suffice Different relaxations are possible Conditioning will depend on Ψ i (e.g., how different {λ k } are) Noisy case: sampling nodes critical Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 67 / 120

83 Recovery with unknown support: Example Erdős-Rényi p = 0.15, 0.20, 0.25, K = 3, non-smooth Three different shifts: A, (I A) and 1 2 A2 Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 68 / 120

84 Space-shift sampling Space-shift sampling (hybrid) Multiple nodes and multiple shifts Selection: 4 nodes, 1 sample Space-shift: 2 nodes, 2 samples Aggregat.: 1 node, 4 samples Section and aggregation sampling as particular cases With Ū := [diag(ū 1),..., diag(ū N )] T, the sampled signal is ( = C I (Ψ z T E K ) )Ū xk + Cw As before, BLUE and error covariance in closed-form Optimizing sample selection more challenging More structured schemes easier: e.g., message passing Node i knows y (l) i node i knows y (l ) j for all j N i and l < l Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 69 / 120

Sampling the US economy 62 economic sectors in USA + 2 artificial sectors Graph: average flows in 2007-2010, S = A Signal x:

85 Sampling the US economy 62 economic sectors in USA + 2 artificial sectors Graph: average flows in , S = A Signal x: production in 2011 x is approximately bandlimited with K = 4 Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 70 / 120

86 Sampling the US economy: Results Setup 1: we add different types of noise Error depends on sampling node: better if more connected Setup 2: we try different shift-space strategies Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 71 / 120

87 More on sampling graph signals Beyond bandlimitedness Smooth signals [Chen15] Parsimonious in kernelized domain [Romero16] Strategies to select the sampling nodes Random (sketching) [Varma15] Optimal reconstruction [Marques16, Chepuri-Leus16] Designed based on posterior task [Gama16] And more... Low-complexity implementations [Tremblay16, Anis16] Local implementations [Wang14, Segarra15] Unknown spectral decomposition [Anis16] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 72 / 120

88 Stationarity of random graph processes Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 73 / 120

89 Motivation and context We frequently encounter stochastic processes Statistical SP tools for their understanding Stationarity facilitates the analysis of random signals in time Statistical properties are time-invariant We seek to extend the concept of stationarity to graph processes Network data and irregular domains motivate this Lack of regularity leads to multiple definitions Classical SSP can be generalized: spectral estimation, periodograms,... Better understanding and estimation of graph processes Related works: [Girault 15], [Perraudin 16] Segarra, Marques, Leus, Ribeiro, Stationary Graph Processes: Nonparametric Spectral Estimation, SAM16 Marques, Segarra, Leus, Ribeiro, Stationary Graph Processes and Spectral Estimation, IEEE TSP (sub.) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 74 / 120

90 Prelimaries: Weak stationarity in time (1) Correlation of stationary discrete time signals is invariant to shifts [ C x := E xx H] ] = E [x H (n l) N x(n l) N = E [S l x(s l x) H] (2) Signal is the output of a LTI filter H excited with white noise w x = Hw, with E [ ww H] = I (3) The covariance matrix C x is diagonalized by the Fourier matrix C x = Fdiag(p)F H The process has a power spectral density p := diag ( F H C x F ) Each of these definitions can be generalized to graph signals Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 75 / 120

91 Stationary graph processes: Shifts Definition (shift invariance) Process x is weakly stationary with respect to S if and only if (b > c) (S E[ a x )( (S H ) b x ) ] [ H (S = E a+c x )( (S H ) b c x ) ] H Use a and b shifts as reference. Shift by c forward and backward Signal is stationary if these shifts do not alter its covariance It reduces to E [ xx H] = E [ S l x(s l x) H] when S is a directed cycle Time shift is orthogonal, S H = S 1 (a = 0, b = N and c = l) Need reference shifts because S can change energy of the signal Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 76 / 120

Stationary graph processes: Filtering Definition (filtering of white noise) Process x is weakly stationary with respect to S if it can be written as the output of linear graph filter H with white

92 Stationary graph processes: Filtering Definition (filtering of white noise) Process x is weakly stationary with respect to S if it can be written as the output of linear graph filter H with white input w x = Hw, with E [ ww H] = I The filter H is linear shift invariant if H(Sx) = S(Hx) L Equivalently, H polynomial on the shift operator H = h l S l Filter H determines color C x = E [ (Hw)(Hw) H] = HH H l=0 Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 77 / 120

93 Stationary graph processes: Diagonalization Definition (Simultaneous diagonalization) Process x is weakly stationary with respect to S if the covariance C x and the shift S are simultaneously diagonalizable S = V Λ V H = C x = V diag(p) V H Equivalent to time definition because F diagonalizes cycle graph The process has a power spectral density p := diag ( V H C x V ) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 78 / 120

H = C x = V diag(p) V H Equivalent to time definition because F diagonalizes cycle graph The process has a

94 Stationary graph processes: Diagonalization Definition (Simultaneous diagonalization) Process x is weakly stationary with respect to S if the covariance C x and the shift S are simultaneously diagonalizable S = V Λ V H = C x = V diag(p) V H Equivalent to time definition because F diagonalizes cycle graph The process has a power spectral density p := diag ( V H C x V ) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 78 / 120

95 Equivalence of definitions and PSD Have introduced three equally valid definitions of weak stationarity They are different but, under mild conditions, equivalent Proposition Process x has shift invariant correlation matrix it is the output of a linear shift invariant filter Covariance jointly diagonalizable with shift Shift and Filtering How stationary signals look like (local invariance) Simultaneous Diagonalization A PSD exists p := diag ( V H C x V ) The PSD collects the eigenvalues of C x and is nonnegative Proposition Let x be stationary in S and define the process x := V H x. Then, it holds that x is uncorrelated with covariance matrix C x = E [ x x H] = diag(p). Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 79 / 120

96 Weak stationary graph processes examples Example (White noise) White noise w is stationary in any graph shift S = VΛV H Covariance C w = σ 2 I simultaneously diagonalizable with all S Example (Covariance matrix graphs and Precision matrices) Every process is stationary in the graph defined by its covariance matrix If S = C x, shift S and covariance C x diagonalized by same basis Process is also stationary on precision matrix S = C 1 x Example (Heat diffusion processes and ARMA processes) Heat diffusion process in a graph x = α 0 (I αl) 1 w Stationary in L since α 0 (I αl) 1 is a polynomial on L Any autoregressive moving average (ARMA) process on a graph Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 80 / 120

97 Power spectral density examples Example (White noise) Power spectral density p = diag ( V H (σ 2 I)V ) = σ 2 1 Example (Covariance matrix graphs and Precision matrices) Power spectral density p = diag ( V H (VΛV H )V ) = diag(λ) Example (Heat diffusion processes and ARMA processes) Power spectral density p = diag [α 0 2 (I αλ) 2] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 81 / 120

98 PSD estimation with correlogram & periodogram Given a process x, the covariance of x = V H x is given by C x := E [ x x H] = E [ (V H x)(v H x) H] = diag(p) Periodogram Given samples {x r } R r=1, average GFTs of samples ˆp pg := 1 R R x r 2 = 1 R r=1 R V H x r 2 r=1 Correlogram Replace C x in PSD definition by sample covariance ˆp cg ( ) := diag V H Ĉ x V [ [ 1 := diag V H R R x r x H r r=1 ] ] V Periodogram and correlogram lead to identical estimates ˆp pg = ˆp cg Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 82 / 120

99 Accuracy of periodogram estimator Theorem If the process x is Gaussian, periodogram estimates have bias and variance Bias b pg := E [ˆp pg ] p = 0 Variance Σ pg := E [ (ˆp pg p)(ˆp pg p) H] = 2 R diag2 (p) The periodogram is unbiased but the variance is not too good Quadratic in p. Same as time processes Alternative nonparametric methods to reduce variance Average windowed periodogram Filterbanks Bias - variance tradeoff characterized [Marques16,Segarra16] Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 83 / 120

100 Parametric PSD estimation Until now non-parametric estimators: parametric estimation? View x as output of a graph filter with P N parameters Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 84 / 120

101 Parametric PSD estimation Until now non-parametric estimators: parametric estimation? View x as output of a graph filter with P N parameters Key how to use x to estimate ĥ Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 84 / 120

102 Parametric PSD estimation: finding h How to use realization x to estimate ĥ Input w white and x stationary Rely on correlations Problem can be formulated in the vertex or in the frequency domain Vertex domain: ĥ = arg min h d(c(h), Ĉ(x)) C(h) = H(h, S)H(h, S) H Ĉ(x) = xxh Frequency domain: ĥ = arg min h d(p(h), ˆp(x)) p(h) = diag(ψ(hh H )Ψ H ) ˆp(x) = diag(v H (xx H )V) C( ) and p( ) quadratic on h General estimation nonconvex If d(, ) quadratic, phase retrieval Particular cases (S 0 and h 0) tractable Not in time Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 85 / 120

Parametric PSD estimation: an example Intuition on parametric estimation Better performance because less parameters must be estimated Give rise to smoother PSD estimates

103 Parametric PSD estimation: an example Intuition on parametric estimation Better performance because less parameters must be estimated Give rise to smoother PSD estimates Example: Laplacian diffusion with positive coefficients Diffusion will favor exponential spectral decay Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 86 / 120

104 Parametric PSD estimation: an example Intuition on parametric estimation Better performance because less parameters must be estimated Give rise to smoother PSD estimates Example: Laplacian diffusion with positive coefficients Diffusion will favor exponential spectral decay Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 86 / 120

105 Parametric PSD estimation: an example Intuition on parametric estimation Better performance because less parameters must be estimated Give rise to smoother PSD estimates Example: Laplacian diffusion with positive coefficients Diffusion will favor exponential spectral decay Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 86 / 120

106 Alternative parametric formulations Chose vertex or frequency based on the type of filter MA, AR, ARMA can be generalized If the number of parameters (filter degree) not known Use regularizers so that h is sparse Alternative parametric models x sum of frequency basis x = V H x is sparse x is diffusion of a sparse initial state w is sparse Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 87 / 120

107 Normalized MSE Average periodogram MSE of periodogram as a function of the nr. of observations R Baseline ER random graph (N = 100 and p = 0.05) and S = A Observe filtered white Gaussian noise and estimate PSD 2 1 Baseline ER Smaller ER Larger PSD Small-world Number of observations Normalized MSE evolves as 2/R as expected Invariant to size, topology, and PSD Same behavior observed in non-gaussian processes (theory not valid) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 88 / 120

108 Windowed average periodogram Performance of local windows and random windows Block stochastic graph (N = 100, 10 communities) and small world Process filters white noise with different number of taps Normalized MSE Bias squared Variance Theoretical error Empirical error Normalized MSE Local window (L=10) Local window (L=2) Random window (L=10) Random window (L=2) 0 Single window Ten local wind. Ten random wind Number of windows The use of windows introduces bias but reduces total error (MSE) Local windows work better than random windows Advantage of local windows is larger for local processes Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 89 / 120

109 Source identification error Opinion source identification Opinion diffusion in Zachary s karate club network (N = 34) Observed opinion x obtained by diffusing sparse white rumor w Given {x r } R r=1 generated from unknown {w r } R r=1 Diffused through filter of unknown nonnegative coefficients β Goal Identify the support of each rumor w r First Estimate β from Moving Average PSD estimation Second Solve R sparse linear regressions to recover supp(w r ) < = 0 < = 0.10 < = 0.15 < = Number of observations Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 90 / 120

images of person j #10 5 5 10 15 20 25 30 35 40 45 50 10 20 30 40 50 Frequency index 8 6 4 2 0-2 -4 Process of person j approximately stationary

110 Frequency index PSD of face images PSD estimation for spectral signatures of faces of different people 100 grayscale face images {x i } 100 i=1 R10304 (10 images 10 people) Consider x i as realization graph process that is Stationary on Ĉ x Construct Ĉ(j) x = V (j) Λ (j) c V H(j) based on images of person j # Frequency index Process of person j approximately stationary in Ĉx (left) Use windowed average periodogram to estimate PSD of new face Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 91 / 120

111 Stationarity: Takeaways Extended the notion of weak stationarity for graph processes Three definitions inspired in stationary time processes Shown all of them to be equivalent Defined power spectral density and studied its estimation Generalized classical non-parametric estimation methods Periodogram and correlogram where shown to be equivalent Windowed average periodogram, filter banks Generalized classical ARMA parametric estimation methods Particular cases tractable Extensions Other parametric schemes Space-time variation Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 92 / 120

112 Network topology inference Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 93 / 120

113 Motivation and context Network topology inference from nodal observations [Kolaczyk 09] Approaches use Pearson correlations to construct graphs [Brovelli04] Partial correlations and conditional dependence [Friedman08, Karanikolas16] Key in neuroscience [Sporns 10] Functional net inferred from activity Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 94 / 120

114 Motivation and context Network topology inference from nodal observations [Kolaczyk 09] Approaches use Pearson correlations to construct graphs [Brovelli04] Partial correlations and conditional dependence [Friedman08, Karanikolas16] Key in neuroscience [Sporns 10] Functional net inferred from activity Most GSP works: How known graph S affects signals and filters Here, reverse path: How to use GSP to infer the graph topology? Gaussian graphical models [Egilmez16] Smooth signals [Dong15], [Kalofolias16] Stationary signals [Segarra16], [Pasdeloup16] Directed graphs [Mei-Moura15], [Shen16] Segarra, Marques, Mateos, Ribeiro, Network Topology Identification from Spectral Templates, IEEE SSP16 Segarra, Marques, Mateos, Ribeiro, Network Topology Inference from Spectral Templates, JSTSP (sub.) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 94 / 120

Our approach for topology inference We propose a two-step

A"priori"info,"desired" topological"features" Inferred"

Inferred" eigenvectors" STEP%2:% Iden-fy%eigenvalues%to%

for spectral templates V Graph sparsification, network

115 Our approach for topology inference We propose a two-step approach for graph topology identification Input" A"priori"info,"desired" topological"features" Inferred" network" STEP%1:% Iden-fy%the%eigenvectors% of%the%shi9% Inferred" eigenvectors" STEP%2:% Iden-fy%eigenvalues%to% obtain%a%suitable%shi9% Beyond diffusion alternative sources for spectral templates V Graph sparsification, network deconvolution,... Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 95 / 120

116 Step 1: Eigenvectors from graph stationarity Given a set of signals {x r } R r=1 find S We view signals as samples of random graph process x AS. x is stationary in S Equivalent to x is the linear diffusion of a white input x = α 0 (I α l S)w = l=1 β l S l w l=0 i.e. x = Hw Examples: heat diffusion, structural equation models x = (I αl) 1 w x = Ax + w We say the graph shift S explains the structure of signal x Key point after assuming stationarity: eigenvectors of the covariance Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 96 / 120

117 Step 1: Eigenvectors from covariance The covariance matrix of the stationary signal x = Hw is C x = E [ xx T ] = HE [( ww T )] H T = HH T Since H is diagonalized by V, so is the covariance C x C x = V L 1 l=0 h lλ l 2 V T = V diag(p) V T Any shift with eigenvectors V can explain x G and its specific eigenvalues have been obscured by diffusion Observations (a) Identifying S identifying the eigenvalues (b) Correlation methods eigenvalues are kept unchanged (c) Precision methods eigenvalues are inverted Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 97 / 120

118 Step 1: Other sources of spectral templates 1) Graph sparsification Goal: given S f find sparser S with same eigenvectors Find S f = V f Λ f Vf T and set V = V f Otentimes referred to as network deconvolution problem Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 98 / 120

119 Step 1: Other sources of spectral templates 1) Graph sparsification Goal: given S f find sparser S with same eigenvectors Find S f = V f Λ f Vf T and set V = V f Otentimes referred to as network deconvolution problem 2) Nodal relation assumed by a given transform GSP: decompose S = VΛV T and set V T as GFT SP: some transforms T known to work well on specific data Goal: given T, set V T = T and identify S intuition on data relation DCTs: i iii Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 98 / 120

Step 1: Other sources of spectral templates 1) Graph sparsification Goal: given S f find sparser S with same eigenvectors Find S f = V f Λ f Vf T and set V = V f Otentimes referred to as network

120 Step 1: Other sources of spectral templates 1) Graph sparsification Goal: given S f find sparser S with same eigenvectors Find S f = V f Λ f Vf T and set V = V f Otentimes referred to as network deconvolution problem 2) Nodal relation assumed by a given transform GSP: decompose S = VΛV T and set V T as GFT SP: some transforms T known to work well on specific data Goal: given T, set V T = T and identify S intuition on data relation DCTs: i iii 3) Implementation of linear network operators Goal: distributed implementation of linear operator B via graph filter Feasible if B and S share eigenvectors Like 1) with S f = B Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 98 / 120

121 Step 2: Obtaining the eigenvalues Given V, there are many possible S = Vdiag(λ)V T We can use extra knowledge/assumptions to choose one graph Of all graphs, select one that is optimal in some sense S := argmin S,λ N f (S, λ) s. to S = λ k v k vk T, S S (1) k=1 Set S contains all admissible scaled adjacency matrices S :={S S ij 0, S M N, S ii = 0, j S 1j =1} Can accommodate Laplacian matrices as well Problem is convex if we select a convex objective f (S, λ) Minimum energy (f (S) = S F ), Fast mixing (f (λ) = λ 2 ) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 99 / 120

122 Size of the feasibility set Feasible set in (1) small helps optimization Search over λ R N N linear constraints S ii = 0 To be rigorous define W :=V V (Khatri-Rao) D index set diagonal elements Assume that (1) is feasible, then it holds that rank(w D ) N 1. If rank(w D ) = N 1, then the feasible set of (1) is a singleton. Take-aways Convex and small feasibility set Exhaustive search affordable rank(w D ) = N 1 arises in practice Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 100 / 120

123 Non-convex objective: Sparse recovery Whenever the feasibility set of (1) is non-trivial f (S, λ) determines the features of the recovered graph Ex: Identify the sparsest shift S 0 that explains observed signal structure Set the cost f (S, λ) = S 0 S 0 = argmin S,λ N S 0 s. to S = λ k v k vk T, S S k=1 Non-convex problem, relax to l 1 norm minimization (e.g. [Tropp06]) S 1 := argmin S,λ N S 1 s. to S = λ k v k vk T, S S k=1 Does the solution S 1 coincide with the l 0 solution S 0? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 101 / 120

124 Recovery guarantee for l 1 relaxation Recall that W :=V V Build M := (I WW ) D c the orthogonal projector onto range(w) Construct R := [M, e 1 1 N 1 ] Denote by K the indices of the support of s 0 = vec(s 0 ) S 1 and S 0 coincide if the two following conditions are satisfied: 1) rank(r K ) = K ; and 2) There exists a constant δ > 0 such that ψ R := I K c (δ 2 RR T + I T K c I K c ) 1 I T K < 1. Cond. 1) ensures uniqueness of solution S 1 Cond. 2) guarantees existence of a dual certificate for l 0 optimality Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 102 / 120

Robust shift identification So far, two-step algorithm based on perfect spectral templates However, perfect knowledge of V may not be available Robust designs?

125 Robust shift identification So far, two-step algorithm based on perfect spectral templates However, perfect knowledge of V may not be available Robust designs? Q1: How to modify the optimization in step 2? Distance for noise, orthogonal subspace for incomplete Q2: Recovery guarantees? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 103 / 120

126 Noisy spectral templates We might have access to ˆV, a noisy version of the spectral templates With d(, ) denoting a (convex) distance between matrices min {S,λ,Ŝ} S 1 s. to Ŝ = N k=1 λ kˆv kˆv k T, S S, d(s, Ŝ) ɛ How does the noise in ˆV affect the recovery? Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 104 / 120

127 Noisy spectral templates We might have access to ˆV, a noisy version of the spectral templates With d(, ) denoting a (convex) distance between matrices min {S,λ,Ŝ} S 1 s. to Ŝ = N k=1 λ kˆv kˆv k T, S S, d(s, Ŝ) ɛ How does the noise in ˆV affect the recovery? Stability (robustness) can be established Conditions 1) and 2) but based on ˆR, guaranteed d(s, S 0 ) Cɛ ɛ large enough to guarantee feasibility of S 0 Constant C depends on ˆV and the support K Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 104 / 120

128 Incomplete spectral templates Partial access to V Only K known eigenvectors V K = [v 1,..., v K ] E.g., if (sample) covariance is rank-deficient min S 1 s. to S = S K + K k=1 λ kv k v T k, S S, S K v k = 0 {S,S K,λ} Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 105 / 120

129 Incomplete spectral templates Partial access to V Only K known eigenvectors V K = [v 1,..., v K ] E.g., if (sample) covariance is rank-deficient min S 1 s. to S = S K + K k=1 λ kv k v T k, S S, S K v k = 0 {S,S K,λ} How does the (partial) knowledge of V affect the recovery? A bit more involved, conditions 1) and 2) depend now on V K For K = N, guarantees boil down to the noiseless case Incomplete and noisy scenarios can be combined Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 105 / 120

130 Topology inference in random graphs N N Frequency Erdős-Rényi graphs of varying size N {10, 20,..., 50} Edge probabilities p {0.1, 0.2,..., 0.9} Recovery rates for adjacency (left) and normalized Laplacian (mid) All realizations Successful recovery Unique solution p p Rank Recovery is easier for intermediate values of p Rate of recovery related to the rank(w D ) (histogram N =10,p =0.2) When rank is N 1, recovery is guaranteed As rank decreases, there is a detrimental effect on recovery Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 106 / 120

131 Sparse recovery guarantee Generate 1000 ER random graphs (N = 20, p = 0.1) such that Feasible set is not a singleton Cond. 1) in sparse recovery theorem is satisfied Noiseless case: l 1 norm guarantees recovery as long as ψ R < Recovery Success Recovery Failure Condition is sufficient but not necessary Tightest possible bound on this matrix norm Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 107 / 120

132 Recovery error Inferring brain graphs from noisy templates Identification of structural brain graphs N = 66 Test recovery for noisy spectral templates ˆV Obtained from sample covariances of diffused signals Patient 1 Patient 2 Patient Number of Observations Recovery error decreases with increasing number of observed signals More reliable estimate of the covariance Less noisy ˆV Brain of patient 1 is consistently the hardest to identify Robustness for identification in noisy scenarios Traditional methods like graphical lasso fail to recover S Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 108 / 120

133 Recovery error Inferring social graphs from incomplete templates Identification of multiple social networks N = 32 Defined on the same node set of students from Ljubljana Test recovery for incomplete spectral templates ˆV = [v 1,..., v K ] Obtained from a low-pass diffusion process Repeated eigenvalues in C x introduce rotation ambiguity in V Network 1 Network 2 Network 3 Network Number of spectral templates Recovery error decreases with increasing nr. of spectral templates Performance improvement is sharp and precipitous Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 109 / 120

134 Performance comparisons F-measure Comparison with graphical lasso and sparse correlation methods Evaluated on 100 realizations of ER graphs with N = 20 and p = Our for H Our for H 2 Correl. for H 1 Correl for H 2 GLasso for H 1 GLasso for H Number of observations Graphical lasso implicitly assumes a filter H 1 = (ρi + S) 1/2 For this filter spectral templates work, but not as well (MLE) For general diffusion filters H 2 spectral templates still work fine Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 110 / 120

Inferring direct relations Our method can be used to sparsify a given network Keep direct and

eigenvectors ˆV of given network as noisy templates Infer contact between amino-acid residues in

15 20 20 20 20 25 25 25 25 30 30 30 30 35 35 35 35 40 40 40 40 45 45 45 45 50 50 50 50 10 20 30

Our approach Network deconvolution assumes a specific filter model [Feizi13] We achieve better

135 Inferring direct relations Our method can be used to sparsify a given network Keep direct and important edges or relations Discard indirect relations that can be explained by direct ones Use eigenvectors ˆV of given network as noisy templates Infer contact between amino-acid residues in BPT1 BOVIN Use mutual information of amino-acid covariation as input Ground truth Mutual info. Network deconv. Our approach Network deconvolution assumes a specific filter model [Feizi13] We achieve better performance by being agnostic to this Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 111 / 120

136 Topology ID: Takeaways Network topology inference cornerstone problem in Network Science Most GSP works analyze how S affect signals and filters Here, reverse path: How to use GSP to infer the graph topology? Our GSP approach to network topology inference Two step approach: i) Obtain V; ii) Estimate S given V Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 112 / 120

137 Topology ID: Takeaways Network topology inference cornerstone problem in Network Science Most GSP works analyze how S affect signals and filters Here, reverse path: How to use GSP to infer the graph topology? Our GSP approach to network topology inference Two step approach: i) Obtain V; ii) Estimate S given V How to obtain the spectral templates V Based on covariance of diffused signals Other sources too: net operators, data transforms Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 112 / 120

138 Topology ID: Takeaways Network topology inference cornerstone problem in Network Science Most GSP works analyze how S affect signals and filters Here, reverse path: How to use GSP to infer the graph topology? Our GSP approach to network topology inference Two step approach: i) Obtain V; ii) Estimate S given V How to obtain the spectral templates V Based on covariance of diffused signals Other sources too: net operators, data transforms Infer S via convex optimization Objectives promotes desirable properties Constraints encode structure a priori info and structure Formulations for perfect and imperfect templates Sparse recovery results for both adjacency and Laplacian Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 112 / 120

139 Wrapping up Motivation and preliminaries Part I: Fundamentals Graphs 101 Graph signals and the shift operator Graph Fourier Transform (GFT) Graph filters and network processes Part II: Applications Sampling graph signals Stationarity of graph processes Network topology inference Concluding remarks Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 113 / 120

140 Concluding remarks Network science and big data pose new challenges GSP can contribute to solve some of those challenges Well suited for network (diffusion) processes Central elements in GSP: graph-shift operator and Fourier transform Graph filters: operate graph signals Polynomials of the shift operator that can be implemented locally Network diffusion/percolations processes via graph filters Successive/parallel combination of local linear dynamics Possibly time-varying diffusion coefficients Accurate to model certain setups GSP yields insights on how those processes behave Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 114 / 120

141 Concluding remarks GSP results can be applied to solve practical problems Sampling, interpolation (network control) Input and system ID (rumor ID) Shift design (network topology ID) Marques, Ribeiro, Segarra Graph SP: Fundamentals and Applications 115 / 120

Graph Signal Processing: Fundamentals and Applications to Diffusion Processes

Graph Signal Processing: Fundamentals and Applications to Diffusion Processes Antonio G. Marques, Santiago Segarra, Alejandro Ribeiro King Juan Carlos University Massachusetts Institute of Technology University