A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

Size: px

Start display at page:

Download "A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay"

Madlyn Cameron
5 years ago
Views:

1 A Statistical Look at Spectral Graph Analysis Deep Mukhopadhyay Department of Statistics, Temple University Office: Speakman Graph Signal Processing Workshop, Philadelphia University of Pennsylvania: May 27, 2016 Deep Statistical Spectral Graph Analysis May / 15

2 1. Graph Signal Processing: A Practitioner s Guide 1 Graph Signal Processing = Spectral graph theory + Harmonic analysis. 2 Choose basis as eigenfunction of S. Common choices of S (1) L = D 1/2 AD 1/2 ; Fiedler (1973) (2) R = D 1 A Coifman, and Lafon. (2006) (3) B = A N 1 dd T ; Newman, M. E. (2006) (4) Type-I Reg. L = D 1/2 τ A D 1/2 τ ; Chaudhuri et al. (2012) (5) Type-II Reg. L = D 1/2 τ A τ D 1/2 τ Amini et al. (2013) D = diag(d 1,..., d n ) R n n, where d i denotes the degree of a node. 3 Compute Graph Fourier Transform by expanding signals (defined over the vertices of a graph) as linear combination of the selected eigenbasis and carry out learning tasks such as regression, clustering, classification, smoothing, kriging, etc. in a straightforward manner.

3 2. Can Beauty and Utility Coexist? The way in which spectral graph analysis is currently taught and practiced is rather mechanical, consisting of a series of matrix calculations this has a huge negative bearing on our understanding. How can we make the approach less mechanistic and more systematic ( Scientific)? What unifying feature is shared by all spectral graph analysis approaches based on different shift operators? How can we establish a statistical path to discover these different spectral methods based on few general principles? Most of these questions are either unsolved or unasked. The main challenge to discover the right starting point (NOT the end products). Where should we start? Can we develop a systematic constructive theory starting from that fundamental object from scratch?

4 3. Unified Construction Principle Step 1. For given discrete graph G of size n, construct GraField kernel function C : [0, 1] 2 R + {0} defined a.e by C (u, v; G n ) = p( ) Q(u; X ), Q(v; Y ); G n p ( Q(u; X ) ) p ( ), 0 < u, v < 1, (1) Q(v; Y ) where u = F (x; X ), v = F (y; Y ) for x, y {1, 2,..., n} and degree sequence induced graph mass functions n n p(x; X ) = A(x, y)/n, p(y; Y ) = A(x, y)/n, p(x, y; G) = A(x, y)/n. y=1 x=1 with Q(u; X ) and Q(v; Y ) are the respective quantile functions. Step 2. ξ j (X ; F (X )) denotes polynomials rank-transform that are an orthonormal basis for L 2 (F ): E F [ξ j (X ; F (X ))] = 0, and E F [ξ j (X ; F (X ))ξ k (X ; F (X ))] = δ jk.

5 Step 3. Transform Coding of Graphs. Construct generalized graph matrix M(G, ξ) R n n with respect to an orthonormal system ξ: M[j, k; G, ξ] = ξ j, 1 0 (C 1)ξ k L 2 [0,1] for j, k = 1,..., n. (2) They can be viewed as a coefficient matrix of the orthogonal series expansion of C (u, v; G) with respect to the product bases {ξ j ξ k } 1 j,k n. Step 4. Perform the singular value decomposition (SVD) of M = UΛU T = k u kµ k u T k, where where u ij are the elements of the singular vector of moment matrix U = (u 1,..., u n ), and Λ = diag(µ 1,..., µ n ), µ 1 µ n 0. Step 5. Obtain approximate Karhunen-Loéve (KL) representation basis (which act as a Fourier basis) of the graph G by φ k (u) = n u jk ξ j, for k = 1,..., n 1. j=1

6 4. GraField: Some Insights and Properties 1. GraField is a positive piecewise-constant kernel satisfying C (u, v; G) du dv = C (u, v; G) du dv = 1, [0,1] 2 I ij (i,j) {1,...,n} 2 where { 1, if (u, v) (F (i; X ), F (i + 1; X )] (F (j; Y ), F (j + 1; Y )] I ij (u, v) = 0, elsewhere. 2. In the continuum limit (as the dimension of the graph n ), the shape of the piecewise-constant discrete C approaches to a continuous field over unit interval. 3. The slices of the GraField kernel can be expressed p(y x; G)/p(y; G) in the vertex domain, suggesting a connection with the random walk on the graph. Interpret p(y x; G) as transition probability from vertex x to vertex y, and (when G is non-bipartite) p(y; G) as the stationary distribution.

7 5. Karhunen-Loéve (KL) Representation of Graph We define the Karhunen-Loéve (KL) representation of a graph G based on spectral expansion of its GCD function C (u, v; G). The pioneering work by Schmidt (1907) guarantee the existence of the following decomposition result for undirected graph. Theorem 1. The L 2 GCD bivariate kernel C : [0, 1] 2 R + {0} admits the following canonical representation C (u, v; G n ) = 1 + n λ k φ k (u)φ k (v), (3) k=1 where the non-negative λ 1 λ 2 λ n 0 are singular values and {φ k } k 1 are the orthonormal singular functions φ j, φ k L 2 [0,1] = δ jk, for j, k = 1,..., n, which can be evaluate as the solution of the following integral equation relation [C (u, v; G) 1] φ k (v) dv = λ k φ k (u), k = 1, 2,..., n. (4) [0,1]

8 6. Nonparametric Spectral Approximation Definition 3. The FUNDAMENTAL statistical modeling problem. We define Spectral graph learning algorithm as method of approximating the singular system (λ k, φ k ) k 1 that satisfies the integral equation (4). { A ( λ1, φ ) 1,..., ( λn, φ ) } n Definition 4. Orthogonal series spectral approximation (SOS): Approximate the unknown function φ k as a linear combination of elements from a complete orthogonal system in L 2 [0, 1]. Let {ξ k } be a complete basis of R n defined on the unit interval [0, 1]. Accordingly, each singular functions φ k can be expressed as the expansion over this basis φ k (u) = j α jk ξ j (u), u [0, 1]. (5) where α jk are the unknown coefficients to be determined. Deep (Statistics@Fox) Statistical Spectral Graph Analysis May / 15

9 7. Connection with Laplacian Spectral Analysis Degree-Adaptive Block-pulse Basis Functions. Define block-pulse basis functions, in short BPFs) on the non-uniform mesh 0 = u 0 < u 1 < u n = 1 over [0,1], where u j = x j p(x; X ) with local support { p ξ j (u) = 1/2 (j) for u j 1 < u u j ; (6) 0 elsewhere. They are disjoint, orthogonal and complete set of functions satisfying 1 0 ξ j (u) du = p(j), 1 0 ξ 2 j (u) du = 1, and 1 0 ξ j (u)ξ k (u) du = δ jk. Theorem 2. Then the solution of the integral equation (4) for block-pulse orthogonal series approximated (6) Fourier coefficients {α jk } can equivalently be written down in closed form as the following matrix eigen-value problem L [α] = λα, (7) where L = L uu T, L is the Laplacian matrix, and u = D 1/2 p 1 n.

10 8. Connection with Diffusion Map Theorem 3. The empirical GraField admits the following vertex-domain spectral decomposition p(y x; G) p(y; G) = 1 + k λ k φ k (x) φ k (y), (8) where φ k = Dp 1/2 u k, (u k is the kth eigenvector of the Laplacian matrix L), (φ k F )( ) is abbreviated as φ k ( ), p(y x; G) = T (x, y), and T = D 1 A is the transition matrix of a random walk on G with stationary distribution p(y; G) = d y /N. NOTE: Since { φ k } n 1 k=1 approximate the optimal Karhunen-Loeve representation basis, it is only natural to use them for non-linear embedding of graphs. Our approach provides an additional insight and justification for the diffusion coordinates by interpreting it as the strength of connectivity profile for each vertex.

11 9. Connection with Modularity Spectral Analysis Theorem 4. To approximate KL graph basis φ k = j α jkξ j, choose ξ j (u) = I(u j 1 < u u j ) to be characteristic function satisfying 1 0 ξ j (u) du = 1 0 ξ 2 j (u) du = p(j; G). Then the corresponding spectral estimating equation can equivalently be reduced to the following generalized eigenvalue equation in terms of the matrix B = A N 1 dd T Bα = λdα. (9) NOTE: The matrix B, known as modularity matrix, was introduced by Newman (2006) from an entirely different motivation.

12 Spectral Regularization and High-d Discrete Parameter Space 1 Recall that the amplitude of the top hat indicator basis functions {ξ k } depends on the unknown distribution p(x; G) for x = [n]. 2 MLE estimate the unknown distribution (p 1, p 2,..., p n ) (support size = size of the graph = n) is known to be extremely inefficient for N/n = O(1) Large sparse graph where both n and N are of comparable order. 3 A simple solution which is remarkably serviceable is the Laplace/Additive smoothing Raw-empirical MLE estimates: p(j; G) = d j N ; Smooth Laplace estimates: p(j; G) = d j + τ N + nτ { 1 Laplace estimator; τ = 1/2 Krichevsky Trofimov estimator (j = 1,..., n).

13 10. Connection with Type-I Reg. Laplacian τ-regularized indicator basis. Construct τ-regularized top hat indicator basis ξ j;τ by replacing the amplitude p 1/2 (j) by pτ 1/2 (j). Theorem 5. τ-regularized block-pulse series based spectral approximation scheme is equivalent to representing or embedding discrete graphs in the continuous eigenspace of Type-I Regularized Laplacian = D 1/2 τ A D 1/2 τ, (10) where D τ is a diagonal matrix with i-th entry d i + τ. More Surprise: This EXACT regularized Laplacian formula was proposed by Chaudhuri et al. (2012) and Qin and Rohe (2013).

14 11. Connection with Type-II Reg. Laplacian Theorem 6. Estimate the joint probability p(j, k; G) by extending the univariate formula for two-dimensional case as follows: N p τ (j, k; G) = p(j, k; G) + nτ ( ) 1 N + nτ N + nτ n 2, (11) which is equivalent to replacing the original adjacency matrix by A τ = A + (τ/n)11 T. Note that this strategy automatically produces the Laplace-smoothed marginals. Verify this leads to the following spectral graph matrix Type-II Regularized Laplacian = D 1/2 τ A τ D 1/2 τ. (12) Surprise... IDENTICAL to the proposal given in Amini et al. (2013), thus provides a more FUNDAMENTAL understanding of the spectral regularization (consequence: HD Discrete data smoothing), which previously considered as empirical guesswork based fine-tuned solution.

15 Final Remark: Nonparametric Spectral Analysis of Graphs The Gist Spectral Graph Analysis can be transformed into the following canonical graph learning problem: A method of obtaining an approximate Karhunen-Loéve basis functions of GraField C (u, v; G) via orthogonal series expansion, by solving Graph Co-moment based estimating equation. This technique serves as a general-purpose unified framework for understanding the spectral graph theory from statistical perspective. Our formalism extract ALL the pieces of SGT in a coherent manner (by revealing the underlying connections), which have been discovered by many researchers using different approaches and reasoning. Our statistical viewpoint allows generalization and inspire to develop new computational methods that could be necessary to handle large-scale discrete graph problems. Thanks.

Unified Statistical Theory of Spectral Graph Analysis

Unified Statistical Theory of Spectral Graph Analysis Subhadeep Mukhopadhyay Department of Statistical Science, Temple University Philadelphia, Pennsylvania, 19122, U.S.A. Dedicated to the beloved memory