LAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION

LAPLACIAN MATRIX LEARNING FOR SMOOTH GRAPH SIGNAL REPRESENTATION Xiaowen Dong, Dorina Tanou, Pascal Frossard and Pierre Vandergeynst Media Lab, MIT, USA xdong@mit.edu Signal Processing Laboratories, EPFL, Switzerland {dorina.tanou, pascal.frossard, pierre.vandergeynst}@epfl.c ABSTRACT Te construction of a meaningful grap plays a crucial role in te emerging field of signal processing on graps. In tis paper, we address te problem of learning grap Laplacians, wic is similar to learning grap topologies, suc tat te input data form grap signals wit smoot variations on te resulting topology. We adopt a factor analysis model for te grap signals and impose a Gaussian probabilistic prior on te latent variables tat control tese grap signals. We sow tat te Gaussian prior leads to an efficient representation tat favours te smootness property of te grap signals, and propose an algoritm for learning graps tat enforce suc property. Experiments demonstrate tat te proposed framework can efficiently infer meaningful grap topologies from only te signal observations. Index Terms Grap learning, grap signal processing, representation teory, factor analysis, Gaussian prior. 1. INTRODUCTION Modern data processing tasks often manipulate structured data, were signal values are defined on te vertex set V of a weigted and undirected grap G. We refer to suc data as grap signals. Due to te irregular structure of te grap domain, processing tese signals is a callenging task tat combines tools form algebraic and spectral grap teory wit computational armonic analysis [1, 2]. Currently, most of te researc effort in te emerging field of signal processing on graps as been devoted to te analysis and processing of te grap signals in bot te vertex and te spectral domain of te grap. Te grap owever, wic is crucial for te successful processing of tese signals, is considered to be known a priori or naturally cosen from te application domain. However, tere are cases were a good grap is not readily available. It is terefore desirable in tese situations to learn te grap topology from te observed data suc tat it captures te intrinsic relationsips between te entities. Tis is exactly te motivation and objective of tis paper. Te key callenge in te problem of grap learning is to coose some meaningful criteria to evaluate te relationsips between te signals and te grap topology. In tis paper, we are interested in a family of signals tat are smoot on a grap. Given a set of signals X = {x i} p i=1, xi Rn, defined on a weigted and undirected grap G of n vertices, we would like to infer an optimal topology of G, namely, its edges and te associated weigts, wic results in te smootness of tese signals on tat grap. More precisely, we want Tis work was done wile te first autor was at EPFL. It was partially supported by te LOGAN project funded by Hasler Foundation, Switzerland. to find an optimal Laplacian matrix for te grap G from te signal observations. We define te relationsip between signals and graps by revisiting te representation learning teory [3]. Specifically, we consider a factor analysis model for te grap signals, and impose a Gaussian prior on te latent variables tat control te observed signals. Te transformation from te latent variables to te observed signals involves information about te topology of te grap. As a result, we can define joint properties between te signals and te grap, suc tat te signal representation is consistent wit te Gaussian prior. We ten propose an algoritm for grap learning tat favours signal representations wic are smoot and consistent wit te statistical prior defined on te data. Specifically, given te input signal observations, our algoritm iterates between te updates of te grap Laplacian and te signal estimates wose variations on te learned grap are imized upon convergence. We test our grap learning algoritm on syntetic data, were we sow tat it efficiently infers te topology of te groundtrut graps, by recovering te correct edge positions. We furter demonstrate te meaningfulness of te proposed framework on some meteorological signals, were we exploit te spectral properties of te learned grap for clustering its nodes troug spectral clustering [4]. Te proposed framework is one of te first rigorous frameworks to solve te callenging problem of grap learning in grap signal processing. It provides new insigts into te understanding of te interactions between signals and graps, wic could be beneficial in many real world applications, suc as te analysis of transportation, biomedical, and social networks. Finally, it is important to notice tat te objective of our grap learning problem is to infer a grap Laplacian operator tat can be used for analysing or processing grap signals of te same class as te training signals. Tis is clearly different from te objective of frameworks for learning Gaussian grapical models [5, 6, 7] proposed in macine learning, were te estimated inverse covariance matrix only represents te conditional dependence structure between te random variables, and cannot be used directly for forg grap signals of given properties 1. 2. FACTOR ANALYSIS FRAMEWORK We consider te factor analysis [8, 9] model as our signal model, wic is a generic linear statistical model tat tries to explain observations of a given dimension wit a potentially smaller number of unobserved latent variables. Suc latent variables usually obey 1 Altoug te work in [7] does learn a valid grap topology, teir metod is essentially similar to te classical approac for sparse inverse covariance estimation, but wit a regularized Laplacian matrix. 978-1-4673-6997-8/15/$31.00 2015 IEEE 3736 ICASSP 2015

given probabilistic priors and lead to effective signal representations in te grap signal processing setting, as we sow next. We start wit te definition of te Laplacian matrix of a grap G. Te unnormalized (or combinatorial) grap Laplacian matrix L is defined as L = D W, were D is te degree matrix tat contains te degrees of te vertices along te diagonal, and W is te adjacency matrix of G. Since L is a real and symmetric matrix, it can be decomposed as L = χλχ T, were χ is te complete set of ortonormal eigenvectors and Λ is te diagonal eigenvalue matrix were te eigenvalues are sorted in increasing order. Te smallest eigenvalue is 0 wit a multiplicity equal to te number of connected components of te grap [10]. We consider te following model: x = χ + u x + ɛ, (1) were x R n represents te observed grap signal, R n represents te latent variable tat controls te grap signals x, χ is te representation matrix tat linearly relates te two random variables, u x R n is te mean of x, and ɛ is a multivariate Gaussian noise wit mean zero and covariance σ 2 ɛ I n. Te probability density function of ɛ is given by: p(ɛ) N (0, σ 2 ɛ I n). (2) Moreover, we impose a Gaussian prior on te latent variable. Specifically, we assume tat te latent variable follows a degenerate zero-mean multivariate Gaussian distribution wit precision matrix defined as te eigenvalue matrix Λ of te grap Laplacian L: p() N (0, Λ ). (3) were Λ is te Moore-Penrose pseudoinverse of Λ. Te conditional probability of x given, and te probability of x, are respectively given as: p(x ) N (χ + u x, σ 2 ɛ I n), (4) p(x) N (u x, L + σ 2 ɛ I n), (5) were we ave used in Eq. (5) te fact tat te pseudoinverse of L, L, admits te eigendecomposition L = χλ χ T. Te representation in Eq. (1) leads to smootness properties for te signal on te grap. To see tis, recall tat te latent variables explain te grap signal x troug te representation matrix χ, namely, te eigenvector matrix of te grap Laplacian. Given te observation x and te multivariate Gaussian prior distribution of in Eq. (3), we are tus interested in a maximum a posteriori (MAP) estimate of. Specifically, by applying Bayes rule and assug witout loss of generality tat u x = 0, te MAP estimate of te latent variable can be written as follows [11]: MAP(x) := arg max p( x) = arg max p(x )p() = arg ( log p E(x χ) log p H()). From te probability distributions sown in Eq. (2) and Eq. (3), te above MAP estimate of Eq. (6) can be expressed as: (6) MAP(x) = arg x χ 2 2 + α T Λ, (7) were α is some constant parameter. In a noise-free scenario were x = χ, Eq. (7) corresponds to imizing te following quantity: T Λ = (χ T x) T Λχ T x = x T χλχ T x = x T Lx. (8) Te Laplacian quadratic term in Eq. (8) is usually considered as a measure of smootness of te signal x on G [12]. Terefore, we see tat in a factor analysis model in Eq. (1), a Gaussian prior in Eq. (3) imposed on te latent variable leads to smootness properties for te grap signal. Similar observations can be made in a noisy scenario, were te main component of te signal x, namely, χ, is smoot on te grap. We are going to make use of te above observations in our grap learning algoritm in te following section. 3. LEARNING GRAPH LAPLACIAN UNDER SIGNAL SMOOTHNESS PRIOR As sown above, given a Gaussian prior in te factor analysis model of te grap signals, te MAP estimate of in Eq. (7) implies tat te signal observations form smoot grap signals. Specifically, notice in Eq. (7) tat bot te representation matrix χ and te precision matrix Λ of te Gaussian prior distribution imposed on come from te grap Laplacian L. Tey respectively represent te eigenvector and eigenvalue matrices of L. Wen te grap is unknown, we can terefore ave te following joint optimization problem of χ, Λ and in order to infer te grap topology: x χ,λ, χ 2 2 + α T Λ. (9) Eq. (9) can be simplified wit te cange of variable y = χ to: L,y x y 2 2 + α y T Ly. (10) According to te factor analysis model in Eq. (1), y can be considered as a noiseless version of te zero-mean observation x. Due to te properties of te grap Laplacian L, te quadratic form y T Ly in Eq. (10) is usually considered as a measure of smootness of te signal y on G. Solving te problem of Eq. (10) is tus equivalent to finding jointly te Laplacian L (wic is equivalent to te topology of te grap) and te signal y tat is close to te observation x and at te same time smoot on te learned grap G. As a result, it enforces te smootness property of te observed signals on te learned grap. We propose to solve te optimization problem of Eq. (10) wit te following objective function given in a matrix form: X Y 2 L R n n,y R n p F + α tr(y T LY ) + β L 2 F, s.t. tr(l) = n, L ij = L ji 0, i j, L 1 = 0, (11) were X R n p contains te p input data samples {x i} p i=1 as columns, α and β are two positive regularization parameters, and 1 and 0 denote te constant one and zero vectors. Te first constraint (te trace constraint) in Eq. (11) permits to avoid trivial solutions, and te second and tird constraints guarantee tat te learned L is a valid Laplacian matrix. Te latter is particularly important for two reasons: (i) only a valid Laplacian matrix can lead to te interpretation of te input data as smoot grap signals; (ii) a valid Laplacian allows us to define notions of frequencies in te irregular grap domain, and use successfully already existing signal processing tools on graps [1]. Furtermore, under te latter constraints, te trace constraint essentially fixes te L 1 -norm of L, wile te Frobenius norm is added as a penalty term in te objective function to control te distribution of te off-diagonal entries in L, namely, te edge weigts of te learned grap. Te optimization problem of Eq. (11) is not jointly convex in L and Y. Terefore, we adopt an alternating optimization sceme 3737

were, at eac step, we fix one variable and solve for te oter variable. Specifically, at te first step, for a given Y (wic at te first iteration is initialized as te input X), we solve te following optimization problem wit respect to L: L α tr(y T LY ) + β L 2 F, s.t. tr(l) = n, L ij = L ji 0, i j, L 1 = 0. (12) At te second step, L is fixed and we solve te following optimization problem wit respect to Y : Y X Y 2 F + α tr(y T LY ). (13) Bot Eq. (12) and Eq. (13) can be casted as convex optimization problems. Te first one is a quadratic program tat can be solved efficiently wit state-of-te-art convex optimization packages, wile te second one as a closed form solution. A detailed description about solving tese two problems is presented in [13]. We ten alternate between tese two steps to get te final solution to te problem of Eq. (11), and we generally observe convergence to a local imum witin a few iterations. We finally remark tat te proposed learning framework as some similarity wit te one in [14], were te autors ave proposed a similar objective as te one in Eq. (11), based on a smootness or fitness metric of te signals on graps. However, we rater take ere a probabilistic approac tat is analogous to te one in te traditional signal representation setting wit te factor analysis model. Tis gives us an extra data fitting term X Y 2 F in te objective function of te optimization problem of Eq. (11). In practice, wen te power of Laplacian is cosen to be 1, te problem in [14] corresponds to finding te solution to a single instance of te problem of Eq. (12) by assug tat X = Y. 4.1. Experimental settings 4. EXPERIMENTS We denote te proposed algoritm as GL-SigRep and test its performance by comparing te grap learned from sets of syntetic or real world observations to te groundtrut grap. We provide bot visual and quantitative comparisons, were we compare te existence of edges in te learned grap to te ones of te groundtrut grap. In our experiments, we solve te optimization of Eq. (12) using te convex optimization package CVX [15, 16]. Te experiments are carried out on different sets of parameters, namely, for different values of α and β in Eq. (11). Finally, we prune insignificant edges tat ave a weigt smaller tan 10 4 in te learned grap. We compare te proposed grap learning framework to a stateof-te-art approac for estimating a sparse inverse covariance matrix for Gaussian Markov Random Field (GMRF). Specifically, te works in [5, 6] propose to solve te following L 1 -regularized logdeterant program: tr(slpre) log det(lpre) + λ Lpre 1, (14) L pre Rn n were L pre is te inverse covariance matrix (or precision matrix) to estimate, S = XX T is te sample covariance matrix, λ is a regularization parameter, det( ) denotes te deterant, and 1 denotes te L 1 -norm. Te problem of Eq. (14) is conceptually similar to te problem of Eq. (11), in te sense tat bot can be interpreted as estimating te precision matrix of a multivariate Gaussian distribution. An important difference is owever tat te precision matrix in our framework is a valid grap Laplacian, wile te one in Eq. (14) is not. Terefore, L pre cannot be interpreted as a grap topology for defining grap signals; it rater only reflects te partial correlations between te random variables tat control te observations. As a result, te learning of L pre is not directly linked to te desired properties of te input grap signals. In our experiments, we solve te L 1 -regularized log-deterant program of Eq. (14) wit te ADMM [17]. We denote tis algoritm as GL-LogDet. We test GL-LogDet based on different coices of te parameter λ in Eq. (14). In te evaluation, all te off-diagonal non-zero entries wose absolute values are above te tresold of 10 4 are considered as valid correlations. Tese correlations are ten considered as learned edges and compared against te edges in te groundtrut grap for performance evaluation. 4.2. Results on syntetic data We first carry out experiments on a syntetic grap of 20 vertices. More specifically, we generate te coordinates of te vertices uniformly at random in te unit square, and compute te edge weigts between every pair of vertices using te Euclidean distances between tem and a Gaussian radial basis function (RBF): exp ( d(i, j) 2 /2σ 2), wit te widt parameter σ = 0.5. We remove all te edges wose weigts are smaller tan 0.75. We ten compute te grap Laplacian L and normalize te trace according to Eq. (11). Moreover, we generate 100 signals X = {x i} 100 i=1 tat follow te distribution sown in Eq. (5) wit u x = 0 and σ ɛ = 0.5. We ten apply GL-SigRep and GL-LogDet to learn te grap Laplacian or te precision matrix, respectively, given only X. In Fig. 1, we sow visually, from te left to te rigt columns, te Laplacian matrix of te groundtrut grap, te grap Laplacian learned by GL-SigRep, te precision matrix learned by GL- LogDet, and te sample covariance matrix S = XX T, for one random instance of te Gaussian RBF grap 2. We see clearly tat te grap Laplacian matrix learned by GL-SigRep is visually more consistent wit te groundtrut data tan te precision matrix learned by GL-LogDet and te sample covariance matrix. Next, we evaluate quantitatively te performance of our grap learning algoritm in recovering te positions of te edges in te groundtrut, and we compare to tat obtained by GL-LogDet. In Table 1, we sow te best F-measure, Precision, Recall and Normalized Mutual Information (NMI) [18] scores acieved by te two algoritms averaged over ten random instances of te Gaussian RBF grap wit te associated signals X. Our algoritm clearly outperforms GL-LogDet in terms of all te evaluation criteria. Especially, GL-SigRep acieves an average F-measure score close to 0.9, wic means tat te learned graps ave topologies tat are very similar to te groundtrut ones. Furter discussions about te influence of te parameters in te algoritms, te number of training signals, and te noise level, are presented in [13]. 4.3. Learning meteorological grap from temperature data We now test te proposed grap learning framework on real world data. Specifically, we consider te average montly temperature data collected at 89 measuring stations in Switzerland during te period between 1981 and 2010. Tis leads to 12 signals (i.e., one per mont), eac of dimension 89, wic correspond to te average temperatures at eac of te measuring stations. By applying te 2 Tese results are obtained based on te parameters, namely, α and β in GL-SigRep and λ in GL-LogDet, tat lead to a similar number of edges as te ones in te groundtrut grap. Te values of te sample covariance matrix are scaled before te visualization. 3738

(a) Gaussian RBF: Groundtrut (b) Gaussian RBF: GL-SigRep (c) Gaussian RBF: GL-LogDet (d) Gaussian RBF: Sample covariance Fig. 1. Te learned grap Laplacian or precision matrices. From te left to te rigt columns are te groundtrut Laplacian, te Laplacian learned by GL-SigRep, te precision matrix learned by GL-LogDet, and te sample covariance. Table 1. Performance comparison for GL-SigRep and GL-LogDet. Algoritm F-measure Precision Recall NMI GL-SigRep 0.8803 0.8535 0.9108 0.5902 GL-LogDet 0.4379 0.2918 0.8851 0.0220 proposed grap learning algoritm, we would like to infer a grap were stations wit similar temperature evolutions across te year are connected. In oter words, we aim at learning a grap on wic te observed temperature signals are smoot. In tis case, te natural coice of a geograpical grap based on pysical distances between te stations does not seem appropriate for representing te similarity of temperature values between tese stations. Indeed, we observe tat te evolution of temperatures at most of te stations follows very similar trends across te year and are tus igly correlated, regardless of te geograpical distances between tem. On te oter and, it turns out tat altitude is a more reliable source of information to detere temperature evolutions. For instance, as we observed from te data, temperatures at two stations, Jungfraujoc and Piz Corvatsc, follow similar trends tat are clearly different from oter stations, possibly due to teir similar altitudes (bot are more tan 3000 metres above sea level). Terefore, te goal of our experiment is ten to opefully learn a grap tat reflects te altitude relationsip between te stations given te observed temperature signals. We verify our results by separating tese measuring stations into disjoint clusters based on te grap learned by GL-SigRep, suc tat different clusters correspond to different caracteristics of te stations. In particular, since te learned grap is a valid Laplacian, we can apply te spectral clustering algoritm [4] to partition te vertex set into two disjoint clusters. Te results are sown in Fig. 2, were te red and blue dots represent two different clusters of stations. As we can see, te stations in te red cluster are mainly tose built on te mountains, suc as tose in te Jura Mountains and Alps, wile te ones in te blue cluster are mainly stations in flat regions. It is especially interesting to notice tat, te blue stations in te Alps region (from centre to te bottom rigt of te map) mainly lie in te valleys along main roads (suc as tose in te canton of Valais) or in te Lugano region. Tis sows tat te obtained clusters indeed capture te altitude information of te measuring stations ence confirms te quality of te learned grap topology. Fig. 2. Two clusters of te measuring stations obtained by applying spectral clustering to te learned grap. Te red and blue clusters include stations at iger and lower altitudes, respectively. 5. CONCLUSION We ave presented a framework for learning grap topologies from te signal observations under te assumption tat te resulting grap signals are smoot. Te framework is based on te factor analysis model and leads to te learning of a valid grap Laplacian matrix tat can be used for analysing and processing grap signals. We ave demonstrated troug experimental results te efficiency of our algoritm in inferring meaningful grap topologies. We believe tat te proposed grap learning framework can open new perspectives in te field of signal processing on graps and can also benefit applications were one is interested in exploiting spectral grap metods for processing data wose structure is not explicitly available. 6. REFERENCES [1] D. I Suman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergeynst, Te emerging field of signal processing on graps: Extending ig-dimensional data analysis to networks and oter irregular domains, IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83 98, May 2013. [2] A. Sandryaila and J. M. F. Moura, Discrete signal processing 3739

on graps, IEEE Transactions on Signal Processing, vol. 61, no. 7, pp. 1644 1656, Apr 2013. [3] Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Macine Intelligence, vol. 35, no. 8, pp. 1798 1828, Aug 2013. [4] A. Ng, M. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algoritm, in Advances in Neural Information Processing Systems 14 (NIPS), 2001, pp. 849 856. [5] O. Banerjee, L. E. Gaoui, and A. d Aspremont, Model selection troug sparse maximum likeliood estimation for multivariate Gaussian or binary data, Journal of Macine Learning Researc, vol. 9, pp. 485 516, Jun 2008. [6] J. Friedman, T. Hastie, and R. Tibsirani, Sparse inverse covariance estimation wit te grapical lasso, Biostatistics, vol. 9, no. 3, pp. 432 441, Jul 2008. [7] B. Lake and J. Tenenbaum, Discovering structure by learning sparse grap, in Proceedings of te 33rd Annual Cognitive Science Conference, 2010. [8] D. J. Bartolomew, M. Knott, and I. Moustaki, Latent variable models and factor analysis: A unified approac, 3rd Edition, Wiley, Jul 2011. [9] A. Basilevsky, Statistical factor analysis and related metods, Wiley, Jun 1994. [10] F. R. K. Cung, Spectral grap teory, American Matematical Society, 1997. [11] R. Gribonval, Sould penalized least squares regression be interpreted as maximum a posteriori estimation?, IEEE Transactions on Signal Processing, vol. 59, no. 5, pp. 2405 2410, May 2011. [12] D. Zou and B. Scölkopf, A regularization framework for learning from grap data, in ICML Worksop on Statistical Relational Learning, 2004, pp. 132 137. [13] X. Dong, D. Tanou, P. Frossard, and P. Vandergeynst, Learning Graps from Signal Observations under Smootness Prior, in arxiv:1406.7842, 2014. [14] C. Hu, L. Ceng, J. Sepulcre, G. E. Fakri, Y. M. Lu, and Q. Li, A grap teoretical regression model for brain connectivity learning of Alzeimer s disease, in Proceedings of te IEEE International Symposium on Biomedical Imaging (ISBI), 2013. [15] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programg, version 2.0 beta, ttp:// cvxr.com/cvx, September 2013. [16] M. Grant and S. Boyd, Grap implementations for nonsmoot convex programs, in Recent Advances in Learning and Control, V. Blondel, S. Boyd, and H. Kimura, Eds., Lecture Notes in Control and Information Sciences, pp. 95 110. Springer- Verlag Limited, 2008, ttp://stanford.edu/ boyd/ grap_dcp.tml. [17] S. Boyd, N. Parik, E. Cu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via te alternating direction metod of multipliers, Foundations and Trends in Macine Learning, vol. 3, no. 1, pp. 1 122, 2011. [18] C. D. Manning, P. Ragavan, and H. Scütze, Introduction to information retrieval, Cambridge University Press, 2008. 3740