Supplement to Graph Sparsification Approaches for Laplacian Smoothing

Size: px

Start display at page:

Download "Supplement to Graph Sparsification Approaches for Laplacian Smoothing"

Hugo Stone
5 years ago
Views:

1 Supplemet to Graph Sparsificatio Approaches for Laplacia Smoothig Veerajaeyulu Sadhaala Yu-Xiag Wag Rya J. Tibshirai Machie Learig Departmet Caregie Mello Uiversity Pittsburgh PA 53 Departmet of Statistics Caregie Mello Uiversity Pittsburgh PA 53 This documet cotais proofs supplemetary details ad supplemetary experimets for the paper Graph Sparsificatio Approaches for Laplacia Smoothig. All sectio umbers equatio umbers ad figure umbers i this supplemetary documet are preceded by the letter A to distiguish them from those from the mai paper. A. Proof of Theorem Part a. By optimality of ˆθ for problem 5 y ˆθ + λˆθ T Lˆθ y ˆβ + λ ˆβ T L ˆβ y ˆβ + λ + ɛ ˆβ T L ˆβ where we have used the spectral similarity of L L. Rearragig we see that ˆθ ˆβ y T ˆθ ˆβ + λ + ɛ ˆβ T L ˆβ λˆθ T Lˆθ. Substitutig y = ˆβ + y ˆβ o the right-had side ad agai rearragig ˆθ ˆβ y ˆβ T ˆθ ˆβ + λ + ɛ ˆβ T L ˆβ λˆθ T Lˆθ. Usig y ˆβ = λl ˆβ from the statioarity coditio for ˆθ ˆβ λˆθ T L ˆβ λ ɛ ˆβ T L ˆβ λˆθ T Lˆθ λ L / ˆθ L / ˆβ λ ɛ ˆβ T L ˆβ λˆθ T Lˆθ λ + ɛ L / ˆθ L / ˆβ λ ɛ ˆβ T L ˆβ λˆθ T Lˆθ where we have agai used the spectral similarity of L L. Now we examie two cases for last lie above. If + ɛ L/ ˆθ L / ˆβ the ˆθ ˆβ λ + ɛ ˆβ T L ˆβ λˆθ T Lˆθ. If + ɛ L / ˆθ > L / ˆβ the ˆθ ˆβ λ + ɛˆθ T Lˆθ λ ɛ ˆβT L ˆβ. Puttig these together we get the desired fial boud. Proof of b. Followig the proof strategy for part a except with the regularizatio parameter deoted by

2 λ for problem 5 we have ˆθ ˆβ λˆθ T L ˆβ + + ɛλ λ ˆβ T L ˆβ λ ˆθT Lˆθ λ L / ˆθ L / ˆβ + + ɛλ λ ˆβ T L ˆβ λ ˆθT Lˆθ λ + ɛ L / ˆθ L / ˆβ + + ɛλ λ ˆβ T L ˆβ λ ˆθT Lˆθ = λ + ɛ L / ˆθ L / λ ˆβ λ + ɛ L / ˆθ + + ɛλ λ ˆβ T L ˆβ. }{{} a We ow examie two cases for the first term a o the lie above. If λ /λ + ɛ L / ˆθ L / ˆβ the a 4λ + ɛ/λ ˆβ T L ˆβ; if λ /λ + ɛ L / ˆθ > L / ˆβ the a 0. Therefore ˆθ ˆβ 4λ + ɛ + + ɛλ λ ˆβT L ˆβ. λ A easy calculatio shows that the optimal value of λ makig the leadig factor above as small as possible is λ = λ. Pluggig this i gives the desired result. A. Proof of Theorem Let us first cosider the uivariate logistic fuctio g : R R defied by gx = ax + log + e x for a costat a R. It is clear that g is covex with first ad secod derivatives g x = a + πx g x = πx πx where πx = e x / + e x. If x R the g x δ δ where δ = π R. Thus by strog covexity of g over the iterval [ R R] we have gz gx g xz x or equivaletly by the mootoicity of the iverse lik fuctio π gz gx g xz x δ δ z x for all x z [ R R]. δ δ z x for all πx πz [δ δ]. Now write the logistic loss i 3 as fβ = i= gβ i; y i ; the the above shows that fθ fβ fβ T θ β δ δ θ β wheever πβ i πθ i [δ δ] i =.... A. Hece uder the assumptios of the theorem we may begi with the assertio that fˆθ + λ ˆθT Lˆθ f ˆβ + λ ˆβT L ˆβ by the optimality of ˆθ for its ow problem the rearrage ad use A. to arrive at δ δ ˆβ ˆθ f ˆβ T ˆθ ˆβ + λ ˆβT L ˆβ λ ˆθT Lˆθ f ˆβ T ˆθ ˆβ + λ + ɛ ˆβ T L ˆβ λ ˆθT Lˆθ. Usig f ˆβ = λl ˆβ from the statioarity coditio for ˆβ i we have δ δ ˆβ ˆθ λˆθ T L ˆβ + + ɛλ λ ˆβ T L ˆβ λ ˆθT Lˆθ. The right-had side is precisely of the same form as that aalyzed i parts a ad b of Theorem ad the results follow.

3 A.3 Estimatio Error Bouds ad Stability The followig is a estimatio error boud derived for Laplacia smoothig i regressio where the error rate is show to scale with λ/. Theorem A.. Assume that y Nβ σ I ad deote by... the eigevalues of the graph Laplacia matrix L. Let i 0 {... } ad cosider a choice of tuig parameter λ = Θ i=i 0+ i Dβ where D the graph differece operator so that L = D T D. The the graph Laplacia smoothig estimate ˆβ i with the regressio loss satisfies ˆβ β ullityl = O P + i 0 + i=i 0+ Dβ. i Proof. Let R = rowl the row space of L ad R = ulll the ull space of L. Also let P R be the projectio oto R ad P R be the projectio oto R ad abbreviate x R = P R x x R = P R x. We ca decompose ˆβ β = ˆβ ˆβ R + ˆβ ˆβ R. The first term ˆβ ˆβ R is o the order of ullityl which is the umber of coected compoets i the uderlyig graph G. This cotributes the first term i the error rate of the theorem. Now it suffices to cosider ˆβ ˆβ R. Usig the optimality of ˆβ i y ˆβ + λ ˆβ T L ˆβ y β + λβ T Lβ. After settig y = β + ɛ with ɛ N0 σ I expadig ad rearragig we arrive at the basic iequality ˆβ β R ˆβ β P R ɛ + λβ T Lβ λ ˆβ T L ˆβ. A. Abbreviatig δ = ˆβ β let us write ɛ T P R δ = ɛ T P i0 P R δ + ɛ T I P i0 P R δ where P i0 is the projectio oto the first i 0 eigevectors of L i.e. the eigevectors associated with eigevalues... i0. The first term above satisfies whereas the secod term satisfies ɛ T P i0 P R δ P i0 ɛ δ R = O P i0 δ R ɛ T I P i0 P R δ = ɛ T I P i0 D + Dδ = ɛt I P i0 D + D + T I P i0 ɛ λdδ + λ λ λ Dδ. A.3 I the last lie here we used the iequality a T b a + b. Directly from the sigular value decompositio of D it is easy to verify that D + T I P i0 ɛ = O P. i=i i 0+ 3

4 Pluggig these bouds ito the basic iequality A. we see that ˆβ β R O P i0 ˆβ β R + O P i=i i λ + λ D ˆβ D ˆβ 0 + λ Dβ λ D ˆβ 0+ O P i0 ˆβ β R + O P λ + λ Dβ. i=i i 0+ I the last lie we have agai used a + b a + b. Pluggig i the value of λ as described i the statemet of the theorem yields ˆβ β R O P i0 ˆβ β R + O P i=i i 0+ Dβ. We ca view this as a quadratic equatio of form ax bx c 0 i x = ˆβ β R. As a > 0 the larger of its two roots serves as a boud for x i.e. x b + b + 4ac/a b/a + c/a or x b /a + c/a which meas that ˆβ β R = O P i 0 + Dβ. i This completes the proof. i=i 0+ Remark. Assumig that the umber of coected compoets ullityl stays bouded as grows the optimal i 0 i the theorem would be chose to balace i 0 / with the last term i the rate. This depeds of course o the decay of eigevalues of L; for differet graphs G the eigevalues will decay at differet rates leadig to differet error bouds. But i ay case the result i the theorem reduces to ˆβ β λ = O P Dβ which whe Dβ = O is precisely as claimed i 7 i the mai paper. Our ext result shows that whe the solutio ˆβ is tued as i Theorem A. the achieved pealty is o the same order as that for β. Lemma A.. Uder the same coditios as i Theorem A. if ullityl = O ad we were to choose i 0 = Oλ Dβ the the achieved pealty term satisfies D ˆβ = O P Dβ. Hece i particular if Dβ = O the D ˆβ = O P. Proof. Returig to the proof of Theorem A. cosider replacig the step i A.3 by ɛ T I P i0 P R δ = ɛ T I P i0 D + Dδ = ɛt I P i0 D + D + T I P i0 ɛ 0.5λDδ + λ 0.5λ λ 4 Dδ. Carryig o as i the proof of Theorem A. we arrive at ˆβ β R O P i0 ˆβ β R + O P i=i i 0+ λ + 3λ Dβ λ D ˆβ. Usig ˆβ β R 0 ad rearragig we have D ˆβ i 0 O P λ ˆβ β R + 3 Dβ = O P Dβ where we have used the choice of i 0 ad the correspodig rate of ˆβ β R from Theorem A.. Lastly we show that uder the coditios of the last lemma both ˆβ ad ˆθ the latter beig the solutio of the sparsified problem 5 achieve the same error rate. 4

5 Corollary A.. Uder the same coditios as i Lemma A. assumig that L is a + ɛ approximatio of L the solutio ˆθ of the sparsified problem 5 with the regressio loss ad tuig parameter λ = Θ i=i 0+ i Dβ satisfies just as ˆβ does. ˆθ β λ = O P Dβ Proof. There are two ways to prove this result. The first strategy is simply to ote that ˆθ β ˆθ ˆβ + ˆβ β ad both terms are O P λ Dβ / the first term cotrolled by part b of Theorem i the mai paper i combiatio with Lemma A. ad the secod term hadled by Theorem A.. The secod strategy is to replicate the proof of Theorem A. applied to the problem 5 with L i place of L ad the coclude that the resultig error rate is ot chaged sice all eigevalues of L are withi a multiplicative factor betwee / + ɛ ad + ɛ of those of L. A.4 Proof of Lemma We use Hoeffdig s iequality as a mai tool for provig the results for both the uiform ad kn samplers first treatig the uiform samplig case. For i =... q let e i = u i v i be the ith edge sampled ad deote by X i the term added to x T Lx due to samplig ei. The X i = W q x u x v with probability w e /W for each e = u v E recallig W = e E w e. Note that q i= X i = x T Lx. The variables X X... X q are idepedet ad lie i a iterval of legth W r/q where r = max uv E x u x v mi uv E x u x v. Usig Hoeffdig s iequality P q i= X i EX i > t exp t q i= W r /q = exp t q W r. A.4 As L is a ubiased estimator of L we have q i= EX i = Ex T Lx = x T Lx. Abbreviatig µ = x T Lx ad substitutig t = δµ ito the Hoeffdig boud A.4 we ifer that x T Lx µ δµ A.5 with probability at least exp δ µ q/w r. Now we use the smoothess assumptio 8 o x i.e. Dx µsw mi /W as well as r max x u x v uv E max uv E w uv w mi x u x v = Dx w mi recallig w mi = mi e E w e. Therefore r µs/w ad the result i A.5 holds with probability at least exp δ q/s. To make this probability at least / we eed to choose q s /δ log samples. Lastly to give the result as writte i the lemma we simply substitute δ = ɛ/ + ɛ ad observe that A.5 the implies + ɛ µ xt Lx + ɛ µ + ɛµ. + ɛ 5

6 The argumets for kn samplig are similar. We sample oly from odes with degree greater tha k; call this set U. Whe samplig edges icidet to ode u U for i =... k let e ui deote the ith edge sampled ad let X ui be its cotributio to x T Lx. The X ui = W u k x u x v with probability w e /W u for each v Nu recallig W u = v Nu w uv ad Nu deotes the eighbors of u. The variables X ui u U i =... k are idepedet ad lie i a iterval of legth at most W max r/k where W max = max u V W u ad r is as above i the proof for the uiform samplig part. Usig Hoeffdig s iequality oce agai P k X ui EX ui > t exp u U k i= t Wmax r/k = exp 8t k r u U i= u U W max A.6 Deotig z u = v Nu w uvx u x v for all u V ote that EX ui = z u /k for all u U. Thus ad k u U i= u U i= X ui = x T Lx k EX ui = x T Lx u V \U u V \U This meas that the Hoeffdig boud A.6 settig t = δµ implies the statemet i A.5 with probability exp 8δ µ k/r u U W max exp 8δ µ k/r W max. Boudig r sµ/w from the smoothess coditio 8 as argued i the proof for the uiform samplig case this is further lower bouded by exp 8δ W k/s W max. To make this probability at least / we eed to choose k sw max /W /4δ log. The rest follows as i the uiform samplig case i.e. to get the result as stated i the lemma we take δ = ɛ/ + ɛ. A.5 Proof of Lemma The proof is simple. Deote by L the Laplacia matrix of G H ad L the Laplacia matrix of G H. Also deote the edge weights of G H by w uu w vv ad the edge weights of G H by w uu w vv. Observe x T Lx = w vv x uv x uv + w uu x uv x u v u V G {vv } E H v V H {uu } E G x T Lx = w vv x uv x uv + w uu x uv x u v. z u z u.. u V G {vv } E H v V H {uu } E G Applyig the appropriate spectral sparsificatio bouds to the ier sums gives the result. A.6 Proof of Theorem 3 Let us deote F β = i Ω lx T i β; y i + µ β v. Note that by costructio F is strogly covex with parameter µ > 0 which implies that F θ F β g T θ β + µ θ β A.7 for ay subgradiet g of F at β. Therefore by the optimality of ˆθ for problem 0 F ˆθ + λ ˆθT Lˆθ F ˆβ + λ ˆβT L ˆβ 6

7 MSE Time secods ad after rearragig ad applyig A.7 we have µ ˆβ ˆθ g T ˆθ ˆβ + λ ˆβT L ˆβ λ ˆθT Lˆθ g T ˆθ ˆβ + λ + ɛ ˆβ T L ˆβ λ ˆθT Lˆθ for ay subgradiet g of F at β. As there exists a subgradiet such that g = λl ˆβ from the statioarity coditio for ˆβ i 9 we have µ ˆβ ˆθ λˆθ T L ˆβ + + ɛλ λ ˆβ T L ˆβ λ ˆθT Lˆθ ad the remaider of the aalysis proceeds exactly as i parts a ad b of Theorem. A.7 Gaussia Smoothig with the Google+ Data I this experimet we added i.i.d. N0 5.5 oise to the compoets of β the smooth sigal costructed by a simulated diffusio over the Google+ graph as described i the mai paper. Figure A. shows the results whe cosiderig a Gaussia loss i the Laplacia smoothig problems MSE Sparse kn Uif Dese Sparse kn Uif Dese Time Number of problems Figure A.: MSE ad timig results for a Gaussia smoothig problem with the Google+ data. 7

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality