Computing approximate PageRank vectors by Basis Pursuit Denoising Michael Saunders Systems Optimization Laboratory, Stanford University Joint work with Holly Jin, LinkedIn Corp SIAM Annual Meeting San Diego, July 7 11, 2008 SIAM AN08 1/36
Outline 1 Introduction 2 PageRank 3 Basis Pursuit Denoising 4 Sparse PageRank (Neumann) 5 Sparse PageRank (LPdual) 6 Preconditioning 7 Summary SIAM AN08 2/36
Abstract The PageRank eigenvector problem involves a square system Ax = b in which x is naturally nonnegative and somewhat sparse (depending on b). We seek an approximate x that is nonnegative and extremely sparse. We experiment with an active-set optimization method designed for the dual of the BPDN problem, and find that it tends to extract the important elements of x in a greedy fashion. Supported by the Office of Naval Research SIAM AN08 3/36
Introduction Many applications (statistics, signal analysis, imaging) seek sparse solutions to square or rectangular systems Ax b x sparse SIAM AN08 4/36
Introduction Many applications (statistics, signal analysis, imaging) seek sparse solutions to square or rectangular systems Ax b x sparse Basis Pursuit Denoising (BPDN) seeks sparsity by solving min 1 2 Ax b 2 + λ x 1 SIAM AN08 4/36
Introduction Many applications (statistics, signal analysis, imaging) seek sparse solutions to square or rectangular systems Ax b x sparse Basis Pursuit Denoising (BPDN) seeks sparsity by solving min 1 2 Ax b 2 + λ x 1 Personalized PageRank eigenvectors involve square systems (I αh T )x = v H, v sparse where 0 < α < 1 He e x 0, sparse SIAM AN08 4/36
Introduction Many applications (statistics, signal analysis, imaging) seek sparse solutions to square or rectangular systems Ax b x sparse Basis Pursuit Denoising (BPDN) seeks sparsity by solving min 1 2 Ax b 2 + λ x 1 Personalized PageRank eigenvectors involve square systems (I αh T )x = v H, v sparse where 0 < α < 1 He e x 0, sparse Perhaps BPDN can find a very sparse approximate x SIAM AN08 4/36
PageRank SIAM AN08 5/36
H = 0 1 1/3 1/3 1/3 B 1/2 1/2 C @ 0 0 0 0 1/2 1/2 PageRank Langville and Meyer 2006 A Hyperlink matrix He e SIAM AN08 6/36
H = 0 1 1/3 1/3 1/3 B 1/2 1/2 C @ 0 0 0 0 1/2 1/2 v = 1 n e or e i PageRank Langville and Meyer 2006 A Hyperlink matrix He e Teleportation vector or personalization vector e T v = 1 SIAM AN08 6/36
H = 0 1 1/3 1/3 1/3 B 1/2 1/2 C @ 0 0 0 0 1/2 1/2 v = 1 n e or e i PageRank Langville and Meyer 2006 A Hyperlink matrix He e Teleportation vector or personalization vector e T v = 1 S = H + av T Stochastic matrix Se = e SIAM AN08 6/36
H = 0 1 1/3 1/3 1/3 B 1/2 1/2 C @ 0 0 0 0 1/2 1/2 v = 1 n e or e i PageRank Langville and Meyer 2006 A Hyperlink matrix He e Teleportation vector or personalization vector e T v = 1 S = H + av T Stochastic matrix Se = e G = αs + (1 α)ev T Google matrix α = 0.85 say Ge = e SIAM AN08 6/36
H = 0 1 1/3 1/3 1/3 B 1/2 1/2 C @ 0 0 0 0 1/2 1/2 v = 1 n e or e i PageRank Langville and Meyer 2006 A Hyperlink matrix He e Teleportation vector or personalization vector e T v = 1 S = H + av T Stochastic matrix Se = e G = αs + (1 α)ev T Google matrix α = 0.85 say Ge = e eigenvector linear system G T x = x (I αh T )x = v SIAM AN08 6/36
H = 0 1 1/3 1/3 1/3 B 1/2 1/2 C @ 0 0 0 0 1/2 1/2 v = 1 n e or e i PageRank Langville and Meyer 2006 A Hyperlink matrix He e Teleportation vector or personalization vector e T v = 1 S = H + av T Stochastic matrix Se = e G = αs + (1 α)ev T Google matrix α = 0.85 say Ge = e eigenvector linear system G T x = x (I αh T )x = v In both cases, x x/e T x (and x 0) SIAM AN08 6/36
Web matrices H Name n (pages) nnz (links) i (v = e i ) csstanford 9914 36854 4 cs.stanford.edu Stanford 275689 1623817 23036 cs.stanford.edu 6753 calendus.stanford.edu Stanford-Berkeley 683446 7583376 6753 unknown All data collected around 2001 Cleve Moler Sep Kamvar David Gleich (I αh T )x = v SIAM AN08 7/36
Power method on G T x = x Stanford, n = 275000, v = e 6753 (calendus.stanford.edu) Time (secs) vs α = 0.1 : 0.01 : 0.99 60 50 40 30 20 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α SIAM AN08 8/36
Power method Stanford-Berkeley, n = 683000, v = e 6753 180 Time (secs) vs α = 0.1 : 0.01 : 0.99 160 140 120 100 80 60 40 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α SIAM AN08 9/36
Exact x(α) csstanford, n = 9914, v = e 4 (cs.stanford.edu) x j (α), j = 1:n Stanford alpha=0.850 v=e(4) xtrue=p\v 0.25 0.2 0.15 0.1 0.05 0 0 1000 2000 3000 4000 5000 6000 7000 8000 α = 0.85 SIAM AN08 10/36
Exact x(α) Stanford, n = 275000, v = e 23036 (cs.stanford.edu) x j (α), j = 1:n α = 0.85 SIAM AN08 11/36
Sparsity of x(α) Stanford, n = 275000, v = e 23036 (cs.stanford.edu) x > 250/n x > 500/n 0 0 10 50 20 30 40 50 100 60 70 150 0 20 40 60 80 nz = 4638 80 0 10 20 30 40 50 60 70 80 90 nz = 3076 α = 0.1 : 0.01 : 0.99 (90 values) SIAM AN08 12/36
n = 275000 x > 1000/n x > 2000/n 0 5 0 10 5 15 10 20 15 25 20 30 25 35 30 40 35 45 40 50 0 10 20 30 40 50 60 70 80 90 55 nz = 1909 0 10 20 30 40 50 60 70 80 90 nz = 2305 α = 0.1 : 0.01 : 0.99 (90 values) SIAM AN08 13/36
Basis Pursuit Denoising SIAM AN08 14/36
Lasso(ν) Tibshirani 1996 min L1 sparse x min 1 2 Ax b 2 st x 1 ν A = SIAM AN08 15/36
Lasso(ν) Tibshirani 1996 min L1 sparse x min 1 2 Ax b 2 st x 1 ν A = Basis Pursuit Chen, Donoho & S 1998 min x 1 st Ax = b A = fast operator (Ax, A T y are efficient) SIAM AN08 15/36
Lasso(ν) Tibshirani 1996 min L1 sparse x min 1 2 Ax b 2 st x 1 ν A = Basis Pursuit Chen, Donoho & S 1998 min x 1 st Ax = b A = fast operator (Ax, A T y are efficient) BPDN(λ) Same paper min λ x 1 + 1 2 Ax b 2 fast operator, any shape SIAM AN08 15/36
BP and BPDN algorithms min λ x 1 + 1 Ax b 2 2 OMP Davis, Mallat et al 1997 Greedy BPDN-interior Chen, Donoho & S, 1998, 2001 Interior, CG PDSCO, PDCO Saunders 1997, 2002 Interior, LSQR BCR Sardy, Bruce & Tseng 2000 Orthogonal blocks Homotopy Osborne et al 2000 Active-set, all λ LARS Efron, Hastie, Tibshirani 2004 Active-set, all λ STOMP Donoho, Tsaig, et al 2006 Double greedy l1 ls Kim, Koh, Lustig, Boyd et al 2007 Primal barrier, PCG GPSR Figueiredo, Nowak & Wright 2007 Gradient Projection BCLS Friedlander 2006 Projection, LSQR SPGL1 van den Berg & Friedlander 2007 Spectral GP, all λ BPdual Friedlander & S 2007 Active-set on dual LPdual Friedlander & S 2007 For x 0 SIAM AN08 16/36
x 1 when x 0 Suggests regularized LP problems: LPprimal(λ) min x, y e T x + 1 2 λ y 2 st Ax + λy = b, x 0 SIAM AN08 17/36
x 1 when x 0 Suggests regularized LP problems: LPprimal(λ) min x, y e T x + 1 2 λ y 2 st Ax + λy = b, x 0 LPdual(λ) min y b T y + 1 2 λ y 2 st A T y e SIAM AN08 17/36
x 1 when x 0 Suggests regularized LP problems: LPprimal(λ) min x, y e T x + 1 2 λ y 2 st Ax + λy = b, x 0 LPdual(λ) min y b T y + 1 2 λ y 2 st A T y e Friedlander and S 2007: New active-set Matlab codes SIAM AN08 17/36
LPdual solver for Ax b, x 0 min y b T y + 1 2 λ y 2 st A T y e SIAM AN08 18/36
LPdual solver for Ax b, x 0 min y b T y + 1 2 λ y 2 st A T y e A few active constraints: B T y = e Initially B is empty, y = 0 Select columns of B in almost greedy manner SIAM AN08 18/36
LPdual solver for Ax b, x 0 min y b T y + 1 2 λ y 2 st A T y e A few active constraints: B T y = e Initially B is empty, y = 0 Select columns of B in almost greedy manner Main work per itn: Solve min Bx g Form dy = (g Bx)/λ Form dz = A T dy SIAM AN08 18/36
LPdual solver for Ax b, x 0 min y b T y + 1 2 λ y 2 st A T y e A few active constraints: B T y = e Initially B is empty, y = 0 Select columns of B in almost greedy manner Main work per itn: R Solve min Bx g Update QB = 0 Form dy = (g Bx)/λ Form dz = A T dy «without Q SIAM AN08 18/36
Sparse PageRank with Neumann Series SIAM AN08 19/36
Neumann series (I αh T )x = v (*) x = (I αh T ) 1 v = (I + αh T + (αh T ) 2 + )v A sum of nonnegative terms SIAM AN08 20/36
Neumann series (I αh T )x = v (*) x = (I αh T ) 1 v = (I + αh T + (αh T ) 2 + )v A sum of nonnegative terms Equivalent to Jacobi s method on (*) Thanks David! SIAM AN08 20/36
Neumann series (I αh T )x = v (*) x = (I αh T ) 1 v = (I + αh T + (αh T ) 2 + )v A sum of nonnegative terms Equivalent to Jacobi s method on (*) x x k decreases monotonically Thanks David! SIAM AN08 20/36
Neumann series (I αh T )x = v (*) x = (I αh T ) 1 v = (I + αh T + (αh T ) 2 + )v A sum of nonnegative terms Equivalent to Jacobi s method on (*) Thanks David! x x k decreases monotonically r k x x k 1+α 1 α r k Error bounded by residual SIAM AN08 20/36
Neumann series (I αh T )x = v (*) x = (I αh T ) 1 v = (I + αh T + (αh T ) 2 + )v A sum of nonnegative terms Equivalent to Jacobi s method on (*) Thanks David! x x k decreases monotonically r k x x k 1+α 1 α r k Error bounded by residual Eventually small (x k ) j 0, update remainder SIAM AN08 20/36
Neumann series (I αh T )x = v (*) x = (I αh T ) 1 v = (I + αh T + (αh T ) 2 + )v A sum of nonnegative terms Equivalent to Jacobi s method on (*) Thanks David! x x k decreases monotonically r k x x k 1+α 1 α r k Error bounded by residual Eventually small (x k ) j 0, update remainder See Gleich and Polito (2006) for sparse Power Method SIAM AN08 20/36
Sparse PageRank with BPDN (LPdual) SIAM AN08 21/36
Sparse PageRank Regard as (I αh T )x = v v sparse Ax v v = e i say SIAM AN08 22/36
Sparse PageRank Regard as (I αh T )x = v v sparse Ax v v = e i say Apply active-set solver LPdual to dual of min x, r λe T x + 1 2 r 2 st Ax + r = v, x 0 SIAM AN08 22/36
0.08 Stanford alpha=0.950 v=e(4) x(active) 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 SIAM AN08 23/36
Stanford web, n = 275000 α = 0.85 v = e 23036 (cs.stanford.edu) Direct solve: xtrue = A\v BPDN λ = 2e-3 76 nonzero x j SIAM AN08 24/36
Sparsity of xbp(α) Stanford, n = 275000, v = e 23036 (cs.stanford.edu) xpower > 1000/n xbp > 1/n 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 55 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 nz = 2382 nz = 2305 α = 0.1 : 0.01 : 0.99 (90 values) SIAM AN08 25/36
0 5 10 15 20 25 30 35 40 45 50 Sparsity of xbp(α) Stanford, n = 275000, v = e 23036 (cs.stanford.edu) xpower > 1000/n xbp > 1/n 55 0 10 20 30 40 50 60 70 80 90 nz = 2305 α = 0.1 : 0.01 : 0.99 (90 values) 0 5 10 15 20 25 30 35 40 45 0 10 20 30 40 50 60 70 80 90 nz = 2382 xpower > 1/n has λ = 4e-3 6000 rows, 155000 nonzeros SIAM AN08 25/36
Power method vs BP Stanford, n = 275000, v = e 23036 (cs.stanford.edu) Time (secs) for each α 70 60 BP Direct Power 50 40 30 20 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α = 0.1 : 0.01 : 0.99 (90 values) λ = 4e-3 SIAM AN08 26/36
Power method vs BP Stanford-Berkeley, n = 683000, v = e 6753 (unknown) Time (secs) for each α 180 160 140 BP Direct Power 120 100 80 60 40 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α = 0.1 : 0.01 : 0.99 (90 values) λ = 5e-3 SIAM AN08 27/36
Varying α and λ min x, r λ x 1 + 1 2 r 2 st (I αh T )x + r = v H = 9914 9914 csstanford web v = e 4 (cs.stanford.edu) α = 0.75, 0.85, 0.95 λ = 1e-3, 2e-3, 4e-3, 6e-3 Plot nonzero x j in the order they are chosen SIAM AN08 28/36
csstanford v = e 4 α = 0.75 min λe T x + 1 2 r 2 Stanford alpha=0.750 v=e(4) x(active) Stanford alpha=0.750 v=e(4) x(active) 0.25 0.25 0.2 0.2 0.15 λ = 1e-3 0.15 λ = 2e-3 0.1 0.1 0.05 0.05 0 0 10 20 30 40 50 60 70 0 0 10 20 30 40 50 60 Stanford alpha=0.750 v=e(4) x(active) Stanford alpha=0.750 v=e(4) x(active) 0.25 0.25 0.2 0.2 0.15 λ = 4e-3 0.15 λ = 6e-3 0.1 0.1 0.05 0.05 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 SIAM AN08 29/36
csstanford v = e 4 α = 0.85 min λe T x + 1 2 r 2 Stanford alpha=0.850 v=e(4) x(active) csstanford alpha=0.850 v=e(4) x(active) 0.16 0.16 0.14 0.14 0.12 0.1 0.08 0.12 0.1 λ = 1e-3 λ = 2e-3 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 10 20 30 40 50 60 70 80 90 100 0 0 10 20 30 40 50 60 Stanford alpha=0.850 v=e(4) x(active) Stanford alpha=0.850 v=e(4) x(active) 0.16 0.16 0.14 0.14 0.12 0.1 0.08 0.12 0.1 λ = 4e-3 λ = 6e-3 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 SIAM AN08 30/36
csstanford v = e 4 α = 0.95 min λe T x + 1 2 r 2 Stanford alpha=0.950 v=e(4) x(active) Stanford alpha=0.950 v=e(4) x(active) 0.08 0.08 0.07 0.07 0.06 0.05 0.04 0.06 0.05 λ = 1e-3 λ = 2e-3 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 20 40 60 80 100 0 0 10 20 30 40 50 60 70 0.08 Stanford alpha=0.950 v=e(4) x(active) 0.08 Stanford alpha=0.950 v=e(4) x(active) 0.07 0.07 0.06 0.06 0.05 0.04 0.05 λ = 4e-3 λ = 6e-3 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 SIAM AN08 31/36
SPAI: Sparse Approximate Inverse SIAM AN08 32/36
SPAI Preconditioning A plausible future application Find sparse X such that AX I SIAM AN08 33/36
SPAI Preconditioning A plausible future application Find sparse X such that AX I Find sparse x j such that Ax j e j for j = 1 : n SIAM AN08 33/36
SPAI Preconditioning A plausible future application Find sparse X such that AX I Find sparse x j such that Ax j e j for j = 1 : n Embarrassingly parallel use of sparse BP solutions SIAM AN08 33/36
Summary SIAM AN08 34/36
(I αh T )x v Comments Sparse v sometimes gives sparse x SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner Greedy solvers guarantee sparse x (unlike power method) SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner Greedy solvers guarantee sparse x (unlike power method) Not sensitive to α SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner Greedy solvers guarantee sparse x (unlike power method) Not sensitive to α Sensitive to λ but we can limit iterations SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner Greedy solvers guarantee sparse x (unlike power method) Not sensitive to α Sensitive to λ but we can limit iterations Questions SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner Greedy solvers guarantee sparse x (unlike power method) Not sensitive to α Sensitive to λ but we can limit iterations Questions Run much bigger examples SIAM AN08 35/36
(I αh T )x v Comments Sparse v sometimes gives sparse x LPprimal and LPdual work in greedy manner Greedy solvers guarantee sparse x (unlike power method) Not sensitive to α Sensitive to λ but we can limit iterations Questions Run much bigger examples Which properties of I αh T lead to success of greedy SIAM AN08 35/36
Thanks Shaobing Chen and David Donoho Michael Friedlander (BP solvers) SIAM AN08 36/36
Thanks Shaobing Chen and David Donoho Michael Friedlander (BP solvers) Sou-Cheng Choi and Lek-Heng Lim (ICIAM 07) SIAM AN08 36/36
Thanks Shaobing Chen and David Donoho Michael Friedlander (BP solvers) Sou-Cheng Choi and Lek-Heng Lim (ICIAM 07) Amy Langville and Carl Meyer (splendid book) David Gleich SIAM AN08 36/36
Thanks Shaobing Chen and David Donoho Michael Friedlander (BP solvers) Sou-Cheng Choi and Lek-Heng Lim (ICIAM 07) Amy Langville and Carl Meyer (splendid book) David Gleich Holly Jin SIAM AN08 36/36