Approximating a single component of the solution to a linear system

Approximating a single component of the solution to a linear system Christina E. Lee, Asuman Ozdaglar, Devavrat Shah celee@mit.edu asuman@mit.edu devavrat@mit.edu MIT LIDS 1

How do I compare to my competitors? Pagerank Bonacich Centrality Rank Centrality 2

Q: Do I need to compute the entire solution vector in order to get an estimate of my centrality and the centrality of my competitors? Eek, too expensive!! Stationary distribution of Markov chain Solution to Linear System Pagerank Bonacich Centrality Rank Centrality 3

Q: How do we efficiently compute the solution for a single coordinate within a large network? (1) Stationary distribution of Markov chain G = P (i.e. a stochastic matrix), z = 0 Sample short random walks, relies on sharp characterization of π i (2) Solution to Linear System of Equations If A positive definite, then we can choose G, z s.t. spectral radius G < 1, and x = Gx + z equivalent Expand computation in local neighborhood, use sparsity of G to bound size of neighborhood Would like a local method, i.e. cheaper than computing full vector. 4

Overview of Literature Probabilistic Monte Carlo Stationary Distribution of Markov Chain MCMC methods x = Gx + z, spectral radius(g) < 1 Ulam von Neumann Samples long random walks Good if G has good spectral properties Challenge is to control the variance of estimator Naturally parallelizable 5

Overview of Literature Probabilistic Monte Carlo Stationary Distribution of Markov Chain MCMC methods x = Gx + z, spectral radius(g) < 1 Ulam von Neumann Iterative Algebraic Power method Linear iterative update, Jacobi, Gauss-Seidel Iterative matrix-vector multiplication Good if G is sparse, and has good spectral properties Distributed, but synchronous Each multiplication O(n 2 ) 6

Overview of Literature Probabilistic Monte Carlo Stationary Distribution of Markov Chain MCMC methods x = Gx + z, spectral radius(g) < 1 Ulam von Neumann Iterative Algebraic Power method Linear iterative update, Jacobi, Gauss-Seidel Other SVD Gaussian elimination Not designed for single component computation Requires global computation What if we specifically thought about solving for a single component? 7

Probabilistic Monte Carlo Iterative Algebraic Contributions Stationary Distribution of Markov Chain MCMC methods Power method x = Gx + z, spectral radius(g) < 1 Ulam von Neumann New Monte Carlo method which uses Samples local truncated returning random walks Convergence is a function of mixing time Estimate always guaranteed to be an upper bound Suggest and analyze specific termination criteria which gives multiplicative error bounds for high π i, and thresholds low π i [L., Ozdaglar, Shah NIPS 2013] Linear iterative update, Jacobi, Gauss-Seidel Other SVD Gaussian elimination 8

Contributions Probabilistic Monte Carlo Stationary Distribution of Markov Chain MCMC methods x = Gx + z, spectral radius(g) < 1 Ulam von Neumann Iterative Algebraic Power method Linear iterative update, Jacobi, Gauss-Seidel Can be implemented asynchronously Maintains sparse intermediate vectors Provides invariant which track progress of method [L., Ozdaglar, Shah ArXiv 1411.2647 2014] Other SVD Gaussian elimination 9

Contributions Probabilistic Monte Carlo Iterative Algebraic Stationary Distribution of Markov Chain MCMC methods [JW03, H03, FRCS05, ALNO07, BCG10, BBCT12] Power method x = Gx + z, spectral radius(g) < 1 Ulam von Neumann Linear iterative update, Jacobi, Gauss-Seidel [ABCHMT07] Other SVD Gaussian elimination Both methods are inspired by previous algorithms designed for computing PageRank 10

Computing Stationary Probability Consider a Markov chain Finite state space Transition matrix Irreducible, positive recurrent Goal: compute stationary probability focusing on nodes with higher values 11

Node-Centric Monte Carlo Standard Monte Carlo method samples long random walks until it converges to stationarity ergodic theorem Our method is based on the property 12

Intuition Estimate: 13

Algorithm Gather Samples (i, N, θ) Sample N truncated return paths to i = fraction of samples truncated i i Path length Returned Keep walking exceeded to i! Q: How do we choose appropriate N, θ? 14

Increment k Algorithm Gather Samples (i, N (k), θ (k) ) Terminate if Satisfied Compute θ (k+1) and N (k+1) Double θ: θ (k+1) = 2*θ (k) Increase N such that by Chernoff s bound: 15

Convergence Q: Can we design a smart termination criteria? 16

Increment k Algorithm Gather Samples (i, N (k), θ (k) ) Terminate if Satisfied(Δ) Output current estimate if (a) Stationary prob. is small (b) Fraction of truncated samples is small Compute θ (k+1) and N (k+1) 17

Results for Termination Criteria Termination condition 1 guarantees π i small Termination condition 2 guarantees multiplicative error bound Dependent on algorithm parameters, not input Markov chain Results extend to countable state space Markov chains. 18

Simulation - Pagerank Stationary probability Random graph using configuration model and power law degree distribution Nodes sorted by stationary probability 19

Simulation - Pagerank Stationary probability Obtain close estimates for important nodes Random graph using configuration model and power law degree distribution Nodes sorted by stationary probability 20

Simulation - Pagerank corrects for the bias! Stationary probability Obtain close estimates for important nodes Random graph corrects using for configuration the Fraction bias! model and power law degree samples distribution not truncated Nodes sorted by stationary probability 21

Compute Stationary Probability by sampling local random walks Pagerank Rank Centrality 22

Q: Extend to solving linear system? Ulam von Neumann algorithm When spectral radius of G < 1, Importance sampling, sample walks and reweight unbiased estimator If, may not exist sampling matrix such that variance is finite Does not exploit sparsity Pagerank Bonacich Centrality 23

Synchronous Algorithm Standard stationary linear iterative method x (t) as dense as z Relies upon Neumann series representation Solving for a single coordinate Sparsity pattern is k-hop neighborhood of i 25

Synchronous Algorithm Initialize: Update step: Terminate if: Output: Sparsity of p (t) grows as breadth-first traversal 26

Synchronous - Guarantee If, then the algorithm terminates with (a) Estimate satisfying (b) Total number of multiplications bounded by where Independent of size of matrix 27

Synchronous to Asynchronous Initialize: Update step: Update for node u of r (t), Terminate if: Output: What if we implemented the updates asynchronously instead? 28

Asynchronous Algorithm Initialize: Update step: Choose coordinate u Terminate if: Output: Q: Does this converge? At what rate? Yes, if ρ(g) < 1 and u chosen infinitely often For all t, for any chosen sequence of u, 29

Asynchronous - Guarantee If, then the algorithm terminates with (a) Estimate satisfying (b) With probability > 1 - δ, total number of multiplications is bounded by where Independent of size of matrix 30

Compare different choices for Simulations Bonacich centrality in Erdos-Renyi graph: n=1000 and p=0.0276 31

Compute Stationary Probability by sampling local random walks New Monte Carlo method which uses Samples local truncated returning random walks Convergence is a function of mixing time Estimate always guaranteed to be an upper bound Suggest and analyze specific termination criteria which gives multiplicative error bounds for high π i, and thresholds low π i [L., Ozdaglar, Shah NIPS 2013] Compute Ax = b by expanding computation within neighborhood and utilizing sparsity Can be implemented asynchronously Maintains sparse intermediate vectors Provides invariant which track progress of method [L., Ozdaglar, Shah arxiv:1411.2647 2014] 32