Lecture 8: Path Technology

Similar documents
Lecture 9: Counting Matchings

Lecture 14: October 22

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Lecture 8: Spectral Graph Theory III

Lecture 13: Spectral Graph Theory

Lecture 12: Introduction to Spectral Graph Theory, Cheeger s inequality

Lecture 1: Contraction Algorithm

Lecture 1: Introduction: Equivalence of Counting and Sampling

Lecture 17: D-Stable Polynomials and Lee-Yang Theorems

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Lecture 16: Roots of the Matching Polynomial

Lecture 1 and 2: Random Spanning Trees

CS 6820 Fall 2014 Lectures, October 3-20, 2014

A NOTE ON THE POINCARÉ AND CHEEGER INEQUALITIES FOR SIMPLE RANDOM WALK ON A CONNECTED GRAPH. John Pike

Lecture 6: September 22

Approximate Counting and Markov Chain Monte Carlo

25.1 Markov Chain Monte Carlo (MCMC)

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

11.1 Set Cover ILP formulation of set cover Deterministic rounding

1 Adjacency matrix and eigenvalues

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

2.1 Laplacian Variants

Notes 6 : First and second moment methods

Testing Cluster Structure of Graphs. Artur Czumaj

Lecture 17: Cheeger s Inequality and the Sparsest Cut Problem

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

Lower Bounds for Testing Bipartiteness in Dense Graphs

CMPUT 675: Approximation Algorithms Fall 2014

GRAPH PARTITIONING USING SINGLE COMMODITY FLOWS [KRV 06] 1. Preliminaries

Lecture 7: Advanced Coupling & Mixing Time via Eigenvalues

6.842 Randomness and Computation April 2, Lecture 14

R ij = 2. Using all of these facts together, you can solve problem number 9.

7.1 Coupling from the Past

Lecture 4: Hashing and Streaming Algorithms

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

NP-Completeness Review

Lecture: Expanders, in theory and in practice (2 of 2)

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

CS261: Problem Set #3

1 More finite deterministic automata

Counting independent sets up to the tree threshold

Lecture 9: Low Rank Approximation

6.842 Randomness and Computation March 3, Lecture 8

Randomized Algorithms

Week 4. (1) 0 f ij u ij.

Disjoint-Set Forests

Ramanujan Graphs of Every Size

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.

Matching Polynomials of Graphs

Lecture 5: Random Walks and Markov Chain

Lecture 8: Linear Algebra Background

Probability inequalities 11

Lecture: Modeling graphs with electrical networks

Lecture 5: January 30

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Topics in Approximation Algorithms Solution for Homework 3

Kernel Density Estimation

Advanced topic: Space complexity

Lecture 7: Spectral Graph Theory II

Understanding Generalization Error: Bounds and Decompositions

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

k-protected VERTICES IN BINARY SEARCH TREES

Modern Discrete Probability Spectral Techniques

Sampling Good Motifs with Markov Chains

Introducing the Laplacian

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Coupling. 2/3/2010 and 2/5/2010

1 Matchings in Non-Bipartite Graphs

P, NP, NP-Complete, and NPhard

Lecture 2: September 8

WAITING FOR A BAT TO FLY BY (IN POLYNOMIAL TIME)

Notes on Logarithmic Lower Bounds in the Cell Probe Model

CS173 Lecture B, November 3, 2015

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Class Note #20. In today s class, the following four concepts were introduced: decision

CS151 Complexity Theory. Lecture 6 April 19, 2017

University of Chicago Autumn 2003 CS Markov Chain Monte Carlo Methods

Quick Sort Notes , Spring 2010

Price of Stability in Survivable Network Design

Evolutionary Dynamics on Graphs

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Lectures 2 3 : Wigner s semicircle law

Lecture 19: November 10

Lecture Space Bounded and Random Turing Machines and RL

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

Random walks, Markov chains, and how to analyse them

Lectures 6, 7 and part of 8

The Simplest Construction of Expanders

ORIE 6334 Spectral Graph Theory October 25, Lecture 18

CS Communication Complexity: Applications and New Directions

Data selection. Lower complexity bound for sorting

Random Lifts of Graphs

This section is an introduction to the basic themes of the course.

Dominating Set Counting in Graph Classes

20.1 2SAT. CS125 Lecture 20 Fall 2016

Lecture 3. 1 Terminology. 2 Non-Deterministic Space Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005.

Graph Theory. Thomas Bloom. February 6, 2015

Transcription:

Counting and Sampling Fall 07 Lecture 8: Path Technology Lecturer: Shayan Oveis Gharan October 0 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. 8. Dirichlet Form For two functions f, g define E(f, g) = (I K)f, g The operator (matrix) I K is also called the Laplacian matrix. Note that this is slightly different from usual Laplacian because I K is not a symmetric matrix. Next, we study quadratic forms of a Laplacian matrix. Fact 8.. E(f, g) = (f(x) f(y))(g(x) g(y))π(x)k(x, y). Proof. LHS = x (I K)f(x)g(x)π(x) = x = x (f(x) Kf(x))g(x)π(x) f(x)g(x)π(x) K(x, y)f(y)g(x)π(x) On the other hand, RHS = f(x)g(x)π(x)k(x, y) + f(y)g(y)π(y)k(y, x) f(y)g(x)k(x, y)π(x) Relabeling y with x we obtain the equality. f(x)g(y)k(y, x)π(y) Therefore, E(f, f) = (f(x) f(y)) π(x)k(x, y). E(f, f) is called the Dirichlet Form associated with f. Definition 8. (Variance). For a function f let, Var(f) = f E π [f]. where E π [f] is the average of f with respect to π. The following is immediate: 8-

8- Lecture 8: Path Technology Fact 8.3. For any function f, Var(f) = (f(x) f(y)) π(x)π(y). Proof. Var(f) = x π(x)f(x) E π [f] = π(x)f(x) + π(y)f(y) f(x)f(y)π(x)π(y) x y = (f(x) f(y)) π(x)π(y) Now, we are ready for the following important definition: Definition 8.4 (Poincaré Constant). For a (non-necessarily reversible) kernel K the Poincare constant is defined as E(f, f) min f Var(f), where the minimum is taken over all functions f with nonzero variance. Lemma 8.5. If K is reversible, then the Poincaré constant is equal to λ. Proof. We leave the proof of this as an exercise. It simply follows from the the variational characterization of eigenvalues. Now observe that when we make a chain lazy, we are working with the (I + K)/ as a Markov kernel. Since all eigenvalues of K are at least, (I + K)/ is a PSD operator, i.e., all eigenvalues are nonnegative. So, λ is equal to λ. Therefore, it follows from Theorem 7.6 that for a lazy (reversible) chain with Poincaré constant α, and any starting state x, ( τ x (ɛ) O log α ɛ π(x) ) There is a very different proof of this fact that holds for any not necessarily reversible chain. The proof goes by showing that for lazy Markov chain and any function f, Var(Kf) Var(f) E(f, f). From this one can deduce that Var(K t f) ( α) t Var(f) where α is the Poincaré constant. Therefore, for t = log( π(x) )/α we get that the chain mixes.

Lecture 8: Path Technology 8-3 8. Path Technology Next we discuss a new method to bound the Poincaré constant. Note that by Theorem 7.6 this gives a direct upper bound on the mixing time. Think of a multicommodity flow problem where we want to send π(x)π(y) unit of flow between each pair of states x, y. Note that these are supposed to be disjoint commodities. This means that if a unit x y flow goes on edge (u, v) from u to v and unit of x y flow goes on the same edge from v to u they don not cancel out each other. The capacity of each edge e = (u, v) is Q(e) = π(u)k(u, v) = π(v)k(v, u). Recall that Q(e) represents the flow of the probability mass along edge e at stationarity. Suppose we choose a path P between each pair of vertices x, y (note that more generally this may be a distribution of paths). For an edge e let flow of e be defined as follows: f(e) = π(x)π(y). :e P The congestion of e is defined as f(e) Q(e). In the following lemma we show that the inverse of the Poincaré is f(e) at most max P max e Q(e). Lemma 8.6. For any reversible Markov chain, suppose for every pair of states x, y Ω we choose a path P. α max f(e) e Q(e) max P. where α is the Poincaré constant. Proof. Consider an arbitrary function f. Note that for every path P we can choose an orientation of the edges of this say e + is the head and e is the tail such that f(x) f(y) = e P f(e + ) f(e ). We can write Var(f) = (f(x) f(y)) π(x)π(y) = f(e + ) f(e ) π(x)π(y) e P P (f(e + ) f(e )) π(x)π(y) e P = (f(x ) f(y )) Q(e) Q(e) e=(x,y ) where the inequality follows by Cauchy-Schwarz inequality. P π(x)π(y) max P max e :e P f(e) E(f, f). Q(e) Note that we typically let P be a shortest path from x to y. So, P is no more than the diameter. Since in all Markov chains that we constant diameter is only a polynomial in the size of the input, usually, the most important parameter to bound the mixing time is the maximum congestion. Let us use the above machinery to bound the mixing time of a simple lazy random walk on a path. Consider the a simple path of length n. The stationarity distribution is almost uniform. Observe that there is a unique path between each pair of vertices. So, the edge (n/, n/ + ) has the maximum congestion of about (n/)(n/) n n 4 n n.

8-4 Lecture 8: Path Technology Since diam(g) = n, we get α n. It follows that the chain mixes in O(n log(n)). Note that this bound is O(log n) off from the bound we proved using strong stationarity time. The reason is that here we are just upper bound the second eigenvalue of K and we use a very crude bound on all other eigenvalues (we are upper bound each λ i by λ ). So, in many applications of the Path technology this O(log(n)) loss in inherent, but usually it does not chain the mixing time significantly, because it is logarithmic in the size of the state space. Dumbell Graph Note that if we have a Markov chain with a diameter that is polynomial in the size of the state space, then we should expect a very slow mixing. This is essentially what happens in the path example. So, one can ask if we have a Markov chain with a logarithmic size diameter, can it still have a mixing time polynomial in the size of the instance? The answer is yes, and the famous Dumbell graph is perhaps the worst example. In this graph the single edge which is connecting the two cliques will be a bottleneck because all of the flow between the two cliques must go over this edge. Note that although we didn t prove, but the path technology gives a necessary and sufficient condition for bounding the Poincaré constant. Namely, if we can show that for any routing of the multicommodity flow there is an edge with large congestion, it would imply that the chain mixes very slowly. bottleneck 8.3 All or Nothing Theorem In this section we use the path technology to prove the all or nothing theorem in counting. Theorem 8.7. For any self-reducible problem, if there exists a polynomial time counting algorithm that gives + poly(n) multiplicative factor approximation, then there is an FPRAS. First, let us proof a weaker version of this theorem. Suppose we can get a constant factor α approximation for a counting problem, say coloring. So, for a given graph G if it has C possible coloring the algorithm returns a number in p(g) where C p(g) αc. α Now, we feed this algorithm k disjoint copies of G, say G k. Then, the number of solutions to this instance is of course C k. So, the algorithm must still return a number in p(g k where It follows that C k α p(g K ) αc k. C α /k p(g K ) /k α /k C.

Lecture 8: Path Technology 8-5 So, we can to + ɛ letting k = log(α)/ɛ. Next, we will do the general case. The above simple idea does not work if the approximation factor depends on n, because once we below up the instance the approximation factor deteriorates. Proof. Say we work with the matching counting problem as a template. Recall that we consider an ordering of all edges of G, say e,..., e m. We consider a tree where every node at level i corresponds to a (valid) configuration for the first i edges, whether they are forced to be in or out. For example for i =, we have two graph G where we delete the endpoints of e and G where we delete the edge e. For any node (graph) z of this tree, let N z denote the number of matchings of the graph corresponding to z or equivalently, the number of leaves of the subtree rooted at z, T z. We also assume that we have access to an oracle that for any such z returns an estimate Ñz where N z α Ñz αn z for some α = poly(n). For simplicity we assume that the counting oracle is deterministic. Our goal is to build a uniform sampler. We can then use the sampler to construct an FPRAS. For each edge (y, z) in the tree where y is the parent of z, we let w y,z = Ñz. Now, we run a random walk on this weighted tree starting from the root. That is at each vertex we choose an adjacent edge proportional to its weight. Note that the leafs of the tree correspond to matchings all of the have weight (we don t need to run the counting oracle on the leaves). Note that we may also move upward the tree. Here is the high-level intuition for this random walk: The downward moves correspond to the previous idea that we had for designing a sampling algorithm. Unfortunately, if the error in counting is large the multiplicative errors accumulate as we go down the tree, so we may end up selecting a matching with probability c m larger than the stationary distribution. Roughly speaking the upward edges in the Markov chain is to correct these incorrect estimates. The following three facts imply that the Markov chain can be used to generate a random matching Fact 8.8. The stationary distribution π is uniform over all leaves. This is simply because the weight of the single edge incident to every leaf is. Recall that in the stationary distribution of a random walk in an undirected weighted graph, the probability of every vertex is proportional to its (weighted) degree. This fact implies that if the chain mixes, and at stationarity we sample a state which happens to be a leaf, that leaf gives us a uniformly random matching. Next, we study the probability of sampling a leaf in the stationary distribution. Fact 8.9. The sum of the weight of leaves is at least O() O() αm, i.e., a sample X π, with probability αm leaf. Let N be the correct solution that we are trying to find, i.e., the tree has N leaves. It follows that π(x) = D = N D, x is a leaf where D is the sum of the degree of all nodes of the tree. Now observe that the sum of the edge weights of each level of the tree is at most α M(G). This is because every estimate returned by the approximate counter is within α factor of the number of leaves in the corresponding subtree. It follows that Therefore, sum of the probability of the leaves at stationarity is at least is a D αmn. (8.) N D αm.

8-6 Lecture 8: Path Technology The above fact shows that if we generate O(αm) samples of the chain at mixing, then with high probability we obtain a uniformly random leave, i.e., a uniformly random matching. So, to prove the theorem, it remains to bound the mixing time of the chain. We show that the chain mixes in O(α m log(αm)) steps. This implies that we can obtain a uniformly random matching by running the chain for O(α 3 m 3 log(αm)) steps which is a polynomial in m, n. To bound the mixing time we use the path technology. Note that there is a unique path between each pair of vertices of the tree. So, there is no choice in constructing the paths. Let us just calculate the congestion of every edge. Note that the longest path has length O(m). So, it is enough to show that the maximum congestion is at most O(α m). Fix an edge e = (z, z) where z is the parent of z. First observe that Ñ z Q(e) = π(z) D(z) = D(y) D Ñ z D(y) = Ñz D. It remains to upper bound f(e). Let T z be the tree rooted at z. Then, f(e) = π(x)π(y) π(x) = π(t z ). x T z,y / T z x T z Using a calculation similar to what we did before, π(t z ) D αmn z. This is because the sum of the edge weights of each level of T z is at most αn z. It follows that f(e) Q(e) αmn z Ñ z α m. Therefore, the inverse of the Poincaré constant is at most O(α m ). And, the chain mixes in O(α m log π(root)). But π(root) N/α D α m, where we used (8.).