Part 1: Les rendez-vous symétriques

Size: px

Start display at page:

Download "Part 1: Les rendez-vous symétriques"

Rodger Wiggins
5 years ago
Views:

1 Part 1: Les rendez-vous symétriques Cris Moore, Santa Fe Institute Joint work with Alex Russell, University of Connecticut

2 A walk in the park L AND NL-COMPLETENESS 313 there are n locations in a park you and I each call them 1,...,n but our labels diﬀer by a uniformly random permutation π at each step, we can each move wherever we like (a complete graph) our initial positions are uniformly random we can t signal or leave messages how can we minimize the expected time for us to rendezvous? FIGURE 8.6: This REACHABILITY problem is in L; just keep your hand on the right-hand wall. But what if the maze were not planar, or if it had loops? classic advice of keeping your hand on the right-hand wall. At the end of this chapter, we will also discuss the recent discovery that REACHABILITY for undirected graphs is in L. In general, however, it seems that to find your way through a directed graph you either need a guide, or an amount of memory significantly larger than the O(log n ) it takes to remember your current location. In the next section, we will derive an upper bound on how much memory you need to do it on your own. Exercise 8.3 Let ONE-OUT REACHABILITY be the variant of REACHABILITY on a directed graph G where each vertex has at most one outgoing edge. Show that ONE-OUT REACHABILITY is in L. Do you think that it is L-complete under some reasonable kind of reduction?

3 Wait for Mommy! It s optimal if you stay put and I visit 1,...,n (in any order) the expected time is (n+1)/2 at time t, you visit xt and I visit yt : then trend = min {t xt=π(yt)} no matter what, Prπ[xt=π(yt)]=1/n E[t rend ]= = 1X Pr[t rend > t ] t =0 1X Pr x 1 6= (y 1 ) ^ ^x t 6= (y t ) t =0 nx t =0 Å 1 = n t n ã union bound

4 Symmetric strategies what if we haven t agreed on who should wander, and who should wait? we must each follow the same randomized strategy e.g. if we both wander, visiting uniform locations, at each step we meet with probability 1/n these events are independent, so E[trend]=n can we do better? we need negative correlations: Pr x t +1 = (y t +1 ) x 1 6= (y 1 ) ^ ^ x t 6= (y t ) > 1 n

5 Anderson and Weber [1990] flip a biased coin, and either wait or wander in each round of n steps: with probability θ, choose a uniform location and stay there with probability 1 θ, choose a uniform permutation σ and visit σ(1),...,σ(n) optimize over θ

6 Analyzing Anderson and Weber if one of us wanders and the other waits, we meet in expected time n/2 if we both wander, we meet at the first t such that σx(t)=π(σy(t)) first fixed point (if any) of σx 1 πσy when n is large, the number q of fixed points of a random permutation is Poisson with mean 1 if there are q fixed points, we ll find the first one in expected time n q + 1 if q 1, we ll meet in expected time n 1 1/e 1X q=1 e 1 q! 1 q + 1 = n e 2 e 1

7 Analyzing Anderson and Weber putting all this together, we have E[t rend ]=2 (1 ) n 2 +(1 ) 2 e 2 e n + 2 (1 )2 + e (n + E[t rend ]) optimized at θ= , giving E[t rend ]=0.829n is this optimal? we don t know... but we can prove that there is a β > 1/2 such that for any symmetric strategy, E[t rend ] > n

8 A clever failure you and I independently choose random primes p,q = O( n) we choose random subsets of {1,...,n} of size p and q respectively, and cycle through them deterministically with high probability, p and q are distinct, so we visit all pq = O(n) pairs, and rendezvous with constant probability (birthdays!) the expected time for this to work is... wait for it... n the events that π hits each of these pq pairs are nearly independent... we failed to create correlations so that the conditional probability of meeting increases

9 Rendezvous and permanents the permanent of a matrix is the number of permutations in its support: XY perm A = i A i, (i ) let St {1,...,n} {1,...,n} be the set of pairs we visited in the first n steps let Jt be the matrix (J t ) ij = ( 0 if(i, j ) 2 S t 1 otherwise probability we haven t met yet = probability that π is in the support of Jt: E[t rend ]= 1X Pr[t rend > t ]= t =0 1X t =0 perm J t n!

10 How fast can we drive down the permanent? when you wait for mommy, perm Jt/n! = 1 t/n and the union bound is tight

11 How fast can we drive down the permanent? when you wait for mommy, perm Jt/n! = 1 t/n and the union bound is tight

12 How fast can we drive down the permanent? when you wait for mommy, perm Jt/n! = 1 t/n and the union bound is tight

13 How fast can we drive down the permanent? when you wait for mommy, perm Jt/n! = 1 t/n and the union bound is tight

14 How fast can we drive down the permanent? when you wait for mommy, perm Jt/n! = 1 t/n and the union bound is tight

15 How fast can we drive down the permanent? when you wait for mommy, perm Jt/n! = 1 t/n and the union bound is tight

16 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent

17 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent

18 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent

19 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent

20 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent

21 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent

22 How fast can we drive down the permanent? when we both wander, perm Jt/n! (1 1/n) t e t/n, and each step is independent after n steps, all we know is that π is a derangement: perm Jt/n! 1/e

23 Inclusion and exclusion an arrangement is a subset of St where no two are in the same row or column there are (n k)! permutations π that include a given arrangement of size k let Ak = number of arrangements of size k. The fraction of permutations that avoid all of them is perm J t n! = nx ( 1) k (n k )! A k n! k =0 = nx k =0 ( 1) k k! A k n k = 1 A 1 n + A 2 n(n 1) for instance, if t=n and St is the diagonal, then perm J t n! = nx k =0 ( 1) k k! A 3 n(n 1)(n 2) + A k = 1 e n k and

24 A lower bound cut inclusion-exclusion off at third order: perm J t n! where we used St t and A 1 n + A 2 n(n 1) S t n + A 2 n(n 1) Å 1 S t n + A 2 n 2 A 3 apple S t 2 A 2 3 A 3 n(n 1)(n 2) S t 2 1 3(n 2) ã t 3n hope: either St is small, or A2 is large (in expectation) then in either case, we do better than 1 t/n

25 Arrangements in space and time A2 = number of pairs of elements of St that aren t on the same row or column let T2 = number of pairs of times t < t at which both players visit distinct places let Xt,t and Yt,t be the indicator random variable for the events xt xt and yt yt X then T 2 = X t,t 0Y t,t 0 t <t 0 let Q t,t 0 = E[X t,t 0]=Pr[x t 6= x t 0]. Since players are identical and independent, E[T 2 ]= X t <t 0 Q 2 t,t 0 a crude bound on how much T2 overcounts A2: T 2 A 2 apple t 2 S t 2

26 Arrangements in space and time T2 = number of pairs of times t,t at which both players visit distinct places E[T 2 ]= X t <t 0 Q 2 t,t 0 and St E 2 apple X Ä 2Qt,t 0 t <t 0 Q 2 t,t 0 ä we can relate these with Cauchy-Schwarz, giving E[T 2 ] X Q 2 1 t,t 0 t t,t 0 2 t 2 1 r Ç X 1 E t,t 0 Q t,t 0! 2 St t 2 å 2 2 if St is large, then T2 is too

27 Putting it all together Combining all these bounds and writing E St =αt and t=τn gives E perm J t n! Å 1 p ã 2 Å ã 3 the union bound is E perm J t n! 1 for each τ, we optimize over α and take whichever one is better integrating over τ gives 1.0 E[t rend ]= n +o(n) where β is the area under the curve we get β=0.548 (vs )

28 What I think of this proof

29 A better parametrization? the Anderson-Weber strategy has two extremes: wander or stay put the probability we don t meet after t steps is either 1 t/n or e t/n (or 1). how can we show that we often have e t/n? lemma: let St =t and let τ=t/n. Let St have the property that two pairs (i,j), chosen uniformly without replacement, collide on a row or column with probability c. Then perm J t n! e c 2 2 cosh perm J X t ( 1) k A k k t proof: use = and A k 1 c for even k n! k! 2 k k =0 a good bound when c is small; when c=1, the union bound is tight n k goal: prove that it s one or the other, so that Anderson-Weber is optimal

30 Variations on a theme we can put a graph structure on the locations suppose you and I are walking on a cycle: our initial positions are uniform, and so are our orientations conjecture [Alpern]: the optimal strategy is to flip a coin to choose a direction, travel halfway around the cycle, then try again lattices? hypercubes? Cayley graphs?

31 Part 2: Spectral clustering for sparse graphs joint work with Florent Krzakala, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborova, and Pan Zhang

32 d to choose between this fit and the ructure. In the present case it turns on into high and low degrees gives the nd sothe it is this division that block the algo-model stochastic the degree-corrected blockmodel, by tion of edge probability with degree is the functional form of the likelihood, block structure for fitting to the true pparent that this behavior is not limk = 2. For K = 3, the ordinary del will, for sufficiently heterogeneous towards splitting into three groups by um, suppose and low and similarly for higheras follows: a graph G is generated of course possible that the true comself corresponds entirely mainly toof n/2 vertices each divide n vertices intoor two groups (b) With degree-correction d low degree, but we only want our structure if it(u,v), is still statistically sur- probability cin/n if they are in the same group, for each connect them with w about degree sequence, andare this and the probability cout/n if they in diﬀerent FIG. 2:groups Divisions of the political blog network found he corrected model does. (a) uncorrected and (b) corrected blockmodels. Th -world example we show in Fig. 2 an average degree is c=(c vertex is proportional to its degree and vertex co in+cout)/2 twork of political blogs assembled by inferred group membership. The division in (b) co e [31]. This network is composed of to the between liberal and conserv given just the graph, label the verticesroughly according todivision their group! l or group web diaries) about US polgiven in [31]. links between them, as captured on Thursday, June 19, blogs The have known political

Warning: overfitting! the most likely group assignment, or MAP (maximum a posteriori) estimate, maximizes the probability of the graph but this overfits! e.g. random 3-regular graphs have bisections with only 11% of the edges crossing the cut [Zdeborová & Boettcher] indeed, there are many such bisections, and they have nothing in common!

33 Warning: overfitting! the most likely group assignment, or MAP (maximum a posteriori) estimate, maximizes the probability of the graph but this overfits! e.g. random 3-regular graphs have bisections with only 11% of the edges crossing the cut [Zdeborová & Boettcher] indeed, there are many such bisections, and they have nothing in common! can we even distinguish G from an Erdős-Rényi graph?

34 A phase transition theorem: we can label the vertices better than chance if and only if c out c in > 2 p c conjectured by Decelle, Krzakala, Moore, and Zdeborová [2011] based on the cavity method N=100k, BP N=70k, MC N=128, MC N=128, full BP q=4, c=16 proved on the negative side by Mossel, Neeman, and Sly [2012] and on the positive side by Massoulié and MNS [2013] undetectable = c /c cout/cin E-R graph strong communities

35 Clustering nodes with eigenvalues take your favorite linear operator associated with a graph: adjacency matrix, graph Laplacian, stochastic transition matrix if there are 2 groups, label nodes according to the sign of the 2nd eigenvector if there are k groups, look at the first k eigenvectors, and use your favorite clustering algorithm in R k

36 When does this work? using tools from random matrix theory, we can compute the typical spectrum of the adjacency matrix of a graph generated by the stochastic block model the bulk follows the Wigner semicircle law λ 2 λ 1 ρ(λ) -2 c 0 2 c λ the communities are detectable as long as λ2 lies outside this bulk......it crosses at the detectability transition! [Nadakuditi and Newman, Phys. Rev. Lett. 2012]

37 But in the sparse case... if v has degree d, applying A 2 has d ways to return to v thus A has an eigenvector with an eigenvalue at least d these localized eigenvalues deviate from the semicircle law: informative eigenvectors get lost in the bulk

38 The non-backtracking operator B is a walk on directed edges of the network, with backtracking prohibited: prevents paths returning to a high-degree vertex it appears that the bulk of B s spectrum is confined to a disk of radius c, and spectral clustering with B works down to the detectability transition

40 Comparing with standard spectral methods Non backtracking Modularity Random Walk Adjacency Laplacian BP n=10 5, c=3 Non backtrac Modularity Random Walk Adjacency Laplacian BP Overlap Overlap c in c out Fig. 5. The accuracy of spectral algorithms based on different linear operators, and of belief propagation, f [Krzakala, Moore, Mossel, Neeman, Sly, Zdeborová, Zhang, PNAS 2013] while fixing the average degree c =3; the detectability transition given by [1] occurs at c in c out =2 p c; the detectability transition is at c Each point is averaged over 20 instances with n =10 5. Ou

41 A sharp boundary for the bulk? conjecture: let G=G(n,c/n). For any ε>0, with high probability, except for the leading eigenvalue λ1 c, all eigenvalues of B have absolute value at most c+ε. a simple argument shows that almost all eigenvalues are in this disk: since eigenvalues are at most singular values in absolute value, X 2r apple tr B r (B r ) T since G is locally treelike, for small r, each diagonal entry (B r (B r ) T )vv = number of vertices r steps from v. In expectation this is c r, so moments are bounded: E 2r apple c r but this only limits the number of bad eigenvalues to n α for some α=α(c)<1 proving the conjecture requires r~diam(g), far beyond treelike neighborhoods

42 A more compact form for a graph with n vertices and m edges, B is a 2m 2m matrix but except for ±1 eigenvalues, its spectrum is the same as a 2n 2n matrix, B 0 = Ç å 0 D 1 1 A think of a directed edge with a head and a tail: each neighbor of the head gains a head (A), the tail becomes an antihead (-1), and the head becomes d 1 tails (D-1).

43 A more compact form B 0 = Ç å 0 D 1 1 A quadratic eigenvalue equation: B 0 Ç u v å = Ç u v å ) Ä A D ( 2 1)1 ä v = 0 B has an eigenvalue λ if and only if λa D has an eigenvalue λ 2 1 the spectral density of λa D is the fixed point of a recursive distribution [Bordenave and Lelarge; Khorunzhy, Shcherbina, and Vengerovsky; Saade, Krzakala, Zdeborova] but proving zero density doesn t prove a gap...

Scott Aaronson, MIT This is, simply put, the best-written book on the theory of computation I

44 Shameless Plug To put it bluntly: this book rocks! It somehow manages to combine the fun of a popular book with the intellectual heft of a textbook. Scott Aaronson, MIT This is, simply put, the best-written book on the theory of computation I have ever read; one of the best-written mathematical books I have ever read, period. Cosma Shalizi, Carnegie Mellon

The non-backtracking operator

The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley: