Spectral Partitioning in the Planted Partition Model

Similar documents
Lecture: Some Statistical Inference Issues (3 of 3)

5.1 Review of Singular Value Decomposition (SVD)

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Spectral Partitiong in a Stochastic Block Model

Notes for Lecture 11

Lecture 3: August 31

Math 2784 (or 2794W) University of Connecticut

Infinite Sequences and Series

Lecture 9: Expanders Part 2, Extractors

Fastest mixing Markov chain on a path

Analysis of Algorithms. Introduction. Contents

Lecture 9: Hierarchy Theorems

Math 778S Spectral Graph Theory Handout #3: Eigenvalues of Adjacency Matrix

Lecture 14: Graph Entropy

Lecture 2. The Lovász Local Lemma

18.S096: Homework Problem Set 1 (revised)

Problem Set 2 Solutions

A class of spectral bounds for Max k-cut

Posted-Price, Sealed-Bid Auctions

The minimum value and the L 1 norm of the Dirichlet kernel

CMSE 820: Math. Foundations of Data Sci.

Chapter 10: Power Series

Stochastic Matrices in a Finite Field

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

4.3 Growth Rates of Solutions to Recurrences

Application to Random Graphs

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l

Lecture 19: Convergence

Lesson 10: Limits and Continuity

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Optimally Sparse SVMs

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

c 2006 Society for Industrial and Applied Mathematics

Statistical Pattern Recognition

Introduction to Computational Molecular Biology. Gibbs Sampling

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

4.1 Sigma Notation and Riemann Sums

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

6.3 Testing Series With Positive Terms

Lecture 12: February 28

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Spectral Graph Theory and its Applications. Lillian Dai Oct. 20, 2004

MA131 - Analysis 1. Workbook 3 Sequences II

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

MA131 - Analysis 1. Workbook 2 Sequences I

Rates of Convergence by Moduli of Continuity

On Random Line Segments in the Unit Square

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

General IxJ Contingency Tables

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

AN INTRODUCTION TO SPECTRAL GRAPH THEORY

Area As A Limit & Sigma Notation

Entropy Rates and Asymptotic Equipartition

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Sequences. Notation. Convergence of a Sequence

4.1 SIGMA NOTATION AND RIEMANN SUMS

CS 332: Algorithms. Linear-Time Sorting. Order statistics. Slide credit: David Luebke (Virginia)

Lecture 2 Clustering Part II

Some special clique problems

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

In algebra one spends much time finding common denominators and thus simplifying rational expressions. For example:

Zeros of Polynomials

Algorithm Analysis. Chapter 3

Self-normalized deviation inequalities with application to t-statistic

1 Review and Overview

Math 104: Homework 2 solutions

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

Balanced coloring of bipartite graphs

Lecture 2 Long paths in random graphs

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Estimation of a population proportion March 23,

6.046 Recitation 5: Binary Search Trees Bill Thies, Fall 2004 Outline

Chapter 2 The Solution of Numerical Algebraic and Transcendental Equations

On Algorithm for the Minimum Spanning Trees Problem with Diameter Bounded Below

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

1 Review of Probability & Statistics

Rademacher Complexity

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Lecture 2: April 3, 2013

Lecture 10 October Minimaxity and least favorable prior sequences

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

SPECTRAL THEOREM AND APPLICATIONS

Chapter 5.4 Practice Problems

1 Covariance Estimation

Alliance Partition Number in Graphs

A note on log-concave random graphs

Machine Learning for Data Science (CS 4786)

Polynomial Functions and Their Graphs

Chimica Inorganica 3

Design and Analysis of ALGORITHM (Topic 2)

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Comparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Kinetics of Complex Reactions

Design and Analysis of Algorithms

Transcription:

Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of spectral partitioig algorithms i the plated partitio model. I this model, we build a radom graph that has a atural partitio. The simplest model of this form is for the graph bisectio problem. This is the problem of partitioig the vertices of a graph ito two equal-sized sets while miimizig the umber of edges bridgig the sets. To create a istace of the plated bisectio problem, we first choose a parititio of the vertices ito equal-sized sets V 1 ad V 2. Whe the choose probabilities p > q, ad place edges betwee vertices with the followig probabilities: p if u V 1 ad v V 1 Pr [(u, v) E] = p if u V 2 ad v V 2 q otherwise. The expected umber of edges crossig betwee V 1 ad V 2 will be q V 1 V 2. If p is sufficietly larger tha q, the every other bisectio will have more crossig edges. I this lecture, we will show that this partitio ca be recovered from the secod eigevector of the adjacecy matrix of the graph. This will be a crude versio of a aalysis of McSherry [McS01] There have bee may aalyses of graph partitioig algorithms uder plated partitio models such as this. The model is motivated by the idea that vertices (or geeral items) belog to certai categories, ad that vertices i the same categories are more likely to be coected. Such models also arise i the aalysis of clusterig algorithms. However, it is ot clear that these models represet practice very well. 21.2 The Perturbatio Approach As log as we do t tell our algorithm, we ca choose V 1 = {1,..., /2} ad V 2 = {/2 + 1,..., }. Let s do this for simplicity. 21-1

Lecture 21: November 11, 2009 21-2 Defie the matrix p p q q.. [ ] M = p p q q pj/2 qj q q p p = /2, qj /2 pj /2.. q q p p where we write J /2 for the square all-1s matrix of size /2. The adjacecy matrix of the plated partitio graph is obtaied by settig A(i, j) = 1 with probability M(i, j), subject to A(i, j) = A(j, i). So, this is a radom graph, but the probabilities of some edges are differet from others. We will study a very simple algorithm for fidig a approximatio of the plated bisectio: compute v 2, the eigevector of the secod-largest eigevalue of A. The, set S = {i : v 2 (i) < 0}. We guess that S is oe of the sets i the bisectio. We will show that uder reasoable coditios o p ad q, S will be mostly right. Ituitively, the reaso is that A is a slight perturbatio of M, ad so the eigevectors of A should look like the eigevectors of M. For that to make sese, I should have said what the eigevectors M look like. costat vectors are eigevectors of M. We have M1 = (p + q)1. 2 Of course, the The secod eigevector of M has two values: oe o V 1 ad oe o V 2. Let s be careful to make this a uit vector. We take { 1 i V 1 w 2 (i) = 1 i V 2. The, Mv 2 = 2 (p q)w 2. So, the secod-largest eigevalue of M is (/2)(p q). As M has rak 2, all the other eigevalues of M are zero. Now, let R = A M. For (u, v) i the same compoet, ad for (u, v) i differet compoets, Pr [R(u, v) = 1 p] = p Pr [R(u, v) = p] = 1 p, Pr [R(u, v) = 1 q] = q Pr [R(u, v) = q] = 1 q. As i the last lecture, we ca boud the probability that the orm of R is large. While I m ot yet sure if we ca boud it usig the same techique, we ca appeal to a result of Vu [Vu07, Theorem 1.4], which implies the followig. ad ad

Lecture 21: November 11, 2009 21-3 Theorem 21.2.1. There exist costats c 1 ad c 2 such that with probability approachig 1, R 2 p + c 1 (p) 1/4 l, provided that p c 2 l 4. We apply the followig corollary. Corollary 21.2.2. There exists a costat c 0 such that with probability approachig 1, R 3 p, provided that p c 0 l 4. I fact, Krivelevich ad Vu [?, Theorem??] prove that the probability that the orm of R exceeds this value by more tha t is expoetially small i t. However, we will ot eed that fact for this lecture. Igorig the details of the asymptotics, let s just assume that R is small, ad ivestigate the cosequeces. 21.3 Perturbatio Theory for Eigevectors Let α 1 α 2 α be the eigevalues of A, ad let µ 1 > µ 2 > µ 3 = = µ be the eigevalues of M. We kow from a problem set that α i µ i R. I particular, if the ad, assumig q > p/3, we have R < (p q), 4 4 (p q) < α 2 < 3 (p q) 4 α 1 > 3 (p q). 4 So, we ca view α 2 as a perturbatio of µ 2. The atural questio is whether we ca view w 2 as a perturbatio of v 2. Here is the theory that says we ca.

Lecture 21: November 11, 2009 21-4 Theorem 21.3.1. Let A ad M be symmetric matrices. Let R = M A. Let α 1 α be the eigevalues of A with correspodig eigevectors v 1,..., v ad let Let µ 1 µ be the eigevalues of M with correspodig eigevectors w 1,..., w. Let θ i be the agle betwee v i ad w i. The, 2 R si θ i mi j i α i α j, ad si θ i 2 R mi j i µ i µ j. We remark that this boud may be tighteed slightly, essetially elimiatig the 2. We defer the proof of this theorem for a few miutes, ad first see what it implies. 21.4 Partitioig Cosider δ = v 2 w 2. For every vertex i that is mis-classified by v 2, we have δ(i) 1. So, if v 2 mis-classifies k vertices, the k δ. As w ad v are uit vectors, we may apply the crude iequality (the 2 disappears as θ 2 gets small). δ 2 si θ 2 To combie this with the perturbatio boud, we assume q > p/3, ad fid Assumig that R 3 p, we fid mi µ 2 µ j = (p q). j 2 2 si θ 2 3 p 2 (p q) = 6 p. (p q) So, the umber k of misclassified vertices satisfies k 6 p, (p q) which implies k 36p (p q) 2.

Lecture 21: November 11, 2009 21-5 So, if p ad q are both costats, we expect to misclassify at most a costat umber of vertices. If p = 1/2, ad q = p 12/, the we get 36p (p q) 2 = 8, so we expect to mis-classify at most a costat fractio of the vertices. 21.5 Proof of Eigevector Perturbatio Proof of Theorem 21.3.1. By cosiderig the matrices M λ i I ad A µ i I istead of M ad A, we ca assume that µ i = 0. As the theorem is vacuous if µ i has multiplicity more tha 1, we may also assume that µ i has multiplicity 1 as a eigevalue, ad that w i is a uit vector i the ullspace of M. Our assumptio that µ i = 0 also leads to λ i R. Expad v i i the eigebasis of M, as v i = j c j w j, where c j = w T j v i. Settig we may compute δ = mi j i µ j, Mv i 2 = j c 2 jµ 2 j j i c 2 jδ 2 = δ 2 j i c 2 j = δ 2 (1 c 2 i ) = δ 2 si 2 θ i. O the other had, So, Mv i Av i + Rv i = λ i + Rv i 2 R. si θ i 2 R. δ

Lecture 21: November 11, 2009 21-6 It may seem surprisig that the amout by which eigevectors move depeds upo how close their respective eigevalues are to the other eigevalues. However, this depedece is ecessary. To see why, first cosider a matrix with a repeated eigevalue, such as [ ] 1 0 A =. 0 1 Now, let v be ay uit vector, ad cosider B = A + ɛvv T. The matrix B will have v as a eigevector of eigevalue 1 + ɛ as well as a eigevalue of 1. So, by makig a arbitrarily small perturbatio, we were able to select which eigevalue of B was largest. To make this effect clearer, let w be ay other uit vector, ad cosider the matrix C = A + ɛww T. So, w is the eigevector of C of eigevalue (1 + ɛ), ad the other eigevalue is 1. O the other had, C B ɛww T + ɛww T = 2ɛ. So, while B ad C differ very little, their domiat eigevectors ca be completely differet. This is because the eigevalues were close together. 21.6 Improvig the Partitio If I get a chace, I ll describe how oe improves such a partitio i practice, ad how McSherry did it i theory. I ll begi by observig that the aalysis we performed is very pessimistic. It relies o a upper boud o w 2 v 2. But, v 2 was produced by a radom process. So, it seems ulikely that all of its weight would be cocetrated o a few vertices. Refereces [McS01] F. McSherry. Spectral partitioig of radom graphs. I FOCS 01: Proceedigs of the 42d IEEE symposium o Foudatios of Computer Sciece, page 529, Washigto, DC, USA, 2001. IEEE Computer Society. [Vu07] Va Vu. Spectral orm of radom matrices. Combiatorica, 27(6):721 736, 2007.