Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Size: px

Start display at page:

Download "Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting"

Joan Hodges
6 years ago
Views:

1 Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity of Illiois at Urbaa-Champaig jxu18@illiois.edu Abstract We provide the proofs for the theorems i the mai paper. 1 Proofs for Plated Clusterig I this sectio, Theorems 1 6 refer to the theorems i the mai paper. Equatios are umbered cotiuously from the mai paper. We let 1 := r ad 2 := r be the umbers of oisolated ad isolated odes, respectively. 1.1 Proof of Theorem 1 The proof relies o iformatio theoretical argumets ad the Fao s iequaliy [4]. We use D (Ber(p Ber(q to deote the L divergece betwee two Beroulli distributios with mea p ad q. We first state a upper boud o D (Ber(p Ber(q, which is used later i the proof: D (Ber(p Ber(q = p log p q + (1 p log 1 p 1 q (a p p q + (1 p q p (p q2 = q 1 q q(1 q, (16 where (a follows from the iequality log x x 1, x 0. Let P (Y,A be the joit distributio of Y ad A whe Y is sampled from Y uiformly at radom ad A is geerated accordig to the plated clusterig model. Because the supremum is lower bouded by the average, we have if Ŷ sup P [Ŷ ] Y if P (Y,A [Ŷ ] Y. (17 Y Y Ŷ Let H(X be the etropy of a radom variable X ad I(X; Z the mutual iformatio betwee X ad Z. By Fao s iequality, we have for ay Ŷ, P (Y,A(Ŷ Y 1 I(Y ; A + 1. (18 log Y Simple coutig gives that Y = ( 1! 1 r!(!. Note that ( r 1 ( 1 1 ad ( e! e ( e. It follows that Y (/ ( 1 /e 1 ( e r(r/e r e r r/2 (/e e(r. r 1

2 This implies log Y log uder the assumptio that 8 /2 ad 32. O the other had, ote that H(A ( 2 H(A12 because the A ij s are idetically distributed by symmetry. Furthermore,the A ij s are idepedet coditioed o Y, so H(A Y = ( 2 H(A12 Y12. It follows that I(Y ; A = H(A H(A Y ( 2 I(Y 12 ; A 12. We boud I(Y12 ; A 12 below. Observe that ( 2 ( ( P(Y12 2 r+ 1 (r 1! = 1 = = α := Y, ad thus P(A 12 = 1 = β := αp + (1 αq. It follows that I(Y 12; A 12 = αd (Ber(p Ber(β + (1 αd (Ber(q Ber(β (a (p β2 α β(1 β = α(1 α(p q2 β(1 β + (1 αq (q β2 β(1 β (b α(p q2 q(1 q, where (a follows from (16 ad (b follows because β(1 β αp(1 p + (1 αq(1 q due to the cocavity of x(1 x. Hece we have I(Y ; A 1( 1(p q 2 2q(1 q. Combiig with (18, we obtai P (Y,A(Y Y 1 1 ( 1(p q 2 q(1 q log 3/4 ( 1(p q2 q(1 q log, (19 where the last iequality holds because 1 log 8 whe /2 ad 32. The RHS above is at least 1/2 whe the first coditio (1 i the theorem holds. Substitutig ito (17 proves sufficiecy of (1. We ow tur to the third coditio (3. Sice p > q, we have β p by the defiitio of β. Moreover, β αp ad 1 β = 1 q α(p q (1 α(1 q. It follows that I(Y12; A 12 = αp log p (1 p + α(1 p log + (1 αd (Ber(q Ber(β β 1 β αp log 1 (q β2 + (1 α α β(1 β = αp log 1 α + α2 (1 α(p q 2 β(1 β αp log 1 α + α(p q2 p(1 q αp log e α. where the first iequality follows from p/β 1/α, 1 β 1 p ad (16, the secod iequality follows from β(1 β α(1 αp(1 q, ad the last iequality follows from p q p(1 q. By the defiitio of α, we have α ( 1 ( 1 e whe 8. It follows that I(Y 12 ; A 12 αp log e2, ad thus I(Y ; A 1 2 1p log e2 1. By equatio (19, if p 16, i.e., the coditio (3 i the theorem holds, the P(Y Y 1/2. It remais to prove the sufficiecy of the secod coditio (3. Let M = ad Ȳ = {Y 0, Y 1,..., Y M} be a subset of Y with cardiality M + 1, which is specified later. Let P (Y,A deote the joit distributio of Y ad A whe Y is sampled from Ȳ uiformly at radom ad the A is geerated accordig to the plated clusterig model. By Fao s iequality, we have if Ŷ sup P [Ŷ ] Y if P (Y,A [Ŷ ] Y if Y Y Ŷ Ŷ { 1 I(Y ; A + 1 log Ŷ }. (20 2

3 We costruct Ȳ as follows. Let Y 0 be the clusterig matrix such that the clusters {C l } r l=1 are give by C l = {(l 1 + 1,..., l}. Iformally, each Y i with i 1 is obtaied from Y 0 by swappig two odes i two differet clusters. More specifically, for each i [ M], (i if the ( + i-th ode belogs to cluster C l for some l, the Y i has the first right cluster as {1, 2,..., 1, + i} ad the l-th right cluster as D l \ { + i} {}, ad all the other clusters idetical to Y 0 ; (ii if the ( +i-th ode is a isolated ode i Y 0, the Y i has the first right cluster as {1, 2,..., 1, +i} ad ode as a isolated ode, ad all the other clusters idetical to Y 0. Let P i be the distributio of the graph A coditioed o Y = Y i. Note that each P i is a product of 1 2( 1 Beroulli distributios. We have the followig chai of iequalities: I(Y ; A (a 1 (M M i,i =0 max D (P i P i i,i =0,...,M D (P i P i (b 3D (Ber(p Ber(q + 3D (Ber(q Ber(p (c 3(p q 2 1 mi{p(1 p, q(1 q} where (a follows from the covexity of L divergece, (b follows by our costructio of {Y i }, ad (c follows from (16. If (2 holds, the I(Y ; A 1 4 log( = 1 4 log Ȳ. Sice log( log(/2 4 if 128, it follows from (20 that the miimax error probability is at least 1/ Proof of Theorem 2 Let X, Y := Tr(X Y deote the ier product betwee two matrices. Assume that p > q first. For ay feasible solutio Y Y of (4, we defie (Y := A, Y Y. To prove the theorem, it suffices to show that (Y > 0 for all feasible Y with Y Y. For simplicity, i this proof we use a differet covetio that Y ii = 0 ad Y ii = 0 for all i V. Note that (Y = E[A], Y Y + A E[A], Y Y, (21 where E[A] = q11 +(p qy qi, 1 is the all oe vector i R ad I is the idetity matrix. Let d(y := Y, Y Y ; sice i,j Y ij = i,j Y ij, we have O the other had, observe that A E[A], Y Y = 2 E[A], Y Y = (p qd(y. (22 (i<j: Y ij =1 Y ij =0 (A ij p 2 } {{ } T 1 (Y (i<j: Y ij =0 Y ij =1 (A ij q. } {{ } T 2 (Y Here T 1 (Y (T 2 (Y, resp. is the sum of 1 2d(Y i.i.d. cetered Beroulli radom variables with parameter p (q, resp.. Let δ 1 = (p q/(2p ad δ 2 = (p q/(2q. By the Berstei iequality, we have for each fix Y Y, { P T 1 (Y δ } 1 d(y p 2 { } P T 2 (Y δ 2 d(y q 2 ( exp ( exp δ 2 1 4(1 p + 4δ 1 /3 δ 2 2 4(1 q + 4δ 2 /3 (a d(y p exp ( (b d(y q exp ( (p q2 20p(1 q d(y (p q2 20p(1 q d(y,, 3

4 where (a ad (b hold because p > q ad p q p(1 q. It follows from the uio boud that { 1 P 2 A E[A], Y Y = T 1 (Y T 2 (Y 1 } (p q2 (p qd(y 2 exp ( 2 20p(1 q d(y. This implies (p q2 P { (Y 0} 2 exp ( 20p(1 q d(y (23 i view of (21 ad (22. This boud holds for each fixed Y Y. We proceed to boud the probability of the evet { Y Y : Y Y, (Y 0}. Note that 2( 1 d(y r 2 for ay feasible Y Y, where the lower boud is achieved by swappig a ode i V 1 with a ode i V 2, ad the upper boud follows from i,j Y ij r2. The key step is to upper-boud the cardiality of the set {Y Y : d(y = t} for each t. This is doe i the followig combiatorial lemma. Lemma 1.1. For each t [2( 1, r 2 ], we have {Y Y : d(y = t} 25t2 2 20t/. (24 We prove the lemma i Sectio Combiig the lemma with (23 ad the uio boud, we obtai 2 2 P { Y Y : Y Y, (Y 0} (a 50 r 2 t=2 2 r 2 t=2 2 r 2 t=2 2 r 2 t=2 2 P { Y Y : d(y = t, (Y 0} ( { Y Y : d(y = t} exp (p q2 t 20p(1 q 25t 2 ( 2 20t/ exp (p q2 t 20p(1 q 2 5t/ 50r , where (a follows from the assumptio that (p q 2 C p(1 q log for a large costat C. This meas Y is the uique optimal solutio with high probability. A similar argumet applies to the case with q > p. This completes the proof of the theorem Proof of Lemma 1.1 Recall that C 1,..., C r are the true clusters associated with Y. Fix a Y Y with d(y = t. The cluster matrix Y defies a ew ordered partitio (C 1,..., C r+1 of V accordig to the followig procedure. 1. Let C r+1 := {i : Y ij = 0, j}. 4

5 2. The odes i V \C r+1 ca be further partitioed ito r ew clusters of size, where odes i ad j are i the same cluster if ad oly if Y ij = 1; we defie a orderig C 1,..., C r of these r ew clusters as follows. (a For each ew cluster C, if there exists a k [r] such that C Ck > /2, the we label this ew cluster as C k ; this label is uique because the cluster size is. (b The remaiig clusters are labeled arbitrarily. This ew partitio has the followig properties: (A0 (C 1,..., C r, C r+1 is a partitio of V, ad C k = for all k [r]. (A1 For every k [r], either C k Ck > /2, or C k C k /2 for all k [r]; (A2 We have r C k C r+1 2 Ck C r+1 + Ck C k C k C k = t. k=1 Here, Properties (A0 ad (A1 are direct cosequeces of how we label the ew clusters, ad Property (A2 follows from the followig equalities t = d(y = {(i, j V V : Yij = 1, Y ij = 0} r = {(i, j : (i, j Ck C k, i j, Y ij = 0} = k=1 r {(i, j : (i, j Ck C k, i j, (i, j C r+1 C r+1 } k=1 + r k=1 {(i, j : (i, j C k C k, (i, j C k C k }. Sice these properties are satisfied by ay Y with d(y = t, we have {Y Y : d(y = t} {(C 1,..., C r, C r+1 : it has properties (A0 (A2}. (25 To prove the lemma, it suffices to upper boud the right had side of (25. Fix a ordered partitio C := (C 1,..., C r, C r+1 with properties (A0 (A2. We cosider the first true cluster, C 1. Defie m 1 := k [r+1],k 1 C k C 1, which is the umber of odes i C 1 that are misclassified by C. We cosider two cases. If C 1 C1 > /10, the C1 C k C1 C k 2 C1 C 1 k [r+1] k 1 C 1 C k > m 1 /5. If C 1 C 1 /10, the by coditio (A1 we must have C k C 1 /2 for all 1 k r 5

6 ad m 1 > 9/10. Hece, =m k r C 1 C k C 1 C k + C 1 C r+1 2 C 1 C r+1 I {k 1}I {k 1} C k C 1 C k C 1 + C 1 C r+1 2 C 1 C r+1 m m m 1. C k C 1 2 C 1 C r+1 We coclude that we always have C 1 C k C 1 C k + C 1 C r+1 2 C 1 C r m 1. The above iequality holds if we replace C1 by C k ad m 1 by m k (defied similarly for each k [r]. It follows that r C k C r+1 2 Ck C r+1 + Ck C k C k C k r m k. 5 k=1 Thus by Property (A2, we have k [r] m k 5t/, i.e., the total umber of misclassified odes i V 1 is upper bouded by 5t/. This meas that the total umber of misclassified odes i V 2 is also upper bouded by 5t/ because by our cluster size costrait, oe misclassified ode i V 2 must produce oe misclassified ode i V 1. Therefore, the total umber of misclassified odes i V 1 ca at most take 5t/ differet values ad the same is true for the total umber of misclassified odes i V 2. For a fixed umber of misclassified odes i V 1 ad V 2, there are at most 5t/ 1 5t/ 2 differet ways to choose these misclassified odes. Each misclassified ode i V 1 ca be assiged to oe of r 1 differet clusters or leave isolated, ad each misclassified ode i V 2 ca be assiged to oe of r differet clusters. Hece, the right had side of (25 is upper bouded by 25t 2 5t/ 2 1 5t/ 2 r 10t/ 25t2 20t/, which proves the lemma Proof of Theorem 3 The proof uses several matrix orms. The spectral orm X of a matrix X is the largest sigular value of X. The uclear orm X is the sum of sigular values. We also eed the L 1 orm X 1 = i,j X ij ad the L orm X = max i,j X ij. Let X, Y = Tr(X Y deote the ier product betwee two matrices. For a vector x, x 2 is the usual Euclidea orm. Defie (Y = Y Y, A. It suffices to show that (Y > 0 for all feasible solutio Y of the program (6 (8 with Y Y. Rewrite (Y as (Y = Ā, Y Y + A Ā, Y Y = Ā, Y Y + λ W, Y Y, (26 k=1 6

7 where Ā = q11 + (p qy ad W := (A Ā/λ. Because i,j Y ij = r 2 = i,j Y ij ad Y ij [0, 1], the first term i the RHS of (26 satisfies Ā, Y Y = (p q Y, Y Y = p q 2 Y Y 1. We ow cotrol the secod term i (26. Note that Var[A ij ] p(1 q. Defie σ 2 := max{p(1 q, q(1 p}. Note that Ā E[A] 1. By assumptio, σ2 C log / for a costat C. We eed the followig boud, which is proved i Sectio to follow. Lemma 1.2. Uder the otatio above, if σ 2 C log / for a costat C, the there exists a costat C such that with high probability A E[A] C σ 2 log + q(1 q. It follows that with high probability A Ā A E[A] + Ā E[A] C σ 2 log + q(1 q for a uiversal costat C. Defie λ := C σ log + q(1 q ad the ormalized oise matrix. Thus w.h.p. W 1. Let u k be the ormalized characteristic vector of cluster C k, i.e., u k(i = 1/ if ode i is i cluster C k ad u k(i = 0 otherwise. Let U = [u 1,..., u r ]. The Y = UU is the sigular value decompositio of Y. Defie the projectios P T (M = UU M + MUU UU MUU ad P T (M = M P T (M. Because P T (W W 1 w.h.p., UU + P T (W is a subgradiet of X at X = Y. Sice Y = 1, it follows that for ay feasible Y, 0 Y Y UU + P T (W, Y Y. Substitutig ito (26, we obtai that for ay feasible Y, (Y p q 2 Y Y 1 + λ P T (W UU, Y Y ( p q λ UU P T (λw Y Y 1 2 ( p q = λ 2 P T (λw Y Y 1, (27 where the last equality holds because UU = 1/. We proceed by boudig the term P T (λw i (27. From the defiitio of P T, we have P T (λw UU (λw + (λw UU + UU (λw UU 3 UU (λw. Assume ode i belogs to cluster k. The (UU (λw ij = (u k u k (λw ij = (1/ i C k (λw i j, which is the average of idepedet radom variables. By Berstei s iequality (Theorem A.1 we have with probability at least 1 4, i Ck 6σ 2 log + 2 log C 1 σ log, for some costat C 1, where the last iequality follows from the assumptio (9. It follows from the uio boud over all (i, j that that P T (λw 3 UU (λw 3C 1 σ log / with probability at least 1 2. Substitutig back to (27, we coclude that with probability at least 1 1, ( p q (Y λ 2 3C 1σ log / Y Y 1 > 0 for all feasible Y Y, where the last iequality follows from the assumptio (9. 7

8 1.3.1 Proof of Lemma 1.2 Let R := support(y ad P R ( : R R be the operator which sets the etries outside of R to be zero. Let B 1 = P R (A E[A] ad B 2 = A E[A] B 1. The B 1 is a block-diagoal matrix with r blocks of size ad has etries with variace bouded by σ 2. Applyig the matrix Berstei iequality [5], we get that with high probability, B 1 c 1 σ 2 log for a costat c 1. O the other had, B 2 has etries with variace bouded by max{q(1 q, c 2 log /} for a costat c 2. By Lemma 1.3 below, we obtai that B 2 c 3 max{ q(1 q, log } for a costat c 3. It follows that A E[A] B 1 + B 2 c 1 σ 2 log + c 3 max{ q(1 q, log } which completes the proof of the lemma. C σ 2 log + q(1 q, Lemma 1.3. Let M deote the symmetric matrix such that M ij (1 i < j are idepedet radom variables with P(M ij = 1 p ij = p ij ad P(M ij = p ij = 1 p ij, ad M ii = 0. Suppose that Var[M ij ] σ 2 with σ 2 C log / for a costat C, the with high probability M Cσ for a costat C. Proof. If σ 2 log7, the Theorem 8.4 i [1] implies that M 3σ w.h.p. If C log σ 2 log 7, the Lemma 2 i [2] implies that M Cσ w.h.p. for some uiversal costat C. 1.4 Proof of Theorem 4 We first claim that (p q c 2 p + q implies (p q c2 2q uder the assumptio that /2 ad q c 1 log. I fact, if p q, the the claim trivially holds. If p > q, the q < p/ p/2. It follows that p/2 < (p q c 2 p + q c2 2p. Thus, p < 8c 2 2 which cotradicts the assumptio that p > q c 1 log. Therefore, p > q caot hold. Hece, it suffices to show that if (p q c 2 2q, the Y is ot a optimal solutio of the covex program (6 (8. We do this by showig that the optimality of Y implies (p q > c 2 2q. Let I be the all-oe matrix. Let R := support(y ad A := support(a. Recall that Y = UU is the sigular value decompositio of Y, ad the orthogoal projectio oto the space T is give by P T (M = UU M + MUU UU MUU. Cosider the Lagragia L(Y ; λ, µ, F, G := A, Y + λ ( Y Y + η ( I, Y r 2 F, Y + G, Y I, where λ 0, η R, F ij 0 ad G ij 0, i, j are the Lagragia multipliers. Note that Y = r2 I 2 is strictly feasible so strog duality holds by Slater s Theorem. By stadard covex aalysis, if Y = Y is a optimal solutio, the there must exists some F, G ad λ for which the followig 8

9 T coditios hold: 0 L(Y ; λ, µ, F, G Y Y =Y, F ij 0, (i, j, G ij 0, (i, j, λ 0, F ij = 0, (i, j R, G ij = 0, (i, j R c. } Statioary coditio Dual feasibility } Complemetary slackess Recall that M R is a subgradiet of X at X = Y if ad oly if P T (M = UU ad P T (M 1. Let H = F G; the T coditios imply that there exist some λ, η, W ad H obeyig ( A λ UU + W ηi + H = 0, (28 λ 0, (29 P T W = 0, (30 W 1, (31 H ij 0, (i, j R, (32 H ij 0, (i, j R c. (33 Now observe that UU W UU = 0 by (30. We left ad right multiply (28 by UU to obtai Ā λuu ηi + H = 0, where for ay matrix X R, X := UU XUU is the matrix obtaied by takig the average i each block of X. Cosider the last display equatio o etries i R ad R c respectively. Applyig the Berstei iequality (Theorem A.1 for each etry of Ā ij, we get that with high probability, p λ η + H ij c 3 p(1 p log q(1 q log q η + H ij c 3 c 4 log c 4 log 2 2 (a ɛ 0, (i, j R (34 8 (b ɛ 0 8, (i, j Rc (35 for some costats c 3, c 4 > 0, where (a ad (b follow from the assumptio c 1 log with c1 sufficietly large. Usig (32 ad (33, we get q ɛ 0 8 q c 3 q(1 q log c 4 log 2 2 η p + c 3 p(1 p log + c 4 log 2 2 λ p + ɛ 0 8 λ. (36 It follows that λ (p q + c 3 ( p(1 p log + q(1 q log + c 4 log { } c 4 log 4 max (p q, c 3 p(1 p log, c3 q(1 q log,. (37 c 1 9

10 O the other had, (31, (30 ad (28 imply λ 2 = λ(uu + W 2 1 λ(uu + W 2 F = 1 A ηi + H 2 F 1 A R c ηi R c + H R c 2 F 1 (1 η 2 A ij, (i,j R c where X R c deotes that matrix obtaied from X by settig the etries outside R c to zero. Usig η ɛ 0, which is a cosequece of (36, (29 ad the assumptio p 1 ɛ 0, we obtai λ ɛ2 0 (i,j R c A ij,. (38 Note that (i,j R c A ij equals two times the sum of ( ( 2 r 2 i.i.d. Beroulli radom variables with parameter q. By the Cheroff boud of Biomial distributios ad the assumptio that q c 1 log, we kow with high probability (i<j R c A ij Cq 2 for some costat C. It follows from (38 that λ 2 C q for some costat C > 0. Combiig with (37 ad the assumptio that q c 1 log, we coclude that (p q C q for some costat C > 0. This completes the proof of the theorem. 1.5 Proof of Theorem 5 The degree d i of ode i is distributed as Bi( 1, p plus a idepedet Bi(, q if i V 1. Otherwise d i is distributed as Bi( 1, q if i V 2. It follows that E[d i ] = ( 1q +( 1(p q if i V 1 ad E[d i ] = ( 1q if i V 2. If we defie σ 2 = p(1 q + q(1 q, the we further have Var[d i ] σ 2. Set t := ( 1 p q /2 σ 2 ; the Berstei iequality (Theorem A.1 gives ( t 2 P{ d i E[d i ] t} 2 exp 2σ 2 2 exp ( ( 12 (p q 2 + 2t/3 12σ 2 2 2, where the last iequality follows from assumptio (12. By the uio boud, with probability at least 1 2 1, d i > (p q 2 + q for all odes i V 1 ad d i < (p q 2 + q for all odes i V 2. Therefore, all odes i V 2 are correctly declared to be isolated with high probability. The umber of commo eighbors S ij betwee the odes i ad j is distributed as Bi( 2, p 2 plus a idepedet Bi(, q 2 if i ad j are i the same cluster, ad it is distributed as Bi(2( 1, pq plus a idepedet Bi( 2, q 2 if i ad j are i differet clusters. Hece, E[S ij ] equals ( 2p 2 + ( q 2 if i ad j are i the same cluster ad 2( 1pq + ( 2q 2 otherwise. The differece i mea equals (p q 2 2p(p q. Let σ 2 = 2p 2 (1 q 2 +q 2 (1 q 2. The Var[S ij ] σ 2. Set t := (p q 2 /3 σ 2. Assumptio (13 implies that t > 2p(p q. Applyig the Berstei iequality (Theorem A.1, we obtai ( P{ S ij E[S ij ] t t 2 } 2 exp 2σ 2 2 exp ( 2 (p q 4 + 2t/3 27σ 2 2 3, where the last iequality follows from the assumptio (13. By uio boud, with probability at least 1 2 1, S ij > (p q pq + q 2 for all odes i, j from the same cluster ad S ij < (p q pq + q 2 for all odes i, j from two differet clusters. Therefore the simple algorithm returs the true clusters w.h.p. 10

11 1.6 Proof of Theorem 6 For simplicity we assume ad 2 are eve umbers. We partitio V 1 ito two equal-sized subsets V 1+ ad V 1 such that half of the odes of each cluster are i V 1+. Similarly, V 2 is partitioed ito two equal-sized subsets V 2+ ad V 2. To prove the theorem, we eed the followig aticocetratio iequality. Theorem 1.4 (Theorem i [3]. Let X 1,..., X be idepedet radom variables such that 0 X i 1 for all i. Suppose X = i=1 X i ad σ 2 := i=1 Var[X i] 200. The for all 0 t σ 2 /100 ad some uiversal costat c > 0, we have P [X E[X] + t] ce t2 /(3σ 2. Idetifyig isolated odes For each ode i i V 1+ V 2+, let d i+ be the umber of its eighbors i V 1+ V 2+ ad d i be the umber of its eighbors i V 1 V 2, so d i = d i+ + d i. We cosider two cases. Case 1: Suppose (p + ( q log 1 q log 2. I this case we have ( 1 2 (p q 2 2c 2 (p+q log 1 by (14. For each i V 1+, d i is distributed as Bi(/2, p plus a idepedet Bi(( /2, q. Let t := ( 1(p q + 2, γ 1 := E[d i ] t = q/2 + (p q/2 t ad σd 2 := Var[d i ] = 1 2 p(1 p ( q(1 q. Sice /2, p, q are bouded away from 1 ad p + q p 2 + q 2 c 1 log, we have σd 2 c log 200. Combiig wit (14, we further have σd 4 c 2 (p q 2 / log 1 c log t 2. We ca thus apply Theorem 1.4 ad get ( P [d i γ 1 ] c exp ( t2 (( 1(p q σd 2 = exp c1 c, 3(p(1 p + ( q(1 q for some costat c > 0 that ca be made small by choosig c 2 above sufficietly small. Let i := arg mi i V1+ d i. Sice the radom variables {d i : i V 1+ } are mutually idepedet, we have P [d i γ 1 ] = ( P [d i γ 1 ] (1 c1 c 1/2 exp c 1 c 1 /2 1/4. i V 1+ O the other had, for each i V 1+, d i+ is distributed as Bi(/2 1, p plus a idepedet Bi(( /2, q. Sice the media of Bi(, p is at most p + 1, we kow that with probability at least 1/2, d i+ γ 2 := q/2 + (p q/2 p + 2. Now observe that the two sets of radom variables {d i+, i V 1+ } ad {d i, i V 1+ } are idepedet of each other, so d i+ is idepedet of i for each i V 1+. It follows that P [d i + γ 2 ] = i V 1+ P [d i+ γ 2 i = i] P [i = i] = i V 1+ P [d i+ γ 2 ] P [i = i] 1 2. Combiig the last two display equatios by the uio boud, we obtai that with probability at least 1/4, d i = d i + d i + γ 1 + γ 2 = ( 1q, ad thus ode i will be icorrectly declared as a isolated ode. Case 2: Suppose (p + q log 1 q log 2. I this case we have ( 1 2 (p q 2 2c 2 q log 2 by assumptio. Defie i = arg max i V2+ d i. Usig the same argumet as i Case 1, we ca show that with probability at least 1/4, d i ( 1q + ( 1(p q ad thus ode i will icorrectly declared as a o-isolated ode. 11

12 Recoverig clusters For two odes i, j V 1, let S ij+ be the umber of their commo eighbors i V 1+ V 2+ ad S ij be the umber of their commo eighbors i V 1 V 2, so S ij+ = S ij+ +S ij. For each pair of odes i, j i V 1+ that are from the same cluster, S ij is distributed as Bi(/2, p 2 plus a idepedet Bi(( /2, q 2. Let t := (p q 2 + 4, γ 3 := E[S ij ] t = q 2 /2+(p 2 q 2 /2 t, ad σs 2 := Var[S ij ] = 1 2 p2 (1 p ( q2 (1 q 2. Sice /2, p, q are bouded away from 1 ad p 2 + q 2 c 1 log, we have that σs ad σ2 S 100t. Theorem 1.4 implies that there exists a costat c > 0 such that ( P [S ij γ 3 ] c exp ( t 2 ((p q = c exp 3(p 2 (1 p 2 + ( q 2 (1 q 2 c c 1, 3σ 2 S where the costat c > 0 ca be made sufficietly small by choosig c 2 sufficietly small i the statemet of the lemma. Without loss of geerality, we may re-label the odes such that V 1+ = {1, 2,..., 1 /2} ad for each k = 1,..., 1 /4, the odes 2k 1 ad 2k are i the same cluster. Note that the radom variables {S (2k 12k : k = 1, 2,..., 1 /4} are mutually idepedet. Let i = arg mi k=1,2,...,1 /4 S (2k 12k ad j = i + 1; it follows that P [S i j γ 3 ] (1 c c 1 1/4 exp( c 1 c 1 /4 1/4. O the other had, sice S ij+ is distributed as Bi(/2 2, p 2 plus a idepedet Bi(( /2, q 2, we use the media argumet to obtai that with probability at least 1/2, S ij+ γ 4 := q 2 /2 + (p 2 q 2 /2 2p Because {S ij+, i, j V 1+ } oly depeds o the edges betwee V 1+ ad V 1+ V 2+, ad (i, j oly depeds o the edges betwee V 1+ ad V 1 V 2, we kow {S ij+, i, j V 1+ } ad (i, j are idepedet of each other. It follows that S i j + γ 4 with probability at least 1/2. It follows that with probability at least 1/4, S i j = S i j + S i j + γ 3 + γ 4 = 2( 1pq + ( 2q 2 ad thus the odes i, j will be icorrectly assiged to two differet clusters. Appedices A Stadard Berstei Iequality Theorem A.1. Let X 1,..., X be idepedet radom variables such that X i M almost surely. Let σ 2 = i=1 Var(X i, the for ay t 0 [ ] ( t 2 P X i t exp 2σ Mt. i=1 A cosequet of the above iequality is P [ i=1 X i 2σ 2 u + 2Mu 3 ] e u for ay u > 0. 12

13 Refereces [1] S. Chattergee. Matrix estimatio by uiversal sigular value thresholdig. arxiv: , [2] L. Massoulié ad D.-C. Tomozei. Distributed user profilig via spectral methods. arxiv: , [3] J. Matoušek ad J. Vodrák. The probabilistic method, lecture otes. kam.mff.cui. cz/ matousek/ prob-l-2pp.ps.gz, [4]. Rohe, S. Chatterjee, ad B. Yu. Spectral clusterig ad the high-dimesioal stochastic blockmodel. The Aals of Statistics, 39(4, [5] J. Tropp. User-friedly tail bouds for sums of radom matrices. Foudatios of Computatioal Mathematics, 12(4: ,

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space