Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Size: px
Start display at page:

Download "Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting"

Transcription

1 Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity of Illiois at Urbaa-Champaig jxu18@illiois.edu Abstract We provide the proofs for the theorems i the mai paper. 1 Proofs for Plated Clusterig I this sectio, Theorems 1 6 refer to the theorems i the mai paper. Equatios are umbered cotiuously from the mai paper. We let 1 := r ad 2 := r be the umbers of oisolated ad isolated odes, respectively. 1.1 Proof of Theorem 1 The proof relies o iformatio theoretical argumets ad the Fao s iequaliy [4]. We use D (Ber(p Ber(q to deote the L divergece betwee two Beroulli distributios with mea p ad q. We first state a upper boud o D (Ber(p Ber(q, which is used later i the proof: D (Ber(p Ber(q = p log p q + (1 p log 1 p 1 q (a p p q + (1 p q p (p q2 = q 1 q q(1 q, (16 where (a follows from the iequality log x x 1, x 0. Let P (Y,A be the joit distributio of Y ad A whe Y is sampled from Y uiformly at radom ad A is geerated accordig to the plated clusterig model. Because the supremum is lower bouded by the average, we have if Ŷ sup P [Ŷ ] Y if P (Y,A [Ŷ ] Y. (17 Y Y Ŷ Let H(X be the etropy of a radom variable X ad I(X; Z the mutual iformatio betwee X ad Z. By Fao s iequality, we have for ay Ŷ, P (Y,A(Ŷ Y 1 I(Y ; A + 1. (18 log Y Simple coutig gives that Y = ( 1! 1 r!(!. Note that ( r 1 ( 1 1 ad ( e! e ( e. It follows that Y (/ ( 1 /e 1 ( e r(r/e r e r r/2 (/e e(r. r 1

2 This implies log Y log uder the assumptio that 8 /2 ad 32. O the other had, ote that H(A ( 2 H(A12 because the A ij s are idetically distributed by symmetry. Furthermore,the A ij s are idepedet coditioed o Y, so H(A Y = ( 2 H(A12 Y12. It follows that I(Y ; A = H(A H(A Y ( 2 I(Y 12 ; A 12. We boud I(Y12 ; A 12 below. Observe that ( 2 ( ( P(Y12 2 r+ 1 (r 1! = 1 = = α := Y, ad thus P(A 12 = 1 = β := αp + (1 αq. It follows that I(Y 12; A 12 = αd (Ber(p Ber(β + (1 αd (Ber(q Ber(β (a (p β2 α β(1 β = α(1 α(p q2 β(1 β + (1 αq (q β2 β(1 β (b α(p q2 q(1 q, where (a follows from (16 ad (b follows because β(1 β αp(1 p + (1 αq(1 q due to the cocavity of x(1 x. Hece we have I(Y ; A 1( 1(p q 2 2q(1 q. Combiig with (18, we obtai P (Y,A(Y Y 1 1 ( 1(p q 2 q(1 q log 3/4 ( 1(p q2 q(1 q log, (19 where the last iequality holds because 1 log 8 whe /2 ad 32. The RHS above is at least 1/2 whe the first coditio (1 i the theorem holds. Substitutig ito (17 proves sufficiecy of (1. We ow tur to the third coditio (3. Sice p > q, we have β p by the defiitio of β. Moreover, β αp ad 1 β = 1 q α(p q (1 α(1 q. It follows that I(Y12; A 12 = αp log p (1 p + α(1 p log + (1 αd (Ber(q Ber(β β 1 β αp log 1 (q β2 + (1 α α β(1 β = αp log 1 α + α2 (1 α(p q 2 β(1 β αp log 1 α + α(p q2 p(1 q αp log e α. where the first iequality follows from p/β 1/α, 1 β 1 p ad (16, the secod iequality follows from β(1 β α(1 αp(1 q, ad the last iequality follows from p q p(1 q. By the defiitio of α, we have α ( 1 ( 1 e whe 8. It follows that I(Y 12 ; A 12 αp log e2, ad thus I(Y ; A 1 2 1p log e2 1. By equatio (19, if p 16, i.e., the coditio (3 i the theorem holds, the P(Y Y 1/2. It remais to prove the sufficiecy of the secod coditio (3. Let M = ad Ȳ = {Y 0, Y 1,..., Y M} be a subset of Y with cardiality M + 1, which is specified later. Let P (Y,A deote the joit distributio of Y ad A whe Y is sampled from Ȳ uiformly at radom ad the A is geerated accordig to the plated clusterig model. By Fao s iequality, we have if Ŷ sup P [Ŷ ] Y if P (Y,A [Ŷ ] Y if Y Y Ŷ Ŷ { 1 I(Y ; A + 1 log Ŷ }. (20 2

3 We costruct Ȳ as follows. Let Y 0 be the clusterig matrix such that the clusters {C l } r l=1 are give by C l = {(l 1 + 1,..., l}. Iformally, each Y i with i 1 is obtaied from Y 0 by swappig two odes i two differet clusters. More specifically, for each i [ M], (i if the ( + i-th ode belogs to cluster C l for some l, the Y i has the first right cluster as {1, 2,..., 1, + i} ad the l-th right cluster as D l \ { + i} {}, ad all the other clusters idetical to Y 0 ; (ii if the ( +i-th ode is a isolated ode i Y 0, the Y i has the first right cluster as {1, 2,..., 1, +i} ad ode as a isolated ode, ad all the other clusters idetical to Y 0. Let P i be the distributio of the graph A coditioed o Y = Y i. Note that each P i is a product of 1 2( 1 Beroulli distributios. We have the followig chai of iequalities: I(Y ; A (a 1 (M M i,i =0 max D (P i P i i,i =0,...,M D (P i P i (b 3D (Ber(p Ber(q + 3D (Ber(q Ber(p (c 3(p q 2 1 mi{p(1 p, q(1 q} where (a follows from the covexity of L divergece, (b follows by our costructio of {Y i }, ad (c follows from (16. If (2 holds, the I(Y ; A 1 4 log( = 1 4 log Ȳ. Sice log( log(/2 4 if 128, it follows from (20 that the miimax error probability is at least 1/ Proof of Theorem 2 Let X, Y := Tr(X Y deote the ier product betwee two matrices. Assume that p > q first. For ay feasible solutio Y Y of (4, we defie (Y := A, Y Y. To prove the theorem, it suffices to show that (Y > 0 for all feasible Y with Y Y. For simplicity, i this proof we use a differet covetio that Y ii = 0 ad Y ii = 0 for all i V. Note that (Y = E[A], Y Y + A E[A], Y Y, (21 where E[A] = q11 +(p qy qi, 1 is the all oe vector i R ad I is the idetity matrix. Let d(y := Y, Y Y ; sice i,j Y ij = i,j Y ij, we have O the other had, observe that A E[A], Y Y = 2 E[A], Y Y = (p qd(y. (22 (i<j: Y ij =1 Y ij =0 (A ij p 2 } {{ } T 1 (Y (i<j: Y ij =0 Y ij =1 (A ij q. } {{ } T 2 (Y Here T 1 (Y (T 2 (Y, resp. is the sum of 1 2d(Y i.i.d. cetered Beroulli radom variables with parameter p (q, resp.. Let δ 1 = (p q/(2p ad δ 2 = (p q/(2q. By the Berstei iequality, we have for each fix Y Y, { P T 1 (Y δ } 1 d(y p 2 { } P T 2 (Y δ 2 d(y q 2 ( exp ( exp δ 2 1 4(1 p + 4δ 1 /3 δ 2 2 4(1 q + 4δ 2 /3 (a d(y p exp ( (b d(y q exp ( (p q2 20p(1 q d(y (p q2 20p(1 q d(y,, 3

4 where (a ad (b hold because p > q ad p q p(1 q. It follows from the uio boud that { 1 P 2 A E[A], Y Y = T 1 (Y T 2 (Y 1 } (p q2 (p qd(y 2 exp ( 2 20p(1 q d(y. This implies (p q2 P { (Y 0} 2 exp ( 20p(1 q d(y (23 i view of (21 ad (22. This boud holds for each fixed Y Y. We proceed to boud the probability of the evet { Y Y : Y Y, (Y 0}. Note that 2( 1 d(y r 2 for ay feasible Y Y, where the lower boud is achieved by swappig a ode i V 1 with a ode i V 2, ad the upper boud follows from i,j Y ij r2. The key step is to upper-boud the cardiality of the set {Y Y : d(y = t} for each t. This is doe i the followig combiatorial lemma. Lemma 1.1. For each t [2( 1, r 2 ], we have {Y Y : d(y = t} 25t2 2 20t/. (24 We prove the lemma i Sectio Combiig the lemma with (23 ad the uio boud, we obtai 2 2 P { Y Y : Y Y, (Y 0} (a 50 r 2 t=2 2 r 2 t=2 2 r 2 t=2 2 r 2 t=2 2 P { Y Y : d(y = t, (Y 0} ( { Y Y : d(y = t} exp (p q2 t 20p(1 q 25t 2 ( 2 20t/ exp (p q2 t 20p(1 q 2 5t/ 50r , where (a follows from the assumptio that (p q 2 C p(1 q log for a large costat C. This meas Y is the uique optimal solutio with high probability. A similar argumet applies to the case with q > p. This completes the proof of the theorem Proof of Lemma 1.1 Recall that C 1,..., C r are the true clusters associated with Y. Fix a Y Y with d(y = t. The cluster matrix Y defies a ew ordered partitio (C 1,..., C r+1 of V accordig to the followig procedure. 1. Let C r+1 := {i : Y ij = 0, j}. 4

5 2. The odes i V \C r+1 ca be further partitioed ito r ew clusters of size, where odes i ad j are i the same cluster if ad oly if Y ij = 1; we defie a orderig C 1,..., C r of these r ew clusters as follows. (a For each ew cluster C, if there exists a k [r] such that C Ck > /2, the we label this ew cluster as C k ; this label is uique because the cluster size is. (b The remaiig clusters are labeled arbitrarily. This ew partitio has the followig properties: (A0 (C 1,..., C r, C r+1 is a partitio of V, ad C k = for all k [r]. (A1 For every k [r], either C k Ck > /2, or C k C k /2 for all k [r]; (A2 We have r C k C r+1 2 Ck C r+1 + Ck C k C k C k = t. k=1 Here, Properties (A0 ad (A1 are direct cosequeces of how we label the ew clusters, ad Property (A2 follows from the followig equalities t = d(y = {(i, j V V : Yij = 1, Y ij = 0} r = {(i, j : (i, j Ck C k, i j, Y ij = 0} = k=1 r {(i, j : (i, j Ck C k, i j, (i, j C r+1 C r+1 } k=1 + r k=1 {(i, j : (i, j C k C k, (i, j C k C k }. Sice these properties are satisfied by ay Y with d(y = t, we have {Y Y : d(y = t} {(C 1,..., C r, C r+1 : it has properties (A0 (A2}. (25 To prove the lemma, it suffices to upper boud the right had side of (25. Fix a ordered partitio C := (C 1,..., C r, C r+1 with properties (A0 (A2. We cosider the first true cluster, C 1. Defie m 1 := k [r+1],k 1 C k C 1, which is the umber of odes i C 1 that are misclassified by C. We cosider two cases. If C 1 C1 > /10, the C1 C k C1 C k 2 C1 C 1 k [r+1] k 1 C 1 C k > m 1 /5. If C 1 C 1 /10, the by coditio (A1 we must have C k C 1 /2 for all 1 k r 5

6 ad m 1 > 9/10. Hece, =m k r C 1 C k C 1 C k + C 1 C r+1 2 C 1 C r+1 I {k 1}I {k 1} C k C 1 C k C 1 + C 1 C r+1 2 C 1 C r+1 m m m 1. C k C 1 2 C 1 C r+1 We coclude that we always have C 1 C k C 1 C k + C 1 C r+1 2 C 1 C r m 1. The above iequality holds if we replace C1 by C k ad m 1 by m k (defied similarly for each k [r]. It follows that r C k C r+1 2 Ck C r+1 + Ck C k C k C k r m k. 5 k=1 Thus by Property (A2, we have k [r] m k 5t/, i.e., the total umber of misclassified odes i V 1 is upper bouded by 5t/. This meas that the total umber of misclassified odes i V 2 is also upper bouded by 5t/ because by our cluster size costrait, oe misclassified ode i V 2 must produce oe misclassified ode i V 1. Therefore, the total umber of misclassified odes i V 1 ca at most take 5t/ differet values ad the same is true for the total umber of misclassified odes i V 2. For a fixed umber of misclassified odes i V 1 ad V 2, there are at most 5t/ 1 5t/ 2 differet ways to choose these misclassified odes. Each misclassified ode i V 1 ca be assiged to oe of r 1 differet clusters or leave isolated, ad each misclassified ode i V 2 ca be assiged to oe of r differet clusters. Hece, the right had side of (25 is upper bouded by 25t 2 5t/ 2 1 5t/ 2 r 10t/ 25t2 20t/, which proves the lemma Proof of Theorem 3 The proof uses several matrix orms. The spectral orm X of a matrix X is the largest sigular value of X. The uclear orm X is the sum of sigular values. We also eed the L 1 orm X 1 = i,j X ij ad the L orm X = max i,j X ij. Let X, Y = Tr(X Y deote the ier product betwee two matrices. For a vector x, x 2 is the usual Euclidea orm. Defie (Y = Y Y, A. It suffices to show that (Y > 0 for all feasible solutio Y of the program (6 (8 with Y Y. Rewrite (Y as (Y = Ā, Y Y + A Ā, Y Y = Ā, Y Y + λ W, Y Y, (26 k=1 6

7 where Ā = q11 + (p qy ad W := (A Ā/λ. Because i,j Y ij = r 2 = i,j Y ij ad Y ij [0, 1], the first term i the RHS of (26 satisfies Ā, Y Y = (p q Y, Y Y = p q 2 Y Y 1. We ow cotrol the secod term i (26. Note that Var[A ij ] p(1 q. Defie σ 2 := max{p(1 q, q(1 p}. Note that Ā E[A] 1. By assumptio, σ2 C log / for a costat C. We eed the followig boud, which is proved i Sectio to follow. Lemma 1.2. Uder the otatio above, if σ 2 C log / for a costat C, the there exists a costat C such that with high probability A E[A] C σ 2 log + q(1 q. It follows that with high probability A Ā A E[A] + Ā E[A] C σ 2 log + q(1 q for a uiversal costat C. Defie λ := C σ log + q(1 q ad the ormalized oise matrix. Thus w.h.p. W 1. Let u k be the ormalized characteristic vector of cluster C k, i.e., u k(i = 1/ if ode i is i cluster C k ad u k(i = 0 otherwise. Let U = [u 1,..., u r ]. The Y = UU is the sigular value decompositio of Y. Defie the projectios P T (M = UU M + MUU UU MUU ad P T (M = M P T (M. Because P T (W W 1 w.h.p., UU + P T (W is a subgradiet of X at X = Y. Sice Y = 1, it follows that for ay feasible Y, 0 Y Y UU + P T (W, Y Y. Substitutig ito (26, we obtai that for ay feasible Y, (Y p q 2 Y Y 1 + λ P T (W UU, Y Y ( p q λ UU P T (λw Y Y 1 2 ( p q = λ 2 P T (λw Y Y 1, (27 where the last equality holds because UU = 1/. We proceed by boudig the term P T (λw i (27. From the defiitio of P T, we have P T (λw UU (λw + (λw UU + UU (λw UU 3 UU (λw. Assume ode i belogs to cluster k. The (UU (λw ij = (u k u k (λw ij = (1/ i C k (λw i j, which is the average of idepedet radom variables. By Berstei s iequality (Theorem A.1 we have with probability at least 1 4, i Ck 6σ 2 log + 2 log C 1 σ log, for some costat C 1, where the last iequality follows from the assumptio (9. It follows from the uio boud over all (i, j that that P T (λw 3 UU (λw 3C 1 σ log / with probability at least 1 2. Substitutig back to (27, we coclude that with probability at least 1 1, ( p q (Y λ 2 3C 1σ log / Y Y 1 > 0 for all feasible Y Y, where the last iequality follows from the assumptio (9. 7

8 1.3.1 Proof of Lemma 1.2 Let R := support(y ad P R ( : R R be the operator which sets the etries outside of R to be zero. Let B 1 = P R (A E[A] ad B 2 = A E[A] B 1. The B 1 is a block-diagoal matrix with r blocks of size ad has etries with variace bouded by σ 2. Applyig the matrix Berstei iequality [5], we get that with high probability, B 1 c 1 σ 2 log for a costat c 1. O the other had, B 2 has etries with variace bouded by max{q(1 q, c 2 log /} for a costat c 2. By Lemma 1.3 below, we obtai that B 2 c 3 max{ q(1 q, log } for a costat c 3. It follows that A E[A] B 1 + B 2 c 1 σ 2 log + c 3 max{ q(1 q, log } which completes the proof of the lemma. C σ 2 log + q(1 q, Lemma 1.3. Let M deote the symmetric matrix such that M ij (1 i < j are idepedet radom variables with P(M ij = 1 p ij = p ij ad P(M ij = p ij = 1 p ij, ad M ii = 0. Suppose that Var[M ij ] σ 2 with σ 2 C log / for a costat C, the with high probability M Cσ for a costat C. Proof. If σ 2 log7, the Theorem 8.4 i [1] implies that M 3σ w.h.p. If C log σ 2 log 7, the Lemma 2 i [2] implies that M Cσ w.h.p. for some uiversal costat C. 1.4 Proof of Theorem 4 We first claim that (p q c 2 p + q implies (p q c2 2q uder the assumptio that /2 ad q c 1 log. I fact, if p q, the the claim trivially holds. If p > q, the q < p/ p/2. It follows that p/2 < (p q c 2 p + q c2 2p. Thus, p < 8c 2 2 which cotradicts the assumptio that p > q c 1 log. Therefore, p > q caot hold. Hece, it suffices to show that if (p q c 2 2q, the Y is ot a optimal solutio of the covex program (6 (8. We do this by showig that the optimality of Y implies (p q > c 2 2q. Let I be the all-oe matrix. Let R := support(y ad A := support(a. Recall that Y = UU is the sigular value decompositio of Y, ad the orthogoal projectio oto the space T is give by P T (M = UU M + MUU UU MUU. Cosider the Lagragia L(Y ; λ, µ, F, G := A, Y + λ ( Y Y + η ( I, Y r 2 F, Y + G, Y I, where λ 0, η R, F ij 0 ad G ij 0, i, j are the Lagragia multipliers. Note that Y = r2 I 2 is strictly feasible so strog duality holds by Slater s Theorem. By stadard covex aalysis, if Y = Y is a optimal solutio, the there must exists some F, G ad λ for which the followig 8

9 T coditios hold: 0 L(Y ; λ, µ, F, G Y Y =Y, F ij 0, (i, j, G ij 0, (i, j, λ 0, F ij = 0, (i, j R, G ij = 0, (i, j R c. } Statioary coditio Dual feasibility } Complemetary slackess Recall that M R is a subgradiet of X at X = Y if ad oly if P T (M = UU ad P T (M 1. Let H = F G; the T coditios imply that there exist some λ, η, W ad H obeyig ( A λ UU + W ηi + H = 0, (28 λ 0, (29 P T W = 0, (30 W 1, (31 H ij 0, (i, j R, (32 H ij 0, (i, j R c. (33 Now observe that UU W UU = 0 by (30. We left ad right multiply (28 by UU to obtai Ā λuu ηi + H = 0, where for ay matrix X R, X := UU XUU is the matrix obtaied by takig the average i each block of X. Cosider the last display equatio o etries i R ad R c respectively. Applyig the Berstei iequality (Theorem A.1 for each etry of Ā ij, we get that with high probability, p λ η + H ij c 3 p(1 p log q(1 q log q η + H ij c 3 c 4 log c 4 log 2 2 (a ɛ 0, (i, j R (34 8 (b ɛ 0 8, (i, j Rc (35 for some costats c 3, c 4 > 0, where (a ad (b follow from the assumptio c 1 log with c1 sufficietly large. Usig (32 ad (33, we get q ɛ 0 8 q c 3 q(1 q log c 4 log 2 2 η p + c 3 p(1 p log + c 4 log 2 2 λ p + ɛ 0 8 λ. (36 It follows that λ (p q + c 3 ( p(1 p log + q(1 q log + c 4 log { } c 4 log 4 max (p q, c 3 p(1 p log, c3 q(1 q log,. (37 c 1 9

10 O the other had, (31, (30 ad (28 imply λ 2 = λ(uu + W 2 1 λ(uu + W 2 F = 1 A ηi + H 2 F 1 A R c ηi R c + H R c 2 F 1 (1 η 2 A ij, (i,j R c where X R c deotes that matrix obtaied from X by settig the etries outside R c to zero. Usig η ɛ 0, which is a cosequece of (36, (29 ad the assumptio p 1 ɛ 0, we obtai λ ɛ2 0 (i,j R c A ij,. (38 Note that (i,j R c A ij equals two times the sum of ( ( 2 r 2 i.i.d. Beroulli radom variables with parameter q. By the Cheroff boud of Biomial distributios ad the assumptio that q c 1 log, we kow with high probability (i<j R c A ij Cq 2 for some costat C. It follows from (38 that λ 2 C q for some costat C > 0. Combiig with (37 ad the assumptio that q c 1 log, we coclude that (p q C q for some costat C > 0. This completes the proof of the theorem. 1.5 Proof of Theorem 5 The degree d i of ode i is distributed as Bi( 1, p plus a idepedet Bi(, q if i V 1. Otherwise d i is distributed as Bi( 1, q if i V 2. It follows that E[d i ] = ( 1q +( 1(p q if i V 1 ad E[d i ] = ( 1q if i V 2. If we defie σ 2 = p(1 q + q(1 q, the we further have Var[d i ] σ 2. Set t := ( 1 p q /2 σ 2 ; the Berstei iequality (Theorem A.1 gives ( t 2 P{ d i E[d i ] t} 2 exp 2σ 2 2 exp ( ( 12 (p q 2 + 2t/3 12σ 2 2 2, where the last iequality follows from assumptio (12. By the uio boud, with probability at least 1 2 1, d i > (p q 2 + q for all odes i V 1 ad d i < (p q 2 + q for all odes i V 2. Therefore, all odes i V 2 are correctly declared to be isolated with high probability. The umber of commo eighbors S ij betwee the odes i ad j is distributed as Bi( 2, p 2 plus a idepedet Bi(, q 2 if i ad j are i the same cluster, ad it is distributed as Bi(2( 1, pq plus a idepedet Bi( 2, q 2 if i ad j are i differet clusters. Hece, E[S ij ] equals ( 2p 2 + ( q 2 if i ad j are i the same cluster ad 2( 1pq + ( 2q 2 otherwise. The differece i mea equals (p q 2 2p(p q. Let σ 2 = 2p 2 (1 q 2 +q 2 (1 q 2. The Var[S ij ] σ 2. Set t := (p q 2 /3 σ 2. Assumptio (13 implies that t > 2p(p q. Applyig the Berstei iequality (Theorem A.1, we obtai ( P{ S ij E[S ij ] t t 2 } 2 exp 2σ 2 2 exp ( 2 (p q 4 + 2t/3 27σ 2 2 3, where the last iequality follows from the assumptio (13. By uio boud, with probability at least 1 2 1, S ij > (p q pq + q 2 for all odes i, j from the same cluster ad S ij < (p q pq + q 2 for all odes i, j from two differet clusters. Therefore the simple algorithm returs the true clusters w.h.p. 10

11 1.6 Proof of Theorem 6 For simplicity we assume ad 2 are eve umbers. We partitio V 1 ito two equal-sized subsets V 1+ ad V 1 such that half of the odes of each cluster are i V 1+. Similarly, V 2 is partitioed ito two equal-sized subsets V 2+ ad V 2. To prove the theorem, we eed the followig aticocetratio iequality. Theorem 1.4 (Theorem i [3]. Let X 1,..., X be idepedet radom variables such that 0 X i 1 for all i. Suppose X = i=1 X i ad σ 2 := i=1 Var[X i] 200. The for all 0 t σ 2 /100 ad some uiversal costat c > 0, we have P [X E[X] + t] ce t2 /(3σ 2. Idetifyig isolated odes For each ode i i V 1+ V 2+, let d i+ be the umber of its eighbors i V 1+ V 2+ ad d i be the umber of its eighbors i V 1 V 2, so d i = d i+ + d i. We cosider two cases. Case 1: Suppose (p + ( q log 1 q log 2. I this case we have ( 1 2 (p q 2 2c 2 (p+q log 1 by (14. For each i V 1+, d i is distributed as Bi(/2, p plus a idepedet Bi(( /2, q. Let t := ( 1(p q + 2, γ 1 := E[d i ] t = q/2 + (p q/2 t ad σd 2 := Var[d i ] = 1 2 p(1 p ( q(1 q. Sice /2, p, q are bouded away from 1 ad p + q p 2 + q 2 c 1 log, we have σd 2 c log 200. Combiig wit (14, we further have σd 4 c 2 (p q 2 / log 1 c log t 2. We ca thus apply Theorem 1.4 ad get ( P [d i γ 1 ] c exp ( t2 (( 1(p q σd 2 = exp c1 c, 3(p(1 p + ( q(1 q for some costat c > 0 that ca be made small by choosig c 2 above sufficietly small. Let i := arg mi i V1+ d i. Sice the radom variables {d i : i V 1+ } are mutually idepedet, we have P [d i γ 1 ] = ( P [d i γ 1 ] (1 c1 c 1/2 exp c 1 c 1 /2 1/4. i V 1+ O the other had, for each i V 1+, d i+ is distributed as Bi(/2 1, p plus a idepedet Bi(( /2, q. Sice the media of Bi(, p is at most p + 1, we kow that with probability at least 1/2, d i+ γ 2 := q/2 + (p q/2 p + 2. Now observe that the two sets of radom variables {d i+, i V 1+ } ad {d i, i V 1+ } are idepedet of each other, so d i+ is idepedet of i for each i V 1+. It follows that P [d i + γ 2 ] = i V 1+ P [d i+ γ 2 i = i] P [i = i] = i V 1+ P [d i+ γ 2 ] P [i = i] 1 2. Combiig the last two display equatios by the uio boud, we obtai that with probability at least 1/4, d i = d i + d i + γ 1 + γ 2 = ( 1q, ad thus ode i will be icorrectly declared as a isolated ode. Case 2: Suppose (p + q log 1 q log 2. I this case we have ( 1 2 (p q 2 2c 2 q log 2 by assumptio. Defie i = arg max i V2+ d i. Usig the same argumet as i Case 1, we ca show that with probability at least 1/4, d i ( 1q + ( 1(p q ad thus ode i will icorrectly declared as a o-isolated ode. 11

12 Recoverig clusters For two odes i, j V 1, let S ij+ be the umber of their commo eighbors i V 1+ V 2+ ad S ij be the umber of their commo eighbors i V 1 V 2, so S ij+ = S ij+ +S ij. For each pair of odes i, j i V 1+ that are from the same cluster, S ij is distributed as Bi(/2, p 2 plus a idepedet Bi(( /2, q 2. Let t := (p q 2 + 4, γ 3 := E[S ij ] t = q 2 /2+(p 2 q 2 /2 t, ad σs 2 := Var[S ij ] = 1 2 p2 (1 p ( q2 (1 q 2. Sice /2, p, q are bouded away from 1 ad p 2 + q 2 c 1 log, we have that σs ad σ2 S 100t. Theorem 1.4 implies that there exists a costat c > 0 such that ( P [S ij γ 3 ] c exp ( t 2 ((p q = c exp 3(p 2 (1 p 2 + ( q 2 (1 q 2 c c 1, 3σ 2 S where the costat c > 0 ca be made sufficietly small by choosig c 2 sufficietly small i the statemet of the lemma. Without loss of geerality, we may re-label the odes such that V 1+ = {1, 2,..., 1 /2} ad for each k = 1,..., 1 /4, the odes 2k 1 ad 2k are i the same cluster. Note that the radom variables {S (2k 12k : k = 1, 2,..., 1 /4} are mutually idepedet. Let i = arg mi k=1,2,...,1 /4 S (2k 12k ad j = i + 1; it follows that P [S i j γ 3 ] (1 c c 1 1/4 exp( c 1 c 1 /4 1/4. O the other had, sice S ij+ is distributed as Bi(/2 2, p 2 plus a idepedet Bi(( /2, q 2, we use the media argumet to obtai that with probability at least 1/2, S ij+ γ 4 := q 2 /2 + (p 2 q 2 /2 2p Because {S ij+, i, j V 1+ } oly depeds o the edges betwee V 1+ ad V 1+ V 2+, ad (i, j oly depeds o the edges betwee V 1+ ad V 1 V 2, we kow {S ij+, i, j V 1+ } ad (i, j are idepedet of each other. It follows that S i j + γ 4 with probability at least 1/2. It follows that with probability at least 1/4, S i j = S i j + S i j + γ 3 + γ 4 = 2( 1pq + ( 2q 2 ad thus the odes i, j will be icorrectly assiged to two differet clusters. Appedices A Stadard Berstei Iequality Theorem A.1. Let X 1,..., X be idepedet radom variables such that X i M almost surely. Let σ 2 = i=1 Var(X i, the for ay t 0 [ ] ( t 2 P X i t exp 2σ Mt. i=1 A cosequet of the above iequality is P [ i=1 X i 2σ 2 u + 2Mu 3 ] e u for ay u > 0. 12

13 Refereces [1] S. Chattergee. Matrix estimatio by uiversal sigular value thresholdig. arxiv: , [2] L. Massoulié ad D.-C. Tomozei. Distributed user profilig via spectral methods. arxiv: , [3] J. Matoušek ad J. Vodrák. The probabilistic method, lecture otes. kam.mff.cui. cz/ matousek/ prob-l-2pp.ps.gz, [4]. Rohe, S. Chatterjee, ad B. Yu. Spectral clusterig ad the high-dimesioal stochastic blockmodel. The Aals of Statistics, 39(4, [5] J. Tropp. User-friedly tail bouds for sums of radom matrices. Foudatios of Computatioal Mathematics, 12(4: ,

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Introduction to Probability. Ariel Yadin

Introduction to Probability. Ariel Yadin Itroductio to robability Ariel Yadi Lecture 2 *** Ja. 7 ***. Covergece of Radom Variables As i the case of sequeces of umbers, we would like to talk about covergece of radom variables. There are may ways

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Lecture 9: Expanders Part 2, Extractors

Lecture 9: Expanders Part 2, Extractors Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Notes 5 : More on the a.s. convergence of sums

Notes 5 : More on the a.s. convergence of sums Notes 5 : More o the a.s. covergece of sums Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: Dur0, Sectios.5; Wil9, Sectio 4.7, Shi96, Sectio IV.4, Dur0, Sectio.. Radom series. Three-series

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1 Solutio Sagchul Lee October 7, 017 1 Solutios of Homework 1 Problem 1.1 Let Ω,F,P) be a probability space. Show that if {A : N} F such that A := lim A exists, the PA) = lim PA ). Proof. Usig the cotiuity

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Generalized Semi- Markov Processes (GSMP)

Generalized Semi- Markov Processes (GSMP) Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Notes 27 : Brownian motion: path properties

Notes 27 : Brownian motion: path properties Notes 27 : Browia motio: path properties Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces:[Dur10, Sectio 8.1], [MP10, Sectio 1.1, 1.2, 1.3]. Recall: DEF 27.1 (Covariace) Let X = (X

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Technical Proofs for Homogeneity Pursuit

Technical Proofs for Homogeneity Pursuit Techical Proofs for Homogeeity Pursuit bstract This is the supplemetal material for the article Homogeeity Pursuit, submitted for publicatio i Joural of the merica Statistical ssociatio. B Proofs B. Proof

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h

More information

APPENDIX A SMO ALGORITHM

APPENDIX A SMO ALGORITHM AENDIX A SMO ALGORITHM Sequetial Miimal Optimizatio SMO) is a simple algorithm that ca quickly solve the SVM Q problem without ay extra matrix storage ad without usig time-cosumig umerical Q optimizatio

More information

The log-behavior of n p(n) and n p(n)/n

The log-behavior of n p(n) and n p(n)/n Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Supplement to Graph Sparsification Approaches for Laplacian Smoothing

Supplement to Graph Sparsification Approaches for Laplacian Smoothing Supplemet to Graph Sparsificatio Approaches for Laplacia Smoothig Veerajaeyulu Sadhaala Yu-Xiag Wag Rya J. Tibshirai Machie Learig Departmet Caregie Mello Uiversity Pittsburgh PA 53 Departmet of Statistics

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

Online hypergraph matching: hiring teams of secretaries

Online hypergraph matching: hiring teams of secretaries Olie hypergraph matchig: hirig teams of secretaries Rafael M. Frogillo Advisor: Robert Kleiberg May 29, 2008 Itroductio The goal of this paper is to fid a competitive algorithm for the followig problem.

More information

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions Math 451: Euclidea ad No-Euclidea Geometry MWF 3pm, Gasso 204 Homework 3 Solutios Exercises from 1.4 ad 1.5 of the otes: 4.3, 4.10, 4.12, 4.14, 4.15, 5.3, 5.4, 5.5 Exercise 4.3. Explai why Hp, q) = {x

More information

Math 216A Notes, Week 5

Math 216A Notes, Week 5 Math 6A Notes, Week 5 Scribe: Ayastassia Sebolt Disclaimer: These otes are ot early as polished (ad quite possibly ot early as correct) as a published paper. Please use them at your ow risk.. Thresholds

More information

EE 4TM4: Digital Communications II Information Measures

EE 4TM4: Digital Communications II Information Measures EE 4TM4: Digital Commuicatios II Iformatio Measures Defiitio : The etropy H(X) of a discrete radom variable X is defied by We also write H(p) for the above quatity. Lemma : H(X) 0. H(X) = x X Proof: 0

More information

Lecture 6: Coupon Collector s problem

Lecture 6: Coupon Collector s problem Radomized Algorithms Lecture 6: Coupo Collector s problem Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Radomized Algorithms - Lecture 6 1 / 16 Variace: key features

More information

Asymptotic distribution of products of sums of independent random variables

Asymptotic distribution of products of sums of independent random variables Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Mathematics 170B Selected HW Solutions.

Mathematics 170B Selected HW Solutions. Mathematics 17B Selected HW Solutios. F 4. Suppose X is B(,p). (a)fidthemometgeeratigfuctiom (s)of(x p)/ p(1 p). Write q = 1 p. The MGF of X is (pe s + q), sice X ca be writte as the sum of idepedet Beroulli

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

ECE534, Spring 2018: Final Exam

ECE534, Spring 2018: Final Exam ECE534, Srig 2018: Fial Exam Problem 1 Let X N (0, 1) ad Y N (0, 1) be ideedet radom variables. variables V = X + Y ad W = X 2Y. Defie the radom (a) Are V, W joitly Gaussia? Justify your aswer. (b) Comute

More information

Symmetric Two-User Gaussian Interference Channel with Common Messages

Symmetric Two-User Gaussian Interference Channel with Common Messages Symmetric Two-User Gaussia Iterferece Chael with Commo Messages Qua Geg CSL ad Dept. of ECE UIUC, IL 680 Email: geg5@illiois.edu Tie Liu Dept. of Electrical ad Computer Egieerig Texas A&M Uiversity, TX

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data

ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data ECE 6980 A Algorithmic ad Iformatio-Theoretic Toolbo for Massive Data Istructor: Jayadev Acharya Lecture # Scribe: Huayu Zhag 8th August, 017 1 Recap X =, ε is a accuracy parameter, ad δ is a error parameter.

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Diagonal approximations by martingales

Diagonal approximations by martingales Alea 7, 257 276 200 Diagoal approximatios by martigales Jaa Klicarová ad Dalibor Volý Faculty of Ecoomics, Uiversity of South Bohemia, Studetsa 3, 370 05, Cese Budejovice, Czech Republic E-mail address:

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Lecture 5: April 17, 2013

Lecture 5: April 17, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 5: April 7, 203 Scribe: Somaye Hashemifar Cheroff bouds recap We recall the Cheroff/Hoeffdig bouds we derived i the last lecture idepedet

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Solution to Chapter 2 Analytical Exercises

Solution to Chapter 2 Analytical Exercises Nov. 25, 23, Revised Dec. 27, 23 Hayashi Ecoometrics Solutio to Chapter 2 Aalytical Exercises. For ay ε >, So, plim z =. O the other had, which meas that lim E(z =. 2. As show i the hit, Prob( z > ε =

More information

Detailed proofs of Propositions 3.1 and 3.2

Detailed proofs of Propositions 3.1 and 3.2 Detailed proofs of Propositios 3. ad 3. Proof of Propositio 3. NB: itegratio sets are geerally omitted for itegrals defied over a uit hypercube [0, s with ay s d. We first give four lemmas. The proof of

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

Large holes in quasi-random graphs

Large holes in quasi-random graphs Large holes i quasi-radom graphs Joaa Polcy Departmet of Discrete Mathematics Adam Mickiewicz Uiversity Pozań, Polad joaska@amuedupl Submitted: Nov 23, 2006; Accepted: Apr 10, 2008; Published: Apr 18,

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information