36-755, Fall 2017 Homework 5 Solution Due Wed Nov 15 by 5:00pm in Jisu s mailbox

Size: px

Start display at page:

Download "36-755, Fall 2017 Homework 5 Solution Due Wed Nov 15 by 5:00pm in Jisu s mailbox"

Meghan Marilynn Owen
5 years ago
Views:

1 Poits: 00+ pts total for the assigmet , Fall 07 Homework 5 Solutio Due Wed Nov 5 by 5:00pm i Jisu s mailbox We first review some basic relatios with orms ad the sigular value decompositio o matrices Lemma 0 For ay matrix A R m, let A = UΣV be its sigular value decompositio with U U = UU = I m ad V V = V V = I, ad Σ beig m diagoal matrix with oegative diagoal values σ,, σ mim, i Frobeius orm ca be calculated as A F = mim, ii Operator -orm A op ca be calculated as A op = σ i i Proof i Note that Frobeius orm equals A F = m j= A ij = tra A Ad hece from the sigular value decompositio of A, σ i A F = tra A = truσv V Σ U = truσσ U = trσσ U U = mim, σi ii Let r = mim,, ad let colums of U ad V be u,, u m ad v,, v, respectively Expad ay w R usig v,, v as basis, so that w = a iv i with a,, a R The Aw = a i Av i = r a i σ i u i, ad hece Aw = r σi a i i r σ i a i = σ i w, i r ad sufficiet coditio for the equality is whe a = ad a = = a = 0, ie w = v Hece Aw A op = sup w 0 w = i r σ i I this exercise you will fill i some of the details from the proof of the upper boud for sparse PCA uder a spike covariace model

2 a For p the Schatte p-orm of a m matrix A is the l p orm of its sigular values: r /p A p = σ p i, where σ σ σ r 0 are the sigular values of A ad r = mi{m, } Prove the ocommutative Hölder iequality for coformal matrices A ad B: for all p, q such that p + q = tra B A p B q, b Let u ad v be two uit orm vectors i R d Show that uu vv op = uu vv F = u v = si u, v where u, v = cos u v Poits: 5 pts = Solutio a Let A be m matrix ad let B be m matrix, ad let r = mi{, m }, s = mi{, m } Let σ σ r 0 be sigular values of A, the from sigular value decompositio, there exists a orthoormal set of vectors {u,, u r } R ad a orthoormal set of vectors {v,, v r } R m such that A = r σ iu i vi These {u i} s are first r colums of U ad {v i } s are first r colums of V, where A = UΣV is the sigular value decompositio of A Similarly, let λ λ s 0 be sigular values of B, the there exists a orthoormal set of vectors {w,, w s } R ad a orthoormal set of vectors {t,, t s } R s such that s j= λq j B = s j= λ jw j t j Note that A p = r σp i /p ad B q = LHS ca be expaded as tra r s B = tr σ i v i u i λ j w j t j j= r s = σ i λ j tr v i u i w j t j j= r s u tr σ i λ j i w j v i t j j= /q Meawhile, Now, ote that for ay u, w R with u = v =, Cauchy-Schwarz iequality yields u w u w = Also, for ay v R m ad t R m with v = t =, Cauchy-Schwarz iequality yields tr v i t j = mi{m,m } v i t i mi{m,m } v mi{m,m } i t i v t =

3 Hece applyig this ad further applyig Hölder s iequality gives tra B r j= r s σ i λ j σ p i p s j= λ q j q = A p B q Note that above iequality holds for p =, q = or p =, q = as well, where A = i r σ i b The rest of the iequality follows from HW4 Problem 4c, hece we are oly left to show the first iequality uu vv op = uu vv F Note that the rak of uu vv is at most, so ozero eigevalues ca be at most Hece uu vv ca be expressed as uu vv = λ w w + λ w w, where λ, λ R ad w, w R d with w w The from λ = λ, ie there exists λ > 0 such that λ + λ = truu vv = u u v v = 0, uu vv = λw w λw w The sice uu vv is real symmetric, all the sigular values are absolute values of eigevalues Hece sigular values of uu vv are λ, λ, 0,, 0 Hece from Lemma 0, uu vv op = λ ad uu vv F = λ + λ = λ, ad hece uu vv op = uu vv F holds This is a result that I cited whe discussig spectra clusterig for stochastic block models A radom matrix A of dimesio m is sub-gaussia with parameter σ, writte as A SG m, σ, whe y Ax is SGσ for ay y S ad x S m You may assume that E[A] = 0 or otherwise replace A by A E[A] a Suppose that the etries of A are idepedet variables that are SGσ SG m, σ b Let A SG,m σ ad recall that the operator orm of A is Show that A A op = Ax x R m,x 0 x = y S,x S y Ax m Show that, for some C > 0, E [ A op ] C + m c Fid a cocetratio iequality for A op 3

4 Hit: work with a /4 et for S ad a /4 et for S m Poits: 5 pts = Solutio Assume that E[A] = 0 a For all y S, x S m, ad λ R, by usig the etries of A beig idepedet ad SGσ, the mgf of y Ax is bouded as [ ] E exp λy Ax = E exp λ = j= j= j= m y i A ij x j m E [exp λy i x j A ij ] usig idepedece of A ij m exp σ λy i x j = exp m σ λ yi x j j= = exp σ λ, y S, x S m hece A SG m, σ b Let B S ad B m S m be the 4 -et for S ad S m with respect to l distaces The from Lecture ote 5Sep 8, B V ol /4 B0, + B0, V olb0, = V ol 9B0, V olb0, = 9, ad similarly B m 9 m holds Also for all y S ad x S m, there exists y 4 S, x 4 Sm, w B, z B m with y = w + y ad x = z + x Hece y S,x S y Ax m w Az + w Ax + y Az + y Ax w B,z B m w B,x 4 Sm y 4 S,z B m y 4 S,x 4 Sm = w B,z B m w Az + 4 Now, ote that w B,x S m w Ax + 4 w B w Ax,x S m y Az + y S,z B m 6 w Az + w Ax w B,z B m w B,x 4 Sm w Az + w B,z B m 4 w B w Ax,,x S m y S,x S y Ax m 4

5 so Ax 4 w B,x S m w 3 w Az, ad similarly y Az 4 w B,z B m y S 3 w Az,z B m w B,z B m Ad applyig these to yields y S,x S y Ax 6 w Az m 9 w B,z B m Hece these yields the upper boud of E [ A op ] as [ ] E [ A op ] = E y S,x S y Ax m 6 [ ] 9 E w Az usig w B,z B m 6σ log Bm B 9 m + log 9 6σ 9 < 3σ log 3 9 m + c From, the tail probability P A op t ca be bouded as usig m + < m + for m, > 0 P A op t = P y S,x S y Ax t m 6 P w Az t usig 9 w B,z B m w Az 96 t 3 w B,z B m P The for each w B ad z B m, sice w Az SGσ with E [ w Az ] = 0, so by applyig Hoeffdig s iequality, P w Az 96 t exp 8t 5σ Hece applyig this to 3 gives the boud for the tail probability P A op t as P A op t B m B exp 8t 5σ 9 m+ exp 8t 5σ 3 Exercise 84: Show that the orthogoal matrix V R d r imizig the criterio 8 has colums formed by the top r eigevectors of Σ = covx: Poits: 5 pts E V X = r j= [ E v j, X ] 5

6 Solutio E V X ] [ ] [X = E V V X = E tr X V V X [ = E tr V V XX ] = tr V V E [XX ] = tr V V Σ Let eigevalues ad correspodig eigevectors of Σ be λ λ d ad u,, u d The E V X d d = tr V V λ i u i u i = λ i tr V V u i u i = d λ i tr u i V V u i = d V λ i u i Now, ote that V V V = V V V = V I r = V, ad hece for ay w ColV, V V w = w, where ColV is the colum space of V Also, for ay w ColV, V w = 0, which implies V V w = 0 Hece for ay u R d with u =, let u = w + w with w ColV ad w ColV, the hece V u i satisfy V u = w + w V V w + w = w + w w = w u =, 0 V u i 4 Also, V u i = holds if ad oly if u i ColV ie ColV part is 0 ad V u i = 0 if ad oly if u i ColV ie ColV part is 0 Also, ote that d V d d u i = tr V V u i u i = tr V V u i u i = tr V V I d = tr V V = tr I r = r 5 Now, cosider the followig optimizatio problem: d imize λ i x i, subject to i, 0 x i, d x i = r 6 The, the costraits imply that i d, i j= x j mi {i, r} Let λ d+ = 0 for cove- 6

7 iece, ad ote that d λ i x i = = d i λ i λ i+ r λ i λ i+ i + r λ i, j= x j d λ i λ i+ r i=r+ ad the equality holds if ad oly if i j= x j = mi {i, r} wheever λ i > λ i+ Hece x = = x r =, x r+ = = x d = 0 7 imizes the objective fuctio d λ ix i, ad if we further have λ r > λ r+, it is the uique imizer Hece, applyig 4 ad 5 to 6 implies that E V X = d V λ r i u i λ i Ad from 7, the equality holds if V V u = = V u r = ad V u r+ = = u d = 0, 8 ad oly if holds uder λ r > λ r+ The from equality coditios from 4, 8 is equivalet to ColV = u,, u r, where u,, u r R d is the liear subspace spaed by u,, u r I coclusio, E V X r λ i, ad the equality holds if ColV = u,, u r, ad uder λ r > λ r+, it is ecessary as well I particular, whe top r eigevectors {u,, u r } form the colums of V, E V X is imized 4 Suppose we observe a iid sample X,, X from the mixture distributio P + P, where P = N d µ, I d ad P = N d µ, I d, with µ R d a o-zero vector Ours task is to cluster the sample poits ito two groups, where poits i the same group origiated from the same compoet of the mixture ie Either P or P We will use spectral clusterig: compute the leadig eigevector ˆν of the empirical covariace matrix ad cluster the poits depedig o the sig of Xi ˆν, i =,, Use the Davis-Kaha theorem to derive a upper boud o the proportio of misclustered odes Poits: 0 + pts 7

8 Solutio X has the same distributio as W µ + Y, where W Rademacher, Y N d 0, I d E[X] = 0 ad [ Σ := V ar [X] = E XX ] = µµ + I d The Hece by lettig v := µ µ, the v is the leadig eigevector of Σ with eigevalue + µ Hece from Davis-Kaha theorem, ˆΣ Σ op si v, ˆv µ 9 The ote that the umber of misclustered odes is upper bouded by I { } { } Xi ˆv 0, W i = Xi ˆv > 0, W i = Hece the misclusterig proportio R is bouded as R P Xi ˆv 0, W i = + P Xi ˆv > 0, W i = P µ + Y i ˆv 0 + P µ + Y i ˆv > 0 Now, cosider the first evet { µ + Y i ˆv 0 } Note that Hece µ + Y i, ˆv 0 implies µ + Y i, ˆv = µ, ˆv + Y i, ˆv µ, ˆv Y i, Y i µ, ˆv µ + Y i, ˆv µ, ˆv Its geometric iterpretatio is i Figure Ad applyig Y i istead of Y i gives that µ + Y i, ˆv > 0 implies Y i > µ, ˆv as well Sice ˆv is aliged with µ so that µ, ˆv 0, ad hece the misclusterig proportio R is bouded as R P Y i µ, ˆv The applyig the result of Davis-Kaha Theorem i 9 implies µ, ˆv = µ cos v, ˆv = µ si v, ˆv 4 ˆΣ Σ µ op µ 4 Hece the misclusterig proportio R is bouded as 4 ˆΣ Σ R P Y i µ µ 4 op 8

9 µ, ˆv µ ˆv X i, ˆv 0 Figure : Geometrical iterpretatio of µ + Y i, ˆv 0 implyig Y i µ, ˆv µ + Y i, ˆv 0 is equivalet to X i = µ + Y i lyig o the bottom-left regio of the hyperplae which is perpedicular to ˆv The X i lies outside of a sphere cetered at µ ad of radius µ, ˆv Hece X i µ = Y i µ, ˆv Now, ote that for ay v R d with v =, [ ] E exp λv X i = [ ] E exp λv µ + Y i + [ E ] λv µ + Y i exp = exp λv µ + λ + exp λv µ + λ = exp λ cosh λv µ exp λ + v µ cosh x expx / exp λ + µ, Cauchy-Schwarz Hece X i SG + µ Hece from Lecture ote 6 o Sep 0, for all δ 0,, there exists some C > 0 such that { } op d + log/δ P ˆΣ Σ C + µ, d + log/δ δ 0 { } Let E be the evet that ˆΣ Σ C+ µ d+log/δ op, d+log/δ happes The Y i µ 4 ˆΣ Σ op ad E implies µ 4 Y i µ µ := K,d,δ, µ,c, 4C + µ µ 4 8C + µ 4 { } d + log/δ d + log/δ, { } d + log/δ d + log/δ, 9

10 { } where K,d,δ, µ,c = µ 8C + d+log/δ µ 4, d+log/δ Ad hece 4 ˆΣ Σ R P Y i µ µ 4 op 4 ˆΣ Σ + P Y i µ µ 4 P Y i K,d,δ, µ,c + PE c P Y i K,d,δ, µ,c + δ E op Ec Now, sice Y i N d 0, I d, Y i χ d, chi-square distributio with degree d Hece P Y i K,d,δ, µ,c = Fχ d K,d,δ, µ,c, where F χ is the CDF fuctio for χ d d To coclude, the misclusterig proportio R is bouded as { } R if F χ K d,d,δ, µ,c + δ, δ 0, where K,d,δ, µ,c = µ 8C + µ 4 { d+log/δ, } d+log/δ Note that whe gettig the boud 0, matrix Berstei iequality is a bit icoveiet to use sice X i is ot bouded There is bous poit for usig the correct covariace boud for gettig 0 5 Exercise 43: a Recall the re-cetered fuctio class F = {f E[f] f F} Show that E X,ɛ [ R F] E X,ɛ [ R F ] sup f F E[f] b Use cocetratio results to complete the proof of Propositio 4 Poits: 5 pts = Solutio a Note that for ay f F, there exists f F such that fx = fx + E[f] Hece R F = sup f F = sup f F sup f F ɛ i fxi ɛ i fx i ɛ i E[f] ɛ i fx i sup ɛ i E[f] f F 0

11 [ ] Now, ote that by applyig Cauchy Schwarz, E ɛ i ca be bouded as [ E ] ɛ i E = = ɛ i = E ɛ i + ɛ i ɛ j i j By pluggig this to expectatio of, we get E X,ɛ [ R F] E X,ɛ [ R F ] sup f F [ E[f] E E X,ɛ [ R F ] sup f F E[f] ] ɛ i b Note that from a ad Propositio 4 a, we have ] E X [ P P F ] E X,ɛ [ R F E X,ɛ [ R F ] sup f F E[f] Now, for each f F let fx = fx E[f] as i a, ad let Gx,, x = sup fx i f F so that P P F = GX,, X Note that if x, y X satisfies x j = y j for all j i, the for ay f F, fx j j= sup hy j h F j= fx j j= fy j j= fx j fy j j= j= = fxj fy j f b Hece takig supremum over f F yields Gx Gy b Sice the same argumet ca be applied to get Gy Gx b b, so Gx Gy Hece from Bouded differeces iequality, P GX E [GX] δ exp δ b The applyig GX = P P F ad to this gives P P P F E X,ɛ [ R F ] sup f F E[f] P P P F E [ P P F ] δ exp δ b δ

12 Hece P P F E X,ɛ [ R F ] sup f F E[f] + δ with P-probability at least e δ b 6 Massart s fiite class Lemma Let A be a fiite subset of R ad let ɛ = ɛ,, ɛ be a vector of iid Rademacher variables Show that [ ] log A E sup a ɛ D a A where D = a A a Use this result to prove Lemma 4 of Chapter 4 I provig both claims it is OK if you get differet costats; kow however that the costats i Lemma 4 are sub-optimal Poits: 0 pts Solutio Note that E [ e λɛ i] ca be bouded as E [exp λɛ i ] = eλ + e λ = λ k k=0 k! k! k! k=0 k=0 λ k k! λ k k! k = e σ, hece ɛ i SG Sice ɛ i s are idepedet, from HW Problem 6 Details, a ɛ SG a SG D Hece from imum iequality for subgaussia radom variables, [ ] log A E sup a ɛ D, a A ad from HW Problem, we also have a boud for absolute value versio, [ ] E sup a log A ɛ D 3 a A For provig Lemma 4 of Chapter 4, let Fx := {fx,, fx R f F} as i 4 i Waiwright, the a Fx a = Dx Hece by applyig 3, [ ] log Fx + log E sup f F ɛ i fx i Dx The F beig polyomial discrimiatio of order ν implies Fx + ν, hece [ ] ν log + + log E sup ɛ i fx i Dx f F ν log + Dx

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of