Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Size: px

Start display at page:

Download "Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities"

Charleen Simon
6 years ago
Views:

1 Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we analyze the convergence of PMPU Matrx Concentraton Inequaltes Lemma Matrx Bernsten s nequalty Consder a fnte sequence {S } of ndependent random matrces of dmenson d d Assume that each matrx has unformly bounded devaton from ts mean: S ES L for each ndex Introduce the random matrx Z S, and let νz be the matrx varance of Z where νz max{ez EZZ EZ, EZ EZ Z EZ} max{ ES ES S ES, ES ES S ES } Then EZ EZ νz logd + d + 3 L logd + d Furthermore, for all t >, { P{Z EZ t} d + d exp t / νz + Lt/3 Wth matrx Bernsten s nequalty, t s standard to get the concentraton of covarance estmaton: Proposton Suppose {x } N Rd are ndependent and dentcal dstrbuted d sub-gaussan random vectors and X [x, x,, x N ], then wth probablty at least δ, provded N C δ d logd/ϵ N XX I ϵ Lemma Let X [x, x,, x N ] R d N Suppose each x s are ndependently sampled from the truncated Gaussan dstrbuton wth postve margn, then for w R d wth w, we have } where λ + exp Esgn x, w x λ w, Proof It s well known that when x s the standard Gaussan random varable, λ In our settng, the st dmenson of x s a truncated Gaussan, hence Esgn x, w x E x w + exp w Lemma 3 Let g [g, g,, g d ], g be a truncated Gaussan random varable, and the remanng d dmensons are d from standard Gaussan dstrbuton For two dfferent vectors w, w R d, f arccos w, w, we have [ Egg sgn g, w sgn g, w C d + e + ] w w [ Eg sgn g, w sgn g, w C d + e + ] w w

2 snce α <, we have c α + sn α 3c α Proof Defne α arccos w, w and w, w We wll prove the two nequaltes under the condton α a Snce Egg sgn g, w sgn g, w g g g g g d g g g g g d E sgn g, w sgn g, w, g d g g d g g d g d we need to estmate each Eg g j sgn g, w sgn g, w Observe that only when g > g cos α + g sn α < or g < g cos α + g sn α >, sgn g, w sgn g, w Otherwse t s Hence, the doman of the expectaton s Ω {g, g : g > g cos α + g sn α < } {g < g cos α + g sn α > } wth all other Gaussan varables g 3,, g d, For j, j [d], For j, Eg sgn g, w sgn g, w + exp erf Eg sgn g, w sgn g, w gϕg ϕg dg dg ϕg 3 ϕg d dg 3 dg d g 3,,g d 8 g,g Ω g,g Ω /+α / g ϕg ϕg dg dg c α + sn α, sn θe r r 3 drdθ by polar transformaton For j 3, we have For, j 3,, d or j, 3,, d, we get Eg sgn g, w sgn g, w α Eg g j sgn g, w sgn g, w Eg Eg g 3,,g d ϕg 3 ϕg d dg 3 dg d + exp / 8 + exp / For all the other cases that j, we can see that Eg g j sgn g, w sgn g, w

3 Therefore, Egg sgn g, w sgn g, w + exp erf 8 + exp / 8 + exp / exp / c α + sn α 8 + exp / α 8 + exp / α max { + exp d + exp /, } 3c α exp /, α exp / C d exp + + w w, n whch the frst nequalty holds because A A A b: The proof s smlar to that of a We have Eg sgn g, w sgn g, w g sgn g, w sgn g, w + exp C d exp + + 3c α + d α + + w w + exp / Proof of Theorem Proof Accordng to the rotaton nvarance of the Eucldean space, there exsts a rotaton matrx Q such that Q w [,,, ] Wthout loss of generalty, we assume that w [,,, ] R d For smplcty, we wll dscard the superscrpt t n X t but the reader should aware that the feature matrx X s always re-sampled n each teraton Let x [x, x ] where x denotes the st dmenson of x and x denotes the remanng d dmenson Smlarly, we denote w [w, w ] Denote by y y ŷ the ntal error Snce at the t-th teraton, w t w t X y t λ m t we have w t λ m t Xy t ŷ t, w t w w t w λ m t XsgnX w t sgnx w + sgnx w S X w t w t w λ m t XsgnX w t sgnx w + λ m t X t where t sgnx w S X w t To bound the frst two terms, usng Lemma and Lemma 3, we have wth probablty at least δ, w t w λ m t XsgnX w t sgnx w ϵ maxw t w, w t w /

4 provded m t Od log d exp //ϵ As we assume m s suffcently large, t s easy to satsfy that w t w Then w t w λ m t XsgnX w t sgnx w ϵ Next let us frst consder It s clear that wth probablty at least δ, d logd X C δ + E[sgnx w S x w ] m t m t The estmaton of ncludes two cases, e E + and E where E + s the error on {x w η z > } and E s the error on {x w > η z < }, where z x w Denote the cumulatve dstrbuton functon of standard Gaussan dstrbuton by We obtan E + Φz : z e t dt P x w + x w < η, x α dα e α η αw Φ dα w η αw [ e α + erf w e α αw η erfc w e α αw η e w dα w e η w +w w + w erfc w e η w η erfc w dα ] dα w + w w η w w + w where erfz x dx denotes the error functon and erfcz erfz s the complementary error functon The e x -th equalty holds because cumulatve functon Φz z + erf Smlarly, we have E P x w + x w η, x β dβ β e β e e η βw erfc w η βw w dβ dβ w e η w +w w η w + w erfc w w + w w e η erfc, w η w

5 Then, E[sgnx w S x w ] E + + E w e η a w [ w η w η ] erfc + erfc w w [ + w η w exp η w w + exp η ] w b ĉ [exp c + δ m ]w w c exp c w w b s a smplfcaton of a The constant ĉ and c actually depend on and many other factors However once we fxed the parameters, they wll be constants and do not control the order of our bound δ m s a small number f m s large because when m s large w and w As we always assume m s suffcently large, δ m < due to the exponental decayng Smlarly the upper bound of the error at the t-th step s E[sgnx w S x w t ] c exp c w t w Combne everythng above, we have wth probablty at least δ, w t w d logd ϵ + C δ + c exp c w t w m t As m t s sampled on unlabeled dataset, t can be as large as we want Therefore the above nequalty can be smplfed when m t s suffcently large, that s, w t w c exp c w t w Proof of Theorem Proof Let B λ [x sgn x, w x sgn x, w ], then by Lemma, we have Further, we set where EB w w Z B EB, m B [Xsgn X, w Xsgn X, w ] λ In order to utlze matrx Bernsten nequalty, we need to bound the terms max Z, EZ Z and EZ Z respectvely For the frst term, we have max Z max B EB max max When w w s suffcent small, then B + EB λ x sgn x, w x sgn x, w + w w d λ + w w d + w w c d λ λ

6 For the second term, we get EZ Z EB EB B EB Snce and EB B B EB EB B + EB EB EB B + EB EB EB EB w w, Thus, we have EB B λ Ex x sgn x, w sgn x, w C d λ w w EZ Z C d λ w w + w w Note that f w w <, then w w > w w, and w w, then w w w w Hence, the above nequalty can be rewrtten as For the thrd term, we have EZ Z C d λ EZ Z max{w w, w w } Snce and EB EB B EB EB B + EB EB EB EB w w, Then, we derve whch can be rewrtten as EB B λ Ex sgn x, w sgn x, w C λ w w, by Lemma 3 EZ Z C λ w w + w w, EZ Z C λ max{w w, w w } Now we can apply matrx Bernsten nequalty to obtan the fnal result

Lecture 4: September 12

Lecture 4: September 12 36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been