1 Review and Overview

Size: px

Start display at page:

Download "1 Review and Overview"

Shon Price
5 years ago
Views:

1 CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #12 Scribe: Garrett Thomas, Pega Liu October 31, Review a Overview Recall the GAN setup: we have iepeet samples x 1,..., x raw from some true ukow istibutio P. Let ˆP be the uiform istributio over these samples. We assume a latet variable Z P Z, for example P Z = N 0, I. Let P θ be the istributio of G θ Z. Our goal is to lear θ so that P θ approximates P. For a fixe set of samples z 1,..., z P Z, we efie ˆP θ to be the uiform istributio over {G θ z 1,..., G θ z }. I traiig, we will miimize the itegral probability metric W F ˆP θ, ˆP where F = {G θ : θ R p }. Ieally, a small W F ˆP θ, ˆP woul guaratee a small W 1 P θ, P, a coversely a large W F ˆP θ, ˆP woul guaratee a large W 1 P θ, P. That way, we kow that the quatity beig optimize the empirical IPM tells us somethig about the quatity we really care about the Wasserstei istace of the true istributios. Our approach is to relate the followig quatities: W F ˆP θ, ˆP W F P θ, P W 1 P θ, P The first arrow is a questio of geeralizatio: is the empirical IPM close to the populatio IPM? The seco arrow is a questio of approximatio: is the populatio IPM close to the populatio Wasserstei istace? We will see that the aswers epe o the complexity of the geerator class F. 2 What happes whe F is too complex? Lemma 1. Suppose F is the set of all 1-Lipschitz fuctios. Note this meas that W F = W 1. Assume = poly. The there exist istributios P a Q such that W 1 P, Q = W F P, Q = 0 a therefore P = Q, but with high probability, W F ˆP, ˆQ 1, where ˆP a ˆQ are uiform istributios over fixe sets of iepeet samples u 1,..., u P a v 1,..., v Q. This lemma implies a uesirable result: if F is too rich, it is possible to lear the true istributio i the sese that P = Q without realizig it i the sese that W F ˆP, ˆQ is large. Proof of lemma. Let V = {± 1 } be the vertices of a hypercube i -imesios. Let P be a uiform istributio over V, a P = Q. The immeiately W F P, Q = W 1 P, Q = 0 Now let ˆP = Uiform{u 1,..., u } a ˆQ = Uiform{v 1,..., v } where u 1,..., u, v 1,..., v are sample iepeetly from P.

2 Our claim is that raom vectors from P have a ier prouct boue like 1. More precisely, if u a v are iepeet samples from P, the for every c 0 2c log P u v 2 c Proof of claim. Write u v = u iv i. Note that for each u i, v i, we have E[u i ] = = a likewise for v i. This implies that E[u v] = E[u i v i ] = E[u i ]E[v i ] = 0 where we have use the iepeece of u i a v i to factor the expectatio. Moreover, sice u i v i [ 1, 1 ], we ca apply Hoeffig s iequality to obtai P u v t = P u v E[u 2t 2 v] t 2 exp 2 = 2 exp t2 2 for ay t 0. Takig t = as state. P u v 2c log, we obtai 2c log 2 exp 2 2c log = 2 c 2 By a uio bou over all 2 pairs u i, v j, it follows that 2c log P i, j, u i v j c for ay c 0. The if O1 polyomial sample complexity, we have with high probability that log i, j, u i v j O which implies log u i v j 2 2 = u i v j 2 2 2u i v j 2 O 1 Now let Γ be a couplig of ˆP a ˆQ. The with high probability, x, y Γ satisfy x y 2 1, so E x,y Γ x y 2 1 Coitioe o u i v j for all i, j which is a high probability evet, for ay Γ, which proves the lemma. W F ˆP, ˆQ = W 1 ˆP, ˆQ = if Γ E x,y Γ x y 2 1 2

3 3 What happes whe F is too simple? 3.1 Goo geeralizatio Theorem 1. Heuristical For ay fixe P, Q, with high probability over the raomess of ˆP, ˆQ, we have W F ˆP, ˆQ W F P, Q R F Remark 1. This theorem is ot eough, we ee a theorem that is more uiform covergece, that is we have boue ifferece for ay fixe P a ay Q. Defiitio 1. Let G = {P θ } be all possible geerate istributios, assume 0 F. For ay P G, efie R F, P = E z1,...,z P [E σ [ 1 sup σ i fz i ]] f F a efie R F, G = sup R F, P P G Recall that over the raomess of traiig examples, we have θ, ˆLθ Lθ thus we ca apply somethig similar here by reefiig the traiig error as E ˆQ[W F ˆP, ˆQ]. More specifically, we itrouce the followig theorem. Theorem 2. Assume that f F, f M, the for fixe P G with probability 1 δ, over the raomess of ˆP, Q G, we have Proof. W F P, Q E ˆQ[W F ˆP, ˆQ] R F, G + M W F P, Q E ˆQ[W F ˆP, ˆQ] = E ˆQ[W F P, Q W F ˆP, ˆQ] triagle iequality E ˆQ[W F P, ˆP + W F ˆP, ˆQ + W F ˆQ, Q W F ˆP, ˆQ] Thus we have = E ˆQ[W F P, ˆP + W F ˆQ, Q] = W F P, ˆP + E ˆQ[W F ˆQ, Q] W F P, Q E ˆQ[W F ˆP, ˆQ] W F P, ˆP + E ˆQ[W F ˆQ, Q] 1 We bou those two terms iiviually. By lemma 1 i lecture ote 5: E ˆQ[W F ˆQ, Q] = E z1,...,z ˆQ [ sup f F Ez Q [fz] 1 Also, by theorem 2 i lecture ote 6, with probability 1 δ fz i ] R F, Q 2 W F P, ˆP R F, P + M The by 1,2,3 we have W F P, Q E ˆQ[W F ˆP, ˆQ] R F, Q+R F, P +M R F, G+M 3 3

4 Similarly, we ca show that Thus E ˆQ[W F ˆP, ˆQ] W F P, Q R F, G + M W F P, Q E ˆQ[W F ˆP, ˆQ] R F, G + M 3.2 Maybe ba approximatio We itrouce the followig lemma regarig the approximatio quality of F with small complexity. Lemma 2. Assume P uiform over {± 1 }, suppose R F, G c, the ɛ > 1/poly, there is Q such that W F P, Q ɛ but W 1 P, Q 1. Proof. Take m c ɛ 2 c for some costat a Q = ˆP m uiform over {x 1,..., x m } where each x i P. The by 3, we have, if we pick some large eough m, igorig the M We also have W F P, Q = W F P, ˆP R m F, Q m c m ɛ W 1 P, Q = W 1 P, ˆP m = if p E x,y p[ x y 2 ] term, we have Sice ɛ 1/poly, we oly require m = poly. Furthermore, we ote that E x P [ x x i 2 2 ] = 2, a x x i 2 2 is the sum of iepeet raom variables. Therefore, Pr x P x x i 2 1 is expoetially small i see Lemma 3 for a formal statemet. Therefore, we ca uio bou over all m = poly choices of i to get that Pr x P i, x x i which gives that for ay couplig p of P a Q, Pr x P Pr y py x x y Thus W 1 P, Q = if p E x,y p[ x y 2 ] Poits are wiesprea i high imesio Lemma 3. Let x Uif{± 1 } a y be a arbitrary fixe vector i {± 1 }. The we have Pr x y 2 1 exp /8. 4

5 Proof. By the fact that x y 2 2 = x y x, y = 2 2 x, y, we have Pr x y 2 1 = Pr x, y 1/2. Now, regarless of the value of y, each variable x i y i is uiformly istribute i {± 1 }, a is thus mea-zero a sub-gaussia with variace proxy 1. As the cooriates are further 2 iepeet, we get that x, y = x iy i is mea-zero with sub-gaussia with variace proxy 1/. Applyig the sub-gaussia cocetratio, we get Pr x, y 1/2 exp 1/22 = exp /8. 2 1/ 5

1 Review and Overview

1 Review and Overview CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we