Supplemental Material: Proofs - PDF Free Download

Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special case of Aθ. Clearly, this case is cosistet with LB. Next cosider the case. Sice θ solves (), the KKT coditio holds: We seek all δ such that θ δ satisfies λaθ l(x i θ, y i )x i. (29) A(θ δ) Aθ ad x i (θ δ) x i θ i,,, (3) For ay such δ, simple algebra verifies that θ tδ satisfies the KKT coditio (29) for ay t [, ]. Cosequetly, θ δ also solves the problem i (). To see this, we cosider two situatios: If the loss fuctio l(, ) is covex i the first argumet, the KKT coditio is a sufficiet optimality coditio, which meas that θ δ solves (). If the loss fuctio l(, ) is smooth (ot ecessary covex) i the first argumet, we have f(θ ) f(θ δ) by usig the Taylor expasio (recall f is defied i equatio ): f(θ δ) f(θ ) f(θ tδ), δ (for some t [, ]) f(θ ) l(x i (θ tδ), y i )x i λa(θ tδ)), δ f(θ ) f(θ ). l(x i θ, y i )x i λaθ, δ } {{ } due to the KKT coditio (29) Therefore, θ δ also solves (). However, the uiqueess of θ requires δ to be the oly value satisfyig (3). This is equivalet to say It idicates that Null(A) Null(Spa{x,, x }) {}. (3) Rak(A) Dim(Spa{x,, x }) d. From Dim(spa{x,, x }), we have d Rak(A). We proved the geeral case for LB. If we have Aθ, we ca further improve LB. Let g (g,..., g ) be the vector satisfyig λaθ gi x i ad gi l(x i θ, y i ) i, 2,,. (32) Sice θ satisfies the KKT coditio, such vector g must exist. Applyig Aθ to (32), we have g ad Dim (Spa{A., A.2,, A.d } Spa{x,, x }). (33) To satisfy (3), we must have d Dim (Spa{A., A.2,, A.d, x,, x }).

Usig the fact i liear algebra Dim (Spa{A., A.2,, A.d, x,, x }) Dim (Spa{A., A.2,, A.d }) Rak(A) Dim (Spa{x,, x }) Dim (Spa{A., A.2,, A.d } Spa{x,, x }) (from (33)) We coclude that d Rak(A). We completed the proof for LB. Proof to Theorem 2 Proof. Whe A has full rak we have a equivalet expressio for the KKT coditio (29): λa 2 θ A 2 xi l(x i θ, y i ) i,,. (34) Let us decompose A 2 x i for all i,, ito A 2 x i α i A 2 θ u i, where u i is orthogoal to A 2 θ : u i A 2 θ. Equivaletly x i α i Aθ A 2 u i. Applyig this decompositio, we have Puttig it back i (34) we obtai x i θ α i θ 2 A u i A 2 θ α i θ 2 A. ) λa 2 θ (α i A 2 θ u i l(α i θ 2 A, y i ) i,,. (35) Sice u i is orthogoal to A 2 θ, (35) ca be rewritte as satisfyig α i R, y i Y, g i l(α i θ 2 A, y i ) i,, (36) g i u i λa 2 θ A 2 θ α i g i (37) Sice Aθ, we have A 2 θ ad (37) is equivalet to λ α ig i. It follows that λ α i g i sup αg α R,y Y,g l(α θ 2 A,y) sup αg α R,y Y,g l(α θ 2 A,y) It idicates the lower boud for λ. sup α R,y Y,g l(α θ 2 A,y) αg Proof to Theorem 3

Proof. Let D {x i, y i },, be a teachig set for [w ; b ]. The followig KKT coditio eeds to be satisfied: [ ] [ ] l(y i (x i w b xi λaw ))y i. (38) If we costruct a ew traiig set } ˆD {ˆx i x i b w 2 Aw, ŷ i y i A,, the [w ; ] satisfies the KKT coditio defied o ˆD. This ca be verified as follows: ] [ ] [ˆxi l(ŷ i (ˆx i w λaw ))ŷ i [ ] l(y i (x i w b x i b Aw [ ] ))y w λaw 2 i A [ ] [ ] [ ] l(y i (x i w b xi λaw b Aw ))y i w 2 A l(y i (x i w b ))y i from (38) from (38) (39) where l(y i(x i w b ))y i is from the bias dimesio i (38). It follows that which is equivalet to l(ŷ iˆx i w )ŷ iˆx i λaw l(ŷ iˆx i w )A 2 ŷiˆx i λa 2 w :z i l(z i w )A 2 zi λa 2 w. (4) We decompose A 2 z i α i A 2 w u i where u i satisfies u i A 2 w. Applyig this decompositio to (4), we have λa 2 w l(α i w 2 A)(α i A 2 w u i ). (4) Sice u i is orthogoal to A 2 w, (4) implies that λa 2 w Sice w we have λ Together with we obtai LB3. l(α i w 2 A)α i A 2 w. l(α i w 2 A)α i. l(α i w 2 A)α i sup αg, α R,g l(α w 2 A )

Proof to Propositio Proof. We simply verify the KKT coditio to see that θ is a solutio to () by applyig the costructio i (). The uiqueess of θ is guarateed by the strog covexity of (). Proof to Propositio 2 Proof. We oly eed to verify that the KKT coditio holds for θ. Due to the strog covexity of (2) uiqueess is guarateed automatically. We deote the subgradiet a max( a, ) max( a, ) I(a), where The KKT coditio is, if a < I(a) [, ], if a. (42), otherwise y i x i max( y i x i θ, ) λθ y i x i I(y i x i θ ) λθ λθ ( λ θ λ θ 2 I 2 ) λ θ 2 λθ ( λ θ λθ 2 ) I λ θ 2 λθ ( ) where the last lie is due to I λ θ 2 λ θ 2 givig either the set [, ] or the value. Proof to Corollary 2 Proof. We show this umber matches LB2. Let A I, l(a, b) max( ab, ), ad cosider the deomiator of (7): sup αg sup αg α R,y Y,g l(α θ 2,y) α,y {,},g yi(yα θ 2 ) sup α,g I(α θ 2 ) αg θ 2 where the first equality is due to l(a, b) bi(ab). Therefore, LB2 λ θ 2 which matches the costructio i (3). Proof to Propositio 3 Proof. We first verify that θ is a solutio to (4) based o the teachig set costructio i (6). We oly eed to verify

the gradiet of (4) is zero. Computig the gradiet of (4), we have y i x i exp{y i x i θ } λθ ( exp {τ ( τ exp λ θ 2 λ θ 2, x i λ θ 2 λ θ 2 ) λ θ 2 λ θ 2 ( {τ λ θ 2 λ θ 2 )} λθ θ )} θ 2 λθ θ θ 2 λθ where the third equality uses the fact λ θ 2 λ θ 2 τmax ad the property a τ (a). The strog covexity e τ (a) of (4) automatically implies uiqueess. Proof to Corollary 3 Proof. We show that the umber matches LB2. I (7) let A I ad l(a, b) log( exp{ ab}). The deomiator of LB2 is: which implies LB2 Proof to Propositio 4 sup α R,y Y,g l(α θ 2,y) λ θ 2. αg sup αg α,y {,},gy(exp{yα θ 2 }) sup α,g(exp{α θ 2 }) αg sup α α exp{α θ 2 } t exp{t} θ 2 sup t θ 2, Proof. We first prove the case for w. We ca verify that the KKT coditio is satisfied by desigig x ad y as i (8): (x w b y )x λw x w b y. The uiqueess of [w ; b ] is idicated by the strog covexity of (7) whe. We the prove the case for w. With simple algebra, we ca verify the KKT coditio holds via the costructio i (9): (x w b y )x (x 2 w b y 2 )x 2 λw (x w b y ) (x 2 w b y 2 ). Similarly, the uiqueess is implied by the strog covexity of (7) whe 2.

Proof to Corollary 4 Proof. We match the lower boud LB i (6). Note θ [w ; b ] R d, ad A i this case is a (d ) (d ) matrix with the d d idetity matrix I d padded with oe additioal row ad colum of zeros for the offset. Therefore Rak(A) Rak(I d ) d. Whe w, Aθ ad LB (d ) Rak(A). Whe w, Aθ ad LB (d ) Rak(A) 2. These lower bouds match the teachig set sizes i (8) ad (9), respectively. Proof to Propositio 5 Proof. Ulike i previous learers (icludig homogeeous SVM), we o loger have strog covexity w.r.t. b. I order to prove that (2) is a teachig set, we eed to verify the KKT coditio ad verify solutio uiqueess. We first verify the KKT coditio to show that the solutio uder (2) icludes the target model [w ; b ]. From (2), we have x w b, x w b. (43) Applyig them to the KKT coditio ad usig the otatio i (42) we obtai [ ] 2 I(x w b x ) [ ] 2 I() x [ 2 I() x [ ] [ ] 2 I() x x λw [ I() w w ] [ ] λw 2 [ [ ] λw λw I(). ] [ ] 2 I( x w b x ) [ ] λw ] It proves that [w ; b ] solves (2) by our teachig set costructio. [ ] λw settig the last dimesio to applyig (2) observig λ w 2 Next we prove uiqueess by cotradictio. We use f(w, b) to deote the objective fuctio i (2) uder the teachig set. It is easy to verify that f(w, b ) λ 2 w 2. Assume that there exists aother solutio [ w; b] differet from [w ; b ]. We ca obtai w 2 w 2 due to λ 2 w 2 f(w, b ) f( w, b) λ 2 w 2. The secod equality is due to [ w; b] beig a solutio; the iequality is due to whole-part relatioship. Therefore, there are oly two possibilities for the orm of w: w w or w t w for some t <. Next we will show that both cases are impossible. (Case ) For the case w w, we have f( w, b) 2 max ( (x w b), ) 2 max ( (x w b), ) λ 2 w 2 2 max x (w w) (b b), 2 max x (w w) (b b), :Δ :Δ λ 2 w 2 2 max (Δ, ) 2 max (Δ, ) f(w, b ).

From f( w, b) f(w, b ), it follows Δ ad Δ. Sice Δ Δ (x x ) (w w) 2(w ) (w w) w 2 2 2 w w w 2, we have w w w 2. But because w w, we must have w w. Applyig this ew observatio to Δ ad Δ, we obtai b b. It meas that [w ; b ] [ w; b], cotradictig our assumptio [w ; b ] [ w; b]. (Case 2) Next we tur to the case w t w for some t [, ). Recall our assumptio that [ w; b] solves (2). The it follows that the followig specific costructio [ŵ, ˆb] solves (2) as well: ŵ tw, ˆb tb. (44) To see this, we cosider the followig optimizatio problem: mi w,b s.t. L(w, b) : 2 max( (x w b), ) 2 max( (x w b), ) w t w. (45) Sice [ w; b] solves (2), it is easy to see that [ w; b] solves (45) too, otherwise there exists a solutio for (45) which gives a lower fuctio value o (2). The we ca verify that [ŵ; ˆb] solves (45) as well by showig the followig optimality coditio holds: [ ] L(w,b) w L(w,b) b [ŵ;ˆb] N w t w (ŵ, ˆb) Normal coe to the set {[w; b] : w t w } at [ŵ; ˆb] Give a covex closed set Ω ad a poit θ Ω, the ormal coe at poit θ is defied to be a set N Ω (θ) {φ : φ, ψ θ ψ Ω}. (46) The optimality coditio basically suggests that at the optimal poit, the egative (sub)gradiet directio overlaps with the ormal coe. I other words, there is ot ay directio to decrease the objective at the optimal poit. Readers ca refer to Bertsekas & Nedic (23) for more details about the geometric optimality coditio. Because of (43) ad (44), we have x ŵ ˆb t <. Thus at [ŵ; ˆb] the subgradiet is [ ] L(w,b) w L(w,b) 2 b [ŵ;ˆb] [ ] [ x x w ] w 2 (47) Ad the ormal coe is { [ ] } N w t w (ŵ, ˆb) w s s. (48) The itersectio is o-empty by choosig s w. Sice both [ŵ; ˆb] ad [ w; b] solve (45), we have L(ŵ, ˆb) L( w, b). 2 Together with ŵ w, we have f(ŵ, ˆb) L(ŵ, ˆb) λ 2 ŵ 2 f( w, b) f(w, b ). Therefore, we proved that [ŵ; ˆb] solves (2) as well. To see the cotradictio, let us check the fuctio value of f(ŵ, ˆb)

via a differet route: f(ŵ, ˆb) f(tw, tb ) 2 max ( t(x w b ), ) 2 max ( t, ) 2 max ( t(x w b ), ) λ 2 w 2 t 2 2 max ( t, ) λ 2 w 2 t 2 ( t) λ 2 w 2 ( t 2 ) λ 2 w 2 ( t) 2 ( t2 ) λ 2 w 2 2 ( t)2 f(w, b ) >f(w, b ), where the first iequality uses the fact that λ w 2. It cotradicts our early assertio f(ŵ, ˆb) f(w, b ). Puttig cases ad 2 together we prove uiqueess. Proof to Corollary 5 Proof. The upper boud directly follows Propositio 5. We oly eed to show the lower boud LB3 λ w 2 i Theorem 3. Let A I, l(a) max( a, ), ad cosider the deomiator of (9): sup αg sup αg α R,g l(α w 2 ) α,g I(α w 2 ) w 2 where the first equality is due to l(a) I(a). Therefore, LB3 λ w 2 which proves the lower boud. Proof to Propositio 6 Proof. We first poit out that for t to be well-defied the argumet to τ () has to be bouded λ w 2. This implies λ w 2. The size of our proposed teachig set is the smallest amog all such symmetric costructio that satisfy this costrait. We verify that the KKT coditio to show the costructio i (23) icludes the solutio [w ; b ]. From (23), we have x w b t x w b t. We apply them ad the teachig set costructio to compute the gradiet of (22): [ ] x 2 exp{x [ x w b } 2 exp{ x w b } [ ] x [ ] [ ] x λw 2 exp{t} 2 exp{t} [ ] [ ] t w λw w 2 exp{t} λ w 2 [ ] [ ] w λw w 2. This verifies the KKT coditio. ] [ ] λw

Fially we show uiqueess. The Hessia matrix of the objective fuctio (22) uder our traiig set (23) is: [ ] [ ] exp{t} x x x x x x I 2 ( exp{t}) }{{ 2 x x λ 2. } :a :A :B [ ] x [x Note a > ad A ] [ ] x [x ] is positive semi-defiite. We show that aaλb is positive defiite. Suppose ot. The there exists [u; v] such that [u; v] (aaλb)[u; v]. This implies [u; v] (aa)[u; v]λu u. Sice the first term is o-egative due to A beig positive semi-defiite, u. But the we have 2av 2 which implies [u; v], a cotradictio. Therefore uiqueess is guarateed. Proof to Corollary 6 Proof. The upper boud directly follows Propositio 6. We oly eed to show the lower boud LB3 i Theorem 3. Let A I ad l(a) log( exp{ a}) ad cosider the deomiator of (9): which implies LB3 λ w 2. sup αg sup αg α R,g l( α w 2 ) α,g(exp{α w 2 }) α sup α exp{α w 2 } w 2 t sup t exp{t} w 2, λ w 2 by applyig