Suppleetary Material Wezhuo Ya a0096049@us.edu.s Departet of Mechaical Eieeri, Natioal Uiversity of Siapore, Siapore 117576 Hua Xu pexuh@us.edu.s Departet of Mechaical Eieeri, Natioal Uiversity of Siapore, Siapore 117576 1. Notatio Table [] The set {1,, } A subset of [] c The copleet of, c = [] \ I The idetity atrix X The saple atrix X R X i The ith colu of X β Vector β R β i The ith eleet of β β The vector whose ith eleet is β i if i or 0 otherwise (i) The ith disturbace atrix (i) j The jth colu of (i) The atrix whose ith colu is i if i or 0 otherwise W i Matrix W i R vec( ) The operator vectorizi a atrix by stacki its colus X p The l p -or of vec(x), vec(x) p 2. Proofs i Sectio 2 To prove the corollaries i Sectio 2, we ive the followi lea. Lea 1. If ay two differet roups p ad q i G i i the ucertaity set U (4) are o-overlappi for i = 1,, t, which eas p q =, the the optiizatio proble (5) is equivalet to i β R { y Xβ p + Proof. Sice ay two differet roups p ad q i G i are o-overlappi, we have G i, α (i) Hece the lea holds. ax α (i) Wi β = p c By usi Theore 3 ad Lea 1, we have G i α (i) ax G i c (W i β) p} (1) p c α (i) (W i β) = G i c (W i β) p (2)
Subissio ad Foratti Istructios for ICML 2013 1. Proof of Corollary 1: G 1 = {[]} satisfies the coditio of Lea 1, so we have c (W i β) p = c β 2 = c β 2. (3) G i G 1 2. Proof of Corollary 2: G 1 = {{1},, {}} satisfies the coditio of Lea 1, the c (W i β) p = c β p = c i β i. (4) G i G 1 3. Proof of Corollary 3: G 1 = { 1,, k } satisfies the coditio of Lea 1, so we have k c (W i β) p = c i β i p. (5) G i 4. Proof of Theore 2: G i = { i, c i } satisfies the coditio of Lea 1 ad c c i = 0, so that k c (W i β) p = (c i β i p + c c i β c i p) = G i k c i β i p. (6) 5. Proof of Corollary 4: The dual proble of the optiizatio proble ca be forulated as i vi =β, supp(v i ) i k c i v i p k ax i { c i v i p α α i,supp(v i ) i = ax α {α β + i i,supp(v i ) i { = ax α {α β ax i,supp(v i ) i { = ax i, α i c i α β k v i + α β} k c i v i p α i v i }} k α i v i c i v i p}} (7) Sice the costraits i the prial proble satisfy Slater s coditio, the stro duality holds. Fro the duality ad the coditio i Corollary 4, we have i β R { y Xβ p + G i, α (i) ax α (i) Wi β} p c = i β R { y Xβ p + ax α β} G 1, α p c = i β R { y Xβ p + i vi =β, supp(v i ) i k c i v i p}. (8)
Subissio ad Foratti Istructios for ICML 2013 6. Proof of Corollary 5: Fro Theore 2 ad Lea 1, we have c (W i β) p G i = c β p + c (W 2 β) p G 1 G 2 1 = c i β i + c i β i β i+1. (9) 7. Proof of Corollary 6: By usi the proofs of Corollary 1 ad Corollary 3, we ca obtai Corollary 6. 8. Proof of Corollary 7: G 1 = {{1},, {}} satisfies the coditio of Lea 1. Sice t = 1, c {i} = λ ad W 1 = D, we have 3. Proofs i Sectio 3 3.1. Proof of Theore 4: Fro the defiitio of Û, we have G i c (W i β) p = G 1 λ (Dβ) p = λ (Dβ) i = λ Dβ 1. (10) ax y (X + )β p Û = ax c Z ax i, G i, (i) = y Xβ p + ax c Z = y Xβ p + = y Xβ p + = y Xβ p + y (X + )β p p c ax α (i) Wi β p c ax G i, α (i) c c 0;f i(c) 0 G i, α (i) i ax { λ R q +,κ Rk c R + k i λ R q +,κ Rk + υ(λ, κ, β) ax α (i) Wi β p c G i, α (i) ax α (i) Wi β + κ c p c q λ i f i (c)} (11) Hece we establish the theore by taki iiu over β o both sides. Now we show the optiizatio proble is covex ad tractable. we first prove that υ(λ, κ, β) is a covex fuctio of λ, κ, β. Sice υ(λ, κ, β) = ax { c R k, i, G i, α (i) p c α (i) W i β + κ c q λ i f i (c)} = ax c R k, i, G i, α (i) p c µ(λ, κ, β). (12) For fixed c ad α (i), µ(λ, κ, β) is a liear fuctio of λ, κ, β. Thus υ(λ, κ, β) is covex, which iplies the optiizatio proble is covex. By choosi paraeter γ, the optiizatio proble ca be reforulated as i s.t. y Xβ p υ(λ, κ, β) γ λ R p +, κ R k +, β R To show the proble is tractable, it suffices to costruct a polyoial-tie separatio oracle for the feasible set S (Grötschel et al. (Grötschel et al., 1988)). A separatio oracle is a routie such that for a solutio (λ 0, κ 0, β 0 ),
Subissio ad Foratti Istructios for ICML 2013 it ca fid, i polyoial tie, that (a) whether (λ 0, κ 0, β 0 ) belos to S or ot; ad (b) if (λ 0, κ 0, β 0 ) S, a hyperplae that separates (λ 0, κ 0, β 0 ) with S. To verify the feasibility of (λ 0, κ 0, β 0 ), otice that (λ 0, κ 0, β 0 ) S if ad oly if the optial value of the optiizatio proble (12) is saller tha or equal to γ, which ca be verified i polyoial tie. If (λ 0, κ 0, β 0 ) S, the by solvi (12), we ca fid i polyoial tie c 0, α (i) 0 such that α (i) 0 Wi β + κ c 0 which is the hyperplae separates (λ 0, κ 0, β 0 ) with S. 3.2. Extesio of Corollary 8: q λ i f i (c 0 ) > γ. Theore 1. Let 1,, t be t roups such that t i = [], ad i be a atrix whose colus except the ith oe are all zero. Suppose that c i is a i diesio vector whose eleets ive the or boud of j for j i, e.. j 2 c j i, ad c = (c 1,, c t ). We defie the ucertaity set as Û = { t j j i c such that c 0 ad c i q s i, i [t]; j 2 c j i, i [t], j i }, the the equivalet liear reularized reressio proble is where q is the dual or of q. i { y Xβ p + β R Proof. Fro Theore 3 ad Theore 4, we have i υ(λ, κ, β) λ R +,κ R + = i λ R +,κ R + ax c R s i β i q }, { j=1 λ i ( c i q + s i )}. i j (κ i + β i )c i Defie r i as the vector whose eleets are κ j + β j for j i, the the equatio above is equivalet to i λ R +,κ R + r i q λ i, ii[t] λ s = s i β i q, which establishes the theore. 4. Proofs i Sectio 5 Recall that the ucertaity set cosidered i this paper is U = { (1) W 1 + + (t) W t i, G i, (i) 2 c } (13) where G i is the set of the roups of (i) ad c ives the boud of (i) for roup. We deote Ḡi ad Ḡc i as the set { G i c 0} ad G i Ḡi, respectively. I this theore, we restrict our discussio to the case that W i = I for i = 1,, t ad the boud c of (i) for each roup equals c or 0, so the ucertaity set ca be rewritte as U = { (1) + + (t) i, Ḡi, (i) 2 c } (14)
Subissio ad Foratti Istructios for ICML 2013 Note that the costrait 2 c ca be reforulated as the uio of several eleet-wise costraits. Deote D = {D i j D2 ij = c2, D ij 0} (we call a eleet D D decopositio), the we have { 2 c} = { i, j, ij D ij }. D D Siilarly, the ucertaity set { 2 c} is equivalet to { i, j, ij D ij }, D D where D = {D i j D2 ij = c2, D ij 0}. After the costraits of the ucertaity sets are decoposed ito eleet-wise costraits, the set {X + (1) + + (t) } ca also be represeted by a eleet-wise way. The otatio is a little coplicated so we first cosider three siple cases: Oe ucertaity set such that 2 c: for fixed D D, we have {X ij + ij } = [X ij D ij, X ij + D ij ]. Two ucertaity sets (1) ad (2) such that (1) 2 c ad (2) 2 c: for fixed D (1) D ad D (2) D, we have {X ij + (1) ij + (2) ij } = [X ij D (1) ij D (2) ij, X ij + D (1) ij + D (2) ij ]. Oe ucertaity set ad two overlappi roups p ad q such that p 2 c ad q 2 c: for fixed P D p ad Q D q, we have [X ij P ij, X ij + P ij ] j p, j q {X ij + ij } = [X ij Q ij, X ij + Q ij ] j p, j q [X ij i{p ij, Q ij }, X ij + i{p ij, Q ij }] j p, j q Thus, if the decopositio D D for each (i) is fixed, we have {X ij + (1) ij + + (t) ij } = [X ij γ ij, X ij +γ ij ] where γ ij is deteried by the decopositio Ds. Sice the uber of the eleets of (i) is less tha or equal to ( is the feature diesio ad is the uber of saples), there exists a decopositio D for each such that [X ij c, X ij + c ] [X ij γ ij, X ij + γ ij ]. We ow prove the theore. (i) Propositio 1. (Xu et al., 2010) Give a fuctio h : R +1 R ad Borel sets Z 1,, Z R +1, let P = {µ P S {1,, } : µ( i S Z i ) S /}. The followi holds 1 sup h(b i, r i ) = sup h(b i, r i )dµ(b i, r i ). (b i,r i ) Z i µ P R +1 Step 1: Usi the otatio above, we first ive the followi corollary: Corollary 1. Give y R, X R, the followi equatio holds for ay β R, y Xβ 2 + c + ax α (i) β = sup (b Ḡi, α (i) 2 r β) 2 dµ(b, r ) (15) c R +1 Here, ˆP () = S={D (i) } D (i) D, i, Ḡi µ ˆP () P (X, S, y, c ) = {µ P Z i = [y i c, y i + c ] where γ ij depeds o the decopositio set S. P (X, S, y, c ) S {1,, } : µ( i S Z i ) S /}, [X ij γ ij, X ij + γ ij ]; j=1
Subissio ad Foratti Istructios for ICML 2013 Proof. The riht had side of Equatio (15) is equal to sup S={D (i) } i, Ḡi,D (i) µ P (X,S,y,c ) D { sup Fro Theore 2, we kow that the left had side is equal to sup i, G i, δ y 2 c, (i) 2 c y + δ y (X + )β 2 = sup { sup i, G i,d (i) D = sup i, G i,d (i) D δ y 2 2 c2, (i) D (i) Furtherore, applyi Propositio 1 yields which proves the corollary. (b r β) 2 dµ(b, r )}. R +1 y + δ y (X + )β 2 } sup (b i,r i ) [y i c /,y i +c / ] j=1 [X ij γ ij,x ij +γ ij ] sup (b i,r i ) [y i c /,y i +c / ] j=1 [X ij γ ij,x ij +γ ij ] = sup (b r β) 2 dµ(b, r ) µ P (X,S,y,c ) R +1 = sup (b r β) 2 dµ(b, r ) µ P (X,S,y,c ) R +1 (b i r i β) (b i r i β). Step 2: As (Xu et al., 2010), we cosider the followi kerel estiator ive saples (b i, r i ), h (b, r) = (c +1 ) 1 K( b b i, r r i ) c where K(x) = I [ 1,1] +1(x)/2 +1, ad c = c. (16) Observe that the estiated distributio above belos to the set of distributios P (X, S, y, c ) = {µ P Z i = [y i c, y i + c ] S {1,, } : µ( i S Z i ) S /} [X ij γ ij, X ij + γ ij ]; j=1 ad hece belos to ˆP () = S={D (i) } D (i) D P (X, S, y, c )., i, Ḡi Step 3: Cobii the last two steps, ad usi the fact that h (b, r) h(b, r) d(b, r) oes to zero alost surely whe c 0 ad c +1 or equivaletly c 0 ad c +1. Now we prove cosistecy of robust reressio. Proof. Let f( ) be the true probability desity fuctio of the saples, ad ˆµ be the estiated distributio usi Equatio (16) ive S ad c, ad deote its desity fuctio as f ( ). The coditio that β(c, S ) 2 H alost surely ad P has a bouded support iplies that there exists a uiversal costat C such that ax (b r β(c, S )) 2 C
alost surely. By Corollary 1 ad ˆµ ˆP (), we have Subissio ad Foratti Istructios for ICML 2013 (b r β(c, S )) 2 dˆµ (b, r) sup (b r β(c, S )) 2 dµ (b, r) µ ˆP () = (b i r i β(c, S )) 2 + ax α (i) 1 β + c Ḡi, α(i) 2 c (b i r i β(p ))2 + ax α (i) 1 β + c Ḡi, α(i) 2 c Notice that, t ax Ḡi, α(i) 2 c α (i) β + 1 c coveres to 0 as c 0 alost surely, so the riht-had side coveres to (b r β(p )) 2 dp (b, r) as ad c 0 alost surely. Furtherore, we have (b r β(c, S )) 2 dp (b, r) (b r β(c, S )) 2 dˆµ (b, r) + ax (b r β(c, S )) 2 f (b, r) f(b, r) d(b, r) (b r β(c, S )) 2 dˆµ (b, r) + C f (b, r) f(b, r) d(b, r), where the last iequality follows fro the defiitio of C. Notice that f (b, r) f(b, r) d(b, r) oes to zero alost surely whe c 0 ad c +1. Hece the theore follows. As etioed i the paper, the assuptio that β(c, S ) 2 H i Theore 7 ca be reoved, the we have Theore 2. Let {c } covere to zero sufficietly slowly. The alost surely. li (b i r i β(c, S )) 2 dp (b, r) = (b i r i β(p ))2 dp (b, r) We ow prove this heore. We establish the followi lea first. Lea 2. Partitio the support of P as V 1,, V T such that the l radius of each set is less tha c. If a distributio µ satisfies µ(v t ) = #((b i, r i ) V t )/; t = 1,, T, (17) the µ ˆP (). Proof. Let Z i = [y i c, y i + c ] j=1 [X ij c, X ij + c ], recall that X ij is the jth eleet of r i. Notice that the l radius of V t is less tha c, we have (b i, r i ) V t V t Z i.
Subissio ad Foratti Istructios for ICML 2013 Therefore, for ay S {1,, }, the followi holds µ( i S Z i ) µ( V t i S : (b i, r i ) V t ) = µ(v t ) = #((b i, r i ) V t )/ S /. t i S:(b i,r i ) V t t i S:(b i,r i ) V t Hece µ P (X, S, y, c ) which iplies µ ˆP (). Partitio the support of P ito T subsets such that the l radius of each set is less tha c. Deote P () as the set of probability easures satisfyi Equatio (17). Hece P () ˆP () by Lea 1. Further otice that there exists a uiversal costat K such that β(c, S ) 2 K/c due to the fact that the square loss of the solutio β = 0 is bouded by a costat oly depeds o the support of P. Thus, there exists a costat C such that ax (b r β(c, S )) 2 C/c 2. Follow a siilar aruet as the proof of Theore 6, we have sup (b r β(c, S )) 2 dµ (b, r) µ P () (18) (b i r i β(p ))2 + ax α (i) 1 β + c Ḡi, α(i) 2 c ad (b r β(c, S )) 2 dp (b, r) if { (b r β(c, S )) 2 dµ (b, r) + ax µ P () r β(c, S )) 2 f µ (b, r) f(b, r) d(b, r)} sup (b r β(c, S )) 2 dµ (b, r) + 2C/c 2 if f µ (b, r) f(b, r) d(b, r), µ P () µ P () here f µ stads for the desity fuctio of a easure µ. Notice that P () is the set of distributios satisfyi Equatio (17), hece if µ P () f µ (b, r) f(b, r) d(b, r) is upper-bouded by T t=1 P (V t) #((b i, r i ) V t ) /, which oes to zero as icreases for ay fixed c. Therefore, 2C/c 2 if f µ (b, r) f(b, r) d(b, r) 0, µ P () if c 0 sufficietly slow. Cobii this with Iequality (18) proves the theore. Refereces Grötschel, Marti, Lovász, Lászlo, ad Schrijver, Alexader. Geoetric Aloriths ad Cobiatorial Optiizatio, volue 2. Sprier, 1988. Xu, H., Caraais, C., ad Maor, S. Robust reressio ad lasso. IEEE Trasactios o Iforatio Theory, 56(7):3561 3574, 2010.