A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

Size: px

Start display at page:

Download "A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers"

Alaina McDonald
5 years ago
Views:

1 A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS Coferece

2 Loss fuctios ad regularizatio Model class: parameter space Ω R p, ad set of probability distributios {P θ θ Ω} Data: samples X 1 = (x i,y i ), i = 1,..., are draw from ukow P θ Estimatio: Miimize loss fuctio plus regularizatio term: θ }{{} Estimate arg mi θ R p { L (θ;x 1 ) }{{} Loss fuctio } + λ r(θ). }{{} Regularizer Aalysis: Boud error d( θ θ ) uder high-dimesioal scalig (,p) +.

3 Example: Sparse regressio y X θ w p = + S S c Set-up: oisy observatios y = Xθ +w with sparse θ Estimator: Lasso program 1 θ argmi θ p (y i x T i θ) 2 +λ θ j i=1 j=1 Some past work: Tibshirai, 1996; Che et al., 1998; Dooho/Xuo, 2001; Tropp, 2004; Fuchs, 2004; Meishause/Buhlma, 2005; Cades/Tao, 2005; Dooho, 2005; Haupt & Nowak, 2006; Zhao/Yu, 2006; Waiwright, 2006; Zou, 2006; Koltchiskii, 2007; Meishause/Yu, 2007; Tsybakov et al., 2008

4 Example: Structured iverse covariace matrices Zero patter of iverse covariace Set-up: Samples from radom vector with sparse iverse covariace Θ. Estimator: Θ argmi Θ 1 p x i x T i, Θ logdet(θ)+λ Θ j q i=1 j=1 Some past work: Yua & Li, 2006; d Asprémot et al., 2007; Bickel & Levia, 2007; El Karoui, 2007; Rothma et al., 2007; Zhou et al., 2007; Friedma et al., 2008; Ravikumar et al., 2008

5 Example: Low-rak matrix approximatio Θ U D V T = k m k r r r r m Set-up: Matrix Θ R k m with rak r mi{k,m}. Estimator: Θ argmi Θ 1 mi{k,m} (y i X i, Θ ) 2 +λ σ j (Θ) i=1 j=1 Some past work: Frieze et al., 1998; Achilioptas & McSherry, 2001; Srebro et al., 2004; Drieas et al., 2005; Rudelso & Vershyi, 2006; Recht et al., 2007; Bach, 2008; Meka et al., 2008; Cades & Tao, 2009; Keshava et al., 2009

6 Importat properties of regularizer/loss 1 Decomposability of regularizer vectors u A ad v B r(u+v) = r(u)+r(v) costrais error = θ θ to smaller set C C 2 Restricted strog covexity: loss fuctios ot strictly covex i high-dimesios require curvature oly for directios C loss fuctio L(θ) := L (θ;x 1 ) satisfies L (θ + ) L (θ ) L (θ ), γ(l) d 2 ( ) }{{}}{{}}{{} Excess loss score squared fuctio error for all C.

7 Mai theorem Quatities that cotrol rates: restricted strog covexity parameter: γ(l) dual orm of regularizer: r (v) := sup v, u. r(u)=1 optimal subspace cost.: Ψ(A) = mi { c R r(θ) cd(θ) for all θ A }. Theorem With regularizatio costat λ 2r ( L(θ ;X 1 )), the ay solutio θ satisfies d( θ θ ) 1 [ Ψ(B ] )λ. γ(l) Assumptios: θ belogs to a subspace A regularizer r decomposable over subspace pair (A, B) loss obeys restricted strog covexity with parameter γ(l) > 0

8 Applicatio: Liear regressio (hard sparsity) RSC reduces to lower boud o restricted eigevalues of X T X for a k-sparse vector, we have θ 1 k θ 2. Corollary Suppose that true parameter θ is exactly k-sparse. Uder RSC ad with λ 2 XT ε, the ay Lasso solutio satisfies θ θ 2 γ(l) 1 kλ. Some stochastic istaces: recover kow results Compressed sesig: X ij N(0,1) ad bouded oise ε 2 σ Determiistic desig: X with bouded colums ad ε i N(0,σ 2 ) XT ε 2σ2 logp w.h.p. = θ θ 2 8σ k logp. γ(l) (e.g., Cades & Tao, 2007; Meishause/Yu, 2007; Bickel et al., 2008)

9 Applicatio: Liear regressio (weak sparsity) for some q [0,1], say θ belogs to l q - ball B q (R q ) := { θ R p p θ j q } R q. j=1 Corollary Uder RSC, the ay Lasso solutio satisfies (w.h.p.) θ θ 2 2 [ O σ 2 R q ( logp ) 1 q/2 ]. ew result; rate kow to be miimax optimal (Raskutti et al., 2009)

10 Multivariate regressio with block regularizers Y m X Θ = p + l 1/l q-regularized group Lasso: with λ 2 XT W, q where 1/q +1/ q = 1 Corollary S S c { 1 Θ arg mi Θ R p p 2 Y } XΘ 2 F +λ Θ 1,q. Say Θ is supported o S = s rows, X satisfies RSC ad W ij N(0,σ 2 ). The we have Θ Θ F 2 γ(l) Ψ q(s)λ where Ψ q (S) = m p W m { m 1/q 1/2 s if q [1,2). s if q 2.

11 Multivariate regressio with block regularizers Y m X Θ = p + Effect of varyig q [1, ]: for q = 1, problem reduces ordiary Lasso with pm parameters ad sparsity sm: Θ Θ F S S c m p W m ( smlog(pm) ) O for q = 2, rate decouples ito term terms: ( Θ Θ slogp F O }{{} Search term (fid s rows) + sm ) }{{} Estimate sm parameters similar rates for q = 2: Louici et al. (2009) ad Huag ad Zhag (2009)

12 Applicatio: Low-rak matrices ad uclear orm low-rak matrix Θ R k m with rak r mi{k,m} oisy/partial observatios of the form y i = X i, Θ +ε i, i = 1,...,, ε i N(0,σ 2 ). Corollary With regularizatio parameter λ 16σ ( k + m ), we have w.h.p. Θ Θ F 32σ [ r k r m γ(l) + ]. for a rak r matrix M, we have M 1 r M F solve uclear orm regularized program with λ 2 i=1 Xiεi 2

13 Summary uified approach to covergece rates for high-dimesioal estimators decomposability of regularizer r restricted strog covexity of loss fuctios actual rates determied by: oise measured i dual fuctio r subspace costat Ψ i movig from r to error orm d restricted strog covexity costat recovered some kow results as corollaries: Lasso with exact sparsity multivariate group Lasso iverse covariace matrix estimatio derived ew results o: low-rak matrix estimatio approximately sparse models other models?

A primer on high-dimensional statistics: Lecture 2

A primer on high-dimensional statistics: Lecture 2 A primer o high-dimesioal statistics: Lecture 2 Marti Waiwright UC Berkeley Departmets of Statistics, ad EECS Simos Istitute Workshop, Bootcamp Tutorials High-level overview Regularized M-estimators: May