A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

Size: px
Start display at page:

Download "A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers"

Transcription

1 A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS Coferece

2 Loss fuctios ad regularizatio Model class: parameter space Ω R p, ad set of probability distributios {P θ θ Ω} Data: samples X 1 = (x i,y i ), i = 1,..., are draw from ukow P θ Estimatio: Miimize loss fuctio plus regularizatio term: θ }{{} Estimate arg mi θ R p { L (θ;x 1 ) }{{} Loss fuctio } + λ r(θ). }{{} Regularizer Aalysis: Boud error d( θ θ ) uder high-dimesioal scalig (,p) +.

3 Example: Sparse regressio y X θ w p = + S S c Set-up: oisy observatios y = Xθ +w with sparse θ Estimator: Lasso program 1 θ argmi θ p (y i x T i θ) 2 +λ θ j i=1 j=1 Some past work: Tibshirai, 1996; Che et al., 1998; Dooho/Xuo, 2001; Tropp, 2004; Fuchs, 2004; Meishause/Buhlma, 2005; Cades/Tao, 2005; Dooho, 2005; Haupt & Nowak, 2006; Zhao/Yu, 2006; Waiwright, 2006; Zou, 2006; Koltchiskii, 2007; Meishause/Yu, 2007; Tsybakov et al., 2008

4 Example: Structured iverse covariace matrices Zero patter of iverse covariace Set-up: Samples from radom vector with sparse iverse covariace Θ. Estimator: Θ argmi Θ 1 p x i x T i, Θ logdet(θ)+λ Θ j q i=1 j=1 Some past work: Yua & Li, 2006; d Asprémot et al., 2007; Bickel & Levia, 2007; El Karoui, 2007; Rothma et al., 2007; Zhou et al., 2007; Friedma et al., 2008; Ravikumar et al., 2008

5 Example: Low-rak matrix approximatio Θ U D V T = k m k r r r r m Set-up: Matrix Θ R k m with rak r mi{k,m}. Estimator: Θ argmi Θ 1 mi{k,m} (y i X i, Θ ) 2 +λ σ j (Θ) i=1 j=1 Some past work: Frieze et al., 1998; Achilioptas & McSherry, 2001; Srebro et al., 2004; Drieas et al., 2005; Rudelso & Vershyi, 2006; Recht et al., 2007; Bach, 2008; Meka et al., 2008; Cades & Tao, 2009; Keshava et al., 2009

6 Importat properties of regularizer/loss 1 Decomposability of regularizer vectors u A ad v B r(u+v) = r(u)+r(v) costrais error = θ θ to smaller set C C 2 Restricted strog covexity: loss fuctios ot strictly covex i high-dimesios require curvature oly for directios C loss fuctio L(θ) := L (θ;x 1 ) satisfies L (θ + ) L (θ ) L (θ ), γ(l) d 2 ( ) }{{}}{{}}{{} Excess loss score squared fuctio error for all C.

7 Mai theorem Quatities that cotrol rates: restricted strog covexity parameter: γ(l) dual orm of regularizer: r (v) := sup v, u. r(u)=1 optimal subspace cost.: Ψ(A) = mi { c R r(θ) cd(θ) for all θ A }. Theorem With regularizatio costat λ 2r ( L(θ ;X 1 )), the ay solutio θ satisfies d( θ θ ) 1 [ Ψ(B ] )λ. γ(l) Assumptios: θ belogs to a subspace A regularizer r decomposable over subspace pair (A, B) loss obeys restricted strog covexity with parameter γ(l) > 0

8 Applicatio: Liear regressio (hard sparsity) RSC reduces to lower boud o restricted eigevalues of X T X for a k-sparse vector, we have θ 1 k θ 2. Corollary Suppose that true parameter θ is exactly k-sparse. Uder RSC ad with λ 2 XT ε, the ay Lasso solutio satisfies θ θ 2 γ(l) 1 kλ. Some stochastic istaces: recover kow results Compressed sesig: X ij N(0,1) ad bouded oise ε 2 σ Determiistic desig: X with bouded colums ad ε i N(0,σ 2 ) XT ε 2σ2 logp w.h.p. = θ θ 2 8σ k logp. γ(l) (e.g., Cades & Tao, 2007; Meishause/Yu, 2007; Bickel et al., 2008)

9 Applicatio: Liear regressio (weak sparsity) for some q [0,1], say θ belogs to l q - ball B q (R q ) := { θ R p p θ j q } R q. j=1 Corollary Uder RSC, the ay Lasso solutio satisfies (w.h.p.) θ θ 2 2 [ O σ 2 R q ( logp ) 1 q/2 ]. ew result; rate kow to be miimax optimal (Raskutti et al., 2009)

10 Multivariate regressio with block regularizers Y m X Θ = p + l 1/l q-regularized group Lasso: with λ 2 XT W, q where 1/q +1/ q = 1 Corollary S S c { 1 Θ arg mi Θ R p p 2 Y } XΘ 2 F +λ Θ 1,q. Say Θ is supported o S = s rows, X satisfies RSC ad W ij N(0,σ 2 ). The we have Θ Θ F 2 γ(l) Ψ q(s)λ where Ψ q (S) = m p W m { m 1/q 1/2 s if q [1,2). s if q 2.

11 Multivariate regressio with block regularizers Y m X Θ = p + Effect of varyig q [1, ]: for q = 1, problem reduces ordiary Lasso with pm parameters ad sparsity sm: Θ Θ F S S c m p W m ( smlog(pm) ) O for q = 2, rate decouples ito term terms: ( Θ Θ slogp F O }{{} Search term (fid s rows) + sm ) }{{} Estimate sm parameters similar rates for q = 2: Louici et al. (2009) ad Huag ad Zhag (2009)

12 Applicatio: Low-rak matrices ad uclear orm low-rak matrix Θ R k m with rak r mi{k,m} oisy/partial observatios of the form y i = X i, Θ +ε i, i = 1,...,, ε i N(0,σ 2 ). Corollary With regularizatio parameter λ 16σ ( k + m ), we have w.h.p. Θ Θ F 32σ [ r k r m γ(l) + ]. for a rak r matrix M, we have M 1 r M F solve uclear orm regularized program with λ 2 i=1 Xiεi 2

13 Summary uified approach to covergece rates for high-dimesioal estimators decomposability of regularizer r restricted strog covexity of loss fuctios actual rates determied by: oise measured i dual fuctio r subspace costat Ψ i movig from r to error orm d restricted strog covexity costat recovered some kow results as corollaries: Lasso with exact sparsity multivariate group Lasso iverse covariace matrix estimatio derived ew results o: low-rak matrix estimatio approximately sparse models other models?

A primer on high-dimensional statistics: Lecture 2

A primer on high-dimensional statistics: Lecture 2 A primer o high-dimesioal statistics: Lecture 2 Marti Waiwright UC Berkeley Departmets of Statistics, ad EECS Simos Istitute Workshop, Bootcamp Tutorials High-level overview Regularized M-estimators: May

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba Departmet of EECS UC Berkeley sahad @eecs.berkeley.edu Marti J. Waiwright Departmet of Statistics

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

General principles for high-dimensional estimation: Statistics and computation

General principles for high-dimensional estimation: Statistics and computation General principles for high-dimensional estimation: Statistics and computation Martin Wainwright Statistics, and EECS UC Berkeley Joint work with: Garvesh Raskutti, Sahand Negahban Pradeep Ravikumar, Bin

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba 1 Pradeep Ravikumar 2 Marti J. Waiwright 1,3 Bi Yu 1,3 Departmet of EECS 1 Departmet of CS 2 Departmet

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Sparsity oracle inequalities

Sparsity oracle inequalities (SOI) Laboratoire de Statistique, CREST ad Laboratoire de Probabilités et Modèles Aléatoires, Uiversité Paris 6 Cambridge, Jue 24, 2008 (SOI) Model, dictioary, liear approximatio Sparsity ad dimesio reductio

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

High Dimensional Structured Superposition Models

High Dimensional Structured Superposition Models High Dimesioal Structured Superpositio Models Qilog Gu Dept of Computer Sciece & Egieerig Uiversity of Miesota, Twi Cities guxxx396@cs.um.edu Aridam Baerjee Dept of Computer Sciece & Egieerig Uiversity

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Estimation Error of the Constrained Lasso

Estimation Error of the Constrained Lasso Estimatio Error of the Costraied Lasso Nissim Zerbib 1, Ye-Hua Li, Ya-Pig Hsieh, ad Volka Cevher Abstract This paper presets a o-asymptotic upper boud for the estimatio error of the costraied lasso, uder

More information

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 8: October 20, Applications of SVD: least squares approximation Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity High-dimesioal regressio with oisy ad missig data: Provable guaratees with o-covexity Po-Lig Loh Departmet of Statistics Uiversity of Califoria, Berkeley Berkeley, CA 94720 ploh@berkeley.edu Marti J. Waiwright

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Restricted Eigenvalue Properties for Correlated Gaussian Designs

Restricted Eigenvalue Properties for Correlated Gaussian Designs Joural of Machie Learig Research 11 (2010) 2241-2259 Submitted 3/10; Revised 8/10; Published 8/10 Restricted Eigevalue Properties for Correlated Gaussia Desigs Garvesh Raskutti Marti J. Waiwright Bi Yu

More information

Information-theoretic bounds on model selection for Gaussian Markov random fields

Information-theoretic bounds on model selection for Gaussian Markov random fields Iformatio-theoretic bouds o model selectio for Gaussia Markov radom fields Wei Wag, Marti J. Waiwright,, ad Kaa Ramchadra Departmet of Electrical Egieerig ad Computer Scieces, ad Departmet of Statistics

More information

Robust Lasso with missing and grossly corrupted observations

Robust Lasso with missing and grossly corrupted observations Robust Lasso with missig ad grossly corrupted observatios Nam H. Nguye Johs Hopkis Uiversity am@jhu.edu Nasser M. Nasrabadi U.S. Army Research Lab asser.m.asrabadi.civ@mail.mil Trac D. Tra Johs Hopkis

More information

Noisy low-rank matrix completion with general sampling distribution

Noisy low-rank matrix completion with general sampling distribution Beroulli 20), 204, 282 303 DOI: 0.350/2-BEJ486 Noisy low-rak matrix completio with geeral samplig distributio OLGA KLOPP MODAL X, Uiversity Paris Ouest Naterre ad CREST, 200 aveue de la République, 9200

More information

A Dirty Model for Multi-task Learning

A Dirty Model for Multi-task Learning A Dirty Model for Multi-task Learig Ali Jalali Uiversity of Texas at Austi ali@mail.utexas.edu Suay Saghavi Uiversity of Texas at Austi saghavi@mail.utexas.edu Pradeep Ravikumar Uiversity of Texas at Asuti

More information

On Robust Estimation of High Dimensional Generalized Linear Models

On Robust Estimation of High Dimensional Generalized Linear Models O Robust Estimatio of High Dimesioal Geeralized Liear Models Euho Yag Departmet of Computer Sciece Uiversity of Texas Austi euho@csutexasedu Ambuj Tewari Departmet of Statistics Uiversity of Michiga A

More information

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness Lower bouds o miimax rates for oparametric regressio with additive sparsity ad smoothess Garvesh Raskutti 1, Marti J. Waiwright 1,2, Bi Yu 1,2 1 UC Berkeley Departmet of Statistics 2 UC Berkeley Departmet

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY Submitted to the Aals of Statistics FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY By Alekh Agarwal ad Sahad Negahba ad Marti J. Waiwright UC Berkeley, Departmet

More information

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY The Aals of Statistics 2012, Vol. 40, No. 5, 2452 2482 DOI: 10.1214/12-AOS1032 Istitute of Mathematical Statistics, 2012 FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

arxiv: v3 [math.st] 16 Jun 2015

arxiv: v3 [math.st] 16 Jun 2015 Geometric Iferece for Geeral High-Dimesioal Liear Iverse Problems T. Toy Cai, Tegyua Liag ad Alexader Rakhli arxiv:1404.4408v3 [math.st] 16 Ju 2015 Departmet of Statistics The Wharto School Uiversity of

More information

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed

Week 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

A First Order Free Lunch for SQRT-Lasso

A First Order Free Lunch for SQRT-Lasso A First Order Free Luch for SQRT-Lasso Xigguo Li, Jarvis Haupt, Rama Arora, Ha Liu, Migyi Hog, ad Tuo Zhao arxiv:605.07950v [cs.lg] 5 May 06 Abstract May statistical machie learig techiques sacrifice coveiet

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Principal Component Analysis with Structured Factors

Principal Component Analysis with Structured Factors Pricipal Compoet Aalysis with Structured Factors Yash Deshpade ad Adrea Motaari Staford Uiversity May 28, 2014 Adrea Motaari (Staford) Sparse PCA May 28, 2014 1 / 51 Pricipal Compoet Aalysis Data matrix

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Matrix Representation of Data in Experiment

Matrix Representation of Data in Experiment Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Chapter 1 Simple Linear Regression (part 6: matrix version)

Chapter 1 Simple Linear Regression (part 6: matrix version) Chapter Simple Liear Regressio (part 6: matrix versio) Overview Simple liear regressio model: respose variable Y, a sigle idepedet variable X Y β 0 + β X + ε Multiple liear regressio model: respose Y,

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Applied Statistics and Machine Learning

Applied Statistics and Machine Learning Applied Statistics and Machine Learning Theory: Stability, CLT, and Sparse Modeling Bin Yu, IMA, June 26, 2013 Today s plan 1. Reproducibility and statistical stability 2. In the classical world: CLT stability

More information

MA Advanced Econometrics: Properties of Least Squares Estimators

MA Advanced Econometrics: Properties of Least Squares Estimators MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

The L[subscript 1] penalized LAD estimator for high dimensional linear regression

The L[subscript 1] penalized LAD estimator for high dimensional linear regression The L[subscript 1] pealized LAD estimator for high dimesioal liear regressio The MIT Faculty has made this article opely available. Please share how this access beefits you. Your story matters. Citatio

More information

arxiv: v3 [math.st] 23 Aug 2012

arxiv: v3 [math.st] 23 Aug 2012 High-dimesioal regressio with oisy ad missig data: Provable guaratees with o-covexity Po-Lig Loh 1 Marti J. Waiwright 1, ploh@berkeley.edu waiwrig@stat.berkeley.edu Departmet of Statistics 1 Departmet

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Electroic Joural of Statistics Vol. (207) 752 799 ISSN: 935-7524 DOI: 0.24/7-EJS233 Optimal predictio for sparse liear models? Lower bouds for coordiate-separable M-estimators Yuche Zhag Computer Sciece

More information

Minimax rates of convergence for high-dimensional regression under l q -ball sparsity

Minimax rates of convergence for high-dimensional regression under l q -ball sparsity Forty-Seveth Aual Allerto Coferece Allerto House, UIUC, Illiois, USA September 30 - October, 009 Miimax rates of covergece for high-dimesioal regressio uder l q -ball sparsity Garvesh Raskutti Marti J

More information

Least Squares Parameter Estimation for Sparse Functional Varying Coefficient Model

Least Squares Parameter Estimation for Sparse Functional Varying Coefficient Model Joural of Statistical Theory ad Applicatios, Vol. 16, No. 3 (September 017) 337 344 Least Squares Parameter Estimatio for Sparse Fuctioal Varyig Coefficiet Model Behdad Mostafaiy Departmet of Statistics,

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Graphical Nonconvex Optimization via an Adaptive Convex Relaxation

Graphical Nonconvex Optimization via an Adaptive Convex Relaxation Graphical Nocovex Optimizatio via a Adaptive Covex Relaxatio Qiag Su 1 Kea Mig Ta Ha Liu 3 Tog Zhag 3 Abstract We cosider the problem of learig highdimesioal Gaussia graphical models. The graphical lasso

More information

Signal Processing in Mechatronics

Signal Processing in Mechatronics Sigal Processig i Mechatroics Zhu K.P. AIS, UM. Lecture, Brief itroductio to Sigals ad Systems, Review of Liear Algebra ad Sigal Processig Related Mathematics . Brief Itroductio to Sigals What is sigal

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Sparse Estimation with Strongly Correlated Variables using Ordered Weighted l 1 Regularization

Sparse Estimation with Strongly Correlated Variables using Ordered Weighted l 1 Regularization Sparse Estimatio with Strogly Correlated Variables usig Ordered Weighted l Regularizatio Mário A. T. Figueiredo Istituto de Telecomuicaçõe ad Istituto Superior Técico, Uiversidade de Lisboa, Portugal Robert

More information

Fast Classification Rates for High-dimensional Gaussian Generative Models

Fast Classification Rates for High-dimensional Gaussian Generative Models Fast Classificatio Rates for High-dimesioal Gaussia Geerative Models Tiayag Li Adarsh Prasad Departmet of Computer Sciece, UT Austi {lty,adarsh,pradeepr}@cs.utexas.edu Pradeep Ravikumar Abstract We cosider

More information

arxiv: v1 [math.st] 13 Mar 2009

arxiv: v1 [math.st] 13 Mar 2009 Adaptive Lasso for High Dimesioal Regressio ad Gaussia Graphical Modelig Shuheg Zhou Sara va de Geer Peter Bühlma arxiv:0903.55v [math.st] 3 Mar 009 Semiar für Statistik ETH Zürich CH-809 Zürich, Switzerlad

More information

SLOPE MEETS LASSO: IMPROVED ORACLE BOUNDS AND OPTIMALITY

SLOPE MEETS LASSO: IMPROVED ORACLE BOUNDS AND OPTIMALITY Submitted to the Aals of Statistics SLOPE MEETS LASSO: IMPROVED ORACLE BOUNDS AND OPTIMALITY By Pierre C. Bellec,, Guillaume Lecué,, ad Alexadre B. Tsybakov, ENSAE, CREST UMR CNRS 9194 ad CNRS Abstract

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression Harder, Better, Faster, Stroger Covergece Rates for Least-Squares Regressio Aoymous Author(s) Affiliatio Address email Abstract 1 2 3 4 5 6 We cosider the optimizatio of a quadratic objective fuctio whose

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls TO APPEAR IN IEEE TRANS. OF INFORMATION THEORY Miimax rates of estimatio for high-dimesioal liear regressio over l -balls Garvesh Raskutti, Marti J. Waiwright, Seior Member, IEEE ad Bi Yu, Fellow, IEEE.

More information

A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery

A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery A Nocovex Free Luch for Low-Rak plus Sparse Matrix Recovery Xiao Zhag ad Ligxiao Wag ad Quaqua Gu arxiv:70.0655v [stat.ml] 3 Apr 07 Abstract We study the problem of low-rak plus sparse matrix recovery.

More information

Regularization methods for large scale machine learning

Regularization methods for large scale machine learning Regularizatio methods for large scale machie learig Lorezo Rosasco March 7, 2017 Abstract After recallig a iverse problems perspective o supervised learig, we discuss regularizatio methods for large scale

More information

AGGREGATION AND HIGH-DIMENSIONAL STATISTICS (preliminary notes of Saint-Flour lectures, July 8-20, 2013)

AGGREGATION AND HIGH-DIMENSIONAL STATISTICS (preliminary notes of Saint-Flour lectures, July 8-20, 2013) AGGREGATION AND HIGH-DIENSIONAL STATISTICS (prelimiary otes of Sait-Flour lectures, July 8-20, 2013) Alexadre B. Tsybakov (CREST-ENSAE) October 30, 2013 1 Itroductio Give a collectio of estimators, the

More information

Closed-form Estimators for High-dimensional Generalized Linear Models

Closed-form Estimators for High-dimensional Generalized Linear Models Closed-form Estimators for High-dimesioal Geeralized Liear Models Euho Yag IBM T.J. Watso Research Ceter euhyag@us.ibm.com Aurélie C. Lozao IBM T.J. Watso Research Ceter aclozao@us.ibm.com Pradeep Ravikumar

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Von Neumann Entropy Penalization and Low Rank Matrix Estimation

Von Neumann Entropy Penalization and Low Rank Matrix Estimation Vo Neuma Etropy Pealizatio ad Low Rak Matrix Estimatio Vladimir Koltchiskii School of Mathematics Georgia Istitute of Techology Atlata, GA 3033-0160 vlad@math.gatech.edu October 6, 011 Abstract We study

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls Garvesh Raskutti Marti J. Waiwright, garveshr@stat.berkeley.edu waiwrig@stat.berkeley.edu Bi Yu, biyu@stat.berkeley.edu arxiv:090.04v

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Statistica Sinica Preprint No: SS R3

Statistica Sinica Preprint No: SS R3 Statistica Siica Preprit No: SS-2015-0179R3 Title Sparse ad robust liear regressio: a optimizatio algorithm ad its statistical properties Mauscript ID SS-2015-0179 URL http://www.stat.siica.edu.tw/statistica/

More information

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 arxiv: v2 [stat.ml] 7 Mar 2011

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 arxiv: v2 [stat.ml] 7 Mar 2011 The Aals of Statistics 011, Vol. 39, No. 1, 1 47 DOI: 10.114/09-AOS776 c Istitute of Mathematical Statistics, 011 SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 arxiv:0808.0711v [stat.ml]

More information

Fast Rates for Regularized Objectives

Fast Rates for Regularized Objectives Fast Rates for Regularized Objectives Karthik Sridhara, Natha Srebro, Shai Shalev-Shwartz Toyota Techological Istitute Chicago Abstract We study covergece properties of empirical miimizatio of a stochastic

More information

Elementary Estimators for Graphical Models

Elementary Estimators for Graphical Models Elemetary Estimators for Graphical Models Euho Yag IBM T.J. Watso Research Ceter euhyag@us.ibm.com Aurélie C. Lozao IBM T.J. Watso Research Ceter aclozao@us.ibm.com Pradeep Ravikumar Uiversity of Texas

More information

arxiv: v4 [math.st] 24 Jul 2017

arxiv: v4 [math.st] 24 Jul 2017 Robust Low-Rak Matrix Estimatio arxiv:1603.09071v4 [math.st] 24 Jul 2017 Adreas Elseer ad Sara va de Geer Semiar for Statistics ETH Zurich 8092 Zurich Switzerlad e-mail: elseer@stat.math.ethz.ch e-mail:

More information

State Space Representation

State Space Representation Optimal Cotrol, Guidace ad Estimatio Lecture 2 Overview of SS Approach ad Matrix heory Prof. Radhakat Padhi Dept. of Aerospace Egieerig Idia Istitute of Sciece - Bagalore State Space Represetatio Prof.

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

High-Dimensional Graphical Model Selection Using l 1 -Regularized Logistic Regression

High-Dimensional Graphical Model Selection Using l 1 -Regularized Logistic Regression High-Dimesioal Graphical Model Selectio Usig l 1 -Regularized Logistic Regressio Marti J. Waiwright Pradeep Ravikumar Joh D. Lafferty Departmet of Statistics Machie Learig Dept. Computer Sciece Dept. Departmet

More information

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2 82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,

More information

SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS

SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS Submitted to the Aals of Statistics arxiv: arxiv:0000.0000 SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS By T. Toy Cai, Tegyua Liag ad Alexader Rakhli The Wharto

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity

Confidence Intervals for High-Dimensional Linear Regression: Minimax Rates and Adaptivity Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research 5-207 Cofidece Itervals for High-Dimesioal Liear Regressio: Miimax Rates ad Adaptivity Toy Cai Uiversity of Pesylvaia Zijia

More information

Supplementary material to Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization

Supplementary material to Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization Electroic Joural of Statistics ISSN: 1935-754 Supplemetary material to No-egative least squares for high-dimesioal liear models: cosistecy ad sparse recovery without regularizatio Marti Slawski ad Matthias

More information

Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima

Regularized M-estimators with Nonconvexity: Statistical and Algorithmic Theory for Local Optima Joural of Machie Learig Research 6 05 559-66 Submitted 4/4; Revised 0/4; Published 3/5 Regularized M-estimators with Nocovexity: Statistical ad Algorithmic Theory for Local Optima Po-Lig Loh Departmet

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. By Zhao Ren and Harrison H. Zhou Yale University

DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. By Zhao Ren and Harrison H. Zhou Yale University Submitted to the Aals of Statistics DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION By Zhao Re ad Harriso H. Zhou Yale Uiversity 1. Itroductio. We would like to cogratulate

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information