The deterministic Lasso

Size: px
Start display at page:

Download "The deterministic Lasso"

Transcription

1 The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality is presented, under a so called compatibility condition Our aim is three fold: to proof a result announced in van de Geer 2007, to provide a simple proof with simple constants, and to separate the stochastic problem from the deterministic one KEY WORDS: coherence, Lasso, oracle, sparsity Introduction We consider the Lasso for high-dimensional generalized linear models, introducing three new elements Firstly, we complete the proof of an oracle inequality, which was announced in van de Geer 2007 The result is shown to hold under a so-called compatibility condition for certain norms This condition is a weaker version of Assumption C* in van de Geer 2007, and is related to coherence conditions in eg Candes and Tao, and Bunea et al 2006, 2007 Secondly, we state the result with simple constants, and give a new and rather straightforward proof Thirdly, in Subsection 3, we separate the stochastic part of the problem from the deterministic part In the subsequent sections, we then only concentrate on the deterministic part whence the title of this paper The advantage of this approach is that the behavior of the Lasso under various sampling schemes eg, dependent observations, can immediately be deduced once the stochastic problem a probability inequality for the empirical process is clear Consider data {Z i } n i=, with for i =,, n Z i in some space Z Let F := F, be a normed real vector space, and, for each f F, ρ f : Z R be a loss function We assume that the map f ρ f z is convex for all z Z For example, in a regression setup, one has for i =,, n Z i = X i, Y i, with covariables X i X and response variable Y i Y R, and f : X R is a regression function Examples are quadratic loss, or logistic loss, etc ρ f, y = y f 2, ρ f, y = yf + log + exp[f], The target We denote, for a function ρ : Z R, the theoretical measure by n P ρ := EρZ i /n, i= and the empirical measure by Define the target P n ρ := n ρz i /n i= f 0 := arg min f F P ρ f We assume for simplicity that the minimum is attained, and that it is unique For f F, the excess risk is 2 The Lasso Ef := P ρ f ρ f 0 Consider a given linear subspace F := {f β : β R p } F, where β f β is linear The collection F will be the model class over which we perform empirical risk minimization When F is high-dimensional, a complexity penalty can prevent overfitting The Lasso is where ˆβ = arg min β { Pn ρ fβ + λ β }, β := p β j, j= ie, is the l -norm The parameter λ is a smoothing - or tuning - parameter We examine the excess risk Ef ˆβ of the Lasso, and show that it behaves as if it knew which variables are relevant for a linear approximation of the target f 0 Section 2 does this under a so-called compatibility condition Section 3 examines under what circumstances the compatibility condition is met In, the penalty weights all coefficients equally In practice, certain terms eg, the constant term will not be penalized, and a weighted version of the l norm is used, with eg, more weight assigned to variables with large sample variance In essence possibly random weights do not alter the main issues in the theory We therefore consider the non-weighted l penalty to clarify these issues More details on the case of empirical weights are in Bunea et al 2006 and van de Geer 2007

2 3 The empirical process Throughout, the study of the stochastic part of the problem involving the empirical process is set aside It can be handled using empirical process theory In the current paper, we simply restrict ourselves to a set S where the empirical process behaves well see 6 in Lemma 2 This allows us to proceed with purely deterministic, and relatively straightforward, arguments More precisely, write for M > 0, Z M := sup P n P ρ fβ ρ fβ 2 β β M Here, β is some fixed parameter value, which will be specified in Section 2 Our results hold on the set S = {Z M ε}, with M and ε appropriately chosen The ratio ε/m will play a major role: it has to be chosen large enough so that S has large probability On the other hand, ε/m will form be a lower bound for the smoothing parameter λ see 5 in Lemma 2 It is shown in van de Geer 2007, that for independent observations Z i, under mild conditions, and with ε/m log2p/n, the set S has large probability in the case of Lipschitz loss functions, that is, for f ρ f z is Lipschitz for all z For other loss functions eg quadratic loss, similar behavior can be obtained, but only for small enough values of M In the result of Lemma 2, one may replace P n by any other random probability measure, say ˆP, in the definition of ˆβ, provided one also adjusts the definition 2 of Z M The result for the Lasso goes through on S One should then ensure that S contains most of the probability mass by taking ε/m appropriately, and this will then put lower bounds on the smoothing parameter 4 The margin behavior We will make use of the so-called margin condition see below, which is assumed for a neighborhood, denoted by F η, of the target f 0 Some of the effort goes in proving that the estimator is indeed in this neighborhood Here, we use arguments that rely on the assumed convexity of f ρ f These arguments also show that the stochastic part of the problem needs only to be studied locally ie, 2 for small values of M The margin condition requires that in the neighborhood F η F of f 0, the excess risk E is bounded from below by a strictly convex function This is true in many particular cases for an L neighborhood F η = { f f 0 η} for F being a class of functions Definition We say that the margin condition holds with strictly convex function G, if for all f F η, we have Ef G f f 0 Indeed, in typical cases, the margin condition holds with quadratic function G, that is, Gu = cu 2, u 0, where c is a positive constant We also need the notion of convex conjugate of a function Definition Let G be a strictly convex function on [0,, with G0 = 0 The convex conjugate H of G is defined as Hv = sup {uv Gu}, v 0 u Note that when G is quadratic, its convex conjugate H is quadratic as well The function H will appear in the estimation error term 2 An oracle inequality Our goal is now, to show that the estimator has oracle properties, namely, to show, for a set B R p as large as possible, that Ef ˆβ const min β B { Ef β + H λ } J β, 3 with large probability Here, H is the convex conjugate of the function G appearing in the margin condition The left hand side of 3 is small if the target f 0 can be well approximated by an f β with only a few of the coefficients β j j =,, p non-zero, that is, by a sparse f β To arrive at such an inequality, we need a certain amount of compatibility between the l norm and the metric on the vector space F 2 The compatibility condition Let J {,, p} be an index set Definition We say that the compatibility condition is met for the set J, with constants φj > 0 and νj > 0, if for all β R p that satisfy j / J β j 3 β j, it holds that β j J f β /φj + j / J β j / + νj 4 More details on this condition are in Section 3 22 The result Let 0 < δ < be fixed but otherwise arbitrary For β R p, set J β := {j : β j 0} and J β := J β The oracle β is defined as { } 2λ β Jβ := arg min + δef β + 2δH, β B φj δ Here, the minimum is over the set B of all β, for which the compatibility condition holds for J β Let J := {j : βj 0}, and let φ = φj and = νj Set 2λ ε Jβ := + δef β + 2δH φ δ

3 Lemma 2 Lasso under the compatibility condition Consider a constant λ 0 satisfying the inequality λ λ, 5 and let M = ε /λ 0 Assume the margin condition with strictly convex function G, and that the oracle f β is in the neighborhood F η of the target, as well as f β F η for all β β M Then on the set we have either or S := {Z M ε }, 6 δef ˆβ + ν 2 + λ ˆβ β 2ε, Ef ˆβ + λ ˆβ 42 + ν β ε 23 Asymptotics in n In the case of independent observations, the set S has large probability under general conditions, for λ 0 log2p/n see van de Geer 2007 Taking λ also of this order, and assuming a a quadratic margin, and in addition that φ stays away from zero, the estimation error H2λ J β /φ δ behaves like J β log2p/n 24 The case = We observe that often λ 0 can be taken independently of oracle values, whereas 5 requires a lower bound on the smoothing parameter λ which depends on the unknown oracle value The best possible situation is when is arbitrarily large Now, the case > 2 corresponds after a reparametrization to the case = Indeed, if νj > 2 and j / J β j 3 β j, then 4 implies νj 2 J β j fβ /φj + νj With =, Lemma 2 reads Corollary 2 Let λ 0 λ/8, and M := ε /λ 0 Assume the conditions of Lemma 2 with = Then on the set S given in 6, we have either or δef ˆβ + λ ˆβ β 2ɛ, Ef ˆβ + λ ˆβ β 4ɛ 3 On the compatibility condition Let f β 2 := β T Σβ, with Σ a p p matrix The entries of Σ are denoted by σ j,k j, k {,, p} We first present a simple lemma Lemma 3 Suppose that Σ has smallest eigenvalue φ > 0 Then the compatibility condition holds for all index sets J, with φj = φ and νj = 3 The coherence lemma For a vector v, and for q, we shall write v q = j v j q /q, with the usual extension for q = We now fix an L {,, p} Consider a partition of the form Σ, Σ Σ :=,2 Σ 2, Σ 2,2 where Σ, is an L L matrix, Σ 2, is a p L L matrix and Σ,2 = Σ T 2, is its transpose, and where Σ 2,2 is a p L p L matrix Then for any b R L and b 2 R p L, and for b T := b T, b T 2, one has b T Σ, b b T Σb + 2 Σ 2, q b 2 b 2 r, with /q + /r =, and Σ 2, q := sup Σ 2, a q a R L, a 2= Consider now a subset L {,, p} We introduce the L L matrix Σ, L := σ j,k j,k L, and the p L L matrix Σ 2, L := σ j,k j / L,k L, We let ρ 2 L be the smallest eigenvalue of Σ, L and take ϑ 2 ql := Σ 2, L q We moreover define and φj := θ q J := min ρl, L J : L =L max ϑ ql L J : L =L Lemma 32 Coherence lemma Fix J {,, p}, with cardinality J := J L Then we have for any β R p, J β j φj f β 2 + β j / + νj j / J with, for /q + /r =, + νj = 2 Jq /r θ 2 q J L+ J /q φ 2 J 2 J θ2 J φ 2 J q < q = The coherence lemma uses an auxiliary lemma Lemma 4 which is based on ideas in Candes and Tao 2007

4 32 Relation with other coherence conditions In the coherence lemma, the choice of L J and q is open They are allowed to depend on J, and for our purposes should be taken in such a way that φj and νj are as large as possible To gain insight into what the coherence lemma can bring us, let us again consider the partition Σ, Σ Σ :=,2 Σ 2, Σ 2,2 Note first that Σ 2, 2 2 is the largest eigenvalue of the matrix Σ,2 Σ 2, The coherence lemma with q = 2 is along the lines of the coherence condition in Candes and Tao 2007 They choose L not larger than 3J It is easy to see that Σ 2, q p j=l+ L k= σ 2 j,k q /q 7 with β j,in = β j l{j J } and β j,out = β j l{j / J } Note thus that βin = β and βout 0 Throughout the proof, we assume we are on the set S Let M s := M + ˆβ, β and β := s ˆβ + sβ Write f := f β, Ẽ := E f and f := f β, E := Ef The proof is structured as follows We first note that β β M and prove the result with ˆβ replaced by β As a byproduct we obtain that ˆβ β M The latter allows us to repeat the proof with β replaced by ˆβ By the convexity of f ρ f, and of, Thus, P n ρ f + λ β s[p n ρ f ˆβ + λ ˆβ ] + s[p n ρ f + λ β ] P n ρ f + λ β Ẽ + λ β 8 Thus, Σ 2, q p L max σ j,k q k L j=l+ /q = P n P ρ f ρ f + P n ρ f ρ f + E + λ β P n P ρ f ρ f + E + λ β Z M + E + λ β So on S For q =, this reads Ẽ + λ β ε + E + λ β 9 L Σ 2, max max σ j,k L+ j p k L Note also that with q =, the choice L = J in the coherence lemma gives the best result With this choice of L, the coherence lemma with q = gives a positive value for νj required in the compatibility conditions, if 2J max max σ j,k /φ 2 J < j / J k J This is in the spirit of the maximal local coherence condition in Bunea et al 2007 It also follows from 7, that Σ 2, q p j=l+ L q σ j,k k= Invoking this bound with q = and L = J in the coherence lemma leads to a condition which is similar to the cumulative local coherence condition in Bunea et al Proofs Proof of Lemma 2 We write β = β in + β out, /q Therefore, we have Ẽ + λ β out 0 ε + E + λ β in β 2ε + λ β in β Case If then 0 implies λ β in β Ẽ + λ β out 22 + ε, ε, and hence Ẽ + λ β 42 + ν β ε But then β β 42 + ν ε λ = 42 + ν λ 0 λ M M 2, since λ 82 + λ 0 / This implies that ˆβ β M Case 2 If λ β 22 + ν in β ε,

5 we get from 0 and using 22 + = 4 + ν, that λ β out 2ε + λ β in β 3λ β in β This means that we can apply the weak compatibility condition Recall that inequality 9 implies We have and So Ẽ + λ β out ε + E + λ β λ β in ε + E + λ β in β β out = β out + ν 2 + β out, β out = β β β in β Ẽ ν λ β out + ν 2 + λ β β ε + E ν 2 + λ β in β With the compatibility condition on J, we find β in β J f β f β /φ + β out / + Thus Ẽ λ β out + ν 2 + λ β β ε +E ν 2 + J f β f β /φ λ β out The term with β out cancels out Moreover, we may use the bound + /2 + This allows us to conclude that Ẽ + ν 2 + λ β β ε + E + 2λ J f β f β /φ Now, because we assumed f β F η, and since also f β F η, we can invoke the margin condition to arrive at 2λ J f βn f β n /φ 2λ J 2δH φ + δẽ δ + δe It follows that Ẽ + ν 2 + λ β β ε + + δe + 2δH 2λ J φ + δẽ δ = ε + ε + δẽ = 2ε + δẽ, or δẽ + ν 2 + λ β β 2ε 2 This yields β β 22 + ν λ 0 M M λ 4, where the last inequality is ensured by the assumption λ 82 + / λ 0 But β β M /4 implies ˆβ β M 3 M So in both Case and Case 2, we arrive at ˆβ β M Now, repeat the above arguments with β replaced by ˆβ Let Ê = Ef ˆβ Then, either from Case with this replacement: Ê + λ ˆβ 42 + ν β ε Or we are in Case 2 and can apply the compatibility assumption Then we arrive at 2 with replacement: δê + ν 2 + λ ˆβ β 2ε Proof of Lemma 3 Let β R p be arbitrary It is clear that β 2 2 f β 2 /φ 2 Hence, β j J β j 2 J β 2 J f β /φ Proof of Lemma 32 Let β = β in + β out, with β j,in := β j l and β j,out := β j l j / J, j =,, p Let L J, with L\J being the set of the L J largest β j with j / J Define b j, := β j l j L and b j,2 := β j l j / L We have b T Σb 2 θqj 2 b 2 b 2 r /2 By the auxiliary lemma below, for q <, This yields Hence b 2 r β out /L + J /q q /r 3 b T Σb 2 θ 2 qj b 2 β out /L + J /q q /r f b 2 f β b T Σb 2 f β 2 + 2θ 2 qj b 2 β out /L + J /q q /r It follows that b 2 2 f b 2 φ 2 J

6 f β 2 φ 2 J + 2 θ2 qj φ 2 J b 2 β out /L + J /q q /r This implies b 2 f β φj + 2 θ2 qj φ 2 J β out /L J /q We thus arrive at J fβ φj β in J β in J θ2 qj φ 2 J β out /L J /q The result with q = follows in the same manner, using instead of 3, the trivial bound b β out Lemma 4 Auxiliary lemma Let b b 2 0 For < r, /q + /r =, and L {0,, }, we have b j r /r q /r L + /q b Proof We have j r L + r + Moreover, for all j, b so that b j b /j Hence b r j b r L+ x r dx q L + r j b l jb j, l= REFERENCES j r q b r L + r Bunea, F, Tsybakov, AB and Wegkamp, MH 2006 Sparsity oracle inequalities for the Lasso submitted Bunea, F, Tsybakov, AB and Wegkamp, MH 2007 Sparse density estimation and aggregation with l penalties, COLT Candes, E and Tao, T 2007 The Dantzig selector: statistical estimation when p is much larger than n To appear in Ann Statist van de Geer, SA 2007, High-dimensional generalized linear models and the Lasso To appear in Ann Statist

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar

More information

arxiv: v1 [math.st] 5 Oct 2009

arxiv: v1 [math.st] 5 Oct 2009 On the conditions used to prove oracle results for the Lasso Sara van de Geer & Peter Bühlmann ETH Zürich September, 2009 Abstract arxiv:0910.0722v1 [math.st] 5 Oct 2009 Oracle inequalities and variable

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

arxiv: v2 [math.st] 15 Sep 2015

arxiv: v2 [math.st] 15 Sep 2015 χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Sparsity oracle inequalities for the Lasso

Sparsity oracle inequalities for the Lasso Electronic Journal of Statistics Vol. 1 (007) 169 194 ISSN: 1935-754 DOI: 10.114/07-EJS008 Sparsity oracle inequalities for the Lasso Florentina Bunea Department of Statistics, Florida State University

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY. By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B.

ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY. By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B. Submitted to the Annals of Statistics arxiv: 007.77v ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B. Tsybakov We

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

arxiv: v1 [math.st] 8 Jan 2008

arxiv: v1 [math.st] 8 Jan 2008 arxiv:0801.1158v1 [math.st] 8 Jan 2008 Hierarchical selection of variables in sparse high-dimensional regression P. J. Bickel Department of Statistics University of California at Berkeley Y. Ritov Department

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Minwise hashing for large-scale regression and classification with sparse data

Minwise hashing for large-scale regression and classification with sparse data Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Classification with Reject Option

Classification with Reject Option Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Relaxed Lasso. Nicolai Meinshausen December 14, 2006 Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

High dimensional thresholded regression and shrinkage effect

High dimensional thresholded regression and shrinkage effect J. R. Statist. Soc. B (014) 76, Part 3, pp. 67 649 High dimensional thresholded regression and shrinkage effect Zemin Zheng, Yingying Fan and Jinchi Lv University of Southern California, Los Angeles, USA

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Journal of Machine Learning Research 17 (2016) 1-20 Submitted 11/15; Revised 9/16; Published 12/16 A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Michaël Chichignoud

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

Eigenvalues and Eigenfunctions of the Laplacian

Eigenvalues and Eigenfunctions of the Laplacian The Waterloo Mathematics Review 23 Eigenvalues and Eigenfunctions of the Laplacian Mihai Nica University of Waterloo mcnica@uwaterloo.ca Abstract: The problem of determining the eigenvalues and eigenvectors

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

1. f(β) 0 (that is, β is a feasible point for the constraints)

1. f(β) 0 (that is, β is a feasible point for the constraints) xvi 2. The lasso for linear models 2.10 Bibliographic notes Appendix Convex optimization with constraints In this Appendix we present an overview of convex optimization concepts that are particularly useful

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA)

Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA) Bayesian Perspectives on Sparse Empirical Bayes Analysis (SEBA) Natalia Bochkina & Ya acov Ritov February 27, 2010 Abstract We consider a joint processing of n independent similar sparse regression problems.

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Convex functions, subdifferentials, and the L.a.s.s.o.

Convex functions, subdifferentials, and the L.a.s.s.o. Convex functions, subdifferentials, and the L.a.s.s.o. MATH 697 AM:ST September 26, 2017 Warmup Let us consider the absolute value function in R p u(x) = (x, x) = x 2 1 +... + x2 p Evidently, u is differentiable

More information

Lecture 21 Theory of the Lasso II

Lecture 21 Theory of the Lasso II Lecture 21 Theory of the Lasso II 02 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Available now, due next Monday Problem Set 7 - Available now, due December 11th

More information

Dimension Reduction in Abundant High Dimensional Regressions

Dimension Reduction in Abundant High Dimensional Regressions Dimension Reduction in Abundant High Dimensional Regressions Dennis Cook University of Minnesota 8th Purdue Symposium June 2012 In collaboration with Liliana Forzani & Adam Rothman, Annals of Statistics,

More information

arxiv: v1 [math.st] 10 May 2009

arxiv: v1 [math.st] 10 May 2009 ESTIMATOR SELECTION WITH RESPECT TO HELLINGER-TYPE RISKS ariv:0905.1486v1 math.st 10 May 009 YANNICK BARAUD Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Does Compressed Sensing have applications in Robust Statistics?

Does Compressed Sensing have applications in Robust Statistics? Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.

More information

Friedrich symmetric systems

Friedrich symmetric systems viii CHAPTER 8 Friedrich symmetric systems In this chapter, we describe a theory due to Friedrich [13] for positive symmetric systems, which gives the existence and uniqueness of weak solutions of boundary

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Lasso-type recovery of sparse representations for high-dimensional data

Lasso-type recovery of sparse representations for high-dimensional data Lasso-type recovery of sparse representations for high-dimensional data Nicolai Meinshausen and Bin Yu Department of Statistics, UC Berkeley December 5, 2006 Abstract The Lasso (Tibshirani, 1996) is an

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

NONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION

NONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION NONLINEAR LEAST-SQUARES ESTIMATION DAVID POLLARD AND PETER RADCHENKO ABSTRACT. The paper uses empirical process techniques to study the asymptotics of the least-squares estimator for the fitting of a nonlinear

More information

This manuscript is for review purposes only.

This manuscript is for review purposes only. 1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls 6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

NONLINEAR FREDHOLM ALTERNATIVE FOR THE p-laplacian UNDER NONHOMOGENEOUS NEUMANN BOUNDARY CONDITION

NONLINEAR FREDHOLM ALTERNATIVE FOR THE p-laplacian UNDER NONHOMOGENEOUS NEUMANN BOUNDARY CONDITION Electronic Journal of Differential Equations, Vol. 2016 (2016), No. 210, pp. 1 7. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu NONLINEAR FREDHOLM ALTERNATIVE FOR THE p-laplacian

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

CENTRAL LIMIT THEOREMS AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n

CENTRAL LIMIT THEOREMS AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n CENTRAL LIMIT THEOREMS AND MULTIPLIER BOOTSTRAP WHEN p IS MUCH LARGER THAN n VICTOR CHERNOZHUKOV, DENIS CHETVERIKOV, AND KENGO KATO Abstract. We derive a central limit theorem for the maximum of a sum

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

How Correlations Influence Lasso Prediction

How Correlations Influence Lasso Prediction How Correlations Influence Lasso Prediction Mohamed Hebiri and Johannes C. Lederer arxiv:1204.1605v2 [math.st] 9 Jul 2012 Université Paris-Est Marne-la-Vallée, 5, boulevard Descartes, Champs sur Marne,

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

SPECTRAL GAP FOR ZERO-RANGE DYNAMICS. By C. Landim, S. Sethuraman and S. Varadhan 1 IMPA and CNRS, Courant Institute and Courant Institute

SPECTRAL GAP FOR ZERO-RANGE DYNAMICS. By C. Landim, S. Sethuraman and S. Varadhan 1 IMPA and CNRS, Courant Institute and Courant Institute The Annals of Probability 996, Vol. 24, No. 4, 87 902 SPECTRAL GAP FOR ZERO-RANGE DYNAMICS By C. Landim, S. Sethuraman and S. Varadhan IMPA and CNRS, Courant Institute and Courant Institute We give a lower

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations arxiv:1510.08986v2 [math.st] 23 Jun 2016 Matey Neykov Yang Ning Jun S. Liu Han Liu Abstract We propose a new

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information