(Part 1) High-dimensional statistics May / 41

Size: px
Start display at page:

Download "(Part 1) High-dimensional statistics May / 41"

Transcription

1 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2 I)-distributed. We moreover assume that the linear model holds exactly, with some true parameter value β 0. (Part 1) High-dimensional statistics May / 41

2 (Part 1) High-dimensional statistics May / 41 What is an oracle inequality? Suppose for the moment that p n and that X has full rank p. Consider the least squares estimator in the linear model Then the prediction error ˆβ LM := (X T X) 1 X T Y. X( ˆβ LM β 0 ) 2 2 /σ2 is χ 2 p-distributed. In particular, this means that E X( ˆβ LM β 0 ) 2 2 n = σ2 n p. In words: each parameter βj 0 is estimated with squared accuracy σ 2 /n, j = 1,..., p. The overall squared accuracy is then (σ 2 /n) p.

3 (Part 1) High-dimensional statistics May / 41 Sparsity We now turn to the situation where possibly p > n. The philosophy that will generally rescue us, is to believe that in fact only a few, say s 0, of the βj 0 are non-zero. We use the notation S 0 : S 0 := {j : β 0 j 0}, so that s 0 = S 0. We call S 0 the active set, and s 0 the sparsity index of β 0.

4 (Part 1) High-dimensional statistics May / 41 Notation β j,s0 := β j l{j S 0 }, β j,s c 0 := β j l{j / S 0 }. Clearly, and β = β S0 + β S c 0, β 0 S c 0 = 0.

5 (Part 1) High-dimensional statistics May / 41 If we would know S 0, we could simply neglect all variables X (j) with j / S 0. Then, by the above argument, the overall squared accuracy would be (σ 2 /n) s 0. With S 0 is unknown, we apply the l 1 -penalty, i.e., the Lasso { } ˆβ := arg min Y Xβ 2 2 /n + λ β 1. β Definition: Sparsity oracle inequality. The sparsity constant φ 0 is the largest value φ 0 > 0 such that Lasso ˆβ satisfies the φ 0 -sparsity oracle inequality X( ˆβ β 0 ) 2 2 /n + λ ˆβ S c 0 1 λ2 s 0 φ 2. 0

6 A digression: the noiseless case Let X be some measurable space, Q be a probability measure on X, and be the L 2 (Q) norm. Consider a fixed dictionary of functions {ψ j } p j=1 L 2(Q): Consider linear functions f β ( ) = p β j ψ j ( ) : β R p. j=1 Consider moreover a fixed target f 0 := p βj 0 ψ j. j=1 We let S 0 := {j : βj 0 0} be its active set, and s 0 := S 0 be the sparsity index of f 0. (Part 1) High-dimensional statistics May / 41

7 (Part 1) High-dimensional statistics May / 41 For some fixed λ > 0, the Lasso for the noiseless problem is } β := arg min { f β f λ β 1, β where 1 is the l 1 -norm. We write f := f β and let S be the active set of the Lasso. The Gram matrix is Σ := ψ T ψdq.

8 (Part 1) High-dimensional statistics May / 41 We will need certain conditions on the Gram matrix to make the theory work. We require a certain compatibility of l 1 -norms with l 2 -norms. Compatibility Let L > 0 be some constant. The compatibility constant is φ 2 Σ (L, S 0) := φ 2 (L, S 0 ) := min{s 0 β T Σβ : β S0 1 = 1, β c S 0 1 L}. We say that the (L, S 0 )-compatibility condition is met if φ(l, S 0 ) > 0.

9 (Part 1) High-dimensional statistics May / 41 Back to the noisy case Lemma (Basic Inequality) We have X( ˆβ β 0 ) 2 2 /n + 2λ ˆβ 1 2ɛ T X( ˆβ β 0 )/n + 2λ β 0 1.

10 (Part 1) High-dimensional statistics May / 41 We introduce the set We assume that T := { } max 1 j p ɛt X (j) /n λ 0. λ > λ 0, to make sure that on T we can get rid of the random part of the problem.

11 (Part 1) High-dimensional statistics May / 41 Let us denote the diagonal elements of the Gram matrix ˆΣ := X T X/n, by ˆσ 2 j := ˆΣ j,j, j = 1,..., p. Lemma Suppose that σ 2 = ˆσ 2 j = 1 for all j. Then we have for all t > 0, and for 2t + 2 log p λ 0 :=, n P(T ) 1 2 exp[ t].

12 (Part 1) High-dimensional statistics May / 41 Compatibility condition (Noisy case) Let L > 0 be some constant. The compatibility constant is φ 2ˆΣ(L, S 0 ) := φ 2 (L, S 0 ) := min{s 0 β T ˆΣβ : β S0 1 = 1, β c S 0 1 L}. We say that the (L, S 0 )-compatibility condition is met if φ(l, S 0 ) > 0.

13 (Part 1) High-dimensional statistics May / 41 Theorem Suppose λ λ 0 and that the compatibility condition holds for S 0, with L = λ + λ 0 λ λ 0. Then on we have and T := { } max 1 j p ɛt X (j) /n λ 0, X( ˆβ β 0 ) 2 n 4(λ + λ 0 ) 2 s 0 /φ 2 (L, S 0 ), ˆβ S0 β 0 1 2(λ + λ 0 )s 0 /φ 2 (L, S 0 ), ˆβ c S 0 1 2L(λ + λ 0 )s 0 /φ 2 (L, S 0 ).

14 (Part 1) High-dimensional statistics May / 41 When does the compatibility condition hold? oracle inequalities for prediction and estimation RIP weak (S,2s)- RIP adaptive (S, 2s)- restricted regression (S,2s)-restricted eigenvalue S-compatibility S \S s * coherence adaptive (S, s)- restricted regression (S,s)-restricted eigenvalue weak (S, 2s)- irrepresentable (S,2s)-irrepresentable (S,s)-uniform irrepresentable S \S =0 *

15 (Part 1) High-dimensional statistics May / 41 If Σ is non-singular, the compatibility condition holds, with φ 2 (S 0 ) Λ 2 min, the latter being the smallest eigenvalue of Σ. Example Consider the matrix 1 ρ ρ Σ := (1 ρ)i + ριι T ρ 1 ρ =....., ρ ρ 1 with 0 < ρ < 1, and ι := (1,..., 1) T a vector of 1 s. Then the smallest eigenvalue of Σ is Λ 2 min = 1 ρ, so the compatibility condition holds with φ(s 0 ) 1 ρ. (The uniform S 0 -irrepresentable condition is met as well.)

16 (Part 1) High-dimensional statistics May / 41 If Σ is non-singular, the compatibility condition holds, with φ 2 (S 0 ) Λ 2 min, the latter being the smallest eigenvalue of Σ. Example Consider the matrix 1 ρ ρ Σ := (1 ρ)i + ριι T ρ 1 ρ =....., ρ ρ 1 with 0 < ρ < 1, and ι := (1,..., 1) T a vector of 1 s. Then the smallest eigenvalue of Σ is Λ 2 min = 1 ρ, so the compatibility condition holds with φ(s 0 ) 1 ρ. (The uniform S 0 -irrepresentable condition is met as well.)

17 (Part 1) High-dimensional statistics May / 41 Geometric interpretation Let X j R n denote the j-th column of X (j = 1,..., p). The set A := {Xβ S : β S 1 = 1} is the convex hull of the vectors {±X j } j S in R n. Likewise, the set B := {Xβ S c : β S c 1 L} is the convex hull including interior of the vectors {±LX j } j S c. The l 1 -eigenvalue δ(l, S) is the distance between these two sets. δ(l,s) B A

18 (Part 1) High-dimensional statistics May / 41 We note that: if L is large the l 1 -eigenvalue will be small, it will also be small if the vectors in S exhibit strong correlation with those in S c, when the vectors in {X j } j S are linearly dependent, it holds that {Xβ S : β S 1 = 1} = {Xβ S : β S 1 1}, and hence then δ(l, S) = 0.

19 (Part 1) High-dimensional statistics May / 41 The difference between the compatibility constant and the squared l 1 -eigenvalue lies only in the normalization by the size S of the set S. This normalization is inspired by the orthogonal case, which we detail in the following example. Example Suppose that the columns of X are all orthogonal: Xj T X k = 0 for all j k. Then δ(l, S) = 1/ S and φ(l, S) = 1.

20 (Part 1) High-dimensional statistics May / 41 Let S β := {j : β j 0}. We call S β the sparsity-index of β. More generally, we call S the sparsity index of the set S. Definition For a set S and constant L > 0, the effective sparsity Γ 2 (L, S) is the inverse of the squared l 1 -eigenvalue, that is Γ 2 (L, S) = 1 δ 2 (L, S).

21 (Part 1) High-dimensional statistics May / 41 Example As a simple numerical example, let us suppose n = 2, p = 3, S = {3}, and X = ( ) 5/ n. 12/ The l 1 -eigenvalue δ(l, S) is equal to the distance of X 3 to line that connects LX 1 and LX 2, that is δ(l, S) = max{(5 L)/ 26, 0}. Hence, for example for L = 3 the effective sparsity is Γ 2 (3, S) = 13/2. Alternatively, when X = ( ) 12/ n. 5/ then for example δ(3, S) = 0 and hence Γ 2 (3, S) =. This is due to the sharper angle between X 1 and X 3.

22 The compatibility condition is slightly weaker than the restricted eigenvalue condition of Bickel et al. [2009]. The restricted isometry property of Candes [2005] implies the restricted eigenvalue condition. (Part 1) High-dimensional statistics May / 41

23 The compatibility condition is slightly weaker than the restricted eigenvalue condition of Bickel et al. [2009]. The restricted isometry property of Candes [2005] implies the restricted eigenvalue condition. (Part 1) High-dimensional statistics May / 41

24 (Part 1) High-dimensional statistics May / 41 Approximating the Gram matrix For two (positive semi-definite) matrices Σ 0 and Σ 1, we define the supremum distance Σ 1 Σ 0 := max (Σ 1 ) j,k (Σ 0 ) j,k. j,k Lemma Assume Then β S c β S0 1, f β 2 Σ 1 f β 2 1 Σ 0 Σ 1 Σ 0 λ. 16 λs φ 2 compatible (Σ 0, S 0 ).

25 (Part 1) High-dimensional statistics May / 41 Corollary We have φ Σ1 (3, S 0 ) φ Σ0 (3, S 0 ) 4 Σ 0 Σ 1 s 0.

26 (Part 1) High-dimensional statistics May / 41 Example Suppose we have a Gaussian random matrix ˆΣ := X T X/n = (ˆσ j,k ), where X = (X i,j ) is a n p-matrix with i.i.d. N (0, 1)-distributed entries in each column. For all t > 0, and for 4t + 8 log p 4t + 8 log p λ(t) := +, n n one has the inequality ( ) P ˆΣ Σ λ(t) 2 exp[ t].

27 Example (continued) Hence, we know for example that with probability at least 1 2 exp[ t], φ compatible (ˆΣ, S 0 ) Λ min (Σ) 4 λ(t)s 0. This leads to a bound on the sparsity of the form s 0 = o(1/ λ(t)), which roughly says that s 0 should be of small order n/ log p. (Part 1) High-dimensional statistics May / 41

28 Definition We call a random variable X sub-gaussian if for some constant K and σ 2 0, E exp[x 2 /K 2 ] σ 2 0. Theorem Suppose X 1,..., X n are uniformly sub-gaussian with constants K and σ0 2. Then for constants η = η(k, σ2 0 ), it holds that β T 1 ˆΣβ 3 βt Σβ t + log p β 2 1 n /η2, with probability at least 1 2 exp[ t]. See Raskutti, Wainwright and Yu [2010]. (Part 1) High-dimensional statistics May / 41

29 (Part 1) High-dimensional statistics May / 41 General convex loss Consider data {Z i } n i=1, with Z i in some space Z. Consider a linear space F := {f β ( ) = p j=1 β jψ j ( ) : β R p }. For each f F, ρ f : Z R be a loss function. We assume that the map f ρ f (z) is convex for all z Z. For example, Z i = (X i, Y i ), and ρ is quadratic loss or logistic loss ρ f (, y) = (y f ( )) 2, ρ f (, y) = yf ( ) + log(1 + exp[f ( )]), or minus log-likelihood loss ρ f = f log exp[f ], etc.

30 (Part 1) High-dimensional statistics May / 41 We denote, for a function ρ : Z R, the empirical average by P n ρ := n ρ(z i )/n, i=1 and the theoretical mean by Pρ := n Eρ(Z i )/n. i=1 The Lasso is { } ˆβ = arg min P n ρ fβ + λ β 1. (1) β We write ˆf = f ˆβ.

31 (Part 1) High-dimensional statistics May / 41 We furthermore define the target as the minimizer of the theoretical risk f 0 := arg min f F Pρ f. The excess risk is E(f ) := P(ρ f ρ f 0). Note that by definition, E(f ) 0 for all f F. We will mainly examine the excess risk E(ˆf ) of the Lasso.

32 (Part 1) High-dimensional statistics May / 41 Definition We say that the margin condition holds with strictly convex function G, if E(f ) G( f f 0 ). In typical cases, the margin condition holds with quadratic function G, that is, G(u) = cu 2, u 0, where c is a positive constant. G uv uv-g(u) Definition Let G be a strictly convex function on [0, ). Its convex conjugate H is defined as -G(u) H(v) = sup {uv G(u)}, v 0. u H(v)

33 (Part 1) High-dimensional statistics May / 41 Set and Z M := sup (P n P)(ρ fβ ρ fβ0 ), (2) β β 0 1 M M 0 := H ( ) 4λ s0 /λ 0, (3) φ(s 0 ) where φ(s 0 ) = φ compatible (S 0 ). Set T := {Z M0 λ 0 M 0 }. (4)

34 (Part 1) High-dimensional statistics May / 41 Theorem (Oracle inequality for the Lasso) Assume the compatibility condition and the margin condition with strictly convex function G. Take λ 8λ 0. Then on the set T given in (4), we have ( ) 4λ E(ˆf ) + λ ˆβ β 0 s0 1 4H. φ(s 0 )

35 (Part 1) High-dimensional statistics May / 41 Corollary Assume quadratic margin behavior, i.e., G(u) = u 2. Then H(v) = v 2 /4, and we obtain on T, E(ˆf ) + λ ˆβ β 0 1 4λ2 s 0 φ 2 (S 0 ).

36 (Part 1) High-dimensional statistics May / 41 l 2 -rates To derive rates for ˆβ β 0 2, we need a stronger compatibility condition. Definition We say that the (S 0, 2s 0 )-restricted eigenvalue condition is satisfied, with constant φ = φ(s 0, 2s 0 ) > 0, if for all N S 0, N = 2s 0, and all β R p, that satisfy β S c β S0 1, and β j β N \S0, j / N, it holds that β N 2 f β /φ.

37 Lemma Suppose the conditions of the previous theorem are met, but now with the stronger (S 0, 2s 0 )-restricted eigenvalue condition. On T, ( )) 4λ 2 ˆβ β (H 16 s0 /(λ 2 s 0 ) + λ2 s 0 φ 4φ 4. In the case of quadratic margin behavior, with G(u) = u 2, we then get on T, ˆβ β λ2 s 0 φ 4. (Part 1) High-dimensional statistics May / 41

38 (Part 1) High-dimensional statistics May / 41 Theory for l 1 /l 2 -penalties Group Lasso Y i = p ( T ) X (j) i,t β0 j,t + ɛ i, i = 1,..., n, j=1 t=1 where the βj 0 := (βj,1 0,..., β0 j,t )T have sparsity property βj 0 0 for most j. l 1 /l 2 -penalty: p β 2,1 := β j 2. j=1

39 Multivariate linear model Y i,t = p j=1 X (j) i,t β0 j,t + ɛ i,t,, i = 1,..., n, t = 1,..., T, with for β 0 j := (β 0 j,1,..., β0 j,t )T, the sparsity property β 0 j 0 for most j. Linear model with time-varying coefficients Y i (t) = p j=1 X (j) i (t)β 0 j (t) + ɛ i (t), i = 1,..., n, t = 1,..., T, where the coefficients β 0 j ( ) are smooth functions, with the sparsity property that most of the β 0 j 0. (Part 1) High-dimensional statistics May / 41

40 (Part 1) High-dimensional statistics May / 41 High-dimensional additive model Y i = p j=1 f 0 j (X (j) i ) + ɛ i, i = 1,..., n, where the f 0 j (X (j) i ) are (non-parametric) smooth functions, with sparsity property f 0 j 0 for most j.

41 (Part 1) High-dimensional statistics May / 41 Theorem Consider the group Lasso where λ 4λ 0, with { } ˆβ = arg min Y Xβ 2 2 /n + λ T β 2,1, β λ 0 = 2 4x + 4 log p n T Then with probability at least 1 exp[ x], we have 4x + 4 log p. T X ˆβ f /n + λ T ˆβ β 0 2,1 24λ2 T S0 s 0 φ 2. 0

42 (Part 1) High-dimensional statistics May / 41 Theorem Consider the smoothed group Lasso } ˆβ := arg min { Y Xβ 22 /n + λ β 2,1 + λ 2 Bβ 2,1, β where λ 4λ 0. Then on T := {2 ɛ T Xβ /n λ 0 β 2,1 + λ 2 0 Bβ 2,1}, we have } ˆf f 0 2 n + λpen( ˆβ β 0 )/2 3 {16λ 2 s 0 /φ λ2 Bβ 0 2,1.

43 etc.... (Part 1) High-dimensional statistics May / 41

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

arxiv: v1 [math.st] 5 Oct 2009

arxiv: v1 [math.st] 5 Oct 2009 On the conditions used to prove oracle results for the Lasso Sara van de Geer & Peter Bühlmann ETH Zürich September, 2009 Abstract arxiv:0910.0722v1 [math.st] 5 Oct 2009 Oracle inequalities and variable

More information

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Lecture I: Asymptotics for large GUE random matrices

Lecture I: Asymptotics for large GUE random matrices Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

arxiv: v2 [stat.ml] 4 Apr 2012

arxiv: v2 [stat.ml] 4 Apr 2012 A General Framework of Dual Certificate Analysis for Structured Sparse Recovery Problems arxiv:1201.3302v2 [stat.ml] 4 Apr 2012 Cun-Hui Zhang Department of Statistics Rutgers University, NJ czhang@stat.rutgers.edu

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #5 1 / 24 Introduction What is a confidence interval? To fix ideas, suppose

More information

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities

l 1 -Regularized Linear Regression: Persistence and Oracle Inequalities l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and

More information

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls 6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Lecture 21 Theory of the Lasso II

Lecture 21 Theory of the Lasso II Lecture 21 Theory of the Lasso II 02 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Available now, due next Monday Problem Set 7 - Available now, due December 11th

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Sparse Legendre expansions via l 1 minimization

Sparse Legendre expansions via l 1 minimization Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY. By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B.

ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY. By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B. Submitted to the Annals of Statistics arxiv: 007.77v ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B. Tsybakov We

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

10725/36725 Optimization Homework 2 Solutions

10725/36725 Optimization Homework 2 Solutions 10725/36725 Optimization Homework 2 Solutions 1 Convexity (Kevin) 1.1 Sets Let A R n be a closed set with non-empty interior that has a supporting hyperplane at every point on its boundary. (a) Show that

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

A Comparative Framework for Preconditioned Lasso Algorithms

A Comparative Framework for Preconditioned Lasso Algorithms A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier Statistics and WTCHG University of Oxford flw@stats.ox.ac.uk Nebojsa Jojic Microsoft Research, Redmond jojic@microsoft.com

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

arxiv: v1 [stat.ml] 3 Nov 2010

arxiv: v1 [stat.ml] 3 Nov 2010 Preprint The Lasso under Heteroscedasticity Jinzhu Jia, Karl Rohe and Bin Yu, arxiv:0.06v stat.ml 3 Nov 00 Department of Statistics and Department of EECS University of California, Berkeley Abstract: The

More information

Universal low-rank matrix recovery from Pauli measurements

Universal low-rank matrix recovery from Pauli measurements Universal low-rank matrix recovery from Pauli measurements Yi-Kai Liu Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA yi-kai.liu@nist.gov

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Electronic Journal of Statistics ISSN: 935-7524 arxiv: arxiv:503.0388 Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Yuchen Zhang, Martin J. Wainwright

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Concentration Inequalities for Random Matrices

Concentration Inequalities for Random Matrices Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

A REMARK ON THE LASSO AND THE DANTZIG SELECTOR

A REMARK ON THE LASSO AND THE DANTZIG SELECTOR A REMARK ON THE LAO AND THE DANTZIG ELECTOR YOHANN DE CATRO Abstract This article investigates a new parameter for the high-dimensional regression with noise: the distortion This latter has attracted a

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Uncertainty quantification in high-dimensional statistics

Uncertainty quantification in high-dimensional statistics Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information