(Part 1) High-dimensional statistics May / 41
|
|
- Amelia Anthony
- 5 years ago
- Views:
Transcription
1 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2 I)-distributed. We moreover assume that the linear model holds exactly, with some true parameter value β 0. (Part 1) High-dimensional statistics May / 41
2 (Part 1) High-dimensional statistics May / 41 What is an oracle inequality? Suppose for the moment that p n and that X has full rank p. Consider the least squares estimator in the linear model Then the prediction error ˆβ LM := (X T X) 1 X T Y. X( ˆβ LM β 0 ) 2 2 /σ2 is χ 2 p-distributed. In particular, this means that E X( ˆβ LM β 0 ) 2 2 n = σ2 n p. In words: each parameter βj 0 is estimated with squared accuracy σ 2 /n, j = 1,..., p. The overall squared accuracy is then (σ 2 /n) p.
3 (Part 1) High-dimensional statistics May / 41 Sparsity We now turn to the situation where possibly p > n. The philosophy that will generally rescue us, is to believe that in fact only a few, say s 0, of the βj 0 are non-zero. We use the notation S 0 : S 0 := {j : β 0 j 0}, so that s 0 = S 0. We call S 0 the active set, and s 0 the sparsity index of β 0.
4 (Part 1) High-dimensional statistics May / 41 Notation β j,s0 := β j l{j S 0 }, β j,s c 0 := β j l{j / S 0 }. Clearly, and β = β S0 + β S c 0, β 0 S c 0 = 0.
5 (Part 1) High-dimensional statistics May / 41 If we would know S 0, we could simply neglect all variables X (j) with j / S 0. Then, by the above argument, the overall squared accuracy would be (σ 2 /n) s 0. With S 0 is unknown, we apply the l 1 -penalty, i.e., the Lasso { } ˆβ := arg min Y Xβ 2 2 /n + λ β 1. β Definition: Sparsity oracle inequality. The sparsity constant φ 0 is the largest value φ 0 > 0 such that Lasso ˆβ satisfies the φ 0 -sparsity oracle inequality X( ˆβ β 0 ) 2 2 /n + λ ˆβ S c 0 1 λ2 s 0 φ 2. 0
6 A digression: the noiseless case Let X be some measurable space, Q be a probability measure on X, and be the L 2 (Q) norm. Consider a fixed dictionary of functions {ψ j } p j=1 L 2(Q): Consider linear functions f β ( ) = p β j ψ j ( ) : β R p. j=1 Consider moreover a fixed target f 0 := p βj 0 ψ j. j=1 We let S 0 := {j : βj 0 0} be its active set, and s 0 := S 0 be the sparsity index of f 0. (Part 1) High-dimensional statistics May / 41
7 (Part 1) High-dimensional statistics May / 41 For some fixed λ > 0, the Lasso for the noiseless problem is } β := arg min { f β f λ β 1, β where 1 is the l 1 -norm. We write f := f β and let S be the active set of the Lasso. The Gram matrix is Σ := ψ T ψdq.
8 (Part 1) High-dimensional statistics May / 41 We will need certain conditions on the Gram matrix to make the theory work. We require a certain compatibility of l 1 -norms with l 2 -norms. Compatibility Let L > 0 be some constant. The compatibility constant is φ 2 Σ (L, S 0) := φ 2 (L, S 0 ) := min{s 0 β T Σβ : β S0 1 = 1, β c S 0 1 L}. We say that the (L, S 0 )-compatibility condition is met if φ(l, S 0 ) > 0.
9 (Part 1) High-dimensional statistics May / 41 Back to the noisy case Lemma (Basic Inequality) We have X( ˆβ β 0 ) 2 2 /n + 2λ ˆβ 1 2ɛ T X( ˆβ β 0 )/n + 2λ β 0 1.
10 (Part 1) High-dimensional statistics May / 41 We introduce the set We assume that T := { } max 1 j p ɛt X (j) /n λ 0. λ > λ 0, to make sure that on T we can get rid of the random part of the problem.
11 (Part 1) High-dimensional statistics May / 41 Let us denote the diagonal elements of the Gram matrix ˆΣ := X T X/n, by ˆσ 2 j := ˆΣ j,j, j = 1,..., p. Lemma Suppose that σ 2 = ˆσ 2 j = 1 for all j. Then we have for all t > 0, and for 2t + 2 log p λ 0 :=, n P(T ) 1 2 exp[ t].
12 (Part 1) High-dimensional statistics May / 41 Compatibility condition (Noisy case) Let L > 0 be some constant. The compatibility constant is φ 2ˆΣ(L, S 0 ) := φ 2 (L, S 0 ) := min{s 0 β T ˆΣβ : β S0 1 = 1, β c S 0 1 L}. We say that the (L, S 0 )-compatibility condition is met if φ(l, S 0 ) > 0.
13 (Part 1) High-dimensional statistics May / 41 Theorem Suppose λ λ 0 and that the compatibility condition holds for S 0, with L = λ + λ 0 λ λ 0. Then on we have and T := { } max 1 j p ɛt X (j) /n λ 0, X( ˆβ β 0 ) 2 n 4(λ + λ 0 ) 2 s 0 /φ 2 (L, S 0 ), ˆβ S0 β 0 1 2(λ + λ 0 )s 0 /φ 2 (L, S 0 ), ˆβ c S 0 1 2L(λ + λ 0 )s 0 /φ 2 (L, S 0 ).
14 (Part 1) High-dimensional statistics May / 41 When does the compatibility condition hold? oracle inequalities for prediction and estimation RIP weak (S,2s)- RIP adaptive (S, 2s)- restricted regression (S,2s)-restricted eigenvalue S-compatibility S \S s * coherence adaptive (S, s)- restricted regression (S,s)-restricted eigenvalue weak (S, 2s)- irrepresentable (S,2s)-irrepresentable (S,s)-uniform irrepresentable S \S =0 *
15 (Part 1) High-dimensional statistics May / 41 If Σ is non-singular, the compatibility condition holds, with φ 2 (S 0 ) Λ 2 min, the latter being the smallest eigenvalue of Σ. Example Consider the matrix 1 ρ ρ Σ := (1 ρ)i + ριι T ρ 1 ρ =....., ρ ρ 1 with 0 < ρ < 1, and ι := (1,..., 1) T a vector of 1 s. Then the smallest eigenvalue of Σ is Λ 2 min = 1 ρ, so the compatibility condition holds with φ(s 0 ) 1 ρ. (The uniform S 0 -irrepresentable condition is met as well.)
16 (Part 1) High-dimensional statistics May / 41 If Σ is non-singular, the compatibility condition holds, with φ 2 (S 0 ) Λ 2 min, the latter being the smallest eigenvalue of Σ. Example Consider the matrix 1 ρ ρ Σ := (1 ρ)i + ριι T ρ 1 ρ =....., ρ ρ 1 with 0 < ρ < 1, and ι := (1,..., 1) T a vector of 1 s. Then the smallest eigenvalue of Σ is Λ 2 min = 1 ρ, so the compatibility condition holds with φ(s 0 ) 1 ρ. (The uniform S 0 -irrepresentable condition is met as well.)
17 (Part 1) High-dimensional statistics May / 41 Geometric interpretation Let X j R n denote the j-th column of X (j = 1,..., p). The set A := {Xβ S : β S 1 = 1} is the convex hull of the vectors {±X j } j S in R n. Likewise, the set B := {Xβ S c : β S c 1 L} is the convex hull including interior of the vectors {±LX j } j S c. The l 1 -eigenvalue δ(l, S) is the distance between these two sets. δ(l,s) B A
18 (Part 1) High-dimensional statistics May / 41 We note that: if L is large the l 1 -eigenvalue will be small, it will also be small if the vectors in S exhibit strong correlation with those in S c, when the vectors in {X j } j S are linearly dependent, it holds that {Xβ S : β S 1 = 1} = {Xβ S : β S 1 1}, and hence then δ(l, S) = 0.
19 (Part 1) High-dimensional statistics May / 41 The difference between the compatibility constant and the squared l 1 -eigenvalue lies only in the normalization by the size S of the set S. This normalization is inspired by the orthogonal case, which we detail in the following example. Example Suppose that the columns of X are all orthogonal: Xj T X k = 0 for all j k. Then δ(l, S) = 1/ S and φ(l, S) = 1.
20 (Part 1) High-dimensional statistics May / 41 Let S β := {j : β j 0}. We call S β the sparsity-index of β. More generally, we call S the sparsity index of the set S. Definition For a set S and constant L > 0, the effective sparsity Γ 2 (L, S) is the inverse of the squared l 1 -eigenvalue, that is Γ 2 (L, S) = 1 δ 2 (L, S).
21 (Part 1) High-dimensional statistics May / 41 Example As a simple numerical example, let us suppose n = 2, p = 3, S = {3}, and X = ( ) 5/ n. 12/ The l 1 -eigenvalue δ(l, S) is equal to the distance of X 3 to line that connects LX 1 and LX 2, that is δ(l, S) = max{(5 L)/ 26, 0}. Hence, for example for L = 3 the effective sparsity is Γ 2 (3, S) = 13/2. Alternatively, when X = ( ) 12/ n. 5/ then for example δ(3, S) = 0 and hence Γ 2 (3, S) =. This is due to the sharper angle between X 1 and X 3.
22 The compatibility condition is slightly weaker than the restricted eigenvalue condition of Bickel et al. [2009]. The restricted isometry property of Candes [2005] implies the restricted eigenvalue condition. (Part 1) High-dimensional statistics May / 41
23 The compatibility condition is slightly weaker than the restricted eigenvalue condition of Bickel et al. [2009]. The restricted isometry property of Candes [2005] implies the restricted eigenvalue condition. (Part 1) High-dimensional statistics May / 41
24 (Part 1) High-dimensional statistics May / 41 Approximating the Gram matrix For two (positive semi-definite) matrices Σ 0 and Σ 1, we define the supremum distance Σ 1 Σ 0 := max (Σ 1 ) j,k (Σ 0 ) j,k. j,k Lemma Assume Then β S c β S0 1, f β 2 Σ 1 f β 2 1 Σ 0 Σ 1 Σ 0 λ. 16 λs φ 2 compatible (Σ 0, S 0 ).
25 (Part 1) High-dimensional statistics May / 41 Corollary We have φ Σ1 (3, S 0 ) φ Σ0 (3, S 0 ) 4 Σ 0 Σ 1 s 0.
26 (Part 1) High-dimensional statistics May / 41 Example Suppose we have a Gaussian random matrix ˆΣ := X T X/n = (ˆσ j,k ), where X = (X i,j ) is a n p-matrix with i.i.d. N (0, 1)-distributed entries in each column. For all t > 0, and for 4t + 8 log p 4t + 8 log p λ(t) := +, n n one has the inequality ( ) P ˆΣ Σ λ(t) 2 exp[ t].
27 Example (continued) Hence, we know for example that with probability at least 1 2 exp[ t], φ compatible (ˆΣ, S 0 ) Λ min (Σ) 4 λ(t)s 0. This leads to a bound on the sparsity of the form s 0 = o(1/ λ(t)), which roughly says that s 0 should be of small order n/ log p. (Part 1) High-dimensional statistics May / 41
28 Definition We call a random variable X sub-gaussian if for some constant K and σ 2 0, E exp[x 2 /K 2 ] σ 2 0. Theorem Suppose X 1,..., X n are uniformly sub-gaussian with constants K and σ0 2. Then for constants η = η(k, σ2 0 ), it holds that β T 1 ˆΣβ 3 βt Σβ t + log p β 2 1 n /η2, with probability at least 1 2 exp[ t]. See Raskutti, Wainwright and Yu [2010]. (Part 1) High-dimensional statistics May / 41
29 (Part 1) High-dimensional statistics May / 41 General convex loss Consider data {Z i } n i=1, with Z i in some space Z. Consider a linear space F := {f β ( ) = p j=1 β jψ j ( ) : β R p }. For each f F, ρ f : Z R be a loss function. We assume that the map f ρ f (z) is convex for all z Z. For example, Z i = (X i, Y i ), and ρ is quadratic loss or logistic loss ρ f (, y) = (y f ( )) 2, ρ f (, y) = yf ( ) + log(1 + exp[f ( )]), or minus log-likelihood loss ρ f = f log exp[f ], etc.
30 (Part 1) High-dimensional statistics May / 41 We denote, for a function ρ : Z R, the empirical average by P n ρ := n ρ(z i )/n, i=1 and the theoretical mean by Pρ := n Eρ(Z i )/n. i=1 The Lasso is { } ˆβ = arg min P n ρ fβ + λ β 1. (1) β We write ˆf = f ˆβ.
31 (Part 1) High-dimensional statistics May / 41 We furthermore define the target as the minimizer of the theoretical risk f 0 := arg min f F Pρ f. The excess risk is E(f ) := P(ρ f ρ f 0). Note that by definition, E(f ) 0 for all f F. We will mainly examine the excess risk E(ˆf ) of the Lasso.
32 (Part 1) High-dimensional statistics May / 41 Definition We say that the margin condition holds with strictly convex function G, if E(f ) G( f f 0 ). In typical cases, the margin condition holds with quadratic function G, that is, G(u) = cu 2, u 0, where c is a positive constant. G uv uv-g(u) Definition Let G be a strictly convex function on [0, ). Its convex conjugate H is defined as -G(u) H(v) = sup {uv G(u)}, v 0. u H(v)
33 (Part 1) High-dimensional statistics May / 41 Set and Z M := sup (P n P)(ρ fβ ρ fβ0 ), (2) β β 0 1 M M 0 := H ( ) 4λ s0 /λ 0, (3) φ(s 0 ) where φ(s 0 ) = φ compatible (S 0 ). Set T := {Z M0 λ 0 M 0 }. (4)
34 (Part 1) High-dimensional statistics May / 41 Theorem (Oracle inequality for the Lasso) Assume the compatibility condition and the margin condition with strictly convex function G. Take λ 8λ 0. Then on the set T given in (4), we have ( ) 4λ E(ˆf ) + λ ˆβ β 0 s0 1 4H. φ(s 0 )
35 (Part 1) High-dimensional statistics May / 41 Corollary Assume quadratic margin behavior, i.e., G(u) = u 2. Then H(v) = v 2 /4, and we obtain on T, E(ˆf ) + λ ˆβ β 0 1 4λ2 s 0 φ 2 (S 0 ).
36 (Part 1) High-dimensional statistics May / 41 l 2 -rates To derive rates for ˆβ β 0 2, we need a stronger compatibility condition. Definition We say that the (S 0, 2s 0 )-restricted eigenvalue condition is satisfied, with constant φ = φ(s 0, 2s 0 ) > 0, if for all N S 0, N = 2s 0, and all β R p, that satisfy β S c β S0 1, and β j β N \S0, j / N, it holds that β N 2 f β /φ.
37 Lemma Suppose the conditions of the previous theorem are met, but now with the stronger (S 0, 2s 0 )-restricted eigenvalue condition. On T, ( )) 4λ 2 ˆβ β (H 16 s0 /(λ 2 s 0 ) + λ2 s 0 φ 4φ 4. In the case of quadratic margin behavior, with G(u) = u 2, we then get on T, ˆβ β λ2 s 0 φ 4. (Part 1) High-dimensional statistics May / 41
38 (Part 1) High-dimensional statistics May / 41 Theory for l 1 /l 2 -penalties Group Lasso Y i = p ( T ) X (j) i,t β0 j,t + ɛ i, i = 1,..., n, j=1 t=1 where the βj 0 := (βj,1 0,..., β0 j,t )T have sparsity property βj 0 0 for most j. l 1 /l 2 -penalty: p β 2,1 := β j 2. j=1
39 Multivariate linear model Y i,t = p j=1 X (j) i,t β0 j,t + ɛ i,t,, i = 1,..., n, t = 1,..., T, with for β 0 j := (β 0 j,1,..., β0 j,t )T, the sparsity property β 0 j 0 for most j. Linear model with time-varying coefficients Y i (t) = p j=1 X (j) i (t)β 0 j (t) + ɛ i (t), i = 1,..., n, t = 1,..., T, where the coefficients β 0 j ( ) are smooth functions, with the sparsity property that most of the β 0 j 0. (Part 1) High-dimensional statistics May / 41
40 (Part 1) High-dimensional statistics May / 41 High-dimensional additive model Y i = p j=1 f 0 j (X (j) i ) + ɛ i, i = 1,..., n, where the f 0 j (X (j) i ) are (non-parametric) smooth functions, with sparsity property f 0 j 0 for most j.
41 (Part 1) High-dimensional statistics May / 41 Theorem Consider the group Lasso where λ 4λ 0, with { } ˆβ = arg min Y Xβ 2 2 /n + λ T β 2,1, β λ 0 = 2 4x + 4 log p n T Then with probability at least 1 exp[ x], we have 4x + 4 log p. T X ˆβ f /n + λ T ˆβ β 0 2,1 24λ2 T S0 s 0 φ 2. 0
42 (Part 1) High-dimensional statistics May / 41 Theorem Consider the smoothed group Lasso } ˆβ := arg min { Y Xβ 22 /n + λ β 2,1 + λ 2 Bβ 2,1, β where λ 4λ 0. Then on T := {2 ɛ T Xβ /n λ 0 β 2,1 + λ 2 0 Bβ 2,1}, we have } ˆf f 0 2 n + λpen( ˆβ β 0 )/2 3 {16λ 2 s 0 /φ λ2 Bβ 0 2,1.
43 etc.... (Part 1) High-dimensional statistics May / 41
The deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationarxiv: v1 [math.st] 5 Oct 2009
On the conditions used to prove oracle results for the Lasso Sara van de Geer & Peter Bühlmann ETH Zürich September, 2009 Abstract arxiv:0910.0722v1 [math.st] 5 Oct 2009 Oracle inequalities and variable
More informationThe adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)
Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationLecture I: Asymptotics for large GUE random matrices
Lecture I: Asymptotics for large GUE random matrices Steen Thorbjørnsen, University of Aarhus andom Matrices Definition. Let (Ω, F, P) be a probability space and let n be a positive integer. Then a random
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationLecture 24 May 30, 2018
Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationarxiv: v2 [stat.ml] 4 Apr 2012
A General Framework of Dual Certificate Analysis for Structured Sparse Recovery Problems arxiv:1201.3302v2 [stat.ml] 4 Apr 2012 Cun-Hui Zhang Department of Statistics Rutgers University, NJ czhang@stat.rutgers.edu
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationRegression #5: Confidence Intervals and Hypothesis Testing (Part 1)
Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #5 1 / 24 Introduction What is a confidence interval? To fix ideas, suppose
More informationl 1 -Regularized Linear Regression: Persistence and Oracle Inequalities
l -Regularized Linear Regression: Persistence and Oracle Inequalities Peter Bartlett EECS and Statistics UC Berkeley slides at http://www.stat.berkeley.edu/ bartlett Joint work with Shahar Mendelson and
More informationMinimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls
6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior
More informationSparsity and the Lasso
Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationLecture 21 Theory of the Lasso II
Lecture 21 Theory of the Lasso II 02 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Available now, due next Monday Problem Set 7 - Available now, due December 11th
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationSparse Legendre expansions via l 1 minimization
Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY. By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B.
Submitted to the Annals of Statistics arxiv: 007.77v ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY By Karim Lounici, Massimiliano Pontil, Sara van de Geer and Alexandre B. Tsybakov We
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More information10725/36725 Optimization Homework 2 Solutions
10725/36725 Optimization Homework 2 Solutions 1 Convexity (Kevin) 1.1 Sets Let A R n be a closed set with non-empty interior that has a supporting hyperplane at every point on its boundary. (a) Show that
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationINDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina
INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed
More informationA Comparative Framework for Preconditioned Lasso Algorithms
A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier Statistics and WTCHG University of Oxford flw@stats.ox.ac.uk Nebojsa Jojic Microsoft Research, Redmond jojic@microsoft.com
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationChapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors
Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More informationOn Model Selection Consistency of Lasso
On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationLasso, Ridge, and Elastic Net
Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent
More informationHigh-dimensional statistics: Some progress and challenges ahead
High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationarxiv: v1 [stat.ml] 3 Nov 2010
Preprint The Lasso under Heteroscedasticity Jinzhu Jia, Karl Rohe and Bin Yu, arxiv:0.06v stat.ml 3 Nov 00 Department of Statistics and Department of EECS University of California, Berkeley Abstract: The
More informationUniversal low-rank matrix recovery from Pauli measurements
Universal low-rank matrix recovery from Pauli measurements Yi-Kai Liu Applied and Computational Mathematics Division National Institute of Standards and Technology Gaithersburg, MD, USA yi-kai.liu@nist.gov
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationOptimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators
Electronic Journal of Statistics ISSN: 935-7524 arxiv: arxiv:503.0388 Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Yuchen Zhang, Martin J. Wainwright
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationConcentration Inequalities for Random Matrices
Concentration Inequalities for Random Matrices M. Ledoux Institut de Mathématiques de Toulouse, France exponential tail inequalities classical theme in probability and statistics quantify the asymptotic
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationarxiv: v1 [math.st] 10 Sep 2015
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationA REMARK ON THE LASSO AND THE DANTZIG SELECTOR
A REMARK ON THE LAO AND THE DANTZIG ELECTOR YOHANN DE CATRO Abstract This article investigates a new parameter for the high-dimensional regression with noise: the distortion This latter has attracted a
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationUncertainty quantification in high-dimensional statistics
Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More information