19.1 Problem setup: Sparse linear regression

Size: px
Start display at page:

Download "19.1 Problem setup: Sparse linear regression"

Transcription

1 ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In the last lecture we analyzed the -sparse Gaussian location model in high dimension (the denoising problem) and proved minimax rate for estimating the location parameter. In this lecture we extend the earlier ideas to sparse linear regression in high dimension. We prove a minimax lower bound and then obtain upper bounds on the ris of a few procedures Problem setup: Sparse linear regression The sparse linear regression model is Y n 1 = X n p θ p 1 + Z, Z N (0, I n ), (19.1) where X R n p is the design matrix and θ R p is an unnown -sparse parameter vector. In this lecture we are concerned with the case when n << p but n, i.e., we have more predictors in the design matrix than we have samples. Interpretation: Y is a noisy linear combination of the columns of the design matrix X. The goal here is to estimate θ, given Y and X. Note that the system has more unnowns than the number of equations and hence is indeterminate even without the noise. Estimation is made possible due to the -sparsity structure. Note: We consider here random design matrices only. More precisely we have so that the columns have roughly unit norm. X ij i.i.d N (0, 1/n), The next theorem proves a lower bound on the minimax ris for estimating θ in the -sparse regression model. Theorem The minimax ris for estimating θ in the model defined by Equation (19.1) is lower bounded by R = R (p,, n) = inf ˆθ sup E θ ˆθ θ 2 2 log ep θ 0, n. Proof. Note that (X i, Y i ) s are i.i.d sampled from a distribution P θ. We show that the KL-diameter for two different θ is exactly same as that in p-dimensional gaussian location model, i.e., We derive the result as follows, D(P θ0 P θ1 ) = 1 2 θ 0 θ

2 Hence, D(P Xi,Y i θ 0 P Xi,Y i θ 1 ) = E Xi [D(P Yi X i,θ 0 P Yi X i,θ 1 )] = E Xi [D(N ( X i, θ 0, 1) N ( X i, θ 1, 1))] = E Xi [ 1 2 X i, θ 0 θ 1 2 ] = 1 2 E X i [(θ 0 θ 1 ) X ix i (θ 0 θ 1 )] = 1 2 (θ 0 θ 1 ) E[X ix i ](θ 0 θ 1 ) = 1 2n θ 0 θ 1 2. D(P θ0 P θ1 ) = D(P n X i,y i θ 0 P n X i,y i θ 1 ) = 1 2 θ 0 θ 1 2. The result then follows from the analysis of the gaussian location model from previous lecture. From the previous discussion we have, R log ep for any n, which is identical to the minimax rate of the denoising problem for any n. This is reasonable, because even with full observation n p, which roughly corresponds to the denoising problem we cannot beat this rate. Surprisingly, as long as n log ep, the denoising rate is attainable and we have R log ep, achieved by, e.g., the maximum lielihood estimator. This is proved in Section Note that MLE is computationally expensive. A computable alternative procedure is the Dantzig selector [CT07]. As analyzed in Section 19.3 It is guaranteed to achieve the rate R log p as long as n log ep. This falls slightly sort of the optimal rate. However unlie MLE, the procedure is completely adaptive and can be cast as a linear programming problem. More recently a procedure called SLOPE [SC15] has been proposed which achieves the optimal rate. In particular its ris coincides with the minimax rate with sharp constant, namely, R (2 + o(1)) log ep as p and = o(p), provided that n log ep Analysis of MLE The MLE in this case is defined in terms of the solution (may or may not be unique) of an optimization problem, ˆθ MLE arg min θ 0 Y Xθ 2 2. (19.2) Unfortunately the optimization problem can only be solved through exhaustive search which is NP-hard in the worst case. Theorem Whenever n C log ep for some sufficiently large constant C, θ B 0(). ˆθ MLE θ 2 2 log ep, (19.3) X(ˆθ MLE θ) 2 2 log ep, (19.4) 2

3 hold with high probability. Proof. Since ˆθ is a minimizer, we have Y X ˆθ 2 2 Y Xθ 2 2 = Z 2 2. On the left hand side we have, Y X ˆθ 2 2 = Y Xθ + Xθ X ˆθ 2 2 = Z Xh 2 2, where h = ˆθ θ. Hence we have, which leads to the basic inequality Z Xh 2 2 Z 2 2, Xh Z, Xh = 2Z Xh 2 h 2 sup u S p 1 B 0 (2) Z Xu. Note that the left hand side is not the estimation error, instead it is the prediction error Xh 2 2 = X ˆθ Xθ 2 2. Hence to conclude both (19.2) and (19.4) from the basic inequality, it suffices to show (a) h 2 Xh 2, (Restricted isometry property) (b) sup u S p 1 B 0 (2) Z Xu log ep, with high probability. We first prove (b). sup Z xu = Z w(g) u S p 1 B 0 (2) log ep, where w(g) is the Gaussian width of the set S p 1 B 0 (2) and from last lecture we now, w(g) log ep. For (a) we will show that where c is a constant. First note that, Xh 2 inf c if n log ep h 0 h 2, Au 2 inf u 0 u 2 = σ min (A). For any feasible h, Xh = X J h J, where J = supp(h) is the support of h and X J is the n matrix whose columns are the columns of X that corresponds to the rows in the support J. Then we have Xh 2 inf h 0 h 2 For a fixed J, σ min (X J ) concentrates to 1 [ P min σ min(x J ) < t J n = min J σ min(x J ).. Hence an union bound gives, ] 3 ( ) p P[σ min (X [] ) < t].

4 Using the tail bound P [ σ min (X [] ) < 1 ] n t exp( t 2 /2), n and choosing t = 4 log ep and n = 100 log ep, we have 1 P[min J σ min (X J ) < 0.5] 0. This completes the proof. n t n 0.5 and consequently, 19.3 Dantzig selector The Dantzig selector can written as the following optimization problem, min θ 1, s.t X (Y Xθ) τ. (19.5) This optimization problem can be efficiently solved as a linear programming. Another computable procedure for -sparse regression is the Lasso, which can be written as the following optimization problem min Y Xθ λ θ 1, θ R n. (19.6) Note that X is added to the constraint in the Dantzig selector in (19.5) to mae the solution rotationinvariant. Precisely, if U O(n) be a n n orthogonal rotation matrix, then UY = UXθ + UZ. Note that in this case, ˆθ(X, Y ) = ˆθ(UX, UY ). Theorem Let the Dantzig selector ˆθ DS denote a minimizer of (19.5) we have ˆθ DS θ 2 2 log p, w.h.p as long as n C log ep for some sufficiently large constant C. Proof. Similar to the proof of Theorem?? in the denoising problem, the proof is divided in three steps. Step 1 : set τ to guarantee θ is feasible We choose, X Z τ = 2 log p, so that ground truth is feasible. Step 2: Structure of the error h = ˆθ θ Let J be the support of θ. Define the cone C J {h : h J c 1 h J 1 }. (19.7) Since ˆθ 1 θ 1, we have h C J. Claim X Xh 2τ 4

5 To see this note that, X Xh = X X(ˆθ θ) = X (Y Xθ) X (Y X ˆθ) X (Y Xθ) + X (Y X ˆθ) 2τ. Step 3: The ris We have, Xh 2 2 = Xh, Xh = h X Xh = X Xh, h X Xh h 1 (Holder) 2τ2 h J 1 4 τ h J 2 (Cauchy-Schwarz) 4 τ h 2. Now we need to show one last thing to complete the proof. Claim h 2 2 Xh 2 2 w.h.p h C J To prove this claim we use the special feature of the cone C J defined in (19.7), that for any h C J, half of the energy of h is on 2 co-ordinates, i.e. h is almost 2- sparse. Suppose h is ordered in the following fashion: The vector of length that corresponds to J = supp(θ), h J comes first. The rest of h is ordered in terms of decreasing magnitude. We divide the remaining h after first bloc into blocs of size and name the blocs K 1, K 2,... and the vectors h 1, h 2,..., such that h Ki = h i. Let a = h J K1 = h J + h 1. By construction, it has more than 1/2 of the energy, i.e., a h 2 2 and define, b h (J K1 ) c = i 2 h i. Then Xh 2 2 = Xa + Xb 2 2 Xa Xa, Xb (19.8) Since X satisfies the restricted isometry property, for n c log ep, there exists c 1(c) with c 1 1 if c, such that Xa 2 2 c 1 a 2 2 c 1 2 h 2 2. Now we need to just show that the cross term Xa, Xb is small in magnitude. For this we use the following restricted decorrelation lemma, 5

6 Lemma Let n c log ep. Then, with high probability, for all u, v B 0(2) we have, Xa, Xb u v, where c 2 = c 2 (c) and c 2 0 if c. Then we have, Xa, Xb = Xa, Xh j Xa, hx j a 2 h j 2 (Lemma 19.1) h 2 hj h 2 h j 1 1 h 2 ( h j 1 1 ) (By ordering) h 2 h J c 1 Property of cone Reverting bac to (19.8) we have, h 2 h J 1 h 2 2. This completes the proof. Xh 2 2 ( c 1 2 c 2) h 2 2 h 2 2. Remar 19.1 (Adaptivity issues). Note that the Dantzig selector procedure is adaptive to, but not to σ. To see this consider the following high dimensional -sparse regression problem, Y = Xθ + Z, Z N (0, σ 2 I n ). If σ is nown then we can set τ = σ 2 log p, but typically σ is not nown. A similar problem arises with Lasso as well. In (19.6), if σ is nown then the optimal λ = 2σ log p, but if σ is unnown then λ is a tuning parameter. As a remedy for this another procedure called square root Lasso was proposed which can be written as the following optimization problem, min Y Xθ 2 + λ θ 1, θ R n. The optimal λ = log p even when σ is unnown. However the downside is that this optimization problem is not easy to solve. 6

7 References [CT07] Emmanuel Candés and Terence Tao. The Dantzig selector: statistical estimation when p is much larger than n. pages , [SC15] Weijie Su and Emmanuel Candés. SLOPE is adaptive to unnown sparsity and asymptotically minimax. arxiv preprint arxiv: ,

19.1 Maximum Likelihood estimator and risk upper bound

19.1 Maximum Likelihood estimator and risk upper bound ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Shifting Inequality and Recovery of Sparse Signals

Shifting Inequality and Recovery of Sparse Signals Shifting Inequality and Recovery of Sparse Signals T. Tony Cai Lie Wang and Guangwu Xu Astract In this paper we present a concise and coherent analysis of the constrained l 1 minimization method for stale

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

15.1 Upper bound via Sudakov minorization

15.1 Upper bound via Sudakov minorization ECE598: Information-theoretic methos in high-imensional statistics Spring 206 Lecture 5: Suakov, Maurey, an uality of metric entropy Lecturer: Yihong Wu Scribe: Aolin Xu, Mar 7, 206 [E. Mar 24] In this

More information

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector Muhammad Salman Asif Thesis Committee: Justin Romberg (Advisor), James McClellan, Russell Mersereau School of Electrical and Computer

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

The Dantzig Selector

The Dantzig Selector The Dantzig Selector Emmanuel Candès California Institute of Technology Statistics Colloquium, Stanford University, November 2006 Collaborator: Terence Tao (UCLA) Statistical linear model y = Xβ + z y

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Z Algorithmic Superpower Randomization October 15th, Lecture 12 15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Optimal rates of convergence for estimating Toeplitz covariance matrices

Optimal rates of convergence for estimating Toeplitz covariance matrices Probab. Theory Relat. Fields 2013) 156:101 143 DOI 10.1007/s00440-012-0422-7 Optimal rates of convergence for estimating Toeplitz covariance matrices T. Tony Cai Zhao Ren Harrison H. Zhou Received: 17

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Statistical Estimation and Testing via the Sorted l 1 Norm

Statistical Estimation and Testing via the Sorted l 1 Norm Statistical Estimation and Testing via the Sorted l Norm Ma lgorzata Bogdan a Ewout van den Berg b Weijie Su c Emmanuel J. Candès c,d a Departments of Mathematics and Computer Science, Wroc law University

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Electronic Journal of Statistics ISSN: 935-7524 arxiv: arxiv:503.0388 Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators Yuchen Zhang, Martin J. Wainwright

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1. BY T. TONY CAI AND ANRU ZHANG University of Pennsylvania

ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1. BY T. TONY CAI AND ANRU ZHANG University of Pennsylvania The Annals of Statistics 2015, Vol. 43, No. 1, 102 138 DOI: 10.1214/14-AOS1267 Institute of Mathematical Statistics, 2015 ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS 1 BY T. TONY CAI AND ANRU ZHANG University

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Near-ideal model selection by l 1 minimization

Near-ideal model selection by l 1 minimization Near-ideal model selection by l minimization Emmanuel J. Candès and Yaniv Plan Applied and Computational Mathematics, Caltech, Pasadena, CA 925 December 2007; Revised May 2008 Abstract We consider the

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Does Compressed Sensing have applications in Robust Statistics?

Does Compressed Sensing have applications in Robust Statistics? Does Compressed Sensing have applications in Robust Statistics? Salvador Flores December 1, 2014 Abstract The connections between robust linear regression and sparse reconstruction are brought to light.

More information

SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX. By Weijie Su and Emmanuel Candès Stanford University

SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX. By Weijie Su and Emmanuel Candès Stanford University arxiv: arxiv:503.08393 SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX By Weijie Su and Emmanuel Candès Stanford University APPENDIX A: PROOFS OF TECHNICAL RESULTS As is

More information

Geometry of log-concave Ensembles of random matrices

Geometry of log-concave Ensembles of random matrices Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1

THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1 The Annals of Statistics 27, Vol. 35, No. 6, 2313 2351 DOI: 1.1214/953661523 Institute of Mathematical Statistics, 27 THE DANTZIG SELECTOR: STATISTICAL ESTIMATION WHEN p IS MUCH LARGER THAN n 1 BY EMMANUEL

More information

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China Acta Mathematica Sinica, English Series May, 015, Vol. 31, No. 5, pp. 755 766 Published online: April 15, 015 DOI: 10.1007/s10114-015-434-4 Http://www.ActaMath.com Acta Mathematica Sinica, English Series

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Does l p -minimization outperform l 1 -minimization?

Does l p -minimization outperform l 1 -minimization? Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

Overlapping Variable Clustering with Statistical Guarantees and LOVE

Overlapping Variable Clustering with Statistical Guarantees and LOVE with Statistical Guarantees and LOVE Department of Statistical Science Cornell University WHOA-PSI, St. Louis, August 2017 Joint work with Mike Bing, Yang Ning and Marten Wegkamp Cornell University, Department

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Lecture # 3 Orthogonal Matrices and Matrix Norms. We repeat the definition an orthogonal set and orthornormal set.

Lecture # 3 Orthogonal Matrices and Matrix Norms. We repeat the definition an orthogonal set and orthornormal set. Lecture # 3 Orthogonal Matrices and Matrix Norms We repeat the definition an orthogonal set and orthornormal set. Definition A set of k vectors {u, u 2,..., u k }, where each u i R n, is said to be an

More information

Optimal Link Prediction with Matrix Logistic Regression

Optimal Link Prediction with Matrix Logistic Regression Optimal Lin Prediction with Matrix Logistic Regression Nicolai Baldin and Quentin Berthet University of Cambridge Abstract. We consider the problem of lin prediction, based on partial observation of a

More information

Optimal Covariance Change Point Detection in High Dimension

Optimal Covariance Change Point Detection in High Dimension Optimal Covariance Change Point Detection in High Dimension Joint work with Daren Wang and Alessandro Rinaldo, CMU Yi Yu School of Mathematics, University of Bristol Outline Review of change point detection

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Another consequence of the Cauchy Schwarz inequality is the continuity of the inner product.

Another consequence of the Cauchy Schwarz inequality is the continuity of the inner product. . Inner product spaces 1 Theorem.1 (Cauchy Schwarz inequality). If X is an inner product space then x,y x y. (.) Proof. First note that 0 u v v u = u v u v Re u,v. (.3) Therefore, Re u,v u v (.) for all

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Definitions and Properties of R N

Definitions and Properties of R N Definitions and Properties of R N R N as a set As a set R n is simply the set of all ordered n-tuples (x 1,, x N ), called vectors. We usually denote the vector (x 1,, x N ), (y 1,, y N ), by x, y, or

More information

Minimax Rates. Homology Inference

Minimax Rates. Homology Inference Minimax Rates for Homology Inference Don Sheehy Joint work with Sivaraman Balakrishan, Alessandro Rinaldo, Aarti Singh, and Larry Wasserman Something like a joke. Something like a joke. What is topological

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Minimax MMSE Estimator for Sparse System

Minimax MMSE Estimator for Sparse System Proceedings of the World Congress on Engineering and Computer Science 22 Vol I WCE 22, October 24-26, 22, San Francisco, USA Minimax MMSE Estimator for Sparse System Hongqing Liu, Mandar Chitre Abstract

More information

Toeplitz Compressed Sensing Matrices with. Applications to Sparse Channel Estimation

Toeplitz Compressed Sensing Matrices with. Applications to Sparse Channel Estimation Toeplitz Compressed Sensing Matrices with 1 Applications to Sparse Channel Estimation Jarvis Haupt, Waheed U. Bajwa, Gil Raz, and Robert Nowak Abstract Compressed sensing CS has recently emerged as a powerful

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH. Weijie Su Małgorzata Bogdan Emmanuel Candès

FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH. Weijie Su Małgorzata Bogdan Emmanuel Candès FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH By Weijie Su Małgorzata Bogdan Emmanuel Candès Technical Report No. 2015-20 November 2015 Department of Statistics STANFORD UNIVERSITY Stanford, California

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information