Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
|
|
- Earl Baldwin
- 5 years ago
- Views:
Transcription
1 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28
2 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis: high level view Some extensions (complex regularization) structured sparsity graphical model matrix regularization T. Zhang (Rutgers) Sparsity Models 2 / 28
3 Modern Sparsity Analysis: Motivation Modern datasets are often high dimensional statistical estimation suffers from curse of dimensionality T. Zhang (Rutgers) Sparsity Models 3 / 28
4 Modern Sparsity Analysis: Motivation Modern datasets are often high dimensional statistical estimation suffers from curse of dimensionality Sparsity: popular assumption to address curse of dimensionality motivated from real applications T. Zhang (Rutgers) Sparsity Models 3 / 28
5 Modern Sparsity Analysis: Motivation Modern datasets are often high dimensional statistical estimation suffers from curse of dimensionality Sparsity: popular assumption to address curse of dimensionality motivated from real applications Challenges: formulation, focusing on efficient computation mathematical analysis T. Zhang (Rutgers) Sparsity Models 3 / 28
6 Standard Sparse Regression Model: Y = X β + ɛ Y R n : observation X R n p : design matrix β R p : parameter vector to be estimated ɛ R n : zero mean stochastic noise with variance σ 2 T. Zhang (Rutgers) Sparsity Models 4 / 28
7 Standard Sparse Regression Model: Y = X β + ɛ Y R n : observation X R n p : design matrix β R p : parameter vector to be estimated ɛ R n : zero mean stochastic noise with variance σ 2 High dimensional setting: n p T. Zhang (Rutgers) Sparsity Models 4 / 28
8 Standard Sparse Regression Model: Y = X β + ɛ Y R n : observation X R n p : design matrix β R p : parameter vector to be estimated ɛ R n : zero mean stochastic noise with variance σ 2 High dimensional setting: n p Sparsity: β has few nonzero components supp( β) = {j : β j 0}. β 0 = supp( β) is small: n T. Zhang (Rutgers) Sparsity Models 4 / 28
9 Algorithms for Standard Sparsity L 0 regularization: natural method (computationally inefficient) ˆβ L0 = arg min β Y Xβ 2 2, subject to β 0 k T. Zhang (Rutgers) Sparsity Models 5 / 28
10 Algorithms for Standard Sparsity L 0 regularization: natural method (computationally inefficient) ˆβ L0 = arg min β Y Xβ 2 2, subject to β 0 k L 1 regularization (Lasso): convex relaxation (computationally efficient) [ ] ˆβ L1 = arg min Y Xβ λ β 1 β T. Zhang (Rutgers) Sparsity Models 5 / 28
11 Algorithms for Standard Sparsity L 0 regularization: natural method (computationally inefficient) ˆβ L0 = arg min β Y Xβ 2 2, subject to β 0 k L 1 regularization (Lasso): convex relaxation (computationally efficient) [ ] ˆβ L1 = arg min Y Xβ λ β 1 β Theoretical questions: how well can we estimate parameter β (recovery performance) T. Zhang (Rutgers) Sparsity Models 5 / 28
12 Greedy Algorithms for standard sparse regularization Reformulation: find variable set F {1,..., p} to minimize min β Xβ Y 2 2 supp(β) F s.t. F k Forward Greedy Algorithm (OMP): select variables one by one Initialize variable set F k = at k = 0 Iterate k = 1,..., p find best variable j to add to F k 1 (maximum reduction of squared error) F k = F k 1 {j} terminate with some criterion; output ˆβ using regression with selected variables F k T. Zhang (Rutgers) Sparsity Models 6 / 28
13 Greedy Algorithms for standard sparse regularization Reformulation: find variable set F {1,..., p} to minimize min β Xβ Y 2 2 supp(β) F s.t. F k Forward Greedy Algorithm (OMP): select variables one by one Initialize variable set F k = at k = 0 Iterate k = 1,..., p find best variable j to add to F k 1 (maximum reduction of squared error) F k = F k 1 {j} terminate with some criterion; output ˆβ using regression with selected variables F k Theoretical question: recovery performance? T. Zhang (Rutgers) Sparsity Models 6 / 28
14 Conditions and Results Type of results (sparse recovery): Variable selection (can we find nonzero variables): can we recover the true support F? supp( ˆβ) F? Parameter estimation (how well we can estimate β): can we recover the parameters? ˆβ β 2 2? Are efficient algorithms (such as L 1 or OMP) good enough? T. Zhang (Rutgers) Sparsity Models 7 / 28
15 Conditions and Results Type of results (sparse recovery): Variable selection (can we find nonzero variables): can we recover the true support F? supp( ˆβ) F? Parameter estimation (how well we can estimate β): can we recover the parameters? ˆβ β 2 2? Are efficient algorithms (such as L 1 or OMP) good enough? Yes but require conditions: Irrepresentable: for support recovery RIP Restricted Isometry Property: for parameter recovery T. Zhang (Rutgers) Sparsity Models 7 / 28
16 KKT Condition for Lasso Solution Lasso solution: KKT condition: at ˆβ = ˆβ L1 : [ ] ˆβ L1 = arg min Y Xβ λ β 1 β Exists a sub-gradient being zero: for all j = 1,..., p (X j is the j-th column of X): 2Xj (X ˆβ y) + λ ˆβ j = 0. 1 u > 0 Subgradient of L 1 norm: u = sign(u) = 1 u < 0 [ 1, 1] u = 0. If we can find a ˆβ that satisfies KKT condition, then it is Lasso solution. A slightly stronger condition implies uniqueness. T. Zhang (Rutgers) Sparsity Models 8 / 28
17 Feature Selection Consistency of Lasso Idea: construct a solution and check KKT condition. T. Zhang (Rutgers) Sparsity Models 9 / 28
18 Feature Selection Consistency of Lasso Idea: construct a solution and check KKT condition. Define ˆβ such that ˆβ F satisfies: and set ˆβ F c = 0. 2X F (X F ˆβ F y) + λsign( β) F = 0, T. Zhang (Rutgers) Sparsity Models 9 / 28
19 Feature Selection Consistency of Lasso Idea: construct a solution and check KKT condition. Define ˆβ such that ˆβ F satisfies: and set ˆβ F c = 0. Condition A: X F X F is full rank sign( ˆβ F ) = sign( ˆβ F ) 2X F (X F ˆβ F y) + λsign( β) F = 0, 2 Xj (X F ˆβ F y) < λ for j / F. Under Condition A: ˆβ is the unique Lasso solution (satisfies KKT) T. Zhang (Rutgers) Sparsity Models 9 / 28
20 Irrepresentable Condition The condition µ = sup Xj X F (X F X F ) 1 sign( β) F < 1, j F is called irrepresentable condition. It implies condition A when y = X β and λ is sufficiently small. Under irrepresentable condition, if noise is sufficiently small min j F β j is larger than noise level then there exists appropriate λ such that condition A holds. Thus Lasso solution is unique and feature selection consistent. T. Zhang (Rutgers) Sparsity Models 10 / 28
21 Irrepresentable Condition The condition µ = sup Xj X F (X F X F ) 1 sign( β) F < 1, j F is called irrepresentable condition. It implies condition A when y = X β and λ is sufficiently small. Under irrepresentable condition, if noise is sufficiently small min j F β j is larger than noise level then there exists appropriate λ such that condition A holds. Thus Lasso solution is unique and feature selection consistent. Condition similar to irrepresentable condition can be derived for OMP. T. Zhang (Rutgers) Sparsity Models 10 / 28
22 RIP Conditions Feature selection consistency implies good parameter estimation. However irrepresentable condition is too strong. T. Zhang (Rutgers) Sparsity Models 11 / 28
23 RIP Conditions Feature selection consistency implies good parameter estimation. However irrepresentable condition is too strong. RIP (restricted isometry property): weaker condition which can be used to obtain parameter estimation result. Definition of RIP: for some c > 1, the following condition holds with k = F = β 0. ρ + (c k)/ρ (c k) < { β X Xβ ρ + (s) = sup β β { β X Xβ ρ (s) = inf β β } : β 0 s } : β 0 s T. Zhang (Rutgers) Sparsity Models 11 / 28
24 Results under Restricted Isometry Property Parameter estimation under RIP: β ˆβ 2 = O(σ 2 β 0 ln p/n), where σ 2 is noise variance. this result can be obtained both for Lasso and for OMP. it is best possible T. Zhang (Rutgers) Sparsity Models 12 / 28
25 Results under Restricted Isometry Property Parameter estimation under RIP: β ˆβ 2 = O(σ 2 β 0 ln p/n), where σ 2 is noise variance. this result can be obtained both for Lasso and for OMP. it is best possible Feature selection under RIP: neither procedure achieves feature selection consistency. T. Zhang (Rutgers) Sparsity Models 12 / 28
26 Results under Restricted Isometry Property Parameter estimation under RIP: where σ 2 is noise variance. β ˆβ 2 = O(σ 2 β 0 ln p/n), this result can be obtained both for Lasso and for OMP. it is best possible Feature selection under RIP: neither procedure achieves feature selection consistency. Improvement: non-convex formulation is needed for optimal feature selection under RIP trickier to analyze: general theory only appeared very recently T. Zhang (Rutgers) Sparsity Models 12 / 28
27 Complex Regularization: structured sparsity Wavelet domain: sparsity pattern not random (structured) Image domain Wavelet domain T. Zhang (Rutgers) Sparsity Models 13 / 28
28 Complex Regularization: structured sparsity Wavelet domain: sparsity pattern not random (structured) Image domain Wavelet domain can we take advantage of structure? T. Zhang (Rutgers) Sparsity Models 13 / 28
29 Structured Sparsity Characterization Observation: sparsity pattern is the set of nonzero coefficients not all sparse patterns are equally likely Our proposal: information theoretical characterization of structure : a sparsity pattern F is associated with cost c(f ) c(f) is negative log-likelihood of F (or its multiple). Optimization problem: min β Xβ Y 2 2 subject to β 0 + c(supp(β)) s. c(supp(β)): cost for selecting support supp(β) β 0 : cost for estimation after feature selection T. Zhang (Rutgers) Sparsity Models 14 / 28
30 Example: Group Structure Variables are divided into pre-defined groups G 1,..., G p/m m variables per group Example (m = 4) G 1 G 2 G 4 G p/m nodes: variables gray nodes: selected variables (groups 1,2,4) Assumption: coefficients are not completely random coefficients in each group are simultaneously (or nearly simultaneously) zeros or nonzeros How to take advantage of group structure? T. Zhang (Rutgers) Sparsity Models 15 / 28
31 Example: Group Structure Variables are divided into pre-defined groups G 1,..., G p/m m variables per group Assumption: coefficients in each group are simultaneously zeros or nonzeros Group sparsity pattern cost: β 0 + m 1 β 0 ln p. Standard sparsity pattern cost (for Lasso): β 0 ln p Theoretical question: can we take advantage of group sparsity structure to improve Lasso? T. Zhang (Rutgers) Sparsity Models 16 / 28
32 Convex Relaxation for group sparsity L 1 L 2 convex relaxation (group Lasso) ˆβ = arg min Xβ Y λ β j β Gj 2. This is supposed to take advantage of group sparsity structure within group: uses L 2 regularization (doesn t encourage sparsity) across group: uses L 1 regularization (encourage sparsity) T. Zhang (Rutgers) Sparsity Models 17 / 28
33 Convex Relaxation for group sparsity L 1 L 2 convex relaxation (group Lasso) ˆβ = arg min Xβ Y λ β j β Gj 2. This is supposed to take advantage of group sparsity structure within group: uses L 2 regularization (doesn t encourage sparsity) across group: uses L 1 regularization (encourage sparsity) Question: what is the benefit of group Lasso formulation? T. Zhang (Rutgers) Sparsity Models 17 / 28
34 Recovery Analysis for Lasso and Group Lasso Simple sparsity: s = β 0 variables out of p variables information theoretical complexity (log of mumbler of choices): O(s ln p) statistical recovery performance: ˆβ β 2 2 = O(σ 2 β 0 ln p/n) T. Zhang (Rutgers) Sparsity Models 18 / 28
35 Recovery Analysis for Lasso and Group Lasso Simple sparsity: s = β 0 variables out of p variables information theoretical complexity (log of mumbler of choices): O(s ln p) statistical recovery performance: ˆβ β 2 2 = O(σ 2 β 0 ln p/n) Group sparsity: g groups out of p/m groups (ideally g = β 0 /m) information theoretical complexity (log of number of choices): O(g ln(p/m)) Statistical recovery performance for group Lasso: if supp( β) is covered in g groups, under group RIP (weaker than RIP) ˆβ β 2 2 = O σ2 n ( g } ln(p/m) {{} group selection + mg ) }{{} estimation after group selection T. Zhang (Rutgers) Sparsity Models 18 / 28
36 Group sparsity: correct group structure 2 (a) Original (b) Lasso (b) Group Lasso T. Zhang (Rutgers) Sparsity Models 19 / 28
37 Group sparsity: incorrect group structure 2 (a) Original (b) Lasso (c) Group Lasso T. Zhang (Rutgers) Sparsity Models 20 / 28
38 Matrix Formulation: Graphical Model Example Learning gene interaction network structure T. Zhang (Rutgers) Sparsity Models 21 / 28
39 Formulation: Gaussian Graphical Model Multi-dimensional Gaussian vectors: X 1,... X n N(µ, Σ). Precision matrix Θ = Σ 1 Non-zeros of precision matrix gives graphical model structure: [ P(X i ) Θ exp 1 ] 2 (X i µ) T Θ(X i µ). where is determinant. Estimation: L 1 regularized maximum likelihood estimator [ ] ˆΘ = arg min ln Θ + tr(ˆσθ) + λ Θ 1, Θ 1 : element L 1 regularization to encourage sparsity ˆΣ: empirical covariance matrix. Analysis exists (feature selection and parameter estimation): techniques similar to L 1 analysis but not satisfactory T. Zhang (Rutgers) Sparsity Models 22 / 28
40 Matrix Completion user movie M1 M2 M3 M4 M5 M6 M7 M8 U1 1? ?? U2 2 4?? 4? 1 2 U ? 4?? 2 U4? ? 1 1 U5 1?? 1?? 3 2 m n matrix: m users and n movies with incomplete ratings can we fill-in the missing values? T. Zhang (Rutgers) Sparsity Models 23 / 28
41 Matrix Completion user movie M1 M2 M3 M4 M5 M6 M7 M8 U1 1? ?? U2 2 4?? 4? 1 2 U ? 4?? 2 U4? ? 1 1 U5 1?? 1?? 3 2 m n matrix: m users and n movies with incomplete ratings can we fill-in the missing values? require assumptions: intuition: U2 and U3 has similar ratings on observed ratings assume they have similar preference low-rank (rank-r) structure: user i to u i R r and movie j to v j R r, with rating u T i v j. Let X the true rating matrix X UV T (U : m r V : n r) T. Zhang (Rutgers) Sparsity Models 23 / 28
42 Formulation Let S = {observed (i, j) entries} Let y ij be the observed values for (i, j) S Let X be the true underlying rating matrix We want to find X to fit observed y ij, assuming X is low-rank: min (X ij y ij ) 2 + λ rank(x). X R m n (i,j) S rank(x): nonconvex function of X convex relaxation: trace-norm X, defined as the sum of singular values of X. The convex reformulation is min (X ij y ij ) 2 + λ X. X R m n (i,j) S Solution of trace norm regularization is low-rank. T. Zhang (Rutgers) Sparsity Models 24 / 28
43 Sparsity versus Low-rank A vector β R p : p parameters reduce dimension sparsity β 0 is small constraint β 0 s is nonconvex convex relaxation: convex hull of unit 1-sparse vectors, which gives L 1 regularization β 1 1 vector solution with L 1 regularization is sparse A matrix X R m n : m n parameters reduce dimension lowrank X = r j=1 u jvj T where u j R m and v j R n are vectors number of parameters no more than rm + rn. rank constraint is nonconvex convex relaxation: convex hull of unit rank-one matrices, which gives trace-norm regularization X 1 matrix solution with trace-norm regularization is low-rank T. Zhang (Rutgers) Sparsity Models 25 / 28
44 Matrix Regularization Example: mixed sparsity and low rank Y (observed) = X L (low-rank) + X S (sparse) [ ˆX S, ˆX L ] = arg min 1 2µ (X S + X L ) Y λ X S 1 + X L }{{}. trace norm trace norm: sum of singular values of a matrix encourage low-rank matrix T. Zhang (Rutgers) Sparsity Models 26 / 28
45 Theoretical Analysis Want to know under what conditions, we can recovery X S and X L Matrix is m n. X S : sparse (spike like outliers) no more than n 0 outliers per row no more than m 0 outliers per column X L : rank is r incoherence: X L is flat no component is large T. Zhang (Rutgers) Sparsity Models 27 / 28
46 Theoretical Analysis Want to know under what conditions, we can recovery X S and X L Matrix is m n. X S : sparse (spike like outliers) no more than n 0 outliers per row no more than m 0 outliers per column X L : rank is r incoherence: X L is flat no component is large Question: how many outliers per row n 0 and per column m 0 are allowed to recover X S and X L? Partial answer (not completely satisfactory): Sparsity pattern supp(x S ) is random: exact recovery under the following conditions m 0 = O(m) and n 0 = O(n). Sparsity pattern supp(x S ) does not have to be random: m 0 c(m/r) and n 0 c(n/r) for some constant c (r is the rank of X L ) T. Zhang (Rutgers) Sparsity Models 27 / 28
47 References Statistical Science Special Issue on Sparsity and Regularization, Structured Sparsity: F. Bach, et al General Theoretical Analysis: S. Negahban et al. Graphical Models: J. Lafferty et al. Nonconvex Methods: CH Zhang and T Zhang... T. Zhang (Rutgers) Sparsity Models 28 / 28
Analysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationAdaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations
Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations Tong Zhang, Member, IEEE, 1 Abstract Given a large number of basis functions that can be potentially more than the number
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationRestricted Strong Convexity Implies Weak Submodularity
Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationHigh-dimensional Joint Sparsity Random Effects Model for Multi-task Learning
High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationSpectral k-support Norm Regularization
Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationSCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD
EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationOverview. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationRobust Sparse Recovery via Non-Convex Optimization
Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationRank minimization via the γ 2 norm
Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationCollaborative Filtering Matrix Completion Alternating Least Squares
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationGeneralized greedy algorithms.
Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationEE 381V: Large Scale Learning Spring Lecture 16 March 7
EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationLASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA
The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationRecovery of Simultaneously Structured Models using Convex Optimization
Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationCOMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION
COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationCovariate-Assisted Variable Ranking
Covariate-Assisted Variable Ranking Tracy Ke Department of Statistics Harvard University WHOA-PSI@St. Louis, Sep. 8, 2018 1/18 Sparse linear regression Y = X β + z, X R n,p, z N(0, σ 2 I n ) Signals (nonzero
More informationPHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN
PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationCS Homework 3. October 15, 2009
CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website
More informationDifferentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso
Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso Adam Smith asmith@cse.psu.edu Pennsylvania State University Abhradeep Thakurta azg161@cse.psu.edu Pennsylvania
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationSTATS 306B: Unsupervised Learning Spring Lecture 13 May 12
STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality
More informationSparsity and the Lasso
Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline
More informationNoisy and Missing Data Regression: Distribution-Oblivious Support Recovery
: Distribution-Oblivious Support Recovery Yudong Chen Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 7872 Constantine Caramanis Department of Electrical
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationNonconvex penalties: Signal-to-noise ratio and algorithms
Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More information