sparse and low-rank tensor recovery Cubic-Sketching

Size: px
Start display at page:

Download "sparse and low-rank tensor recovery Cubic-Sketching"

Transcription

1 Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University Math Oct. 27, 2017 Joint wor with Botao Hao and Anru Zhang

2 Tensor: Multi-dimensional Array

3 Tensor Data Example

4 Motivation: Compressed Image Transmission

5 Motivation: Interaction Effect Model source: Contraceptive Method Choice dataset from UCI

6 Motivation: Interaction Effect Model

7 Sparse and Low-Ran Tensor Recovery

8 Noisy Cubic Setching Model Observe {y i, X i } from noisy cubic setching model, y }{{} i = T, X i }{{} scalar tensor inner product + ɛ i }{{} noise, i = 1,..., n. For two tensors A R p 1 p 2 p 3 inner product is defined as and B R p 1 p 2 p 3, the tensor A, B = ij A ij B ij.

9 Noisy Cubic Setching Model General model including: tensor regression (Zhou, Li, Zhu (2013)), tensor completion (Yuan and Zhang (2014)). Goal: Recover unnown third-order tensor parameter T. High-dimensional problem: n dim(t ) p 3.

10 Key Assumptions on Tensor Parameter When T R p p p is a symmetric tensor... 1 CANDECOMP/PARAFAC(CP) low-ran: Represented as sum of ran-one tensor, where p. 2 Sparse components: β 0 s for [K]. The cubic setching tensor X i for symmetric case is X i = x i x i x i, where {x i } n i=1 are Gaussian random vectors. β and β are not orthogonal. Different from singular-value decomposition in matrix case.

11 Key Assumptions on Tensor Parameter When T R p 1 p 2 p 3 is a non-symmetric tensor... 1 CANDECOMP/PARAFAC(CP) low-ran: 2 Sparse components: β 1 0 s 1, β 2 0 s 2, β 3 0 s 3 for [K]. The cubic setching tensor X i for non-symmetric case is X i = u i v i w i, where {u i, v i, w i } n i=1 are Gaussian random vectors.

12 Reduced Symmetric Tensor Recovery Model For symmetric tensor recovery model K y i = ηβ β β, x i x i x i + ɛ i = =1 Connect with interaction effect model. K η (x i β) 3 +ɛ i }{{} =1 non-linear New Goal: Recover {η, β }K =1

13 Reduced Non-symmetric Tensor Recovery Model For non-symmetric tensor recovery model y i = K ηβ 1 β2 β3, u i v i w i + ɛ i =1 = K η (u i β1)(v i β2)(w i β3) +ɛ i }{{} =1 non-linear Connect with compressed image transmission model. New Goal: Recover {η, β 1, β 2, β 3 }K =1.

14 Empirical Ris Minimization Consider Empirical Ris Minimization T = argmin {η,β } T = argmin n K (y i η (x i β ) 3 ) 2 i=1 } =1 {{ } L 1 (η,β ) n K (y i η (u i β 1 )(vi β 2 )(wi β 3 )) {η,β i } i=1 } =1 {{ } L 2 (η,β i ) Difficulties: Non-convex optimization! Non-convexity from cube structure or tri-convexity.

15 Our Contributions 1 Efficient two-stage implementation to non-convex optimization problem. 2 Non-asymptotic analysis. Provide optimal estimation rate.

16 Two-stage Implementation

17 Main Algorithm

18 Initial Step: Non-symmetric unbiased estimator Construct an unbiased empirical moment based tensor T (y i, X i ) R p 1 p 2 p 3 as following T := 1 n y i u i v i w i n i=1 }{{} only depends on observations. This is used for non-symmetric case and u i, v i, w i are independent standard Gaussian random vectors.

19 Initial Step: Symmetric unbiased estimator Construct an unbiased empirical moment based tensor T s (y i, X i ) R p p p as following T s := 1 [ 1 6 n n ] y i x i x i x i U i=1 }{{} only depends on observations. where the bias term U = ) p j=1 (m 1 e j e j + e j m 1 e j + e j e j m 1, and m 1 = 1 ni=1 n y i x i. Here {e j } p j=1 are the canonical vectors in R p. Bias term U is due to the correlation among three identical Gaussian random vectors.

20 Initial Step: Decompose unbiased estimator Intuition: E[T s ] = T. Observation T s. Noise E = T s E(T s ): approximation error. Decompose T s to obtain {η (0), β(0) } or Decompose T to obtain {η (0), β(0) 1, β(0) 2, β(0) 3 } through sparse tensor decomposition. See next slide for details. Far from the optimal estimation, but good enough as a warm start.

21 Initial Step: Decompose unbiased estimator Intuition: E[T s ] = T. Observation T s. Noise E = T s E(T s ): approximation error. Decompose T s to obtain {η (0), β(0) } or Decompose T to obtain {η (0), β(0) 1, β(0) 2, β(0) 3 } through sparse tensor decomposition. See next slide for details. Far from the optimal estimation, but good enough as a warm start.

22 Initial Step: Decompose unbiased estimator Intuition: E[T s ] = T. Observation T s. Noise E = T s E(T s ): approximation error. Decompose T s to obtain {η (0), β(0) } or Decompose T to obtain {η (0), β(0) 1, β(0) 2, β(0) 3 } through sparse tensor decomposition. See next slide for details. Far from the optimal estimation, but good enough as a warm start.

23 Sparse Tensor Decomposition for Symmetric Tensor Generate L staring points {β start l } L l=1. For each starting point, compute a non-sparse factor of moment-based T s via symmetric tensor power update: β (t+1) l = T s 2 β (t) l T s 2 β (t) l 3 β (t) l 3 β (t) l 2, where for T s R p p p and x R p, define T s 2 x 3 x := j,l x jx l [T ] :,j,l. Get a sparse solution β (t+1) l via thresholding or truncation. Cluster L sets of single component {β (T ) l, β (T ) l, β (T ) l clusters to obtain a ran-k decomposition {β (0), β(0) Different from matrix SVD due to non-orthogonality. } L l=1 into K, β(0) }K =1.

24 Sparse Tensor Decomposition for Non-symmetric Tensor Generate L staring points {β start 1l, β start 2l, β start 3l } L l=1. For each starting point, compute a non-sparse factor of moment-based T via alternating tensor power update: β (t+1) 1l = T s 2 β (t) 2l T s 2 β (t) β (t+1) 2l β (t+1) 3l 2l 3 β (t) 3l 3 β (t) 3l 2 The updates for and Get a sparse solution {β (t+1) 1l, β (t+1) 2l truncation. Cluster L sets of single component {β (T ) 1l, β (T ), are similar., β (t+1) 3l } via thresholding or 2l, β (T ) 3l } L l=1 into K clusters to obtain a ran-k decomposition {β (0) 1, β(0) 2, β(0) 3 }K =1.

25 Gradient Update: Thresholded Gradient Decent Input initial estimator {η (0), β(0) }K =1. In each iteration step, update {β } K =1 as β (t+1) = β (t) µ t φ β L 1 (η (0), β(t) ) where φ = 1 n n i=1 y2 i, µ t is the step size. Sparsify current update by thresholding β (t+1) (t+1) = ϕ ρ ( β ). Normalize final update β (T ) = β(t ) and update the weight β (T ) 2 η = η (0) β (T ) Alternating update for non-symmetric tensor recovery.

26 Gradient Update: Thresholded Gradient Decent Input initial estimator {η (0), β(0) 1, β(0) 2, β(0) 3 }K =1. In each iteration step, alternatively update {β 1, β 2, β 3 } K =1 as β (t+1) 1 = β (t) 1 µ t φ β 1 L 2 (η (0), β(t) 1, β(t) 2, β(t) 3 ) where φ = 1 n n i=1 y2 i, µ t is the step size. The update for (t+1) and β 3 is similar. Sparsify current update by thresholding β (t+1) j = ϕ ρ ( j = 1, 2, 3. Normalize final update β (T ) j = β(t ) j β (T ) j 2 η = η (0) β (T ) 1 2 β (T ) 2 2 β (T ) 3 2. β (t+1) 2 (t+1) β j ) for and update the weight 1 Alternating update for non-symmetric tensor recovery.

27 Non-asymptotic Analysis

28 Non-asymptotic Upper Bound 定理 Suppose some regularity conditions for the true tensor parameter hold. Assume n C 0 s 2 log p for some large constant C 0. Denote Z (t) = K =1 3 η β (t) 3 η β 2 2 For any t = 0, 1, 2,..., the factor-wise estimator satisfies Z (t+1) min κ t Z (t) + C 1η }{{} 16 computational error 4 3 σ 2 s log p n } {{ } statistical error with high probability, where κ is the contraction parameter between 0 and 1, η min = min {η }, σ is the noise level and C 0, C 1 are some absolute constants.,

29 Remars Interesting characterization for computational error and statistical error; Geometric convergence rate to the truth in the noiseless case and minimax optimal statistical rate shown later; The error bound is dominated by computation error in the first several iterations and then is dominated by statistical error. Useful guideline for choosing stopping rule.

30 Remars When t T for some enough T, the final estimator is bounded by T (T ) T 2 Cσ2 Ks log p, F n with high probability. Minimax optimal rate!

31 Class of Sparse and Low-ran tensor Sparse CP decomposition K T = β β β, β 0 s for [K] =1 Incoherence condition(nearly orthogonal): components are incoherent such that The true tensor max i j [K] β i, β j C. s

32 Minimax Lower Bound 定理 Consider the class of tensor satisfy sparse CP-decomposition and incoherence condition. Suppose we sample via cubic measurements with i.i.d. standard normal setches with i.i.d. N(0, σ 2 ) noise, then we have the following lower bound result for recovery loss for this class of low-ran tensors, inf T sup T F E T T 2 F Ks log(ep/s) cσ2. n

33 Optimal Estimation Rate 定理 Consider the class of tensor F p,k,s satisfy sparse CP-decomposition and incoherence condition. Suppose we observe n samples {y i, X i } n i=1 from symmetric tensor cubic setching model, where n Cs 2 log p for some large constant C. Then the estimator T achieves inf T sup E T T T F p,k,s 2 F Ks log(p/s) σ2 }{{ n } when log p log p/s. Here σ is the noise level. R,

34 Remars Our analysis is non-asymptotic and our estimator is rate-optimal. In general, we have a trade-off R is the outcome of statistical error and optimization error trade-off. Similar argument holds for non-symmetric case. Different technical tools are used. To overcome the obstacle from high-order Gaussian random variable, we develop novel high-order concentration inequality by using truncation argument and ψ α -norm.

35 Application to Interaction Effect Model Given the response y R n and covariates X R n p, the regression model with three-way interactions can be formulated as p p p y l = β 0 + X li β i + γ ij X li X lj + η ij X li X lj X l + ɛ l i=1 i,j=1 i,j,=1 = B, X l X l X l + ɛ l where X l = (1, X l ) R p+1.

36 Application to Interaction Effect Model It is reasonable to assume that B possess low-ran and/or sparsity structures in some biometrics studies (Hung, et. 2016). K y l = η β β β, X l X l X l + ε l =1 The symmetric tensor recovery model can be treated as a high-order interaction effect model.

37 Some Changes for Algorithm The unbiased empirical moment based tensor A R p 1 p 2 p 3 for interaction effect model is constructed as following: Define three quantities a = 1 n n l=1 y lx l, Ã = 1 n n i=l y lx l X l X l, Ā = 1 6 (Ã p j=1 (a e j e j + e j a e j + e j e j a)). For i, j, 0, A ij = Āij. For i 0, A 0,0,i = 1 3Ã0,0,i 1 6 ( p =1 Ã,,i (p + 2)a i ). And A 0,0,0 = 1 2p 2 ( p =1 Ã0,, (p + 2)Ã0,0,0). The additional intercept term changes the model structure dramatically.

38 Numerical Study Recover low-ran and sparse symmetric tensor T R p p p from y i = T, X i + ɛ i, i = 1,..., n. Proportion of non-zero elements for each factor s = 0.3. Tensor CP-ran K = 3. Replication = 200. Stopping rule for initialization: β m (l+1) β m (l) Stopping rule for gradient update: B (T +1) B (T ) F The dimension, sample size and noise level vary in different scenarios.

39 Left panel: relative error for initialization. Right panel: percent of successful recovery with varying sample size. Both are noiseless case with p = 30.

40 Noisy case. Left panel: absolute error for recovering ran-three tensor with varying noise level and sample size. Right panel: absolute error for recovering ran-five tensor with varying noise level and sample size.

41 Thans! and Questions?

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Structured signal recovery from non-linear and heavy-tailed measurements

Structured signal recovery from non-linear and heavy-tailed measurements Structured signal recovery from non-linear and heavy-tailed measurements Larry Goldstein* Stanislav Minsker* Xiaohan Wei # *Department of Mathematics # Department of Electrical Engineering UniversityofSouthern

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Optimal Linear Estimation under Unknown Nonlinear Transform

Optimal Linear Estimation under Unknown Nonlinear Transform Optimal Linear Estimation under Unknown Nonlinear Transform Xinyang Yi The University of Texas at Austin yixy@utexas.edu Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Zhaoran

More information

Does l p -minimization outperform l 1 -minimization?

Does l p -minimization outperform l 1 -minimization? Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Compressed Sensing Using Bernoulli Measurement Matrices

Compressed Sensing Using Bernoulli Measurement Matrices ITSchool 11, Austin Compressed Sensing Using Bernoulli Measurement Matrices Yuhan Zhou Advisor: Wei Yu Department of Electrical and Computer Engineering University of Toronto, Canada Motivation Motivation

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Estimating Unknown Sparsity in Compressed Sensing

Estimating Unknown Sparsity in Compressed Sensing Estimating Unknown Sparsity in Compressed Sensing Miles Lopes UC Berkeley Department of Statistics CSGF Program Review July 16, 2014 early version published at ICML 2013 Miles Lopes ( UC Berkeley ) estimating

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Overlapping Variable Clustering with Statistical Guarantees and LOVE

Overlapping Variable Clustering with Statistical Guarantees and LOVE with Statistical Guarantees and LOVE Department of Statistical Science Cornell University WHOA-PSI, St. Louis, August 2017 Joint work with Mike Bing, Yang Ning and Marten Wegkamp Cornell University, Department

More information

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille

More information

Recent developments on sparse representation

Recent developments on sparse representation Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals On the l 1 -Norm Invariant Convex -Sparse Decomposition of Signals arxiv:1305.6021v2 [cs.it] 11 Nov 2013 Guangwu Xu and Zhiqiang Xu Abstract Inspired by an interesting idea of Cai and Zhang, we formulate

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Vast Volatility Matrix Estimation for High Frequency Data

Vast Volatility Matrix Estimation for High Frequency Data Vast Volatility Matrix Estimation for High Frequency Data Yazhen Wang National Science Foundation Yale Workshop, May 14-17, 2009 Disclaimer: My opinion, not the views of NSF Y. Wang (at NSF) 1 / 36 Outline

More information

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).

More information

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery Jonathan Scarlett jonathan.scarlett@epfl.ch Laboratory for Information and Inference Systems (LIONS)

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

arxiv: v1 [math.st] 6 Sep 2018

arxiv: v1 [math.st] 6 Sep 2018 Optimal Sparse Singular Value Decomposition for High-dimensional High-order Data arxiv:809.0796v [math.st] 6 Sep 08 Anru Zhang and Rungang Han University of Wisconsin-Madison September 7, 08 Abstract In

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

12.4 Known Channel (Water-Filling Solution)

12.4 Known Channel (Water-Filling Solution) ECEn 665: Antennas and Propagation for Wireless Communications 54 2.4 Known Channel (Water-Filling Solution) The channel scenarios we have looed at above represent special cases for which the capacity

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

Approximate Message Passing Algorithms

Approximate Message Passing Algorithms November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)

More information

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Numerical Linear and Multilinear Algebra in Quantum Tensor Networks Konrad Waldherr October 20, 2013 Joint work with Thomas Huckle QCCC 2013, Prien, October 20, 2013 1 Outline Numerical (Multi-) Linear

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Signal Recovery, Uncertainty Relations, and Minkowski Dimension Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this

More information

Quantifying conformation fluctuation induced uncertainty in bio-molecular systems

Quantifying conformation fluctuation induced uncertainty in bio-molecular systems Quantifying conformation fluctuation induced uncertainty in bio-molecular systems Guang Lin, Dept. of Mathematics & School of Mechanical Engineering, Purdue University Collaborative work with Huan Lei,

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Mathematical introduction to Compressed Sensing

Mathematical introduction to Compressed Sensing Mathematical introduction to Compressed Sensing Lesson 1 : measurements and sparsity Guillaume Lecué ENSAE Mardi 31 janvier 2016 Guillaume Lecué (ENSAE) Compressed Sensing Mardi 31 janvier 2016 1 / 31

More information

Truncation Strategy of Tensor Compressive Sensing for Noisy Video Sequences

Truncation Strategy of Tensor Compressive Sensing for Noisy Video Sequences Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 207-4212 Ubiquitous International Volume 7, Number 5, September 2016 Truncation Strategy of Tensor Compressive Sensing for Noisy

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar

More information

Identification of ARX, OE, FIR models with the least squares method

Identification of ARX, OE, FIR models with the least squares method Identification of ARX, OE, FIR models with the least squares method CHEM-E7145 Advanced Process Control Methods Lecture 2 Contents Identification of ARX model with the least squares minimizing the equation

More information

Random Matrix Theory and its Applications to Econometrics

Random Matrix Theory and its Applications to Econometrics Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

On the convergence of higher-order orthogonality iteration and its extension

On the convergence of higher-order orthogonality iteration and its extension On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015 Best low-multilinear-rank approximation

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

arxiv: v1 [cs.it] 26 Oct 2018

arxiv: v1 [cs.it] 26 Oct 2018 Outlier Detection using Generative Models with Theoretical Performance Guarantees arxiv:1810.11335v1 [cs.it] 6 Oct 018 Jirong Yi Anh Duc Le Tianming Wang Xiaodong Wu Weiyu Xu October 9, 018 Abstract This

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information