High-dimensional statistics: Some progress and challenges ahead

Size: px
Start display at page:

Download "High-dimensional statistics: Some progress and challenges ahead"

Transcription

1 High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh, Sahand Negahban, Garvesh Raskutti, Pradeep Ravikumar, Bin Yu.

2 Introduction classical asymptotic theory: sample size n + with number of parameters p fixed law of large numbers, central limit theory consistency of maximum likelihood estimation

3 Introduction classical asymptotic theory: sample size n + with number of parameters p fixed law of large numbers, central limit theory consistency of maximum likelihood estimation modern applications in science and engineering: large-scale problems: both p and n may be large (possibly p n) need for high-dimensional theory that provides non-asymptotic results for (n,p)

4 Introduction classical asymptotic theory: sample size n + with number of parameters p fixed law of large numbers, central limit theory consistency of maximum likelihood estimation modern applications in science and engineering: large-scale problems: both p and n may be large (possibly p n) need for high-dimensional theory that provides non-asymptotic results for (n,p) curses and blessings of high dimensionality exponential explosions in computational complexity statistical curses (sample complexity) concentration of measure

5 Introduction modern applications in science and engineering: large-scale problems: both p and n may be large (possibly p n) need for high-dimensional theory that provides non-asymptotic results for (n,p) curses and blessings of high dimensionality exponential explosions in computational complexity statistical curses (sample complexity) concentration of measure Key ideas: what embedded low-dimensional structures are present in data? how can they can be exploited algorithmically?

6 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n

7 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n Classical approach: Estimate Σ via sample covariance matrix: n Σ n := X i Xi T n i= }{{} average of p p rank one matrices

8 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n Classical approach: Estimate Σ via sample covariance matrix: n Σ n := X i Xi T n i= }{{} average of p p rank one matrices Reasonable properties: (p fixed, n increasing) Unbiased: E[ Σ n ] = Σ Consistent: Σ n a.s. Σ as n + Asymptotic distributional properties available

9 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n Classical approach: Estimate Σ via sample covariance matrix: n Σ n := X i Xi T n i= }{{} average of p p rank one matrices An alternative experiment: Fix some α > 0 Study behavior over sequences with p n = α Does Σ n(p) converge to anything reasonable?

10 Density (rescaled) Empirical vs MP law (α = 0.5) Theory Eigenvalue Marcenko & Pastur, 967.

11 Density (rescaled) Empirical vs MP law (α = 0.2) Theory Eigenvalue Marcenko & Pastur, 967.

12 Low-dimensional structure: Gaussian graphical models Zero pattern of inverse covariance P(x,x 2,...,x p ) exp ( 2 xt Θ x ). Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

13 Maximum-likelihood with l -regularization Zero pattern of inverse covariance Set-up: Samples from random vector with sparse covariance Σ or sparse inverse covariance Θ R p p

14 Maximum-likelihood with l -regularization Zero pattern of inverse covariance Set-up: Samples from random vector with sparse covariance Σ or sparse inverse covariance Θ R p p. Estimator (for inverse covariance) { Θ argmin n } x i x T i, Θ logdet(θ)+λ n Θ jk Θ n i= 5 4 j k 3 Some past work: Yuan & Lin, 2006; d Asprémont et al., 2007; Bickel & Levina, 2007; El Karoui, 2007; d Aspremont et al., 2007; Rothman et al., 2007; Zhou et al., 2007; Friedman et al., 2008; Lam & Fan, 2008; Ravikumar et al., 2008; Zhou, Cai & Huang, 2009

15 Gauss-Markov models with hidden variables Z X X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z, vector X = (X,X 2,X 3,X 4 ) is Gauss-Markov.

16 Gauss-Markov models with hidden variables Z X X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z, vector X = (X,X 2,X 3,X 4 ) is Gauss-Markov. Inverse covariance of X satisfies {sparse, low-rank} decomposition: µ µ µ µ µ µ µ µ µ µ µ µ = I 4 4 µ T. µ µ µ µ (Chandrasekaran, Parrilo & Willsky, 200)

17 Vignette II: High-dimensional sparse linear regression y X θ w n n p = + S S c Set-up: noisy observations y = Xθ +w with sparse θ Estimator: Lasso program θ argmin θ n n p (y i x T i θ) 2 +λ n θ j i= Some past work: Tibshirani, 996; Chen et al., 998; Donoho/Xuo, 200; Tropp, 2004; Fuchs, 2004; Efron et al., 2004; Meinshausen & Buhlmann, 2005; Candes & Tao, 2005; Donoho, 2005; Haupt & Nowak, 2005; Zhou & Yu, 2006; Zou, 2006; Koltchinskii, 2007; van j=

18 Application A: Compressed sensing (Donoho, 2005; Candes & Tao, 2005) y X β n = n p p (a) Image: vectorize to β R p (b) Compute n random projections

19 Application A: Compressed sensing In practice, signals are sparse in a transform domain: (Donoho, 2005; Candes & Tao, 2005) θ := Ψβ is a sparse signal, where Ψ is an orthonormal matrix. y X Ψ T θ n = n p s p Reconstruct θ (and hence image β = Ψ T θ ) based on finding a sparse solution to under-constrained linear system y = X θ where X = XΨ T is another random matrix.

20 Noiseless l recovery: Unrescaled sample size Prob. exact recovery vs. sample size (µ = 0) Prob. of exact recovery p = 28 p = 256 p = Raw sample size n Probability of recovery versus sample size n.

21 Application B: Graph structure estimation let G = (V,E) be an undirected graph on p = V vertices pairwise graphical model factorizes over edges of graph: P(x,...,x p ;θ) exp { θ st (x s,x t ) }. (s,t) E given n independent and identically distributed (i.i.d.) samples of X = (X,...,X p ), identify the underlying graph structure

22 Pseudolikelihood and neighborhood regression Markov properties encode neighborhood structure: (X s X V\s ) }{{} Condition on full graph d = (X s X N(s) ) }{{} Condition on Markov blanket N(s) = {s,t,u,v,w} X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 974) basis of many graph learning algorithm (Friedman et al., 999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006)

23 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}.

24 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min n L(θ;X i\s ) + λ n θ θ R p n }{{}}{{} i= local log. likelihood regularization

25 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min n L(θ;X i\s ) + λ n θ θ R p n }{{}}{{} i= local log. likelihood regularization 2 Estimate the local neighborhood N(s) as support of regression vector θ[s] R p.

26 US Senate network ( voting)

27 Outline Lecture (Today): Basics of sparse recovery Sparse linear systems: l0/l equivalence Noisy case: Lasso, l2-bounds and variable selection 2 Lecture 2 (Tuesday): A more general theory A range of structured regularizers Group sparsity Low-rank matrices and nuclear norm regularization Matrix decomposition and robust PCA Ingredients of a general understanding 3 Lecture 3 (Wednesday): High-dimensional kernel methods Curse-of-dimensionality for non-parametric regression Reproducing kernel Hilbert spaces A simple but optimal estimator Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

28 Noiseless linear models and basis pursuit y X θ n = n p S S c under-determined linear system: unidentifiable without constraints say θ R p is sparse: supported on S {,2,...,p}. l 0 -optimization θ = arg min θ R p θ 0 Xθ = y Computationally intractable NP-hard l -relaxation θ arg min θ R p θ Xθ = y Linear program (easy to solve) Basis pursuit relaxation

29 Noiseless l recovery: Unrescaled sample size Prob. exact recovery vs. sample size (µ = 0) Prob. of exact recovery p = 28 p = 256 p = Raw sample size n Probability of recovery versus sample size n.

30 Noiseless l recovery: Rescaled Prob. exact recovery vs. sample size (µ = 0) Prob. of exact recovery p = 28 p = 256 p = Rescaled sample size α Probabability of recovery versus rescaled sample size α := n slog(p/s).

31 Restricted nullspace: necessary and sufficient Definition For a fixed S {,2,...,p}, the matrix X R n p satisfies the restricted nullspace property w.r.t. S, or RN(S) for short, if { R p X = 0} }{{} N(X) { R p } S c S = { 0 }. }{{} C(S) (Donoho & Xu, 200; Feuer & Nemirovski, 2003; Cohen et al, 2009)

32 Restricted nullspace: necessary and sufficient Definition For a fixed S {,2,...,p}, the matrix X R n p satisfies the restricted nullspace property w.r.t. S, or RN(S) for short, if Proposition { R p X = 0} }{{} N(X) { R p } S c S = { 0 }. }{{} C(S) (Donoho & Xu, 200; Feuer & Nemirovski, 2003; Cohen et al, 2009) Basis pursuit l -relaxation is exact for all S-sparse vectors X satisfies RN(S).

33 Restricted nullspace: necessary and sufficient Definition For a fixed S {,2,...,p}, the matrix X R n p satisfies the restricted nullspace property w.r.t. S, or RN(S) for short, if { R p X = 0} }{{} N(X) Proof (sufficiency): { R p } S c S = { 0 }. }{{} C(S) (Donoho & Xu, 200; Feuer & Nemirovski, 2003; Cohen et al, 2009) () Error vector = θ θ satisfies X = 0, and hence N(X). (2) Show that C(S) Optimality of θ: θ θ = θ S. Sparsity of θ : θ = θ + = θ S + S + S c. Triangle inequality: θ S + S + S c θ S S + S c. (3) Hence, N(X) C(S), and (RN) = = 0.

34 Illustration of restricted nullspace property 3 3 (, 2 ) (, 2 ) consider θ = (0,0,θ 3), so that S = {3}. error vector = θ θ belongs to the set C(S;) := { (, 2, 3 ) R }. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

35 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p

36 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p 2 Restricted isometry, or submatrix incoherence (Candes & Tao, 2005) ( max X T X n U 2s I p p ) UU δ 2s. op x x p U

37 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p Matrices with i.i.d. sub-gaussian entries: holds w.h.p. for n = Ω(s 2 logp) 2 Restricted isometry, or submatrix incoherence (Candes & Tao, 2005) ( max X T X n U 2s I p p ) UU δ 2s. op x x p U

38 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p Matrices with i.i.d. sub-gaussian entries: holds w.h.p. for n = Ω(s 2 logp) 2 Restricted isometry, or submatrix incoherence (Candes & Tao, 2005) ( max X T X n U 2s I p p ) UU δ 2s. op x x p U Matrices with i.i.d. sub-gaussian entries: holds w.h.p. for n = Ω(slog p s )

39 Violating matrix incoherence (elementwise/rip) Important: Incoherence/RIP conditions imply RN, but are far from necessary. Very easy to violate them...

40 Violating matrix incoherence (elementwise/rip) Form random design matrix X = [ x x 2... x p ] }{{} p columns = X T X2 T.. X T n }{{} n rows R n p, each row X i N(0,Σ), i.i.d. Example: For some µ (0,), consider the covariance matrix Σ = ( µ)i p p +µ T.

41 Violating matrix incoherence (elementwise/rip) Form random design matrix X = [ x x 2... x p ] }{{} p columns = X T X2 T.. X T n }{{} n rows R n p, each row X i N(0,Σ), i.i.d. Example: For some µ (0,), consider the covariance matrix Σ = ( µ)i p p +µ T. Elementwise incoherence violated: for any j k [ ] xj, x k P µ ǫ c exp( c 2 nǫ 2 ). n

42 Violating matrix incoherence (elementwise/rip) Form random design matrix X = [ x x 2... x p ] }{{} p columns = X T X2 T.. X T n }{{} n rows R n p, each row X i N(0,Σ), i.i.d. Example: For some µ (0,), consider the covariance matrix Σ = ( µ)i p p +µ T. Elementwise incoherence violated: for any j k [ ] xj, x k P µ ǫ c exp( c 2 nǫ 2 ). n RIP constants tend to infinity as (n, S ) increases: [ P XS TX ] S I s s 2 µ(s ) ǫ c exp( c 2 nǫ 2 ). n

43 Noiseless l recovery for µ = 0.5 Prob. exact recovery vs. sample size (µ = 0.5) Prob. of exact recovery p = 28 p = 256 p = Rescaled sample size α Probab. versus rescaled sample size α := n slog(p/s).

44 Direct result for restricted nullspace/eigenvalues Theorem (Raskutti, W., & Yu, 200) Consider a random design X R n p with each row X i N(0,Σ) i.i.d., and define κ(σ) = max j=,2,...p Σ jj. Then for universal constants c,c 2, Xθ 2 logp n 2 Σ/2 θ 2 9κ(Σ) n θ for all θ R p with probability greater than c exp( c 2 n).

45 Direct result for restricted nullspace/eigenvalues Theorem (Raskutti, W., & Yu, 200) Consider a random design X R n p with each row X i N(0,Σ) i.i.d., and define κ(σ) = max j=,2,...p Σ jj. Then for universal constants c,c 2, Xθ 2 logp n 2 Σ/2 θ 2 9κ(Σ) n θ for all θ R p with probability greater than c exp( c 2 n). much less restrictive than incoherence/rip conditions many interesting matrix families are covered Toeplitz dependency constant µ-correlation (previous example) covariance matrix Σ can even be degenerate extensions to sub-gaussian matrices (Rudelson & Zhou, 202) related results hold for generalized linear models

46 Easy verification of restricted nullspace for any C(S), we have = S + S c 2 S 2 s 2 applying previous result: { X 2 λ min ( } slogp Σ) 8κ(Σ) 2. n n }{{} γ(σ)

47 Easy verification of restricted nullspace for any C(S), we have = S + S c 2 S 2 s 2 applying previous result: { X 2 λ min ( } slogp Σ) 8κ(Σ) n n }{{} γ(σ) 2. have actually proven much more than restricted nullspace...

48 Easy verification of restricted nullspace for any C(S), we have = S + S c 2 S 2 s 2 applying previous result: { X 2 λ min ( } slogp Σ) 8κ(Σ) n n }{{} γ(σ) 2. have actually proven much more than restricted nullspace... Definition A design matrix X R n p satisfies the restricted eigenvalue (RE) condition over S (denote RE(S)) with parameters α and γ > 0 if X 2 n γ 2 for all R p such that S c α S. (van de Geer, 2007; Bickel, Ritov & Tsybakov, 2008)

49 Lasso and restricted eigenvalues Turning to noisy observations... y X θ w n n p = + S S c Estimator: Lasso program { θ λn arg min θ R p 2n y } Xθ 2 2 +λ n θ. Goal: Obtain bounds on θ λn θ 2 that hold with high probability. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

50 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ.

51 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2.

52 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2. (2) Derive a basic inequality: re-arranging in terms of = θ θ : n X n, X T w.

53 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2. (2) Derive a basic inequality: re-arranging in terms of = θ θ : n X n, X T w. (3) Restricted eigenvalue for LHS; Hölder s inequality for RHS γ 2 2 n X n, X T w 2 X T w n.

54 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2. (2) Derive a basic inequality: re-arranging in terms of = θ θ : n X n, X T w. (3) Restricted eigenvalue for LHS; Hölder s inequality for RHS γ 2 2 n X n, X T w 2 X T w n. (4) As before, C(S), so that 2 s 2, and hence 2 4 γ s X T w n.

55 Lasso error bounds for different models Proposition Suppose that vector θ has support S, with cardinality s, and design matrix X satisfies RE(S) with parameter γ > 0. For constrained Lasso with R = θ or regularized Lasso with λ n = 2 X T w/n, any optimal solution θ satisfies the bound θ θ 2 4 s γ XT w n. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

56 Lasso error bounds for different models Proposition Suppose that vector θ has support S, with cardinality s, and design matrix X satisfies RE(S) with parameter γ > 0. For constrained Lasso with R = θ or regularized Lasso with λ n = 2 X T w/n, any optimal solution θ satisfies the bound θ θ 2 4 s γ XT w n. this is a deterministic result on the set of optimizers various corollaries for specific statistical models Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

57 Lasso error bounds for different models Proposition Suppose that vector θ has support S, with cardinality s, and design matrix X satisfies RE(S) with parameter γ > 0. For constrained Lasso with R = θ or regularized Lasso with λ n = 2 X T w/n, any optimal solution θ satisfies the bound θ θ 2 4 s γ XT w n. this is a deterministic result on the set of optimizers various corollaries for specific statistical models Compressed sensing: Xij N(0,) and bounded noise w 2 σ n Deterministic design: X with bounded columns and wi N(0,σ 2 ) XT w n 3σ2 logp n w.h.p. = θ θ 2 4σ s logp γ(l) n. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

58 Look-ahead to Lecture 2: A more general theory Recap: Thus far... Derived error bounds for basis pursuit and Lasso (l -relaxation) Seen importance of restricted nullspace and restricted eigenvalues Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

59 Look-ahead to Lecture 2: A more general theory The big picture: Lots of other estimators with same basic form: θ λn }{{} Estimate argmin θ Ω { L(θ;Z n ) +λ n R(θ) }{{}}{{} Loss function Regularizer }. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

60 Look-ahead to Lecture 2: A more general theory The big picture: Lots of other estimators with same basic form: θ λn }{{} Estimate argmin θ Ω { L(θ;Z n ) +λ n R(θ) }{{}}{{} Loss function Regularizer }. Past years have witnessed an explosion of results (compressed sensing, covariance estimation, block-sparsity, graphical models, matrix completion...) Question: Is there a common set of underlying principles? Martin Wainwright (UC Berkeley) High-dimensional statistics February, /

61 Some papers ( wainwrig) M. J. Wainwright (2009), Sharp thresholds for high-dimensional and noisy sparsity recovery using l -constrained quadratic programming (Lasso), IEEE Trans. Information Theory, May S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu (202). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statistical Science. December G. Raskutti, M. J. Wainwright and B. Yu (20) Minimax rates for linear regression over l q -balls. IEEE Trans. Information Theory, October G. Raskutti, M. J. Wainwright and B. Yu (200). Restricted nullspace and eigenvalue properties for correlated Gaussian designs. Journal of Machine Learning Research. August 200.

General principles for high-dimensional estimation: Statistics and computation

General principles for high-dimensional estimation: Statistics and computation General principles for high-dimensional estimation: Statistics and computation Martin Wainwright Statistics, and EECS UC Berkeley Joint work with: Garvesh Raskutti, Sahand Negahban Pradeep Ravikumar, Bin

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Learning and message-passing in graphical models

Learning and message-passing in graphical models Learning and message-passing in graphical models Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Graphical models and message-passing 1 / 41 graphical

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers Submitted to the Statistical Science Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers Sahand N. Negahban 1, Pradeep Ravikumar,

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation

On Iterative Hard Thresholding Methods for High-dimensional M-Estimation On Iterative Hard Thresholding Methods for High-dimensional M-Estimation Prateek Jain Ambuj Tewari Purushottam Kar Microsoft Research, INDIA University of Michigan, Ann Arbor, USA {prajain,t-purkar}@microsoft.com,

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Latent Variable Graphical Model Selection Via Convex Optimization

Latent Variable Graphical Model Selection Via Convex Optimization Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls 6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Applied Statistics and Machine Learning

Applied Statistics and Machine Learning Applied Statistics and Machine Learning Theory: Stability, CLT, and Sparse Modeling Bin Yu, IMA, June 26, 2013 Today s plan 1. Reproducibility and statistical stability 2. In the classical world: CLT stability

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

Estimation of (near) low-rank matrices with noise and high-dimensional scaling Estimation of (near) low-rank matrices with noise and high-dimensional scaling Sahand Negahban Department of EECS, University of California, Berkeley, CA 94720, USA sahand n@eecs.berkeley.edu Martin J.

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Tractable performance bounds for compressed sensing.

Tractable performance bounds for compressed sensing. Tractable performance bounds for compressed sensing. Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure/INRIA, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Compressed Sensing and Linear Codes over Real Numbers

Compressed Sensing and Linear Codes over Real Numbers Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

arxiv: v1 [stat.ml] 3 Nov 2010

arxiv: v1 [stat.ml] 3 Nov 2010 Preprint The Lasso under Heteroscedasticity Jinzhu Jia, Karl Rohe and Bin Yu, arxiv:0.06v stat.ml 3 Nov 00 Department of Statistics and Department of EECS University of California, Berkeley Abstract: The

More information

New Theory and Algorithms for Scalable Data Fusion

New Theory and Algorithms for Scalable Data Fusion AFRL-OSR-VA-TR-23-357 New Theory and Algorithms for Scalable Data Fusion Martin Wainwright UC Berkeley JULY 23 Final Report DISTRIBUTION A: Approved for public release. AIR FORCE RESEARCH LABORATORY AF

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

Sparse signal recovery using sparse random projections

Sparse signal recovery using sparse random projections Sparse signal recovery using sparse random projections Wei Wang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-69 http://www.eecs.berkeley.edu/pubs/techrpts/2009/eecs-2009-69.html

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Shifting Inequality and Recovery of Sparse Signals

Shifting Inequality and Recovery of Sparse Signals Shifting Inequality and Recovery of Sparse Signals T. Tony Cai Lie Wang and Guangwu Xu Astract In this paper we present a concise and coherent analysis of the constrained l 1 minimization method for stale

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Thresholds for the Recovery of Sparse Solutions via L1 Minimization Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Interpolation via weighted l 1 minimization

Interpolation via weighted l 1 minimization Interpolation via weighted l 1 minimization Rachel Ward University of Texas at Austin December 12, 2014 Joint work with Holger Rauhut (Aachen University) Function interpolation Given a function f : D C

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Posterior convergence rates for estimating large precision. matrices using graphical models

Posterior convergence rates for estimating large precision. matrices using graphical models Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department

More information