High-dimensional statistics: Some progress and challenges ahead
|
|
- Marybeth Joseph
- 5 years ago
- Views:
Transcription
1 High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh, Sahand Negahban, Garvesh Raskutti, Pradeep Ravikumar, Bin Yu.
2 Introduction classical asymptotic theory: sample size n + with number of parameters p fixed law of large numbers, central limit theory consistency of maximum likelihood estimation
3 Introduction classical asymptotic theory: sample size n + with number of parameters p fixed law of large numbers, central limit theory consistency of maximum likelihood estimation modern applications in science and engineering: large-scale problems: both p and n may be large (possibly p n) need for high-dimensional theory that provides non-asymptotic results for (n,p)
4 Introduction classical asymptotic theory: sample size n + with number of parameters p fixed law of large numbers, central limit theory consistency of maximum likelihood estimation modern applications in science and engineering: large-scale problems: both p and n may be large (possibly p n) need for high-dimensional theory that provides non-asymptotic results for (n,p) curses and blessings of high dimensionality exponential explosions in computational complexity statistical curses (sample complexity) concentration of measure
5 Introduction modern applications in science and engineering: large-scale problems: both p and n may be large (possibly p n) need for high-dimensional theory that provides non-asymptotic results for (n,p) curses and blessings of high dimensionality exponential explosions in computational complexity statistical curses (sample complexity) concentration of measure Key ideas: what embedded low-dimensional structures are present in data? how can they can be exploited algorithmically?
6 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n
7 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n Classical approach: Estimate Σ via sample covariance matrix: n Σ n := X i Xi T n i= }{{} average of p p rank one matrices
8 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n Classical approach: Estimate Σ via sample covariance matrix: n Σ n := X i Xi T n i= }{{} average of p p rank one matrices Reasonable properties: (p fixed, n increasing) Unbiased: E[ Σ n ] = Σ Consistent: Σ n a.s. Σ as n + Asymptotic distributional properties available
9 Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ R p p given i.i.d. samples X i N(0,Σ), for i =,2,...,n Classical approach: Estimate Σ via sample covariance matrix: n Σ n := X i Xi T n i= }{{} average of p p rank one matrices An alternative experiment: Fix some α > 0 Study behavior over sequences with p n = α Does Σ n(p) converge to anything reasonable?
10 Density (rescaled) Empirical vs MP law (α = 0.5) Theory Eigenvalue Marcenko & Pastur, 967.
11 Density (rescaled) Empirical vs MP law (α = 0.2) Theory Eigenvalue Marcenko & Pastur, 967.
12 Low-dimensional structure: Gaussian graphical models Zero pattern of inverse covariance P(x,x 2,...,x p ) exp ( 2 xt Θ x ). Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
13 Maximum-likelihood with l -regularization Zero pattern of inverse covariance Set-up: Samples from random vector with sparse covariance Σ or sparse inverse covariance Θ R p p
14 Maximum-likelihood with l -regularization Zero pattern of inverse covariance Set-up: Samples from random vector with sparse covariance Σ or sparse inverse covariance Θ R p p. Estimator (for inverse covariance) { Θ argmin n } x i x T i, Θ logdet(θ)+λ n Θ jk Θ n i= 5 4 j k 3 Some past work: Yuan & Lin, 2006; d Asprémont et al., 2007; Bickel & Levina, 2007; El Karoui, 2007; d Aspremont et al., 2007; Rothman et al., 2007; Zhou et al., 2007; Friedman et al., 2008; Lam & Fan, 2008; Ravikumar et al., 2008; Zhou, Cai & Huang, 2009
15 Gauss-Markov models with hidden variables Z X X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z, vector X = (X,X 2,X 3,X 4 ) is Gauss-Markov.
16 Gauss-Markov models with hidden variables Z X X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z, vector X = (X,X 2,X 3,X 4 ) is Gauss-Markov. Inverse covariance of X satisfies {sparse, low-rank} decomposition: µ µ µ µ µ µ µ µ µ µ µ µ = I 4 4 µ T. µ µ µ µ (Chandrasekaran, Parrilo & Willsky, 200)
17 Vignette II: High-dimensional sparse linear regression y X θ w n n p = + S S c Set-up: noisy observations y = Xθ +w with sparse θ Estimator: Lasso program θ argmin θ n n p (y i x T i θ) 2 +λ n θ j i= Some past work: Tibshirani, 996; Chen et al., 998; Donoho/Xuo, 200; Tropp, 2004; Fuchs, 2004; Efron et al., 2004; Meinshausen & Buhlmann, 2005; Candes & Tao, 2005; Donoho, 2005; Haupt & Nowak, 2005; Zhou & Yu, 2006; Zou, 2006; Koltchinskii, 2007; van j=
18 Application A: Compressed sensing (Donoho, 2005; Candes & Tao, 2005) y X β n = n p p (a) Image: vectorize to β R p (b) Compute n random projections
19 Application A: Compressed sensing In practice, signals are sparse in a transform domain: (Donoho, 2005; Candes & Tao, 2005) θ := Ψβ is a sparse signal, where Ψ is an orthonormal matrix. y X Ψ T θ n = n p s p Reconstruct θ (and hence image β = Ψ T θ ) based on finding a sparse solution to under-constrained linear system y = X θ where X = XΨ T is another random matrix.
20 Noiseless l recovery: Unrescaled sample size Prob. exact recovery vs. sample size (µ = 0) Prob. of exact recovery p = 28 p = 256 p = Raw sample size n Probability of recovery versus sample size n.
21 Application B: Graph structure estimation let G = (V,E) be an undirected graph on p = V vertices pairwise graphical model factorizes over edges of graph: P(x,...,x p ;θ) exp { θ st (x s,x t ) }. (s,t) E given n independent and identically distributed (i.i.d.) samples of X = (X,...,X p ), identify the underlying graph structure
22 Pseudolikelihood and neighborhood regression Markov properties encode neighborhood structure: (X s X V\s ) }{{} Condition on full graph d = (X s X N(s) ) }{{} Condition on Markov blanket N(s) = {s,t,u,v,w} X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 974) basis of many graph learning algorithm (Friedman et al., 999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006)
23 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}.
24 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min n L(θ;X i\s ) + λ n θ θ R p n }{{}}{{} i= local log. likelihood regularization
25 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min n L(θ;X i\s ) + λ n θ θ R p n }{{}}{{} i= local log. likelihood regularization 2 Estimate the local neighborhood N(s) as support of regression vector θ[s] R p.
26 US Senate network ( voting)
27 Outline Lecture (Today): Basics of sparse recovery Sparse linear systems: l0/l equivalence Noisy case: Lasso, l2-bounds and variable selection 2 Lecture 2 (Tuesday): A more general theory A range of structured regularizers Group sparsity Low-rank matrices and nuclear norm regularization Matrix decomposition and robust PCA Ingredients of a general understanding 3 Lecture 3 (Wednesday): High-dimensional kernel methods Curse-of-dimensionality for non-parametric regression Reproducing kernel Hilbert spaces A simple but optimal estimator Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
28 Noiseless linear models and basis pursuit y X θ n = n p S S c under-determined linear system: unidentifiable without constraints say θ R p is sparse: supported on S {,2,...,p}. l 0 -optimization θ = arg min θ R p θ 0 Xθ = y Computationally intractable NP-hard l -relaxation θ arg min θ R p θ Xθ = y Linear program (easy to solve) Basis pursuit relaxation
29 Noiseless l recovery: Unrescaled sample size Prob. exact recovery vs. sample size (µ = 0) Prob. of exact recovery p = 28 p = 256 p = Raw sample size n Probability of recovery versus sample size n.
30 Noiseless l recovery: Rescaled Prob. exact recovery vs. sample size (µ = 0) Prob. of exact recovery p = 28 p = 256 p = Rescaled sample size α Probabability of recovery versus rescaled sample size α := n slog(p/s).
31 Restricted nullspace: necessary and sufficient Definition For a fixed S {,2,...,p}, the matrix X R n p satisfies the restricted nullspace property w.r.t. S, or RN(S) for short, if { R p X = 0} }{{} N(X) { R p } S c S = { 0 }. }{{} C(S) (Donoho & Xu, 200; Feuer & Nemirovski, 2003; Cohen et al, 2009)
32 Restricted nullspace: necessary and sufficient Definition For a fixed S {,2,...,p}, the matrix X R n p satisfies the restricted nullspace property w.r.t. S, or RN(S) for short, if Proposition { R p X = 0} }{{} N(X) { R p } S c S = { 0 }. }{{} C(S) (Donoho & Xu, 200; Feuer & Nemirovski, 2003; Cohen et al, 2009) Basis pursuit l -relaxation is exact for all S-sparse vectors X satisfies RN(S).
33 Restricted nullspace: necessary and sufficient Definition For a fixed S {,2,...,p}, the matrix X R n p satisfies the restricted nullspace property w.r.t. S, or RN(S) for short, if { R p X = 0} }{{} N(X) Proof (sufficiency): { R p } S c S = { 0 }. }{{} C(S) (Donoho & Xu, 200; Feuer & Nemirovski, 2003; Cohen et al, 2009) () Error vector = θ θ satisfies X = 0, and hence N(X). (2) Show that C(S) Optimality of θ: θ θ = θ S. Sparsity of θ : θ = θ + = θ S + S + S c. Triangle inequality: θ S + S + S c θ S S + S c. (3) Hence, N(X) C(S), and (RN) = = 0.
34 Illustration of restricted nullspace property 3 3 (, 2 ) (, 2 ) consider θ = (0,0,θ 3), so that S = {3}. error vector = θ θ belongs to the set C(S;) := { (, 2, 3 ) R }. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
35 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p
36 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p 2 Restricted isometry, or submatrix incoherence (Candes & Tao, 2005) ( max X T X n U 2s I p p ) UU δ 2s. op x x p U
37 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p Matrices with i.i.d. sub-gaussian entries: holds w.h.p. for n = Ω(s 2 logp) 2 Restricted isometry, or submatrix incoherence (Candes & Tao, 2005) ( max X T X n U 2s I p p ) UU δ 2s. op x x p U
38 Some sufficient conditions How to verify RN property for a given sparsity s? Elementwise incoherence condition (Donoho & Xuo, 200; Feuer & Nem., 2003) p max j,k=,...,p ( X T X n I p p ) jk δ s n x x j x k x p Matrices with i.i.d. sub-gaussian entries: holds w.h.p. for n = Ω(s 2 logp) 2 Restricted isometry, or submatrix incoherence (Candes & Tao, 2005) ( max X T X n U 2s I p p ) UU δ 2s. op x x p U Matrices with i.i.d. sub-gaussian entries: holds w.h.p. for n = Ω(slog p s )
39 Violating matrix incoherence (elementwise/rip) Important: Incoherence/RIP conditions imply RN, but are far from necessary. Very easy to violate them...
40 Violating matrix incoherence (elementwise/rip) Form random design matrix X = [ x x 2... x p ] }{{} p columns = X T X2 T.. X T n }{{} n rows R n p, each row X i N(0,Σ), i.i.d. Example: For some µ (0,), consider the covariance matrix Σ = ( µ)i p p +µ T.
41 Violating matrix incoherence (elementwise/rip) Form random design matrix X = [ x x 2... x p ] }{{} p columns = X T X2 T.. X T n }{{} n rows R n p, each row X i N(0,Σ), i.i.d. Example: For some µ (0,), consider the covariance matrix Σ = ( µ)i p p +µ T. Elementwise incoherence violated: for any j k [ ] xj, x k P µ ǫ c exp( c 2 nǫ 2 ). n
42 Violating matrix incoherence (elementwise/rip) Form random design matrix X = [ x x 2... x p ] }{{} p columns = X T X2 T.. X T n }{{} n rows R n p, each row X i N(0,Σ), i.i.d. Example: For some µ (0,), consider the covariance matrix Σ = ( µ)i p p +µ T. Elementwise incoherence violated: for any j k [ ] xj, x k P µ ǫ c exp( c 2 nǫ 2 ). n RIP constants tend to infinity as (n, S ) increases: [ P XS TX ] S I s s 2 µ(s ) ǫ c exp( c 2 nǫ 2 ). n
43 Noiseless l recovery for µ = 0.5 Prob. exact recovery vs. sample size (µ = 0.5) Prob. of exact recovery p = 28 p = 256 p = Rescaled sample size α Probab. versus rescaled sample size α := n slog(p/s).
44 Direct result for restricted nullspace/eigenvalues Theorem (Raskutti, W., & Yu, 200) Consider a random design X R n p with each row X i N(0,Σ) i.i.d., and define κ(σ) = max j=,2,...p Σ jj. Then for universal constants c,c 2, Xθ 2 logp n 2 Σ/2 θ 2 9κ(Σ) n θ for all θ R p with probability greater than c exp( c 2 n).
45 Direct result for restricted nullspace/eigenvalues Theorem (Raskutti, W., & Yu, 200) Consider a random design X R n p with each row X i N(0,Σ) i.i.d., and define κ(σ) = max j=,2,...p Σ jj. Then for universal constants c,c 2, Xθ 2 logp n 2 Σ/2 θ 2 9κ(Σ) n θ for all θ R p with probability greater than c exp( c 2 n). much less restrictive than incoherence/rip conditions many interesting matrix families are covered Toeplitz dependency constant µ-correlation (previous example) covariance matrix Σ can even be degenerate extensions to sub-gaussian matrices (Rudelson & Zhou, 202) related results hold for generalized linear models
46 Easy verification of restricted nullspace for any C(S), we have = S + S c 2 S 2 s 2 applying previous result: { X 2 λ min ( } slogp Σ) 8κ(Σ) 2. n n }{{} γ(σ)
47 Easy verification of restricted nullspace for any C(S), we have = S + S c 2 S 2 s 2 applying previous result: { X 2 λ min ( } slogp Σ) 8κ(Σ) n n }{{} γ(σ) 2. have actually proven much more than restricted nullspace...
48 Easy verification of restricted nullspace for any C(S), we have = S + S c 2 S 2 s 2 applying previous result: { X 2 λ min ( } slogp Σ) 8κ(Σ) n n }{{} γ(σ) 2. have actually proven much more than restricted nullspace... Definition A design matrix X R n p satisfies the restricted eigenvalue (RE) condition over S (denote RE(S)) with parameters α and γ > 0 if X 2 n γ 2 for all R p such that S c α S. (van de Geer, 2007; Bickel, Ritov & Tsybakov, 2008)
49 Lasso and restricted eigenvalues Turning to noisy observations... y X θ w n n p = + S S c Estimator: Lasso program { θ λn arg min θ R p 2n y } Xθ 2 2 +λ n θ. Goal: Obtain bounds on θ λn θ 2 that hold with high probability. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
50 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ.
51 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2.
52 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2. (2) Derive a basic inequality: re-arranging in terms of = θ θ : n X n, X T w.
53 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2. (2) Derive a basic inequality: re-arranging in terms of = θ θ : n X n, X T w. (3) Restricted eigenvalue for LHS; Hölder s inequality for RHS γ 2 2 n X n, X T w 2 X T w n.
54 Lasso bounds: Four simple steps Let s analyze constrained version: min θ R p 2n y Xθ 2 2 such that θ R = θ. () By optimality of θ and feasibility of θ : 2n y X θ 2 2 2n y Xθ 2 2. (2) Derive a basic inequality: re-arranging in terms of = θ θ : n X n, X T w. (3) Restricted eigenvalue for LHS; Hölder s inequality for RHS γ 2 2 n X n, X T w 2 X T w n. (4) As before, C(S), so that 2 s 2, and hence 2 4 γ s X T w n.
55 Lasso error bounds for different models Proposition Suppose that vector θ has support S, with cardinality s, and design matrix X satisfies RE(S) with parameter γ > 0. For constrained Lasso with R = θ or regularized Lasso with λ n = 2 X T w/n, any optimal solution θ satisfies the bound θ θ 2 4 s γ XT w n. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
56 Lasso error bounds for different models Proposition Suppose that vector θ has support S, with cardinality s, and design matrix X satisfies RE(S) with parameter γ > 0. For constrained Lasso with R = θ or regularized Lasso with λ n = 2 X T w/n, any optimal solution θ satisfies the bound θ θ 2 4 s γ XT w n. this is a deterministic result on the set of optimizers various corollaries for specific statistical models Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
57 Lasso error bounds for different models Proposition Suppose that vector θ has support S, with cardinality s, and design matrix X satisfies RE(S) with parameter γ > 0. For constrained Lasso with R = θ or regularized Lasso with λ n = 2 X T w/n, any optimal solution θ satisfies the bound θ θ 2 4 s γ XT w n. this is a deterministic result on the set of optimizers various corollaries for specific statistical models Compressed sensing: Xij N(0,) and bounded noise w 2 σ n Deterministic design: X with bounded columns and wi N(0,σ 2 ) XT w n 3σ2 logp n w.h.p. = θ θ 2 4σ s logp γ(l) n. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
58 Look-ahead to Lecture 2: A more general theory Recap: Thus far... Derived error bounds for basis pursuit and Lasso (l -relaxation) Seen importance of restricted nullspace and restricted eigenvalues Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
59 Look-ahead to Lecture 2: A more general theory The big picture: Lots of other estimators with same basic form: θ λn }{{} Estimate argmin θ Ω { L(θ;Z n ) +λ n R(θ) }{{}}{{} Loss function Regularizer }. Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
60 Look-ahead to Lecture 2: A more general theory The big picture: Lots of other estimators with same basic form: θ λn }{{} Estimate argmin θ Ω { L(θ;Z n ) +λ n R(θ) }{{}}{{} Loss function Regularizer }. Past years have witnessed an explosion of results (compressed sensing, covariance estimation, block-sparsity, graphical models, matrix completion...) Question: Is there a common set of underlying principles? Martin Wainwright (UC Berkeley) High-dimensional statistics February, /
61 Some papers ( wainwrig) M. J. Wainwright (2009), Sharp thresholds for high-dimensional and noisy sparsity recovery using l -constrained quadratic programming (Lasso), IEEE Trans. Information Theory, May S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu (202). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statistical Science. December G. Raskutti, M. J. Wainwright and B. Yu (20) Minimax rates for linear regression over l q -balls. IEEE Trans. Information Theory, October G. Raskutti, M. J. Wainwright and B. Yu (200). Restricted nullspace and eigenvalue properties for correlated Gaussian designs. Journal of Machine Learning Research. August 200.
General principles for high-dimensional estimation: Statistics and computation
General principles for high-dimensional estimation: Statistics and computation Martin Wainwright Statistics, and EECS UC Berkeley Joint work with: Garvesh Raskutti, Sahand Negahban Pradeep Ravikumar, Bin
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationLearning and message-passing in graphical models
Learning and message-passing in graphical models Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Graphical models and message-passing 1 / 41 graphical
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationSupplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
Submitted to the Statistical Science Supplementary material for a unified framework for high-dimensional analysis of M-estimators with decomposable regularizers Sahand N. Negahban 1, Pradeep Ravikumar,
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationOn Iterative Hard Thresholding Methods for High-dimensional M-Estimation
On Iterative Hard Thresholding Methods for High-dimensional M-Estimation Prateek Jain Ambuj Tewari Purushottam Kar Microsoft Research, INDIA University of Michigan, Ann Arbor, USA {prajain,t-purkar}@microsoft.com,
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLatent Variable Graphical Model Selection Via Convex Optimization
Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationMinimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls
6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior
More informationA unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationApplied Statistics and Machine Learning
Applied Statistics and Machine Learning Theory: Stability, CLT, and Sparse Modeling Bin Yu, IMA, June 26, 2013 Today s plan 1. Reproducibility and statistical stability 2. In the classical world: CLT stability
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationEstimation of (near) low-rank matrices with noise and high-dimensional scaling
Estimation of (near) low-rank matrices with noise and high-dimensional scaling Sahand Negahban Department of EECS, University of California, Berkeley, CA 94720, USA sahand n@eecs.berkeley.edu Martin J.
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationSimultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationTractable performance bounds for compressed sensing.
Tractable performance bounds for compressed sensing. Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure/INRIA, U.C. Berkeley. Support from NSF, DHS and Google.
More informationCompressed Sensing and Linear Codes over Real Numbers
Compressed Sensing and Linear Codes over Real Numbers Henry D. Pfister (joint with Fan Zhang) Texas A&M University College Station Information Theory and Applications Workshop UC San Diego January 31st,
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationarxiv: v1 [stat.ml] 3 Nov 2010
Preprint The Lasso under Heteroscedasticity Jinzhu Jia, Karl Rohe and Bin Yu, arxiv:0.06v stat.ml 3 Nov 00 Department of Statistics and Department of EECS University of California, Berkeley Abstract: The
More informationNew Theory and Algorithms for Scalable Data Fusion
AFRL-OSR-VA-TR-23-357 New Theory and Algorithms for Scalable Data Fusion Martin Wainwright UC Berkeley JULY 23 Final Report DISTRIBUTION A: Approved for public release. AIR FORCE RESEARCH LABORATORY AF
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationRestricted Strong Convexity Implies Weak Submodularity
Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationSensing systems limited by constraints: physical size, time, cost, energy
Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original
More informationSparse signal recovery using sparse random projections
Sparse signal recovery using sparse random projections Wei Wang Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-69 http://www.eecs.berkeley.edu/pubs/techrpts/2009/eecs-2009-69.html
More informationLinear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1
Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationShifting Inequality and Recovery of Sparse Signals
Shifting Inequality and Recovery of Sparse Signals T. Tony Cai Lie Wang and Guangwu Xu Astract In this paper we present a concise and coherent analysis of the constrained l 1 minimization method for stale
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationThresholds for the Recovery of Sparse Solutions via L1 Minimization
Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationLecture: Introduction to Compressed Sensing Sparse Recovery Guarantees
Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationInterpolation via weighted l 1 minimization
Interpolation via weighted l 1 minimization Rachel Ward University of Texas at Austin December 12, 2014 Joint work with Holger Rauhut (Aachen University) Function interpolation Given a function f : D C
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics
More informationNon-convex Robust PCA: Provable Bounds
Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More information