Variations on Nonparametric Additive Models: Computational and Statistical Aspects

Size: px
Start display at page:

Download "Variations on Nonparametric Additive Models: Computational and Statistical Aspects"

Transcription

1 Variations on Nonparametric Additive Models: Computational and Statistical Aspects John Lafferty Department of Statistics & Department of Computer Science University of Chicago

2 Collaborators Sivaraman Balakrishnan (CMU) Mathias Drton (Chicago) Rina Foygel (Chicago) Michael Horrell (Chicago) Han Liu (Princeton) Kriti Puniyani (CMU) Pradeep Ravikumar (Univ. Texas, Austin) Larry Wasserman (CMU) 2

3 Perspective Even the simplest models can be interesting, challenging, and useful for large, high-dimensional data. 3

4 Motivation Great progress has been made on understanding sparsity for high dimensional linear models Many problems have clear nonlinear structure We are interested in purely functional methods for high dimensional, nonparametric inference no basis expansions 4

5 Additive Models Figure Bone Mineral Density Data Age Bmi Map Tc Figure Diabetes Data 5

6 Additive Models Fully nonparametric methods appear hopeless Logarithmic scaling, p = log n (e.g., Rodeo Lafferty and Wasserman (2008)) Additive models are useful compromise Exponential scaling, p = exp(n c ) (e.g., SpAM Ravikumar et al. (2009)) 6

7 Themes of this talk Variations on additive models enjoy most of the good statistical and computational properties of sparse linear models Thresholded backfitting algorithms, via subdifferential calculus RKHS formulations are problematic A little nonparametricity goes a long way 7

8 Outline Sparse additive models Nonparametric reduced rank regression Functional sparse CCA The nonparanormal Conclusions 8

9 Sparse Additive Models Ravikumar, Lafferty, Liu and Wasserman, JRSS B (2009) Additive Model: Y i = p j=1 m j(x ij ) + ε i, i = 1,..., n High dimensional: n p, with most m j = 0. ( Optimization: minimize E Y ) 2 j m j(x j ) subject to p j=1 E(m 2 j ) L n E(m j ) = 0 Related work by Bühlmann and van de Geer (2009), Koltchinskii and Yuan (2010), Raskutti, Wainwright and Yu (2011) 9

10 Sparse Additive Models C = { m R 4 : p = 0.5 p = 1 m m221 + m m222 L } Geometry π 12 C = π 13 C = p = 1.5 p = 2 p = 0.5 p =

11 Stationary Conditions Lagrangian L(f, λ, µ) = 1 2 E ( Y p j=1 m j(x j ) ) 2 + λ p j=1 E(m 2 j (X j)) Let R j = Y k j m k(x k ) be jth residual. Stationary condition where v j E(mj 2) satisfies m j E(R j X j ) + λv j = 0 a.e. v j = m j E(m 2 j ) if E(m 2 j ) 0 Evj 2 1 otherwise 11

12 Stationary Conditions Rewriting, m j + λv j = E(R j X j ) P j λ 1 + m j = P j if E(P 2 E(mj 2) j ) > λ m j = 0 otherwise This implies m j = 1 λ E(Pj 2 ) + P j 12

13 SpAM Backfitting Algorithm Input: Data (X i, Y i ), regularization parameter λ. Iterate until convergence: For each j = 1,..., p: Compute residual: R j = Y k j m k (X k ) Estimate projection P j = E(R j X j ), smooth: P j = S j R j Estimate norm: s j = E[P j ] 2 ] Soft-threshold: m j [1 λŝj P j + Output: Estimator m(x i ) = j m j (X ij ). 13

14 Example: Boston Housing Data Predict house value Y from 10 covariates. We added 20 irrelevant (random) covariates to test the method. Y = house value; n = 506, p = 30. Y = β 0 + m 1 (crime) + m 2 (tax) + + m 30 (X 30 ) + ɛ. Note that m 11 = = m 30 = 0. We choose λ by minimizing the estimated risk. SpAM yields 6 nonzero functions. It correctly reports that m 11 = = m 30 = 0. 14

15 Example Fits

16 L 2 norms of fitted functions versus 1/λ Component Norms

17 RKHS Version Raskutti, Wainwright and Yu (2011) Sample optimization 1 min f n n ( y i i=1 p ) 2 m j (x ij ) + λ m j Hj + µ j j j=1 m j L2 (P n) where m j L2 (P n) = 1 n n i=1 m2 j (x ij). By Representer Theorem, with m j ( ) = K j α j, 1 min f n n ( y i i=1 p ) 2 K j α j + λ αj T K j α j + µ j j=1 j α T j K 2 j α j Finite dimensional SOCP, but no scalable algorithms known. 17

18 Open Problems Under what conditions do the backfitting algorithms converge? What guarantees can be given on the solution to the infinite dimensional optimization? Is it possible to simultaneously adapt to unknown smoothness and sparsity? 18

19 Multivariate Regression Y R q and X R p. Regression function M(X) = E(Y X). Linear model M(X) = BX where B R q p. Reduced rank regression: r = rank(b) C. Recent work has studied properties and high dimensional scaling of reduced rank regression where nuclear norm B := min(p,q) j=1 σ j (B) as convex surrogate for rank constraint (Yuan et al., 2007; Negahban and Wainwright, 2011) 19

20 Nonparametric Reduced Rank Regression Foygel, Horrell, Drton and Lafferty (2012) Nonparametric multivariate regression M(X) = (m 1 (X),..., m q (X)) T Each component an additive model m k (X) = p mj k (X j ) j=1 What is the nonparametric analogue of B penalty? 20

21 Recall: Sparse Vectors and l 1 Relaxation sparse vectors X 0 t convex hull X 1 t 21

22 Low-Rank Matrices and Convex Relaxation low rank matrices rank(x) t convex hull X t 22

23 Nuclear Norm Regularization Algorithms for nuclear norm minimization are a lot like iterative soft thresholding for lasso problems. To project a matrix B onto the nuclear norm ball X t: Compute the SVD: B = U diag(σ) V T Soft threshold the singular values: B U diag(soft λ (σ)) V T 23

24 Low Rank Functions What does it mean for a set of functions m 1 (x),..., m q (x) to be low rank? Let x 1,..., x n be a collection of points. We require the n q matrix M(x 1:n ) = [m k (x i )] is low rank. Stochastic setting: M = [m k (X i )]. Natural penalty is M = q σ s (M) = s=1 q s=1 λ s (M T M) Population version: M := Cov(M(X)) = Σ(M) 1/2 24

25 Constrained Rank Additive Models (CRAM) Let Σ j = Cov(M j ). Two natural penalties: Σ 1/2 + Σ 1/ (Σ 1/2 1 Σ 1/2 2 Σ 1/2 p ) Σ 1/2 p Population risk functional (first penalty) 1 Y 2 E M j (X j ) 2 + λ 2 j j Mj 25

26 Stationary Conditions { ( ) } 1F Subdifferential is F = E(FF ) + H where H sp 1, E(FH ) = 0, E(FF )H = 0 Let P(X) := E(Y X) and consider optimization 1 2 E Y M(X) λ M Let E(PP T ) = U diag(τ) U T be the SVD. Define M = U diag([1 λ/ τ] + ) U T P Then M is a stationary point of the optimization, satisfying E(Y X) = M(X) + λv (X) a.e., for some V M 26

27 CRAM Backfitting Algorithm (Penalty 1) Input: Data (X i, Y i ), regularization parameter λ. Iterate until convergence: For each j = 1,..., p: Compute residual: R j = Y k j f k (X k ) Estimate projection P j = E(R j X j ), smooth: P j = S j R j Compute SVD: 1 n P j PT j = U diag(τ) U T Soft-threshold: M j = U diag([1 λ/ τ] + )U T P j Output: Estimator M(X i ) = j M j (X ij ). 27

28 Example Data of Smith et al. (1962), chemical measurements for 33 individual urine specimens. q = 5 response variables: pigment creatinine, and the concentrations (in mg/ml) of phosphate, phosphorus, creatinine and choline. p = 3 covariates: weight of subject, volume and specific gravity of specimen. We use Penalty 2 with local linear smoothing. We take λ = 1 and bandwidth h =.3. 28

29 X j \ Y k pigment creatinine phosphate phosphorus choline weight volume spec. gravity

30 Statistical Scaling for Prediction Let F be class of matrices of functions that have a functional SVD M(X) = UDV (X) where E(V V ) = I, and V (X) = [v sj (X j )] with each v sj in a second-order Sobolev space. Define { ( ) } n 1/4 M n = M : M F, D = o. q + log(pq) Let M minimize the empirical risk 1 n class M n. Then R( M) inf M M n R(M) i Y i j M j(x ij ) 2 2 over the P 0. 30

31 Nonparametric CCA Canonical correlation analysis (CCA, Hotelling, 1936) is classical method for finding correlations between components of two random vectors X R p and Y R q. Sparse versions have been proposed for high dimensional data (Witten & Tibshirani, 2009) Sparse additive models can be extended to this setting. 31

32 Sparse Additive Functional CCA Balasubramanian, Puniyani and Lafferty (2012) Population version of optimization: max E (f (X)g(Y )) f F, g G p max E(fj 2 ) 1, j max E(gk 2 ) 1, k j=1 q k=1 subject to E(f 2 j ) C f E(g 2 k ) C g Estimated with analogues of SpAM backfitting, together with screening procedures. See ICML paper. 32

33 Regression vs. Graphical Models assumptions regression graphical models parametric lasso graphical lasso nonparametric sparse additive model nonparanormal 33

34 The Nonparanormal Liu, Lafferty and Wasserman, JMLR 2009 A random vector X = (X 1,..., X p ) T has a nonparanormal distribution X NPN(µ, Σ, f ) in case Z f (X) N(µ, Σ) where f (X) = (f 1 (X 1 ),..., f p (X p )). Joint density p X (x) = { 1 (2π) p/2 exp 1 } Σ 1/2 2 (f (x) p µ)t Σ 1 (f (x) µ) f j (x j) j=1 Semiparametric Gaussian copula 34

35 Examples 35

36 The Nonparanormal Define h j (x) = Φ 1 (F j (x)) where F j (x) = P(X j x). Let Λ be the covariance matrix of Z = h(x). Then X j X k X rest if and only if Λ 1 jk = 0. Hence we need to: 1 Estimate ĥj(x) = Φ 1 ( F j (x)). 2 Estimate covariance matrix of Z = ĥ(x) using the glasso. 36

37 Winsorizing the CDF Truncation to estimate F j for n > p: δ n 1 4n 1/4 π log n δ n 1 4n 1/4 π log n

38

39 Gene-Gene Interactions for Arabidopsis thaliana Dataset from Affymetrix microarrays, sample size n = 118, p = 40 genes (isoprenoid pathway). source: wikipedia.org 39

40 Example Results NPN glasso difference

41 Transformations for 3 Genes x x x18 These genes have highly non-normal marginal distributions. The graphs are different at these genes. 41

42 Graphs on the S&P 500 Data from Yahoo! Finance (finance.yahoo.com). Daily closing prices for 452 stocks in the S&P 500 between 2003 and 2008 (before onset of the financial crisis ). Log returns X tj = log ( S t,j /S t 1,j ). Winsorized to trim outliers. In following graphs, each node is a stock, and color indicates GICS industry. Consumer Discretionary Energy Health Care Information Technology Telecommunications Services Consumer Staples Financials Industrials Materials Utilities 42

43 S&P Data: Glasso vs. Nonparanormal difference common 43

44 The Nonparanormal SKEPTIC Liu, Han, Yuan, Lafferty & Wasserman, 2012 Assuming X NPN(f, Σ 0 ), we have where ρ is Spearman s rho: Empirical estimate: Σ 0 jk = 2 sin ( π 6 ρ jk ρ jk := Corr ( F j (X j ), F k (X k ) ). ) ρ jk = n i=1 (r j i r j )(rk i r k ) n i=1 (r j i r j ) 2 n. i=1 (r k i r k ) 2 Similar relation holds for Kendall s tau. 44

45 The Nonparanormal SKEPTIC Using a Hoeffding inequality for U-statistics, we get max Σ ρ jk jk Σ0 jk 3 2π log d + log n, 2 n with probability at least 1 1/n 2. Can thus estimate the covariance at the parametric rate Punch line: For graph and covariance estimation, no loss in statistical or computational efficiency comes from using Nonparanormal rather than Normal graphical model. 45

46 Conclusions Thresholded backfitting algorithms derived from subdifferential calculus RKHS formulations are problematic Theory for infinite dimensional optimizations still incomplete Many extensions possible: Nonparanormal component analysis, etc. Variations on additive models enjoy most of the good statistical and computational properties of sparse linear models, with relaxed assumptions We re building a toolbox for large scale, high dimensional nonparametric inference. 46

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Nonparametric Reduced Rank Regression

Nonparametric Reduced Rank Regression Nonparametric Reduced Rank Regression Rina Foygel,, Michael Horrell, Mathias Drton, and John Lafferty ß arxiv:1301.1919v1 [stat.ml] 9 Jan 2013 Department of Statistics Department of Statistics Department

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

Nonparametric Reduced Rank Regression

Nonparametric Reduced Rank Regression Nonparametric Reduced Rank Regression Rina Foygel,, Michael Horrell, Mathias Drton,, John Lafferty Department of Statistics Department of Statistics Department of Statistics Stanford University University

More information

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY 0 Final Report DISTRIBUTION A: Distribution approved for public

More information

Variable Selection in High Dimensional Convex Regression

Variable Selection in High Dimensional Convex Regression Sensing and Analysis of High-Dimensional Data UCL-Duke Workshop September 4 & 5, 2014 Variable Selection in High Dimensional Convex Regression John Lafferty Department of Statistics & Department of Computer

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Graphical Models for Non-Negative Data Using Generalized Score Matching

Graphical Models for Non-Negative Data Using Generalized Score Matching Graphical Models for Non-Negative Data Using Generalized Score Matching Shiqing Yu Mathias Drton Ali Shojaie University of Washington University of Washington University of Washington Abstract A common

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

High Dimensional Semiparametric Gaussian Copula Graphical Models

High Dimensional Semiparametric Gaussian Copula Graphical Models High Dimeional Semiparametric Gaussian Copula Graphical Models Han Liu, Fang Han, Ming Yuan, John Lafferty and Larry Wasserman February 9, 2012 arxiv:1202.2169v1 [stat.ml] 10 Feb 2012 Contents Abstract:

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

New Theory and Algorithms for Scalable Data Fusion

New Theory and Algorithms for Scalable Data Fusion AFRL-OSR-VA-TR-23-357 New Theory and Algorithms for Scalable Data Fusion Martin Wainwright UC Berkeley JULY 23 Final Report DISTRIBUTION A: Approved for public release. AIR FORCE RESEARCH LABORATORY AF

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Nonparametric Regression 10/ Larry Wasserman

Nonparametric Regression 10/ Larry Wasserman Nonparametric Regression 10/36-702 Larry Wasserman 1 Introduction Now we focus on the following problem: Given a sample (X 1, Y 1 ),..., (X n, Y n ), where X i R d and Y i R, estimate the regression function

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Fang Han Department of Biostatistics Johns Hopkins University Baltimore, MD 21210 fhan@jhsph.edu Han Liu Department

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Undirected Graphical Models 10/36-702

Undirected Graphical Models 10/36-702 Undirected Graphical Models 10/36-702 Contents 1 Marginal Correlation Graphs 2 2 Partial Correlation Graphs 10 3 Conditional Independence Graphs 13 3.1 Gaussian...................................... 13

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Standardization and the Group Lasso Penalty

Standardization and the Group Lasso Penalty Standardization and the Group Lasso Penalty Noah Simon and Rob Tibshirani Corresponding author, email: nsimon@stanfordedu Sequoia Hall, Stanford University, CA 9435 March, Abstract We re-examine the original

More information

Sparse Additive machine

Sparse Additive machine Sparse Additive machine Tuo Zhao Han Liu Department of Biostatistics and Computer Science, Johns Hopkins University Abstract We develop a high dimensional nonparametric classification method named sparse

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

STANDARDIZATION AND THE GROUP LASSO PENALTY

STANDARDIZATION AND THE GROUP LASSO PENALTY Statistica Sinica (0), 983-00 doi:http://dx.doi.org/0.5705/ss.0.075 STANDARDIZATION AND THE GROUP LASSO PENALTY Noah Simon and Robert Tibshirani Stanford University Abstract: We re-examine the original

More information

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Athanasios Kottas Department of Applied Mathematics and Statistics,

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Distributed ADMM for Gaussian Graphical Models Yaoliang Yu Lecture 29, April 29, 2015 Eric Xing @ CMU, 2005-2015 1 Networks / Graphs Eric Xing

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information