Sparse Learning and Distributed PCA. Jianqing Fan

Size: px
Start display at page:

Download "Sparse Learning and Distributed PCA. Jianqing Fan"

Transcription

1 w/ control of statistical errors and computing resources Jianqing Fan Princeton University

2 Coauthors Han Liu Qiang Sun Tong Zhang Dong Wang Kaizheng Wang Ziwei Zhu

3 Outline Computational Resources and Statistical Errors Sparse Learning: Complexity and Stat Error Distributed PCA: Communication and Stat Error Conclusion

4 Introduction

5 Data Tsunami Large and complex data arise routinely Biological Sci.: Genomics, Genetics, Neuroscience, Medicine Engineering: Machine learning, surveil. videos, social media, networks Natural Sci.: Astronomy, earth sciences, meterology. Social Sci: Economics, finance, business, and digital humanities

6 Big Data are ubiquitous Medicine Internet Business Science Big Data Finance Biological Science Digital Humanities Government Engineering EB ZB ZB ZB ZB

7 Deep Impact System: storage, communication, computation architectures Analysis: statistics, computation, optimization, privacy

8 Data Science Data Science Acquisition & Storage Computing Analysis Applications Applications

9 Statisticians Dream Efficient Methods Computable in Polynomial Time and Communicable

10 About this talk Control: Statistical error + computing resources Sparse Learning Technique: Folded concave penalty (complexity) Distributed PCA Approach: divide-and-conquer (communication)

11 Sparse Learning with control of computation complexity and statistical error Han Liu Qiang Sun Tong Zhang

12 A General Framework for Sparsity Learning Penalized M-estimator Fan & Li, 01; Negahban et al 12: { } β = arg min L(β) + P λ ( β ). β R d L(β): Quadratic, GLIM, Gausian pseuo-likelihood, Huber-loss etc. P λ ( ): Lasso, folded concave: SCAD or MCP. R P Pλ(βj) ( j) Lasso SCAD MCP β j

13 Convex and Folded-Concave Regularization Lasso is fast to compute, but introduces estimation bias. Inferior estimation rate: β β 2 s(logd)/n. Bickel, et al, 09; Negahban et al. 12;... Folded-concave estimators have oracle properties. Oracle estimation rate: β β 2 s/n, with beta-min cond.. Obtain oracle estimator with high probability But require heavy computation

14 Convex and Folded-Concave Regularization Lasso is fast to compute, but introduces estimation bias. Inferior estimation rate: β β 2 s(logd)/n. Bickel, et al, 09; Negahban et al. 12;... Folded-concave estimators have oracle properties. Oracle estimation rate: β β 2 s/n, with beta-min cond.. Obtain oracle estimator with high probability But require heavy computation

15 Nonconvex Penalized M-estimator Theoretical properties for the global optimum (or a specific local optimum), not obtainable in practice. Local min: Fan & Li (2001), Kim et al (2008), Fan & Lv (2011), Wang, Liu, Zhang (2014), Loh & Wainwright (2014),... Global min: Zhang & Zhang (2012), Kim & Kwon (2012), Negahban et al. (2012), Agarwal et al (2012),... Practical algorithms w/o theoretical guarantee: coordinate descent algorithm Friedman, et al., 08; Breheny & Huang, 11. What computed is not what proved.

16 Nonconvex Penalized M-estimator Theoretical properties for the global optimum (or a specific local optimum), not obtainable in practice. Local min: Fan & Li (2001), Kim et al (2008), Fan & Lv (2011), Wang, Liu, Zhang (2014), Loh & Wainwright (2014),... Global min: Zhang & Zhang (2012), Kim & Kwon (2012), Negahban et al. (2012), Agarwal et al (2012),... Practical algorithms w/o theoretical guarantee: coordinate descent algorithm Friedman, et al., 08; Breheny & Huang, 11. What computed is not what proved.

17 Algorithmic Approach to Sparse Learning

18 Folded Concave Penalized Regression argminl(β) + d j=1 p λ( β j ) ILQA LLA (0) Given a good initial estimator β LQA and LLA LQA : p (a) λ ( β j ) = p λ ( β 0 j ) + 0.5p λ ( β0 j )( β j 2 β 0 j 2 ) Fan & Li, 01 LLA : p λ ( β j ) = p λ ( β 0 j ) + p λ ( β0 j )( β j β 0 j ) Zou & Li (2008) Iterative use is an MM algorithm. Hunter & Li, 05. LQA LLA L LQA LLA

19 Local Linear Approximation Given β 0 = 0, compute β(1) β(2) { = arg min { = arg min L(β) + L(β) + d j=1 d j= } λ (0) j β j, λ (0) j = p λ (0) ( β j ) } λ (1) j β j, λ (1) j = p λ (1) ( β j ) How to solve each step? When to stop?

20 A Scalable Solution Isotropic LQA: L( β (0) )+ L( β (0) ),β β (0) + φ 2 β β (0) 2. Avoid storage and computation of Hessian Analytic solution with p λ ( β ): LLA d j=1 w j β j. Related to (proximal) gradient methods Nesterov, 83, 13; Tseng 08; Boyd & Vandenberghe, 09, Agarwal et al, Used by Loh and Wainwright, 14; Wang, Liu, Zhang, 14 Majorization requres φ 2 L( β (0) ), but majorization requires only locally, solved by line search.

21 A Scalable Solution Isotropic LQA: L( β (0) )+ L( β (0) ),β β (0) + φ 2 β β (0) 2. Avoid storage and computation of Hessian Analytic solution with p λ ( β ): LLA d j=1 w j β j. Related to (proximal) gradient methods Nesterov, 83, 13; Tseng 08; Boyd & Vandenberghe, 09, Agarwal et al, Used by Loh and Wainwright, 14; Wang, Liu, Zhang, 14 Majorization requres φ 2 L( β (0) ), but majorization requires only locally, solved by line search.

22 (Local Adaptive) MM Algorithm MM Principle: to minimize a general function f(β) Majorize f(β) by g k (β) at point β (k) : f(β (k) ) = g k (β (k) ) and f(β) g k (β) Compute β (k+1) { = argmin β gk (β) } LQA and LLA LQA LLA Property: The target values decrease: (a) f(β (k+1) ) g k (β (k+1) ) g }{{} k (β (k) ) = f(β (k) ). local requirement

23 Solution to General Sparse Learning Idea: Iteratively refine the estimator via convex program: β(0) β(1) β(2) β(t ), { β(l) =arg min L(β)+ λ (l 1) β 1 },l=1,...,t. β R d LLA to penalty λ (l 1) = (p λ (l 1) ( β 1 ),...,p λ (l 1) ( β d )) T Each is solved approximately by iterative use of LAMM.

24 Algorithm with Controlled Complexity Contraction stage: l=1 with optimization error ε c. Tightening stage: 2 l T with optimization error ε t. ε c ε t : for Gaussian noise ε c n 1 logd, ε t n 1. (0) : (1) : (1,0) =0 (2,0) = (1)... (1,k 1) = (1), (2,1)... (2,k 2) = (2), LAMM LAMM LAMM (1,1) LAMM LAMM LAMM k 1 2 c ; k 2 log( 1 t );... (T 1) : LAMM LAMM... LAMM... (T,0) = (T 1) (T,1)... (T,k T ) = (T ), k T log( 1 t ). T log(λ n) log(logd).

25 Algorithmic Convergence: Phase Transition Contracting stage: sub-linear convergence 1/ k, not a strong convex problem, but gives a sparse solution Computational Rate for Constant Correlation Design (,k) ( ) log contraction stage : = 1 tightening stage : = 2 tightening stage : = 3 tightening stage : = iteration count Tightening stage: linear convergence ρ k, strong convex

26 Effects on Statistical Errors Contraction region: Sparse strong convexity and smoothness. s/n (T ) (T 1) (1) (2) (0) s β (l) β 2 C s/n +δ β (l 1) β }{{} 2, for a δ (0,1). stat error

27 Theoretical Properties

28 Localized Sparse Eigenvalue Definition (Localized Sparse Eigenvalue) ρ + (m,r)=sup{u T J 2 L(β)u J : u J 2 2 =1, J m, β β 2 r}; u,β ρ (m,r) is defined by taking inf. Assumeption 1 (LSE Condition) Let s = β 0. There exists r λ s such that 0 < ρ ρ (6s,r) < ρ + (6s,r) ρ < +. Other Assump: min j S β j λ. p λ (αλ) = 0 for large α.

29 Localized Sparse Eigenvalue Definition (Localized Sparse Eigenvalue) ρ + (m,r)=sup{u T J 2 L(β)u J : u J 2 2 =1, J m, β β 2 r}; u,β ρ (m,r) is defined by taking inf. Assumeption 1 (LSE Condition) Let s = β 0. There exists r λ s such that 0 < ρ ρ (6s,r) < ρ + (6s,r) ρ < +. Other Assump: min j S β j λ. p λ (αλ) = 0 for large α.

30 Statistical Theory Theorem (Optimal Statistical Rate) Contracting property: there exists a δ (0, 1) such that β (l) β 2 C s/n + δ β (l 1) β 2, for l 2, If we take T log(λ n), then β (T ) β 2 s/n. The result is deterministic: ( L(β ) + ε c ) λ r/ s. For l = 1, β (1) β 2 λ s s log(d)/n, not enough for post-lasso.

31 Statistical Theory Theorem (Optimal Statistical Rate) Contracting property: there exists a δ (0, 1) such that β (l) β 2 C s/n + δ β (l 1) β 2, for l 2, If we take T log(λ n), then β (T ) β 2 s/n. The result is deterministic: ( L(β ) + ε c ) λ r/ s. For l = 1, β (1) β 2 λ s s log(d)/n, not enough for post-lasso.

32 Computational Theory Theorem (Algorithmic Complexity) The total number of ilamm iterations needed is C 1 ( + ε 2 (T 1)C 1 ) log, c ε t The number of LAMM iterations is O(n 2 /logd + log(logn) logn) Initial estimator is cruder, but dominates computation time.

33 Computational Theory Theorem (Algorithmic Complexity) The total number of ilamm iterations needed is C 1 ( + ε 2 (T 1)C 1 ) log, c ε t The number of LAMM iterations is O(n 2 /logd + log(logn) logn) Initial estimator is cruder, but dominates computation time.

34 Conclusion ILAMM provides a statistical optimization algorithm to solve nonconvex optimization problems. Examine iteration and optimization error effects. Propose weaker conditions which require localized analysis. All approximate solutions of ilamm enjoy optimal statistical convergence rate with controlled algorithmic complexity.

35 Distributed PCA Dong Wang Kaizheng Wang Ziwei Zhu

36 Principal Component Analysis Fundamental tools for statistical machine learning Dim. reduction, summarize data, latent factors Rates of convergence: v 1 v 1 2 = O P ( d nλ ) d dimension, n sample size, λ eigengap. (Paul 07; Jung et al. 09; Johnstone and Lu 12, Shen et al. 16; Wang and Fan, 17). What for distributed PCA?

37 Principal Component Analysis Fundamental tools for statistical machine learning Dim. reduction, summarize data, latent factors Rates of convergence: v 1 v 1 2 = O P ( d nλ ) d dimension, n sample size, λ eigengap. (Paul 07; Jung et al. 09; Johnstone and Lu 12, Shen et al. 16; Wang and Fan, 17). What for distributed PCA?

38 Problem Setup N samples stored on m servers. i.i.d. sub-gaussian with cov. Σ. generalized to {Σ (l) } m l=1 Spectral-decomp: Σ = VΛV T. Interest: column space of V K Error: ρ( VK,V K ) = VK VT K V K V T K F.

39 Distributed PCA Communications: Kd vectors each machine Aggregation: Ṽ K = argmin UK :d K orthorn Spill-over: Send more than K eigenvectors 1 m m l=1 ρ2 (U K, V(l) K )

40 Statistical Error Analysis Bias and Sample Variance

41 Bias and Variance V K : top-k eigenvectors of Σ = E[ V(l) K ( V(l) K )T ]. ρ(ṽ K,V K ) ρ(ṽ K,V }{{ K) } +ρ(v K,V K ). }{{} sample variance term bias term Sample Variance Term: 1 m m i=1 V (l) K V (l)t K Σ = ρ(ṽ K,V K ) 0

42 Analysis of Sample Variance Theorem 1 (variance) Let r(σ) = (E X 2 ) 2 / Σ ( effective rank) ( V(l) V (l)t K K Σ Σ Kr(Σ) ) F ψ1 = O, λ K λ K +1 n ( ρ(ṽ K,V Σ Kr(Σ) ) K ) ψ1 = O. λ K λ K +1 mn X ψ1 a n implies X = O P (a n ) with exp. concentration.

43 Analysis of Bias ρ(v K,V K ) = V K (V K )T V K V T K F. Comparisons: 1 Σ(l) PCA V(l) K ( V(l) expectation K )T 2 Σ(l) expectation Σ PCA V K V T K Σ PCA V K (V K )T When are they the same?

44 Analysis of Bias ρ(v K,V K ) = V K (V K )T V K V T K F. Comparisons: 1 Σ(l) PCA V(l) K ( V(l) expectation K )T 2 Σ(l) expectation Σ PCA V K V T K Σ PCA V K (V K )T When are they the same?

45 Unbiasedness Definition: Z is symmetric if Z d = I j Z for all j. X with cov Σ = VΛV T has symmetric innovation if V T X is symmetric. Theorem 2 (unbiasedness) If X has sign symmetric innovation, then Σ and Σ share the same set of eigenvectors. If ρ( V(l) K,V K ) ψ1 1/4, then ρ(v K,V K ) = 0.

46 Analysis of Bias: General Case Use Taylor s expansion and perturbation: V(l) K and V K are top K eigenvectors of Σ (l) and Σ, then V (l) K ( V(l) K )T = V K V T K + linear in ( Σ (l) Σ) + O P ( Σ (l) Σ 2 ). Σ = V K V T K + O( Σ (l) Σ 2 ). Theorem 3 (general case bias) ( Σ ρ(v 2 K r(σ) ) K,V K ) = O. n(λ K λ K +1 ) 2

47 Statistical Error Analysis Theorem 4 (MSE) Assume that n K r(σ)( Σ /(λ K λ K +1 )) 2. ) 1 Symmetric dist: ρ(ṽ K,V K ) ψ1 = O( Σ Kr(Σ) λ K λ ; K +1 N 2 General dist: For universal constants C 1 and C 2, ρ(ṽ K,V K ) ψ1 C 1 Σ Kr(Σ) + C 2 Σ 2 Kr(Σ) λ K λ K +1 N n(λ }{{} K λ K +1 ) }{{ 2 } Sample Variance Bias If m C 3 /Bias, then Bias is negligible. Same rate as PCA on the whole data. Extend to heterogeneous data The required n is sharp

48 Simulation Study

49 Simulation Setting {x i R d } mn i.i.d. i=1 N(0,Σ). Σ = diag(λ, λ 2, λ,1,,1) 4 K = 3, V K = (e 1,e 2,e 3 ), eigengap δ = λ 4 1 When λ = O(d), we have ρ(ṽ K,V K ) ψ1 = ( d ) O mnδ Error computed based on 100 simulations

50 Statistical Error Rate: ρ(ṽ K,V K ) v.s. m n = 500 n = 1000 n = 2000 n = 4000 log(error) d=100 d=200 d=400 d=800 d= log(m) log(error) d=100 d=200 d=400 d=800 d= log(m) log(error) d=100 d=200 d=400 d=800 d= log(m) log(error) d=100 d=200 d=400 d=800 d= log(m) ρ(ṽ K,V K ) m 1/2 Also verify ρ(ṽ K,V K ) n 1/2,d 1/2,δ 1/2

51 Statistical Error Rate: ρ(ṽ K,V K ) v.s. {d, m, n and δ} Multiple regression: log(ρ(ṽ K,V K )) = β 0 +β 1 log(d)+β 2 log(m)+β 3 log(n)+β 4 log(δ)+ε Estimated coefficients: β1 β2 β3 β Multiple R 2 :

52 Statistical Error Rate: ρ(ṽ K,V K ) v.s. {d, m, n and δ} Fitted Observed Figure: Observed and fitted values of log(ρ(ṽ K,V K ))

53 Comparison of Three Approaches 1 DP: Distributed PCA 2 FP: Full sample PCA 3 DP5: Distributed PCA with 5 extra eigenvectors Purpose: Examine spill-over effect

54 Comparison of Three Approaches: λ = 30 m = 5 m = 10 m = 20 m = 50 log(error) DP FP DP5 log(error) DP FP DP5 log(error) DP FP DP5 log(error) DP FP DP log(n) log(n) log(n) log(n) Figure: Comparison between DP, FP and DP5 when λ = 30 and d = Little Spill Over!

55 Conclusion Distributed PCA is unbiased when symmetric. In general, the bias is controllable when m is not too large. Distributed PCA has aggregated variance effect. Enjoy the same performance as PCA on full data

56 The End g{tç~ léâ

Guarding against Spurious Discoveries in High Dimension. Jianqing Fan

Guarding against Spurious Discoveries in High Dimension. Jianqing Fan in High Dimension Jianqing Fan Princeton University with Wen-Xin Zhou September 30, 2016 Outline 1 Introduction 2 Spurious correlation and random geometry 3 Goodness Of Spurious Fit (GOSF) 4 Asymptotic

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009)

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) A UNIFIED APPROACH TO MODEL SELECTION AND SPARSE RECOVERY USING REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) Mar. 19. 2010 Outline 1 2 Sideline information Notations

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING. BY HONGCHENG LIU 1,TAO YAO 1 AND RUNZE LI 2 Pennsylvania State University

GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING. BY HONGCHENG LIU 1,TAO YAO 1 AND RUNZE LI 2 Pennsylvania State University The Annals of Statistics 2016, Vol. 44, No. 2, 629 659 DOI: 10.1214/15-AOS1380 Institute of Mathematical Statistics, 2016 GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING BY HONGCHENG LIU

More information

arxiv: v7 [stat.ml] 9 Feb 2017

arxiv: v7 [stat.ml] 9 Feb 2017 Submitted to the Annals of Applied Statistics arxiv: arxiv:141.7477 PATHWISE COORDINATE OPTIMIZATION FOR SPARSE LEARNING: ALGORITHM AND THEORY By Tuo Zhao, Han Liu and Tong Zhang Georgia Tech, Princeton

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

arxiv: v1 [math.st] 24 Mar 2016

arxiv: v1 [math.st] 24 Mar 2016 The Annals of Statistics 2016, Vol. 44, No. 2, 629 659 DOI: 10.1214/15-AOS1380 c Institute of Mathematical Statistics, 2016 arxiv:1603.07531v1 [math.st] 24 Mar 2016 GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Statistical Sparse Online Regression: A Diffusion Approximation Perspective

Statistical Sparse Online Regression: A Diffusion Approximation Perspective Statistical Sparse Online Regression: A Diffusion Approximation Perspective Jianqing Fan Wenyan Gong Chris Junchi Li Qiang Sun Princeton University Princeton University Princeton University University

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

Analysis of Multi-stage Convex Relaxation for Sparse Regularization

Analysis of Multi-stage Convex Relaxation for Sparse Regularization Journal of Machine Learning Research 11 (2010) 1081-1107 Submitted 5/09; Revised 1/10; Published 3/10 Analysis of Multi-stage Convex Relaxation for Sparse Regularization Tong Zhang Statistics Department

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

A Constructive Approach to L 0 Penalized Regression

A Constructive Approach to L 0 Penalized Regression Journal of Machine Learning Research 9 (208) -37 Submitted 4/7; Revised 6/8; Published 8/8 A Constructive Approach to L 0 Penalized Regression Jian Huang Department of Applied Mathematics The Hong Kong

More information

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Sparse Covariance Matrix Estimation with Eigenvalue Constraints Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

VARIABLE SELECTION IN QUANTILE REGRESSION

VARIABLE SELECTION IN QUANTILE REGRESSION Statistica Sinica 19 (2009), 801-817 VARIABLE SELECTION IN QUANTILE REGRESSION Yichao Wu and Yufeng Liu North Carolina State University and University of North Carolina, Chapel Hill Abstract: After its

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

Estimation of High-dimensional Vector Autoregressive (VAR) models

Estimation of High-dimensional Vector Autoregressive (VAR) models Estimation of High-dimensional Vector Autoregressive (VAR) models George Michailidis Department of Statistics, University of Michigan www.stat.lsa.umich.edu/ gmichail CANSSI-SAMSI Workshop, Fields Institute,

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Sta$s$cal Op$miza$on for Big Data. Zhaoran Wang and Han Liu (Joint work with Tong Zhang)

Sta$s$cal Op$miza$on for Big Data. Zhaoran Wang and Han Liu (Joint work with Tong Zhang) Sta$s$cal Op$miza$on for Big Data Zhaoran Wang and Han Liu (Joint work with Tong Zhang) Big Data Movement Big Data = Massive Data- size + High Dimensional + Complex Structural + Highly Noisy Big Data give

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

arxiv: v5 [cs.lg] 23 Dec 2017 Abstract

arxiv: v5 [cs.lg] 23 Dec 2017 Abstract Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, and Tuo Zhao arxiv:1605.02711v5 [cs.lg] 23 Dec 2017 Abstract We

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS A Dissertation in Statistics by Ye Yu c 2015 Ye Yu Submitted

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Functional SVD for Big Data

Functional SVD for Big Data Functional SVD for Big Data Pan Chao April 23, 2014 Pan Chao Functional SVD for Big Data April 23, 2014 1 / 24 Outline 1 One-Way Functional SVD a) Interpretation b) Robustness c) CV/GCV 2 Two-Way Problem

More information

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information