Variable selection in high-dimensional regression problems

Size: px
Start display at page:

Download "Variable selection in high-dimensional regression problems"

Transcription

1 Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and M. Kubkowski

2 Outline Introduction: Penalized empirical risk minimisation Variable selection for linear and logistic regression Variable selection for misspecified logistic model - some comments

3 Penalized Empirical Risk Minimization (PERM) Data form: (y, x T ): y- response (quantitative or nominal), x = (x 1,..., x p ) T R p : vector of predictors. Penalized risk minimization framework: Data = {(y 1, x T 1 ),..., (y n, x T n )} = Train Valid Test β- model parameter, λ - penalty Fitting: β(λ) = arg min{err(β, Train) + penalty(β, λ)} β

4 Penalized Empirical Risk Minimization (PERM) Data form: (y, x T ): y- response (quantitative or nominal), x = (x 1,..., x p ) T R p : vector of predictors. Penalized risk minimization framework: Data = {(y 1, x T 1 ),..., (y n, x T n )} = Train Valid Test β- model parameter, λ - penalty Fitting: β(λ) = arg min{err(β, Train) + penalty(β, λ)} β Selection: λ ) = arg min err ( β(λ), Valid λ

5 Penalized Empirical Risk Minimization (PERM) Data form: (y, x T ): y- response (quantitative or nominal), x = (x 1,..., x p ) T R p : vector of predictors. Penalized risk minimization framework: Data = {(y 1, x T 1 ),..., (y n, x T n )} = Train Valid Test β- model parameter, λ - penalty Fitting: β(λ) = arg min{err(β, Train) + penalty(β, λ)} β Selection: λ ) = arg min err ( β(λ), Valid λ ) Assessment: êrr = err ( β( λ), Test

6 Penalized Empirical Risk Minimization Empirical risk err is generalization of prediction error and negative log-likelihood err(β, Train) = n L(y i, f (x i, β)) which is (usually) a convex function of β. L(y, f ): loss function. i=1

7 Penalized Empirical Risk Minimization Empirical risk err is generalization of prediction error and negative log-likelihood err(β, Train) = n L(y i, f (x i, β)) which is (usually) a convex function of β. L(y, f ): loss function. i=1 β = (β 1,..., β p ) T penalty(β, λ) = p P λ ( β j ) j=1 λ1(t > 0) P λ (t) λt 2

8 Classification loss functions Regression loss functions L binomial hinge square/4 prediction error L square eps insensitive Huber Margin y*f y f

9 Classical Penalty Functions Ridge Regression l 2 -penalty (Hoerl and Kennard (1970)) P λ (t) = λt 2

10 Classical Penalty Functions Ridge Regression l 2 -penalty (Hoerl and Kennard (1970)) P λ (t) = λt 2 Generalized Information Criterion (GIC AIC,BIC) l 0 -penalty (Nishi (1970)) P λ (t) = 2λ1(t > 0) Chen, Donoho, 1995, Tibshirani, 1996:

11 Classical Penalty Functions Ridge Regression l 2 -penalty (Hoerl and Kennard (1970)) P λ (t) = λt 2 Generalized Information Criterion (GIC AIC,BIC) l 0 -penalty (Nishi (1970)) P λ (t) = 2λ1(t > 0) Chen, Donoho, 1995, Tibshirani, 1996: Lasso l 1 -penalty P λ (t) = λt Important for high-dimensional problems: sparseness of the solution for Lasso induced by P λ (0+ ) > 0.

12 Validation criteria Choice of penalty: ) λ = arg min err ( β(λ), Valid λ err( β) = Ê β β 2 (estimation error) err( β) = Ê X( β β) 2 (prediction error) err( β) = P(yx T ˆβ < 0) (classification error) err( β) = P(supp β suppβ) (selection error) others: FDR control etc.

13 Selection consistency Selection consistency P( ˆT T ) is negligible for large n or equivalently Type I and II errors negligible for large n. Explanatory value; Fundamental property for correctness of post-model-selection inference.

14 Linear predictive models Why linear regression is so important? Linear predictive model is the cornerstone od prediction Ŷ = g(x T ˆβ) examples: neural nets, compressed sensing, generalized linear models (GLM), ARMA models etc. Linear model solution for two class classification problem works well.. It is not a fluke!

15 Linear model y = (y 1,..., y n ) T, X = [x 1.,..., x n. ] T = [x.1,..., x.p ]. y T 1 n = 0 and the columns are standardized: x Ṭ j 1 n = 0, x Ṭ j x.j = 1 for j = 1,..., p. Linear Regression Model y i = p β j x ij + ε i, j=1 i = 1,..., n ε = (ε 1,..., ε n ) T R n iid zero-mean errors.

16 High dimensionality and sparsity Aim. Operational algorithms of risk minimisation which work in high-dimensional setting. Two features of the problem: High-dimensionality: p > n or p >> n NP-dimensionality p exp(n α ) for some α > 0;

17 High dimensionality and sparsity Aim. Operational algorithms of risk minimisation which work in high-dimensional setting. Two features of the problem: High-dimensionality: p > n or p >> n NP-dimensionality p exp(n α ) for some α > 0; Sparsity: active set T = {i : β i 0} satisfies (bet on sparsity) T << min(n, p)

18 Bet on sparsity

19 Bet on sparsity (statistical insight) Consider ˆβ OLS T as an oracle benchmark. Then E X ˆβ OLS T Xβ 2 = σ 2 T n. Useless when T n. Simple approaches as OLS for all predictors p > n: not working (perfect fit on training data). Penalized approaches valuable as they can yield sparsity of the solution.

20 LASSO estimator in linear model Least Absolute Shrinkage and Selection Operator: β L β L (λ) = argmin γ { y Xγ 2 + 2λ γ 1 }

21 Penalty Functions: Lasso versus Ridge β 2 β 2 β β β 1 β 1

22 Inclusion of predictors by Lasso for prostate data Coefficients Log Lambda

23 Lasso acts as selector ˆT L = { ˆβ i,l 0},however Strong regularity conditions of moment matrix X T X are needed to ensure selection consistency of Lasso

24 Lasso acts as selector ˆT L = { ˆβ i,l 0},however Strong regularity conditions of moment matrix X T X are needed to ensure selection consistency of Lasso Lasso shrinks and selects at the same time. It tries to compensate for shrinked coefficient of a true predictor by incorporating a spurious predictor correlated with the true one.

25 Lasso acts as selector ˆT L = { ˆβ i,l 0},however Strong regularity conditions of moment matrix X T X are needed to ensure selection consistency of Lasso Lasso shrinks and selects at the same time. It tries to compensate for shrinked coefficient of a true predictor by incorporating a spurious predictor correlated with the true one. Lasso is rather screening than selection operator. (Bühlmann (2011)). Owns its popularity to easiness of computation (LARS for linear regression, Coordinate Descent for other frameworks)

26 Post-Lasso World Folded Concave Penalties (FCP): P λ (t) is increasing, concave and P λ (0) = 0; P λ (0+ ) > 0; P λ (t)= constant for t > γλ for some γ > 1;... Much more difficult algorithmically, but some approximate solutions such as LLA exist. SCAD, MCP, capped l 1 FCP

27 Post-Lasso World Folded Concave Penalties (FCP): P λ (t) is increasing, concave and P λ (0) = 0; P λ (0+ ) > 0; P λ (t)= constant for t > γλ for some γ > 1;... Much more difficult algorithmically, but some approximate solutions such as LLA exist. SCAD, MCP, capped l 1 FCP GIC MCP Lasso RR MCP approximates more closely l 0 penalty then Lasso.

28 Minimax Concave Penalty MCP thresholding functions P gamma = 25 gamma = 2.5 gamma = 1.1 β gamma = 25 gamma = 2.5 gamma = t β

29 Three properties of Lasso which can be used (at a price of conditions!) Selection Consistency (T = {i : β i 0}) ˆT L = T min ˆβ L,i > max ˆβ L,i = 0 i T i T Never satisfied under realistic assumptions.

30 Three properties of Lasso which can be used (at a price of conditions!) Selection Consistency (T = {i : β i 0}) ˆT L = T min ˆβ L,i > max ˆβ L,i = 0 i T i T Never satisfied under realistic assumptions. Separation: Lasso separates T from T : min ˆβ L,i > max ˆβ L,i i T i T Frequently fails (Su et al (2015)).

31 Three properties of Lasso which can be used (at a price of conditions!) Selection Consistency (T = {i : β i 0}) ˆT L = T min ˆβ L,i > max ˆβ L,i = 0 i T i T Never satisfied under realistic assumptions. Separation: Lasso separates T from T : min ˆβ L,i > max ˆβ L,i i T i T Frequently fails (Su et al (2015)). Screening: Lasso yields screening: ˆT L T min i T ˆβ L,i > 0 Holds under much milder conditions, Zou, 2006.

32 Our generic approach SOS algorithm, P. Pokarowski, JM, JMLR(2015) Use Lasso as a Screening Method; Order the obtained variables; Use Generalized Information Criterion.

33 Screening-Selection (SS) procedure Version of SOS (JMLR (2015)) with O removed.. Algorithm 1 SS Input: y, X and λ Screening (Lasso) β β(λ) = argmin γ { y Xγ 2 + 2λ γ 1 } ; order nonzero coefficients: β j1 β j2... β js, where s = supp β ; set J = {{j 1 }, {j 1, j 2 },..., {j 1,..., j s }}; Selection (GIC) T = argmin J J { SSEJ + λ 2 J } Output: β SS = (X T b T X bt ) 1 X T b T y

34 Limitations on selection consistency (statistical insight) To detect active set: dependence between active set and its complement has to be not to strong, or X T X = 2 β β T y Xβ 2 /2 is not too degenerate. What does this mean for p >> n?.

35 Figure: Strict convexity of risk over a certain cone (Tibshirani et al 2015))

36 Feasibility parameters Sign-restricted identifiability factor (SCIF) ζ T,a = X T Xν inf ν C T,a ν where C T,a for a (0, 1) is a certain cone. Restriction to C T,a ensures ζ T,a > 0 for many high-dimensional designs.

37 Feasibility parameters Sign-restricted identifiability factor (SCIF) ζ T,a = X T Xν inf ν C T,a ν where C T,a for a (0, 1) is a certain cone. Restriction to C T,a ensures ζ T,a > 0 for many high-dimensional designs. Scaled K-L distance Scaled K-L distance between T and its submodels is δ T = min J T (I H J )X T β T 2. T \ J

38 Bound for P( ˆT SS T ) (PP & JM, 2015) Theorem Under mild assumptions on feasibility parameters we have ( ) P( ˆT SS T ) p exp λ2 2σ 2 (some constants are omitted) For true regressors to be distinguishable from the noise β min = min i T β i has to be sufficiently large. Thus the condition ζ 2 T,aβ 2 min C > 0

39 Algorithm 2 SSnet (Screening Selection algorithm on a net of λs) Input: y, X and (λ 0, λ 1,..., λ m ) T Screening (Lasso) for k = 1 to m do β (k) β(λ { k ) = argmin γ y Xγ 2 + 2λ k γ } ; order nonzero coefficients: β (k) j 1 β (k) j 2... β (k) j sk, where s k = supp β (k) ; } set J k (y) = {{j 1 }, {j 1, j 2 },..., supp β (k) end for Selection (GIC) J(y) = m k=1 J k(y) { T = argmin J J(y) RJ + λ 2 0 J } Output: T, β SSnet = (X T b T X bt ) 1 X T b T y

40 SOSnet algorithm Use Lasso with λ i = 0, 1,..., m to choose set of predictors I i ; Fit linear model y x Ii, i = 0, 1,..., m; Order predictors according to (t-statistics) 2 ; Construct M = nested models ; Use GIC on M to choose a final model.

41 Numerical experiments Four groups of algorithms SS, SSnet, SOSnet MCP calibrated by GIC (sparsenet) MCP calibrated by CV (sparsenet, two settings) MCP (a = 1, 5 and a = 3) (PLUS) λ = σ 2 log(p), Penalization term for GIC: cλ 2 with three values of c {1, 1.5, 2, 2.5, 3, 3.5, 4}.

42 Experiments cont d M I: β 1 = (3, 1.5, 0, 0, 2, 0 T p 5 )T from Wang et al (2013) (p = 3000) M II: β 2 = (0 T p 10, ±2,, ±2)T Wang et al (2014) (p = 2000) signs ± chosen separately for every run. x 1,..., x p : normal with autoregressive (exp. a: ρ = 0.5,b: ρ = 0.7 ) or equicorrelated (exp. c: ρ = 0.5,d: ρ = 0.7 ) structure. n = 100 (M I) and n = 200 (M II).

43 Table: True Model selection (TM) (%). Exp 1a Exp 1b Exp 1c Exp 1d Exp 2a Exp 2b Exp 2c Exp 2d SS c SS c SS c SSnet c SSnet c SSnet c SOSnet c SOSnet c SOSnet c sparsenet c sparsenet c sparsenet c sparsenet p.1se sparsenet p.min mcp mcp

44 Table: Relative Mean Squared Error (MSE) Exp 1a Exp 1b Exp 1c Exp 1d Exp 2a Exp 2b Exp 2c Exp 2d SS c SS c SS c SSnet c SSnet c SSnet c SOSnet c SOSnet c SOSnet c sparsenet c sparsenet c sparsenet c sparsenet p.1se sparsenet p.min mcp mcp

45 Comments on results SOSnet: higher correct selection probability and lower MSE simultaneously in almost all experimental setups. The difference is most pronounced for higher correlations. Times for SOSnet > 2 times shorter than for sparsenet + GIC, >4 times shorter than for sparsenet + CV, > 20 times shorter than for PLUS implementation. Sparsenet tuned by GIC works much better than tuned by CV.

46 Binary case: remarks Conceptually the same. Change of a loss function, usually to logistic. More difficult algorithmically.

47 Binary case: remarks Conceptually the same. Change of a loss function, usually to logistic. More difficult algorithmically. Theoretical analysis more difficult due to heteroscedasticity of response.

48 Binary case: remarks Conceptually the same. Change of a loss function, usually to logistic. More difficult algorithmically. Theoretical analysis more difficult due to heteroscedasticity of response. NP-dimensional case: Filtering based on ranking of univariate fits (e.g.sis, Fan et al (2009)) and then PERM analysis to chosen subset.

49 Binary case: remarks Conceptually the same. Change of a loss function, usually to logistic. More difficult algorithmically. Theoretical analysis more difficult due to heteroscedasticity of response. NP-dimensional case: Filtering based on ranking of univariate fits (e.g.sis, Fan et al (2009)) and then PERM analysis to chosen subset. Fitting univariate (e.g. logistic) models to multivariate logistic data is an ultimate type of model misspecification.

50 Misspecified logistic model Different angle: Logistic loss in empirical risk minimisation fitting a logistic model. Data comes from binary model P(Y = 1 X) = q(β 0 + β T X) X is random variable in R p and q response function q p, p(β 0 + β T x) = exp(β 0 + β T x) 1 + exp(β 0 + β T x) is most frequently used tool to model dependence of binary outcome on attributes.

51 Important special cases: Omission of (some) valid predictors from logistic model itself, filters in particular. What happens when we misspecify response function and use logistic response p instead of q? Some bias in estimation of β surely occurs, but how important is an error? It is obvious that we cannot learn β when q is arbitrary, but what about direction of β? Can we learn suppβ?

52 Yes, we can (frequently, at least)

53 Simpler framework: minimization of empirical risk (p < n) ML ( ˆβ 0, ˆβ ML ) = arg min γ0,γ err(γ 0, γ). Using ( ˆβ 0 ML, ˆβ 0 ML) we estimate not β 0 and β but β0 and β such that (β 0β ) = argmin b0,b R pe XKL(q(β 0 + X T β), p(b 0 + X T b)), where KL(q, p) = q log ( ) ( ) q 1 q + (1 q) log p 1 p is Kullback-Leibler distance between two Bernoulli distributions with probabilities of success q and p.

54 What can go wrong.... X 2 (X 1 + ε) 2, P(y = 1 x) = q(x 1 ) 3 2 X2 1 0 q(x1) X1 Figure: Squares and triangles correspond to Y = 1 and Y = 0. Solid line shows the direction of ˆβ.

55 Positive result: Ruud s theorem (1983) Assume that distribution of X is nondegenerate and such that regressions with respect to β T X are linear E(X β T X) = uβ T X + u 0. (R) Then there exists η such that β = ηβ Important: η 0?

56 Relevance for selection of predictors (p < n) Order variables according to their residual deviances Dev f \{i1 } Dev f \{i2 }.. Dev f \{ip} and minimize GIC in the corresponding nested family. Then if (R) is satisfied, q is strictly monotone and.. ˆT GIC is consistent (P. Teisseyre, JM (2015))

57 Relevance for selection of predictors (p < n) Order variables according to their residual deviances Dev f \{i1 } Dev f \{i2 }.. Dev f \{ip} and minimize GIC in the corresponding nested family. Then if (R) is satisfied, q is strictly monotone and.. ˆT GIC is consistent (P. Teisseyre, JM (2015)) For the case when η > 1 we are frequently better off when misspecifing the model then fitting the correct one...

58 Correct selection versus η CS M2 M3 M4 M η^

59 PSR versus η PSR M2 M3 M4 M η^

60 FDR versus η FDR M2 M3 M4 M η^

61 For normal predictors we have (M. Kubkowski, JM 2016) η = Eq (β 0 + β T X) Ep (β 0 + β T X) = Eq (β 0 + β T X) Ep (β 0 + ηβt X)

62 For normal predictors we have (M. Kubkowski, JM 2016) η = Eq (β 0 + β T X) Ep (β 0 + β T X) = Eq (β 0 + β T X) Ep (β 0 + ηβt X) (Y, X) follow logistic model and βlin is a projection on a linear model. Then β lin = Ep (β 0 + β T X)β i.e. direction β/ β of β can be recovered by fitting a linear model.

63 Some relevant papers P. Pokarowski, J.,Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection, Journal of Machine Learning Research, 2015 J., P. Teisseyre, What do we choose when we err? Model selection and testing for misspecified logistic model, Studies in Computational Intelligence, 2015 P. Ruud, Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models, Econometrica, 1983 W. Su, M. Bogdan and E. Candes, False discoveries occur early on the Lasso path, preprint T. Hastie, R. Tibshirani, M. Wainwright, Statistical Learning with Sparsity, CRC 2015

64 Machine Learning or Statistics?

65 Kilka prac z JMLR.. P. Bellec and A. Tsybakov, Sharp Oracle Bounds for Monotone and Convex Regression Through Aggregation, JMLR 2015 J. Jin and C-H. Zhang and Q. Zhang, Optimality of Graphlet Screening in High Dimensional Variable Selection, JMLR 2014 X. Li and T. Zhao and Yuan and H. Liu The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R, JMLR 2015 P. Pokarowski and J. Combined l1 and Greedy l0 Penalized Least Squares for Linear Model Selection, JMLR 2015 M. Tan and I. W. Tsang and L. Wang Towards Ultrahigh Dimensional Feature Selection for Big Data, JMLR 2014

66 Kilka prac z Annals of Statistics.. P. Sherwood and L. Wang, Partially linear additive quantile regression in ultra-high dimension, AS 2016 R. Barber and E. Candes Controlling the false discovery rate via knockoffs AS 2015 Y. Yang and S. Tokdar Minimax-optimal nonparametric regression in high dimensions, AS 2015 B. Jiang and J. S. Liu Variable selection for general index models via sliced inverse regression, AS 2014 J. Fan, L. Xue, and H. Zou Strong oracle optimality of folded concave penalized estimation, AS 2014

67 Most cited statistical papers (Pokarowski, 2015) 1.3 The Most Cited Statistical Papers Citations in the Web of Science KaplanMeier 58 JASA Total:39,4 CoxModel 72 JRSS B Total:29,2 FDR 95 JRSS B Total:18,7 EM 77 JRSS B Total:17,2 AIC 74 IEEE TransAutomContr Total:15,7 BIC 78 AnnStat Total:12,0 RandomForests 01 MachLearn Total:7,6 SVM 95 MachLearn Total:6,8 Lasso 96 JRSS B Total:5, Years

68 Computational considerations Lasso regularized path solution requires flops using LARS; Selection step requires O(np min(n, p)) ns 2, s = supp ˆβ L flops. Use QR decomposition. This follows since J is nested! Screening step is the most expensive in this and other algorithms (to be presented).

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection

Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection Journal of Machine Learning Research 16 2015) 961-992 Submitted 10/13; Revised 11/14; Published 5/15 Combined l 1 and Greedy l 0 Penalized Least Squares for Linear Model Selection Piotr Pokarowski Faculty

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Marginal Regression For Multitask Learning

Marginal Regression For Multitask Learning Mladen Kolar Machine Learning Department Carnegie Mellon University mladenk@cs.cmu.edu Han Liu Biostatistics Johns Hopkins University hanliu@jhsph.edu Abstract Variable selection is an important and practical

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Nonconvex penalties: Signal-to-noise ratio and algorithms

Nonconvex penalties: Signal-to-noise ratio and algorithms Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Sure Independence Screening

Sure Independence Screening Sure Independence Screening Jianqing Fan and Jinchi Lv Princeton University and University of Southern California August 16, 2017 Abstract Big data is ubiquitous in various fields of sciences, engineering,

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Tuning Parameter Selection in L1 Regularized Logistic Regression

Tuning Parameter Selection in L1 Regularized Logistic Regression Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information