COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso

Size: px
Start display at page:

Download "COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso"

Transcription

1 COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2

2 Fixed-design linear regression

3 Fixed-design linear regression A simplified fixed-design setting x (), x (2),..., x (n) R p assumed to be fixed i.e., not random; y (), y (2),..., y (n) are independent random variables, with E(y (i) ) = x (i), w, var(y (i) ) = σ 2. 3 / 2

4 Fixed-design linear regression A simplified fixed-design setting x (), x (2),..., x (n) R p assumed to be fixed i.e., not random; y (), y (2),..., y (n) are independent random variables, with y () y (2). y (n) E(y (i) ) = x (i), w, var(y (i) ) = σ 2. }{{} y R n = x () x (2). } x (n) {{ } X R n p w, w,2. w,p }{{} w R p + ε () ε (2). ε (n) }{{} ε R n where ε (), ε (2),..., ε (n) are independent, E(ε (i) ) = 0, and var(ε (i) ) = σ 2. 3 / 2

5 Fixed-design linear regression A simplified fixed-design setting x (), x (2),..., x (n) R p assumed to be fixed i.e., not random; y (), y (2),..., y (n) are independent random variables, with y () y (2). y (n) E(y (i) ) = x (i), w, var(y (i) ) = σ 2. }{{} y R n = x () x (2). } x (n) {{ } X R n p w, w,2. w,p }{{} w R p + ε () ε (2). ε (n) }{{} ε R n where ε (), ε (2),..., ε (n) are independent, E(ε (i) ) = 0, and var(ε (i) ) = σ 2. Want to find ŵ R p based on (x (), y () ), (x (2), y (2) ),..., (x (n), y (n) ) so that [ ] E n Xw Xŵ 2 2 is small (e.g., o() as function of n). 3 / 2

6 Ordinary least squares Recall: assuming X X is invertible, ŵ ols := (X X) X y. 4 / 2

7 Ordinary least squares Recall: assuming X X is invertible, ŵ ols := (X X) X y. Expressing Xŵ ols in terms of Xw : Xŵ ols 4 / 2

8 Ordinary least squares Recall: assuming X X is invertible, ŵ ols := (X X) X y. Expressing Xŵ ols in terms of Xw : Xŵ ols = X(X X) X (Xw + ε) 4 / 2

9 Ordinary least squares Recall: assuming X X is invertible, ŵ ols := (X X) X y. Expressing Xŵ ols in terms of Xw : Xŵ ols = X(X X) X (Xw + ε) = Xw + X(X X) X ε 4 / 2

10 Ordinary least squares Recall: assuming X X is invertible, ŵ ols := (X X) X y. Expressing Xŵ ols in terms of Xw : Xŵ ols = X(X X) X (Xw + ε) = Xw + X(X X) X ε = Xw + Πε where Π R n n is the orthogonal projection operator for ran(x). 4 / 2

11 Ordinary least squares Recall: assuming X X is invertible, ŵ ols := (X X) X y. Expressing Xŵ ols in terms of Xw : Xŵ ols = X(X X) X (Xw + ε) = Xw + X(X X) X ε = Xw + Πε where Π R n n is the orthogonal projection operator for ran(x). [ ] E n Xw Xŵ ols 2 2 = σ2 p n. 4 / 2

12 Ridge regression and principal components regression

13 Ridge regression Ordinary least squares only applies if X X is invertible, which is not possible if p > n. 6 / 2

14 Ridge regression Ordinary least squares only applies if X X is invertible, which is not possible if p > n. Ridge regression (Hoerl, 962): for λ > 0, ŵ λ := arg min w R p n y Xw λ w / 2

15 Ridge regression Ordinary least squares only applies if X X is invertible, which is not possible if p > n. Ridge regression (Hoerl, 962): for λ > 0, Gradient of objective is zero when ŵ λ := arg min w R p n y Xw λ w 2 2. (X X + nλi)w = X y, which always has a unique solution (since λ > 0): ŵ λ := (X X + nλi) X y. 6 / 2

16 Ridge regression Ordinary least squares only applies if X X is invertible, which is not possible if p > n. Ridge regression (Hoerl, 962): for λ > 0, Gradient of objective is zero when ŵ λ := arg min w R p n y Xw λ w 2 2. (X X + nλi)w = X y, which always has a unique solution (since λ > 0): ŵ λ := (X X + nλi) X y. 6 / 2

17 Ridge regression Ordinary least squares only applies if X X is invertible, which is not possible if p > n. Ridge regression (Hoerl, 962): for λ > 0, Gradient of objective is zero when ŵ λ := arg min w R p n y Xw λ w 2 2. (X X + nλi)w = X y, which always has a unique solution (since λ > 0): ( ) ( ) ŵ λ := n X X + λi n X y. 6 / 2

18 Ridge regression: geometry Ridge regression objective (as function of w) can be written as n X(w ŵ ols) λ w (stuff not depending on w) w 2 ŵ ols 0 Level sets for n X(w ŵ ols) 2 2 w Level sets for λ w / 2

19 Ridge regression: eigendecomposition Write eigendecomposition of n X X as n X X = p λ jv jv j where v, v 2,..., v p R p are orthonormal eigenvectors with corresponding eigenvalues λ λ 2... λ p 0. j= 8 / 2

20 Ridge regression: eigendecomposition Write eigendecomposition of n X X as n X X = p λ jv jv j where v, v 2,..., v p R p are orthonormal eigenvectors with corresponding eigenvalues λ λ 2... λ p 0. Eigenvectors v, v 2,..., v p comprise an orthonormal basis for R p. j= We ll look at ŵ ols and w in this basis: e.g., ŵ ols = p v j, ŵ ols v j. j= 8 / 2

21 Ridge regression: eigendecomposition Write eigendecomposition of n X X as n X X = p λ jv jv j where v, v 2,..., v p R p are orthonormal eigenvectors with corresponding eigenvalues λ λ 2... λ p 0. Eigenvectors v, v 2,..., v p comprise an orthonormal basis for R p. j= We ll look at ŵ ols and w in this basis: e.g., ŵ ols = p v j, ŵ ols v j. j= The inverse of n X X + λi has the form ( ) n X X + λi = p j= λ j + λ vjv j. 8 / 2

22 Ridge regression vs. ordinary least squares If ŵ ols exists, then 9 / 2

23 Ridge regression vs. ordinary least squares If ŵ ols exists, then ŵ λ = ( ) ( ) n X X + λi n X X ŵ ols 9 / 2

24 Ridge regression vs. ordinary least squares If ŵ ols exists, then ŵ λ = = ( ) ( ) n X X + λi n X X ŵ ols ( p )( p ) λ j + λ vjv j λ jv jv j j= j= ŵ ols 9 / 2

25 Ridge regression vs. ordinary least squares If ŵ ols exists, then ŵ λ = = = ( ) ( ) n X X + λi n X X ŵ ols ( p )( p ) λ j + λ vjv j λ jv jv j j= ( p j= λ j λ j + λ vjv j ) j= ŵ ols ŵ ols (by orthogonality) 9 / 2

26 Ridge regression vs. ordinary least squares If ŵ ols exists, then ŵ λ = = = = ( ) ( ) n X X + λi n X X ŵ ols ( p )( p ) λ j + λ vjv j λ jv jv j j= ( p j= p j= λ j λ j + λ vjv j ) j= ŵ ols λ j λ j + λ vj, ŵ ols v j. ŵ ols (by orthogonality) 9 / 2

27 Ridge regression vs. ordinary least squares If ŵ ols exists, then ŵ λ = = = = ( ) ( ) n X X + λi n X X ŵ ols ( p )( p ) λ j + λ vjv j λ jv jv j j= ( p j= p j= λ j λ j + λ vjv j ) j= ŵ ols λ j λ j + λ vj, ŵ ols v j. ŵ ols (by orthogonality) Interpretation: Shrink ŵ ols towards zero by λ j λ j factor in direction vj. +λ 9 / 2

28 Coefficient profile Horizontal axis: varying λ (large λ to left, small λ to right). Vertical axis: coefficient value in ŵ λ for eight different variables. 0 / 2

29 Ridge regression: theory Theorem: [ ] E n Xw Xŵ λ 2 2 p λ jλ = λ vj, w 2 (λ j + λ) 2 j= }{{} bias p. n λ j + λ j= }{{} variance + σ2 ( λj ) 2 / 2

30 Ridge regression: theory Theorem: [ ] E n Xw Xŵ λ 2 2 p λ jλ = λ vj, w 2 (λ j + λ) 2 j= }{{} bias p. n λ j + λ j= }{{} variance + σ2 ( λj ) 2 Corollary: since λ jλ/(λ j + λ) 2 /2 and p j= λj = tr( n X X), [ ] E n Xw Xŵ λ 2 2 λ w σ2 tr ( X X ) n. 2nλ / 2

31 Ridge regression: theory Theorem: [ ] E n Xw Xŵ λ 2 2 p λ jλ = λ vj, w 2 (λ j + λ) 2 j= }{{} bias p. n λ j + λ j= }{{} variance + σ2 ( λj ) 2 Corollary: since λ jλ/(λ j + λ) 2 /2 and p j= λj = tr( n X X), [ ] E n Xw Xŵ λ 2 2 For instance, using λ := [ ] E n Xw Xŵ λ 2 2 λ w tr( n X X) guarantees n + σ2 tr ( X X ) n. 2nλ w σ 2 tr ( X X ) n 2 n. / 2

32 Ridge regression: theory Theorem: [ ] E n Xw Xŵ λ 2 2 p λ jλ = λ vj, w 2 (λ j + λ) 2 j= }{{} bias p. n λ j + λ j= }{{} variance + σ2 ( λj ) 2 Corollary: since λ jλ/(λ j + λ) 2 /2 and p j= λj = tr( n X X), [ ] E n Xw Xŵ λ 2 2 For instance, using λ := [ ] E n Xw Xŵ λ 2 2 λ w tr( n X X) guarantees n + σ2 tr ( X X ) n. 2nλ w σ 2 tr ( X X ) n 2 n No explicit dependence on p. Corollary can be meaningful even if p =, as long as w 2 2 and tr ( n X X ) are finite.. / 2

33 Principal components ( keep or kill ) regression Ridge regression as shrinkage: p λ j ŵ λ = λ j + λ vj, ŵ ols v j. j= 2 / 2

34 Principal components ( keep or kill ) regression Ridge regression as shrinkage: p λ j ŵ λ = λ j + λ vj, ŵ ols v j. j= Another approach: principal components regression. Instead of shrinking ŵ ols in all directions, either keep ŵ ols in direction v j (if λ j λ), or kill ŵ ols in direction v j (if λ j < λ). p ŵ pc λ := {λ j λ} v j, ŵ ols v j. j= shrinkage factor ridge pc λ j 2 / 2

35 Principal components regression: theory Theorem: [ ] E n Xw Xŵ pc λ 2 2 = p {λ j < λ}λ j v j, w 2 j= } {{ } bias p {λ j λ} n j= }{{} variance + σ2 3 / 2

36 Principal components regression: theory Theorem: [ ] E n Xw Xŵ pc λ 2 2 = p {λ j < λ}λ j v j, w 2 j= }{{} bias [ ] 4 E n Xw Xŵ λ 2 2. p {λ j λ} n j= }{{} variance + σ2 3 / 2

37 Principal components regression: theory Theorem: [ ] E n Xw Xŵ pc λ 2 2 = p {λ j < λ}λ j v j, w 2 j= }{{} bias [ ] 4 E n Xw Xŵ λ 2 2. p {λ j λ} n j= }{{} variance + σ2 Should pick λ large enough so that n p {λ j λ} j= (effective dimension at cut-off point λ). 3 / 2

38 Principal components regression: theory Theorem: [ ] E n Xw Xŵ pc λ 2 2 = p {λ j < λ}λ j v j, w 2 j= }{{} bias [ ] 4 E n Xw Xŵ λ 2 2. p {λ j λ} n j= }{{} variance + σ2 Should pick λ large enough so that n p {λ j λ} j= (effective dimension at cut-off point λ). Never (much) worse than ridge regression; often substantially better. 3 / 2

39 Sparse regression and Lasso

40 Sparsity Another way to deal with p > n is to only consider sparse w i.e., w with only a small number ( p) of non-zero entries. 5 / 2

41 Sparsity Another way to deal with p > n is to only consider sparse w i.e., w with only a small number ( p) of non-zero entries. Other advantages of sparsity (especially relative to ridge/p.c.-regression): Sparse solutions more interpretable. Can be more efficient to evaluate x, w (both in terms of computing variable values and computing inner product). 5 / 2

42 Sparse regression methods For any T {, 2,..., p}, let ŵ(t ) := OLS only using variables in T. Subset selection Brute-force strategy. Pick the T {, 2,..., p} of size T = k for which is minimal, and return ŵ(t ). y Xŵ(T ) / 2

43 Sparse regression methods For any T {, 2,..., p}, let ŵ(t ) := OLS only using variables in T. Subset selection Brute-force strategy. Pick the T {, 2,..., p} of size T = k for which is minimal, and return ŵ(t ). y Xŵ(T ) 2 2 Gives you exactly what you want (for given value k). 6 / 2

44 Sparse regression methods For any T {, 2,..., p}, let ŵ(t ) := OLS only using variables in T. Subset selection Brute-force strategy. Pick the T {, 2,..., p} of size T = k for which is minimal, and return ŵ(t ). y Xŵ(T ) 2 2 Gives you exactly what you want (for given value k). Only feasible for very small k, since complexity scales with ( p k). (NP-hard optimization problem.) 6 / 2

45 Sparse regression methods Forward stepwise regression Greedy strategy. Starting with T =, repeat until T = k: Pick the j {, 2,..., p} \ T for which is minimal, and add this j to T. Return ŵ(t ). y Xŵ(T {j}) / 2

46 Sparse regression methods Forward stepwise regression Greedy strategy. Starting with T =, repeat until T = k: Pick the j {, 2,..., p} \ T for which is minimal, and add this j to T. Return ŵ(t ). y Xŵ(T {j}) 2 2 Gives you a k-sparse solution, no shrinkage within T. 7 / 2

47 Sparse regression methods Forward stepwise regression Greedy strategy. Starting with T =, repeat until T = k: Pick the j {, 2,..., p} \ T for which is minimal, and add this j to T. Return ŵ(t ). y Xŵ(T {j}) 2 2 Gives you a k-sparse solution, no shrinkage within T. Primarily only effective when columns of X are close to orthogonal. 7 / 2

48 Lasso (Tibshirani, 994) Lasso: least absolute shrinkage and selection operator ŵ lasso λ := arg min w R p n y Xw λ w. (Convex, though not differentiable.) 8 / 2

49 Lasso (Tibshirani, 994) Lasso: least absolute shrinkage and selection operator ŵ lasso λ := arg min w R p n y Xw λ w. (Convex, though not differentiable.) Objective (as function of w) can be written as n X(w ŵ ols) λ w + (stuff not depending on w) w 2 ŵ ols Level sets for n w X(w ŵ ols) Level sets for λ w 8 / 2

50 Coefficient profile Horizontal axis: varying λ (large λ to left, small λ to right). Vertical axis: coefficient value in ŵ λ for eight different variables. 9 / 2

51 Lasso: theory Many results, mostly roughly of the following flavor. Suppose w has k non-zero entries; X satisfies some special properties (typically not efficiently checkable); λ σ then 2 log(p) n ; [ ] E n Xw Xŵ lasso λ 2 2 ( σ 2 k log(p) O n ). 20 / 2

52 Lasso: theory Many results, mostly roughly of the following flavor. Suppose w has k non-zero entries; X satisfies some special properties (typically not efficiently checkable); λ σ then 2 log(p) n ; [ ] E n Xw Xŵ lasso λ 2 2 ( σ 2 k log(p) O n ). Very active subject of research; closely related to compressed sensing ; intersects with beautiful subject of high-dimensional convex geometry. 20 / 2

53 Recap Fixed-design setting for studying linear regression methods. Ridge and principal components regression make use of eigenvectors of n X X ( principal component directions ), and also corresponding eigenvalues. Ridge regression: shrink ŵ ols along principal component directions by amount related to eigenvalue and λ. Principal components regression: keep-or-kill ŵ ols along principal component directions, based on comparing eigenvalue to λ. Sparse regression: intractable, but some greedy strategies work. Lasso: shrink coefficients towards zero in a way that tends to lead to sparse solutions. 2 / 2

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

ORIE 4741: Learning with Big Messy Data. Regularization

ORIE 4741: Learning with Big Messy Data. Regularization ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA Gilbert Saporta CEDRIC, Conservatoire National des Arts et Métiers, Paris gilbert.saporta@cnam.fr With inputs from Anne Bernard, Ph.D student Industrial

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and 7.2 - Quadratic Forms quadratic form on is a function defined on whose value at a vector in can be computed by an expression of the form, where is an symmetric matrix. The matrix R n Q R n x R n Q(x) =

More information

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Eigenvalues and Eigenvectors A =

Eigenvalues and Eigenvectors A = Eigenvalues and Eigenvectors Definition 0 Let A R n n be an n n real matrix A number λ R is a real eigenvalue of A if there exists a nonzero vector v R n such that A v = λ v The vector v is called an eigenvector

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

Regression Retrieval Overview. Larry McMillin

Regression Retrieval Overview. Larry McMillin Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite, Data, and Information Service Washington, D.C. Larry.McMillin@noaa.gov Pick one

More information

Lecture 5: A step back

Lecture 5: A step back Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent

More information

Parameter Norm Penalties. Sargur N. Srihari

Parameter Norm Penalties. Sargur N. Srihari Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) 2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need

More information

Sparse analysis Lecture II: Hardness results for sparse approximation problems

Sparse analysis Lecture II: Hardness results for sparse approximation problems Sparse analysis Lecture II: Hardness results for sparse approximation problems Anna C. Gilbert Department of Mathematics University of Michigan Sparse Problems Exact. Given a vector x R d and a complete

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

MSA200/TMS041 Multivariate Analysis

MSA200/TMS041 Multivariate Analysis MSA200/TMS041 Multivariate Analysis Lecture 8 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Back to Discriminant analysis As mentioned in the previous

More information

Machine Learning for Biomedical Engineering. Enrico Grisan

Machine Learning for Biomedical Engineering. Enrico Grisan Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

COS 424: Interacting with Data

COS 424: Interacting with Data COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Linear Regression. Volker Tresp 2018

Linear Regression. Volker Tresp 2018 Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Lecture 17 Intro to Lasso Regression

Lecture 17 Intro to Lasso Regression Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Transductive Experiment Design

Transductive Experiment Design Appearing in NIPS 2005 workshop Foundations of Active Learning, Whistler, Canada, December, 2005. Transductive Experiment Design Kai Yu, Jinbo Bi, Volker Tresp Siemens AG 81739 Munich, Germany Abstract

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information