Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme

Size: px
Start display at page:

Download "Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme"

Transcription

1 Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 / 36

2 Outline 1. Linear Regression via Normal Equations 2. Minimizing a Function via Gradient Descent 3. Learning Linear Regression Models via Gradient Descent 4. Case Weights 2 / 36

3 1. Linear Regression via Normal Equations Outline 1. Linear Regression via Normal Equations 2. Minimizing a Function via Gradient Descent 3. Learning Linear Regression Models via Gradient Descent 4. Case Weights 2 / 36

4 1. Linear Regression via Normal Equations The Simple Linear Regression Problem Given a set D train := {(x 1, y 1 ), (x 2, y 2 ),..., (x N, y N )} R R called training data, compute the parameters ( ˆβ 0, ˆβ 1 ) of a linear regression function ŷ(x) := ˆβ 0 + ˆβ 1 x s.t. for a set D test R R called test set the test error is minimal. err(ŷ; D test ) := 1 D test (x,y) D test (y ŷ(x)) 2 Note: D test has (i) to be from the same data generating process and (ii) not to be available during training. 2 / 36

5 1. Linear Regression via Normal Equations The (Multiple) Linear Regression Problem Given a set D train := {(x 1, y 1 ), (x 2, y 2 ),..., (x N, y N )} R M R called training data, compute the parameters ( ˆβ 0, ˆβ 1,..., ˆβ M ) of a linear regression function ŷ(x) := ˆβ 0 + ˆβ 1 x ˆβ M x M s.t. for a set D test R M R called test set the test error is minimal. err(ŷ; D test ) := 1 D test (x,y) D test (y ŷ(x)) 2 Note: D test has (i) to be from the same data generating process and (ii) not to be available during training. 3 / 36

6 1. Linear Regression via Normal Equations Several predictors Several predictor variables x.,1, x.,2,..., x.,m : ŷ = ˆβ 0 + ˆβ 1 x.,1 + ˆβ 2 x.,2 + ˆβ M x.,m M =β 0 + ˆβ m x.,m m=1 with M + 1 parameters ˆβ 0, ˆβ 1,..., ˆβ M. 4 / 36

7 1. Linear Regression via Normal Equations Linear form Several predictor variables x.,1, x.,2,..., x.,m : M ŷ = ˆβ 0 + ˆβ m x.,m = ˆβ, x. m=1 where ˆβ := ˆβ 0 ˆβ 1. ˆβ M, x. := 1 x.,1. x.,m, Thus, the intercept is handled like any other parameter, for the artificial constant predictor x., / 36

8 1. Linear Regression via Normal Equations Simultaneous equations for the whole dataset For the whole dataset D train := {(x 1, y 1 ),..., (x N, y N )}: y ŷ := X ˆβ where y := y 1. y N, ŷ := ŷ 1. ŷ N, X := x 1. x N = x 1,1 x 1,2... x 1,M.... x N,1 x N,2... x N,M 6 / 36

9 1. Linear Regression via Normal Equations Least squares estimates Least squares estimates ˆβ minimize RSS( ˆβ, D train ) := N (y n ŷ n ) 2 = y ŷ 2 = y X ˆβ 2 n=1 The least squares estimates ˆβ are computed via normal equations X T X ˆβ = X T y Proof: y X ˆβ 2 = y X ˆβ, y X ˆβ (...) ˆβ = 2 X, y X ˆβ = 2(X T y X T X ˆβ)! = 0 7 / 36

10 1. Linear Regression via Normal Equations How to compute least squares estimates ˆβ Solve the M M system of linear equations X T X ˆβ = X T y i.e., Ax = b (with A := X T X, b = X T y, x = ˆβ). There are several numerical methods available: 1. Gaussian elimination 2. Cholesky decomposition 3. QR decomposition 8 / 36

11 1. Linear Regression via Normal Equations Learn Linear Regression via Normal Equations 1: procedure learn-linreg-normeq(d train := {(x 1, y 1 ),..., (x N, y N )}) 2: X := (x 1, x 2,..., x N ) T 3: y := (y 1, y 2,..., y N ) T 4: A := X T X 5: b := X T y 6: ˆβ := solve-sle(a, b) 7: return ˆβ 9 / 36

12 1. Linear Regression via Normal Equations Example Given is the following data: Predict a y value for x 1 = 3, x 2 = 4. x 1 x 2 y / 36

13 1. Linear Regression via Normal Equations Example / Simple Regression Models for Comparison ŷ = ˆβ 0 + ˆβ 1 x 1 ŷ = ˆβ 0 + ˆβ 2 x 2 = x 1 = x 2 y data model y data model x1 x2 ŷ(x 1 = 3) = 3.25 ŷ(x 2 = 4) = / 36

14 1. Linear Regression via Normal Equations Example Now fit to the data: ŷ = ˆβ 0 + ˆβ 1 x 1 + ˆβ 2 x 2 X = X T X = , y = x 1 x 2 y , X T y = / 36

15 1. Linear Regression via Normal Equations Example i.e., ˆβ = 1597/ / / / 36

16 1. Linear Regression via Normal Equations Example y x2 x / 36

17 1. Linear Regression via Normal Equations Example / Visualization of Model Fit To visually assess the model fit, a plot can be plotted: residuals ˆɛ := y ŷ vs. true values y y y^ y 15 / 36

18 2. Minimizing a Function via Gradient Descent Outline 1. Linear Regression via Normal Equations 2. Minimizing a Function via Gradient Descent 3. Learning Linear Regression Models via Gradient Descent 4. Case Weights 16 / 36

19 2. Minimizing a Function via Gradient Descent Gradient Descent Given a function f : R N R, find x with minimal f (x). Idea: start from a random x 0 and then improve step by step, i.e., choose x i+1 with f (x i+1 ) f (x i ) f(x) Choose the negative gradient f x (x i) as direction for descent, i.e., with a suitable step length α i > 0. x i+1 x i = α i f x (x i) x 16 / 36

20 2. Minimizing a Function via Gradient Descent Gradient Descent 1: procedure minimze-gd-fsl(f : R N R, x 0 R N, α R, i max N, ɛ R + ) 2: for i = 1,..., i max do 3: x i := x i 1 α f x (x i 1) 4: if f (x i 1 ) f (x i ) < ɛ then 5: return x i 6: error not converged in i max iterations x 0 start value α (fixed) step length / learning rate i max maximal number of iterations ɛ minimum stepwise improvement 17 / 36

21 2. Minimizing a Function via Gradient Descent Example Example: f (x) := x 2 f, x (x) = 2x, x 0 := 2, α i : 0.25 Then we compute iteratively: using f i x i x (x i) x i x i+1 = x i α n f x (x i) x f(x) x 2 x 3 x 1 x 0 18 / 36

22 2. Minimizing a Function via Gradient Descent Step Length Why do we need a step length? Can we set α n 1? The negative gradient gives a direction of descent only in an infinitesimal neighborhood of x n. Thus, the step length may be too large, and the function value of the next point does not decrease. f(x) x 1 x 0 Lars Schmidt-Thieme, Information Systems and 3 Machine 2 Learning 1 Lab 0 (ISMLL), 1 University 2 3 of Hildesheim, Germany 19 / 36

23 2. Minimizing a Function via Gradient Descent Step Length There are many different strategies to adapt the step length s.t. 1. the function value actually decreases and 2. the step length becomes not too small (and thus convergence slow) Armijo-Principle: α n := max{α {2 j j N 0 } f (x n α f x (x n)) f (x n ) αδ f x (x n), f x (x n) } with δ (0, 1). 20 / 36

24 2. Minimizing a Function via Gradient Descent Armijo Step Length 1: procedure steplength-armijo(f : R N R, x R N, d R N, δ (0, 1)) 2: α := 1 3: while f (x) f (x + αd) < αδd T d do 4: α = α/2 5: return α x last position d descend direction δ minimum steepness (δ 0: any step will do) 21 / 36

25 2. Minimizing a Function via Gradient Descent Gradient Descent 1: procedure minimize-gd(f : R N R, x 0 R N, α, i max N, ɛ R + ) 2: for i = 1,..., i max do 3: d := f x (x i 1) 4: α i := α(f, x i 1, d) 5: x i := x i 1 + α i d 6: if f (x i 1 ) f (x i ) < ɛ then 7: return x i 8: error not converged in i max iterations x 0 start value α step length function, e.g., steplength-armijo (with fixed δ). i max maximal number of iterations ɛ minimum stepwise improvement 22 / 36

26 2. Minimizing a Function via Gradient Descent Bold Driver Step Length [Bat89] A variant of the Armijo principle with memory: 1: procedure steplengthbolddriver(f : R N R, x R N, d R N, α old, α +, α (0, 1)) 2: α := α old α + 3: while f (x) f (x + αd) 0 do 4: α = αα 5: return α α old last step length α + step length increase factor, e.g., 1.1. α step length decrease factor, e.g., / 36

27 2. Minimizing a Function via Gradient Descent Simple Step Length Control in Machine Learning The Armijo and Bold Driver step lengths evaluate the objective function (including the loss) several times, and thus often are too costly and not used. But useful for debugging as they guarantee decrease in f. Constant step lengths α (0, 1) are frequently used. If chosen (too) small, the learning algorithm becomes slow, but usually still converges. The step length becomes a hyperparameter that has to be searched. Regimes of shrinking step lengths are used: α i := α i 1 γ, γ (0, 1) not too far from 1 If the initial step length α0 is too large, later iterations will fix it. If γ is too small, GD may get stuck before convergence. 24 / 36

28 2. Minimizing a Function via Gradient Descent How Many Minima can a Funciton have? In general, a function f can have several different minima. GD will find a random one (with small step lengths, usually one close to the starting point; local optimization method). 25 / 36

29 2. Minimizing a Function via Gradient Descent Convexity A function f : R N R is called convex if f (tx 1 + (1 t)x 2 ) tf (x 1 ) + (1 t)f (x 2 ), x 1, x 2 R N, t [0, 1] A two-times differentiable function is convex if its Hessian is positive semidefinite, i.e., ( x T 2 ) f x 0 x R N x i x j i=1,...,n,j=1,...,n For any matrix A R N M, the matrix A T A is positive semidefinite. 26 / 36

30 3. Learning Linear Regression Models via Gradient Descent Outline 1. Linear Regression via Normal Equations 2. Minimizing a Function via Gradient Descent 3. Learning Linear Regression Models via Gradient Descent 4. Case Weights 27 / 36

31 3. Learning Linear Regression Models via Gradient Descent Sparse Predictors Many problems have predictors x R M that are high-dimensional: M is large, and sparse: most x m are zero. For example, text regression: task: predict the rating of a customer review. predictors: a text about a product a sequence of words. can be represented as vector via bag of words: x m encodes the frequency of word m in a given text. dimensions 30,000-60,000 for English texts in short texts as reviews with a couple of hundred words, maximally a couple of hundred dimensions are non-zero. target: the customers rating of the product a (real) number. 27 / 36

32 3. Learning Linear Regression Models via Gradient Descent Sparse Predictors Dense Normal Equations Recall, the normal equations have dimensions M M. X T X ˆβ = X T y Even if X is sparse, generally X T X will be rather dense. (X T X ) m,l = X T.,mX.,l For text regression, (X T X ) m,l will be non-zero for every pair of words m, l that co-occurs in any text. Even worse, even if A := X T X is sparse, standard methods to solve linear systems (such as Gaussian elimination, LR decomposition etc.) do not take advantage. 28 / 36

33 3. Learning Linear Regression Models via Gradient Descent Learn Linear Regression via Loss Minimization Alternatively to learning a linear regression model via solving the linear normal equation system one can minimize the loss directly: When computing f and f ˆβ, f ( ˆβ) := ˆβ T X T X ˆβ 2y T X ˆβ + y T y = (y X ˆβ) T (y X ˆβ) f ˆβ ( ˆβ) = 2(X T y X T X ˆβ) = 2X T (y X ˆβ) avoid computing (dense) X T X. always compute (sparse) X times a (dense) vector. 29 / 36

34 3. Learning Linear Regression Models via Gradient Descent Objective Function and Gradient as Sums over Instances f ( ˆβ) := (y X ˆβ) T (y X ˆβ) T = = N (y n xn T ˆβ) 2 n=1 N ɛ 2 n, n=1 f ˆβ ( ˆβ) = 2X T (y X ˆβ) = 2 = 2 N (y n xn T ˆβ)x n n=1 N ɛ n x n n=1 ɛ n := y n x T n ˆβ 30 / 36

35 3. Learning Linear Regression Models via Gradient Descent Learn Linear Regression via Loss Minimization: GD 1: procedure learn-linreg- GD(D train := {(x 1, y 1 ),..., (x N, y N )}, α, i max N, ɛ R + ) 2: X := (x 1, x 2,..., x N ) T 3: y := (y 1, y 2,..., y N ) T 4: ˆβ 0 := (0,..., 0) 5: ˆβ := minimize-gd( f ( ˆβ) := (y X ˆβ) T (y X ˆβ), ˆβ 0, α, i max, ɛ) 6: return ˆβ 31 / 36

36 4. Case Weights Outline 1. Linear Regression via Normal Equations 2. Minimizing a Function via Gradient Descent 3. Learning Linear Regression Models via Gradient Descent 4. Case Weights 32 / 36

37 4. Case Weights Cases of Different Importance Sometimes different cases are of different importance, e.g., if their measurements are of different accurracy or reliability. Example: assume the left most point is known to be measured with lower reliability. Thus, the model does not need to fit to this point equally as well as it needs to do to the other points. I.e., residuals of this point should get lower weight than the others. y data model x 32 / 36

38 4. Case Weights Case Weights In such situations, each case (x n, y n ) is assigned a case weight w n 0: the higher the weight, the more important the case. cases with weight 0 should be treated as if they have been discarded from the data set. Case weights can be managed as an additional pseudo-variable w in implementations. 33 / 36

39 4. Case Weights Weighted Least Squares Estimates Formally, one tries to minimize the weighted residual sum of squares with N w n (y n ŷ n ) 2 = W 1 2 (y ŷ) 2 n=1 W := w 1 0 w w n The same argument as for the unweighted case results in the weighted least squares estimates X T WX ˆβ = X T Wy 34 / 36

40 4. Case Weights Weighted Least Squares Estimates / Example To downweight the left most point, we assign case weights as follows: w x y y data model (w. weights) model (w./o. weights) x 35 / 36

41 4. Case Weights Summary For regression, linear models of type ŷ = x T ˆβ can be used to predict a quantitative y based on several (quantitative) x. A bias term can be modeled as additional predictor that is constant 1. The ordinary least squares estimates (OLS) are the parameters with minimal residual sum of squares (RSS). OLS estimates can be computed by solving the normal equations X T X ˆβ = X T y as any system of linear equations via Gaussian elimination. Alternatively, OLS estimates can be computed iteratively via Gradient Descent. Especially for high-dimensional, sparse predictors GD is advantageous as it never has it never has to compute the large, dense X T X. Case weights can be handled seamlessly by both methods to model different importance of cases. 36 / 36

42 Further Readings [JWHT13, chapter 3], [Mur12, chapter 7], [HTFF05, chapter 3]. 37 / 36

43 References Roberto Battiti. Accelerated backpropagation learning: Two optimization methods. Complex systems, 3(4): , Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. The elements of statistical learning: data mining, inference and prediction, volume Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Springer, Kevin P. Murphy. Machine learning: a probabilistic perspective. The MIT Press, / 36

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Modern Optimization Techniques

Modern Optimization Techniques Modern Optimization Techniques 2. Unconstrained Optimization / 2.2. Stochastic Gradient Descent Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Modern Optimization Techniques

Modern Optimization Techniques Modern Optimization Techniques Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Stochastic Gradient Descent Stochastic

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Machine Learning. 1. Linear Regression

Machine Learning. 1. Linear Regression Machine Learning Machine Learning 1. Linear Regression Lars Schmidt-Thieme Information Sstems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Sstems & Institute for Computer

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

arxiv: v3 [stat.ml] 14 Apr 2016

arxiv: v3 [stat.ml] 14 Apr 2016 arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Advanced Topics in Machine Learning

Advanced Topics in Machine Learning Advanced Topics in Machine Learning 1. Learning SVMs / Primal Methods Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany 1 / 16 Outline 10. Linearization

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

More information

Bias-Variance Decomposition. Mohammad Emtiyaz Khan EPFL Oct 6, 2015

Bias-Variance Decomposition. Mohammad Emtiyaz Khan EPFL Oct 6, 2015 Bias-Variance Decomposition Mohammad Emtiyaz Khan EPFL Oct 6, 2015 Mohammad Emtiyaz Khan 2015 Motivation In ridge regression, we observe a typical behaviour for train and test errors with respect to model

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Logistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015

Logistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Logistic Regression Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Mohammad Emtiyaz Khan 2015 Classification with linear regression We can use y = 0 for C 1 and y = 1 for C 2 (or vice-versa), and simply use least-squares

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16 COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Convex Optimization for High-Dimensional Portfolio Construction

Convex Optimization for High-Dimensional Portfolio Construction Convex Optimization for High-Dimensional Portfolio Construction R/Finance 2017 R. Michael Weylandt 2017-05-20 Rice University In the beginning... 2 The Markowitz Problem Given µ, Σ minimize w T Σw λµ T

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang Machine Learning Lecture 04: Logistic and Softmax Regression Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

x k+1 = x k + α k p k (13.1)

x k+1 = x k + α k p k (13.1) 13 Gradient Descent Methods Lab Objective: Iterative optimization methods choose a search direction and a step size at each iteration One simple choice for the search direction is the negative gradient,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

LINEAR REGRESSION, RIDGE, LASSO, SVR

LINEAR REGRESSION, RIDGE, LASSO, SVR LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

Bootstrap & Confidence/Prediction intervals

Bootstrap & Confidence/Prediction intervals Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

MLCC 2017 Regularization Networks I: Linear Models

MLCC 2017 Regularization Networks I: Linear Models MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Neural Networks: Optimization & Regularization

Neural Networks: Optimization & Regularization Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg

More information

Fundamentals of Machine Learning (Part I)

Fundamentals of Machine Learning (Part I) Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

Conditional Random Fields for Sequential Supervised Learning

Conditional Random Fields for Sequential Supervised Learning Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives Paul Grigas May 25, 2016 1 Boosting Algorithms in Linear Regression Boosting [6, 9, 12, 15, 16] is an extremely

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

Planning and Optimal Control

Planning and Optimal Control Planning and Optimal Control 1. Markov Models Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 / 22 Syllabus Tue.

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Variable Selection in Data Mining Project

Variable Selection in Data Mining Project Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Gradient Descent. Mohammad Emtiyaz Khan EPFL Sep 22, 2015

Gradient Descent. Mohammad Emtiyaz Khan EPFL Sep 22, 2015 Gradient Descent Mohammad Emtiyaz Khan EPFL Sep 22, 201 Mohammad Emtiyaz Khan 2014 1 Learning/estimation/fitting Given a cost function L(β), we wish to find β that minimizes the cost: min β L(β), subject

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information