STK-IN4300 Statistical Learning Methods in Data Science

Size: px
Start display at page:

Download "STK-IN4300 Statistical Learning Methods in Data Science"

Transcription

1 Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input Directions Principal Component Regression Partial Least Squares Shrinkage Methods Ridge Regression STK-I4300: lecture 3 1/ 42 STK-I4300: lecture 3 2/ 42 Cross-Validation: k-fold cross-validation The cross-validation aims at estimating the expected test error, Err ErLpY, ˆfpXqqs. with enough data, we can split them in a training and test set; since usually it is not the case, we mimic this split by using the limited amount of data we have, split data in K folds F1,..., F K, approximatively same size; use, in turn, K 1 folds to train the model (derive ˆf k pxq); evaluate the model in the remaining fold, CV p ˆf k q 1 ÿ Lpy i, F k ˆf k px i qq ipf k estimate the expected test error as an average, CV p ˆfq 1 Kÿ 1 ÿ Lpy i, K F k 1 k ˆf k px i qq Fk K 1 ÿ Lpy i, ˆf k px i qq. ipf k STK-I4300: lecture 3 3/ 42 Cross-Validation: k-fold cross-validation (figure from STK-I4300: lecture 3 4/ 42

2 Cross-Validation: choice of K Cross-Validation: choice of K How to choose K? there is no a clear solution; bias-variance trade-off: smaller the K, smaller the variance (but larger bias); larger the K, smaller the bias (but larger variance); extreme cases: K 2, half observations for training, half for testing; K, leave-one-out cross-validation (LOOCV); LOOCV estimates the expected test error approximatively unbiased; LOOCV has very large variance (the training sets are very similar to one another); usual choices are K 5 and K 10. STK-I4300: lecture 3 5/ 42 STK-I4300: lecture 3 6/ 42 Cross-Validation: further aspects If we want to select a tuning parameter (e.g., no. of neighbours) train ˆf k px, αq for each α; compute CV p ˆf, αq 1 K ř K k 1 obtain ˆα argmin α CV p ˆf, αq. The generalized cross-validation (GCV), GCV p ˆfq 1 1 ř F k ipf k Lpy i, ˆf k px i, αqq; «ÿ y i ˆfpx i q 1 tracepsq{ is a convenient approximation of LOOCV for linear fitting under square loss; has computational advantages. STK-I4300: lecture 3 7/ 42 ff 2 Cross-Validation: the wrong and the right way to do cross-validation Consider the following procedure: 1. find a subset of good (= most correlated with the outcome) predictors; 2. use the selected predictors to build a classifier; 3. use cross-validation to compute the prediction error. Practical example (see R file): generated X, an r 50s ˆ rp 5000s data matrix; generate independently y i, i 1,..., 50, y i P t0, 1u; the true error test is 0.50; implementing the procedure above. What does it happen? STK-I4300: lecture 3 8/ 42

3 Cross-Validation: the wrong and the right way to do cross-validation Why it is not correct? Training and test sets are OT independent! observations on the test sets are used twice. Correct way to proceed: divide the sample in K folds; both perform variable selection and build the classifier using observations from K 1 folds; possible choice of the tuning parameter included; compute the prediction error on the remaining fold. STK-I4300: lecture 3 9/ 42 Bootstrap Methods: bootstrap IDEA: generate pseudo-samples from the empirical distribution function computed on the original sample; by sampling with replacement from the original dataset; mimic new experiments. Suppose Z tpx loomoon 1, y 1 q,..., py looomooon, x qu be the training set: z 1 z by sampling with replacement, Z 1 tpy 1 looomooon, x 1q,..., py, looomooon x q u; by sampling with replacement, Z B tpy 1 looomooon, x 1q,..., py, looomooon x q u; use the B bootstrap samples Z 1,..., Z B to estimate any aspect of the distribution of a map SpZq. STK-I4300: lecture 3 10/ 42 z 1 z 1 z z Bootstrap Methods: bootstrap Bootstrap Methods: bootstrap For example, to estimate the variance of SpZq, xvarrspzqs 1 B 1 where S 1 ř B B b 1 SpZ b q. ote that: Bÿ b 1 pspz b q S q 2 x VarrSpZqs is the Monte Carlo estimate of VarrSpZqs under sampling from the empirical distribution ˆF. STK-I4300: lecture 3 11/ 42 STK-I4300: lecture 3 12/ 42

4 Very simple: Bootstrap Methods: estimate prediction error generate B bootstrap samples Z 1,..., Z B ; apply the prediction rule to each bootstrap sample to derive the predictions ˆf b px iq, b 1,..., B; compute the error for each point, and take the average, Is it correct? O!!! xerr boot 1 B Bÿ b 1 1 ÿ Lpy i, ˆf b px iqq. Again, training and test set are OT independent! Bootstrap Methods: example Consider a classification problem: two classes with the same number of observations; predictors and class label independent ñ Err 0.5. Using the 1-nearest neighbour: if y i P Z b Ñ Err ˆ 0; if y i R Z b Ñ Err ˆ 0.5; Therefore, xerr boot 0 ˆ PrrY i P Z b s ` 0.5 ˆ PrrY i R Z b s looooomooooon STK-I4300: lecture 3 13/ 42 STK-I4300: lecture 3 14/ 42 Bootstrap Methods: why Bootstrap Methods: correct estimate prediction error Prrobservation i does not belong to the boostrap sample bs Since PrrZ brs y is 1, is true for each position rs, then Consequently, ˆ 1 PrrY i R Z b s Ñ8 ÝÝÝÝÑ e 1 «0.368, Prrobservation i is in the boostrap sample bs « ote: each bootstrap sample has observations; some of the original observations are included more than once; some of them (in average, 0.368) are not included at all; these are not used to compute the predictions; they can be used as a test set, xerr p1q 1 ÿ 1 C r is ÿ bpc r is Lpy i, ˆf b px iqq where C r is is the set of indeces of the bootstrap samples which do not contain the observation i and C r is denotes its cardinality. STK-I4300: lecture 3 15/ 42 STK-I4300: lecture 3 16/ 42

5 Bootstrap Methods: bootstrap Issue: the average number of unique observations in the bootstrap sample is Ñ not so far from 0.5 of 2-fold CV; similar bias issues of 2-fold CV; x Err p1q slightly overestimates the prediction error. To solve this, the bootstrap estimator has been developed, xerr p0.632q Ďerr ` x Err p1q in practice it works well; in case of strong overfit, it can break down; consider again the previous classification problem example; with 1-nearest neighbour, Ďerr 0; x Err p0.632q x Err p1q ˆ STK-I4300: lecture 3 17/ 42 Bootstrap Methods: bootstrap Further improvement, bootstrap: based on the no-information error rate γ; γ takes into account the amount of overfitting; γ is the error rate if predictors and response were independent; computed by considering all combinations of x i and y i, ˆγ 1 ÿ 1 ÿ Lpy i, ˆfpx i 1qq. STK-I4300: lecture 3 18/ 42 i 1 1 Bootstrap Methods: bootstrap Methods using Derived Input Directions: summary The quantity ˆγ is used to estimate the relative overfitting rate, Err ˆR x p1q Ďerr, ˆγ Ďerr which is then use in the bootstrap estimator, xerr p0.632`q p1 ŵq Ďerr ` ŵ x Err p1q, Principal Components Regression Partial Least Squares where ŵ ˆR. STK-I4300: lecture 3 19/ 42 STK-I4300: lecture 3 20/ 42

6 Principal Component Regression: singular value decomposition Consider the singular value decomposition (SVD) of the ˆ p (standardized) input matrix X, where: X UDV T U is the ˆ p orthogonal matrix whose columns span the column space of X; D is a p ˆ p diagonal matrix, whose diagonal entries d 1 ě d 2 ě ě d p ě 0 are the singular values of X; V is the p ˆ p orthogonal matrix whose columns span the row space of X. STK-I4300: lecture 3 21/ 42 Principal Component Regression: principal components Simple algebra leads to X T X V D 2 V T, the eigen decomposition of X T X (and, up to a constant, of the sample covariance matrix S X T X{). Using the eigenvectors v (columns of V ), we can define the principal components of X, z Xv. the first principal component z 1 has the largest sample variance (among all linear combinations of the columns of X); Varpz 1 q VarpXv 1 q d2 1 since d 1 ě ě d p ě 0, then Varpz 1 q ě ě Varpz p q. STK-I4300: lecture 3 22/ 42 Principal Component Regression: principal components Principal Component Regression: principal components Principal component regression (PCR): use M ď p principal components as input; regress y on z 1,..., z M ; since the principal components are orthogonal, Mÿ ŷ pcr pmq ȳ ` ˆθ m z m, m 1 where ˆθ m xz m, yy{xz m, z m y. Since z m are linear combinations of x, Mÿ ˆβ pcr pmq ˆθ m v m. m 1 STK-I4300: lecture 3 23/ 42 STK-I4300: lecture 3 24/ 42

7 Principal Component Regression: remarks Partial Least Squares: idea ote that: PCR can be used in high-dimensions, as long as M ă n; idea: remove the directions with less information; if M, ˆβ pcr pmq ˆβ OLS ; M is a tuning parameter, may be chosen via cross-validation; shrinkage effect (clearer later); Partial least square (PLS) is based on an idea similar to PCR: construct a set of linear combinations of X; PCR only uses X, ignoring y; in PLS we want to also consider the information on y; as for PCR, it is important to first standardize X. principal component are scale dependent, it is important to standardize X! STK-I4300: lecture 3 25/ 42 STK-I4300: lecture 3 26/ 42 Partial Least Squares: algorithm Partial Least Squares: step by step 1. standardize each x, set ŷ r0s ȳ and x r0s x ; 2. For m 1, 2,..., p, (a) z m ř p, with ˆϕ m xx rm 1s, yy; 1 ˆϕ mx rm 1s (b) ˆθ m xz m, yy{xz m, z m y; (c) ŷ rms ŷ rm 1s ` ˆθz m ; (d) orthogonalize each x rm 1s with respect to z m, x rms x rm 1s xzm, x rm 1s xz m, z m y 3. output the sequence of fitted vectors tŷ rms u p 1. y z m, 1,..., p; First step: (a) compute the first PLS direction, z 1 ř p 1 ˆϕ 1x, based on the relation between each x and y, ˆϕ 1 xx, yy; (b) estimate the related regression coefficient, ˆθ 1 xz 1,yy (c) model after the first iteration: ŷ r1s ȳ ` ˆθ 1 z 1 ; (d) orthogonalize x 1,..., x p w.r.t. z 1, x r2s x We are now ready for the second step... xz 1,z 1 y Ěz 1y Ďz 1 2 xz1,x y xz 1,z 1 y z 1 ; ; STK-I4300: lecture 3 27/ 42 STK-I4300: lecture 3 28/ 42

8 Partial Least Squares: step by step Partial Least Squares: PLS versus PCR... using x r2s instead of x : (a) compute the second PLS direction, z 2 ř p 1 ˆϕ 2x r2s, based on the relation between each x r2s and y, ˆϕ 2 xx r2s, yy; (b) estimate the related regression coefficient, ˆθ 2 xz 2,yy xz 2,z 2 y ; (c) model after the second iteration: ŷ r2s ȳ ` ˆθ 1 z 1 ` ˆθ 2 z 2 ; (d) orthogonalize x r2s 1,..., xr2s p w.r.t. z 2, x r2s x r2s z 2 ; ˆ xz 2,x r2s y xz 2,z 2 y and so on, until the M ď p step Ñ M derived inputs. Differences: PCR the derived input directions are the principal components of X, constructed by looking at the variability of X; PLS the input directions take into consideration both the variability of X and the correlation between X and y. Mathematically: PCR max α VarpXαq, s.t. α 1 and α T Sv l 0, l 1,..., M 1; PLS max α Cor 2 py, XαqVarpXαq, s.t. α 1 and α T Sϕ l ă M. In practice, the variance tends to dominate Ñ similar results! STK-I4300: lecture 3 29/ 42 STK-I4300: lecture 3 30/ 42 Ridge Regression: historical notes When two predictors are strongly correlated Ñ collinearity; in the extreme case of linear dependency Ñ super-collinearity; in the case of super-collinearity, X T X is not invertible (not full rank); Hoerl & Kennard (1970): X T X Ñ X T X ` λi p, where λ ą 0 and I p With λ ą 0, px T X ` λi p q 1 exists. Ridge Regression: estimator Substituting X T X with X T X ` λi p in the LS estimator, ˆβ ridge pλq px T X ` λi p q 1 X T y. Alternatively, the ridge estimator can be seen as the minimizer of ÿ py i β 0 subect to ř p 1 β2 ď t. Which is the same as pÿ β x i q 2, 1 ˆβ ridge pλq argmin β # ÿ py i β 0 pÿ β x i q 2 ` λ 1 pÿ β STK-I4300: lecture 3 31/ 42 STK-I4300: lecture 3 32/ 42

9 Ridge Regression: visually Ridge Regression: visually STK-I4300: lecture 3 33/ 42 STK-I4300: lecture 3 34/ 42 Ridge Regression: remarks Ridge Regression: bias ote: ridge solution is not equivariant under scaling Ñ X must be standardized before applying the minimizer; the intercept is not involved in the penalization; Bayesian interpretation: Yi pβ 0 ` x T i β, σ2 q; β p0, τ 2 q; λ σ 2 {τ 2 ; ˆβridge pλq as the posterior mean. STK-I4300: lecture 3 35/ 42 Er ˆβ ridge pλqs ErpX T X ` λi q 1 p X T ys ErpI p ` λpx T Xq 1 q 1 px loooooooomoooooooon T Xq 1 X T ys T Xq 1 q 1 loooooooooooomoooooooooooon pi p ` λpx Er ˆβ LS s w λ w λ β ùñ Er ˆβ ridge pλqs β for λ ą 0. λ Ñ 0, Er ˆβ ridge pλqs Ñ β; λ Ñ 8, Er ˆβ ridge pλqs Ñ 0 (without intercept); due to correlation, λ a ą λ b œ ˆβ ridge pλq ą ˆβ ridge pλq. STK-I4300: lecture 3 36/ 42 ˆβ LS

10 Ridge Regression: variance Consider the variance of the ridge estimator, Varr ˆβ ridge pλqs Varrw λ ˆβLS s w λ Varr ˆβ LS sw T λ σ 2 w λ px T Xq 1 w T λ. Then, Varr ˆβ LS s Varr ˆβ ridge pλqs σ 2 px T Xq 1 w λ px T Xq 1 wλ T σ 2 w λ pip ` λpx T Xq 1 qpx T Xq 1 pi p ` λpx T Xq 1 q T px T Xq 1 w T λ σ 2 w λ ppx T Xq 1 ` 2λpX T Xq 2 ` λ 2 px T Xq 3 q px T Xq 1 w T λ σ 2 w λ 2λpX T Xq 2 ` λ 2 px T Xq 3 q w T λ ą 0 (since all terms are quadratic and therefore positive) Ridge Regression: degrees of freedom ote that the ridge solution is a linear combination of y, as the least squares one: ŷ LS XpX loooooooomoooooooon T Xq 1 X T y ÝÑ df tracephq p; H ŷ ridge XpX T X ` λi p q 1 X T looooooooooooomooooooooooooon H λ y ÝÑ dfpλq traceph λ q; tracephλ q ř p d 2 d 2 `λ; d is the diagonal element of D in the SVD of X; λ Ñ 0, dfpλq Ñ p; λ Ñ 8, dfpλq Ñ 0. ùñ Varr ˆβ ridge pλqs ĺ Varr ˆβ LS s STK-I4300: lecture 3 37/ 42 STK-I4300: lecture 3 38/ 42 Ridge Regression: more about shrinkage Ridge Regression: more about shrinkage Recall the SVD decomposition X UDV T, and the properties U T U I p V T V. ˆβ LS px T Xq 1 X T y ŷ LS X ˆβ LS pv DU T UDV T q 1 V DU T y UDV T V D 2 DU T y pv D 2 V T q 1 V DU T y UDD 2 DU T y V D 2 V T V DU T y UU T y V D 2 DU T y ˆβ ridge px T X ` λi p q 1 X T y pv DU T UDV T ` λi p q 1 V DU T y pv D 2 V T ` λv V T q 1 V DU T y V pd 2 ` λi p q 1 V T V DU T y V pd 2 ` λi p q 1 U T y So: ŷ ridge X ˆβ ridge UDV T V pd 2 ` λi p q 1 U T y UV T V D 2 pd 2 ` λi p q 1 U T y U D 2 pd 2 ` λi p q 1 looooooooomooooooooon U T y pÿ 1 d 2 d 2 ` λ small singular values d correspond to directions of the column space of X with low variance; ridge regression penalizes the most these directions. STK-I4300: lecture 3 39/ 42 STK-I4300: lecture 3 40/ 42

11 Ridge Regression: more about shrinkage References I Hoerl, A. E. & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, (picture from STK-I4300: lecture 3 41/ 42 STK-I4300: lecture 3 42/ 42

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 2 1/ 38 Outline of the lecture STK-IN4300 - Statistical Learning Methods in Data Science Linear

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture Linear Methods for Regression Linear Regression Models and Least Squares Subset selection STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no AdaBoost Introduction algorithm Statistical Boosting Boosting as a forward stagewise additive

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Theoretical Exercises Statistical Learning, 2009

Theoretical Exercises Statistical Learning, 2009 Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) 2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is

More information

System Identification

System Identification System Identification Lecture : Statistical properties of parameter estimators, Instrumental variable methods Roy Smith 8--8. 8--8. Statistical basis for estimation methods Parametrised models: G Gp, zq,

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern

More information

Focused fine-tuning of ridge regression

Focused fine-tuning of ridge regression Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009 Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Lecture Notes 6: Linear Models

Lecture Notes 6: Linear Models Optimization-based data analysis Fall 17 Lecture Notes 6: Linear Models 1 Linear regression 1.1 The regression problem In statistics, regression is the problem of characterizing the relation between a

More information

Draft. Lecture 14 Eigenvalue Problems. MATH 562 Numerical Analysis II. Songting Luo. Department of Mathematics Iowa State University

Draft. Lecture 14 Eigenvalue Problems. MATH 562 Numerical Analysis II. Songting Luo. Department of Mathematics Iowa State University Lecture 14 Eigenvalue Problems Songting Luo Department of Mathematics Iowa State University MATH 562 Numerical Analysis II Songting Luo ( Department of Mathematics Iowa State University[0.5in] MATH562

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2) This is my preperation notes for teaching in sections during the winter 2018 quarter for course CSE 446. Useful for myself to review the concepts as well. More Linear Algebra Definition 1.1 (Dot Product).

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Summary In RLS, the Tikhonov minimization problem boils down to solving a linear system (and this

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms De La Fuente notes that, if an n n matrix has n distinct eigenvalues, it can be diagonalized. In this supplement, we will provide

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

Improving ridge regression via model selection and focussed fine-tuning

Improving ridge regression via model selection and focussed fine-tuning Università degli Studi di Milano-Bicocca SCUOLA DI ECONOMIA E STATISTICA Corso di Laurea Magistrale in Scienze Statistiche ed Economiche Tesi di laurea magistrale Improving ridge regression via model selection

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information