Prediction & Feature Selection in GLM
|
|
- Alicia Gregory
- 6 years ago
- Views:
Transcription
1 Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis with R University of Neuchatel, 10 May 2016 Prediction & Feature Selection in GLM Bernadetta Tarigan, Dr. sc. ETHZ Prediction and Feature Selection in GLM 1
2 Bug data Prediction and Feature Selection in GLM 2
3 Predicting number of bugs Goal : TO PREDICT 1. How to predict the number of bugs? 2. Are change-metrics better than source-metrics? 3. Should we combine them? 4. Are the predictors actually independent? 5. What is the best prediction model to use? 6. What are minimal metrics (from combined set) to make best prediction? 7. Are the above questions dependent on the project? 8. Finally, can we actually predict the number of bugs at all? Prediction and Feature Selection in GLM 3
4 Is it good? Term LS BestSubset Ridge ElasticNet Lasso (Intercept) numberofversionsuntil numberoffixesuntil numberofrefactoringsuntil numberofauthorsuntil linesaddeduntil maxlinesaddeduntil avglinesaddeduntil linesremoveduntil maxlinesremoveduntil avglinesremoveduntil maxcodechurnuntil avgcodechurnuntil agewithrespectto weightedagewithrespectto TestError Improvement 7% 38% 33% 30% Prediction and Feature Selection in GLM 4
5 Or is this better? Term Poisson BestSubset Ridge ElasticNet Lasso (Intercept) numberofversionsuntil numberoffixesuntil numberofrefactoringsuntil numberofauthorsuntil linesaddeduntil maxlinesaddeduntil avglinesaddeduntil linesremoveduntil maxlinesremoveduntil avglinesremoveduntil maxcodechurnuntil avgcodechurnuntil agewithrespectto weightedagewithrespectto TestError Improvement 11% 65% 94% 94% Prediction and Feature Selection in GLM 5
6 Review: Multiple Least Squares Regression 2 Y = f X + ε; ε random with E(ε) = 0, Var ε = σ ε Linear model: f X β T p X = β + j=1 β j X j Problem: to estimate the unknown β from the data Least squares estimates β ls arg min (y i β T X i ) 2 Matrix notation β ls = (X T X) 1 X T y n i=1 X is n p matrix with each row an input vector Y is n vector of the outputs Prediction and Feature Selection in GLM 6
7 Prediction is different from explanation Assume Y = f X + ε, E(ε) = 0, Var ε = σ ε 2 Suppose we have any estimator f X. Will f X fit future observations well? Prediction is different from explanation: Quality of a model is no longer measured by R 2 goodness-of-fit It is replaced by its generalization performance on the future observations This generalization performance, on a new input point X = x 0, is calculated by expected prediction error (EPE) EPE x 0 E Y f x 0 2 X = x0 EPE is also called out-of-sample error or test error Prediction and Feature Selection in GLM 7
8 Partition of the data into training and test sets Y = f X + ε; ε random with E(ε) = 0, Var ε = σ ε 2 Data Training Test EPE f ls = 1 n test y test X test β ls 2 2 Obtain a model f ls : β ls = arg min β R p y training X training β 2 2 Prediction and Feature Selection in GLM 8
9 Improving least squares fit with feature selection Why we are not satisfied with the least squares estimates: Prediction accuracy o LS estimates often have law bias but high variance o Prediction accuracy can sometimes be improved by shrinking or setting some coefficients to zero o By doing so we sacrifice a little bit of bias to reduce the variance of the predicted value, hence may improve the overall prediction accuracy Interpretative ability o With large number of predictors, often we d like to determine a smaller subset that exhibit the strongest effects o In order to get the big picture, we are willing to sacrifice some of the small details Also: least squares estimates is not defined when p > n To improve: automatically perform feature selection Subset selection methods Coefficient shrinkage methods (modern techniques) Prediction and Feature Selection in GLM 9
10 However Performing feature selection means reducing complexity of the model class There is an issue in bias-variance decomposition of EPE that depends on the model complexity model selection and bias-variance tradeoff We first see the bias-variance decomposition of EPE Then its relationship to model complexity Prediction and Feature Selection in GLM 10
11 Bias-Variance Decomposition of EPE EPE x 0 = E Y f x 0 2 X = x0 = σ ε 2 + f x 0 E(f x 0 ) 2 + Var(f x 0 ) = Irreducible Error + Bias 2 + Variance Variance of the target around its true mean f x 0 Cannot be avoided no matter how well we predict f x 0 unless σ ε 2 = 0 The difference btwn the average prediction of our model to the true unknown value we are trying to predict The variability of our model prediction for a given data point Note that EPE x 0 = σ ε 2 +MSE(f x 0 ), where MSE = mean squared error Prediction and Feature Selection in GLM 11
12 Graphical illustration of Bias and Variance the unknown target f(x 0 ) model prediction f x 0 Error due to Bias: the difference between the average prediction of our model to the true unknown value we are trying to predict. Error due to Variance: the variability of a model prediction for a given data point. Imagine you can build the entire model building process multiple times (you have multiple samples). The variance is how much the prediction of a given point vary btwn different realization of the model Prediction and Feature Selection in GLM 12
13 Bias, Variance and Model Complexity As the model f becomes more complex (more terms include) the bias will most likely decrease (local curvature can be picked up) However the variance would increase as more terms are included We of course would like to choose our model complexity to trade bias off with variance in such a way as to minimize the test error Prediction and Feature Selection in GLM 13
14 By the way Let f p x = β ls T, the least squares estimates (the parameter vector β is with p components) EPE x 0 = σ ε 2 + f x 0 E(f p x 0 ) 2 + Var(f p x 0 ) 1 n n i=1 x i = σ 2 ε + 1 n n 2 + p i=1 f(x i ) E(f x i σ2 ε Model complexity of least squares estimates is directly related to the number of parameters p Thus, the smaller p the smaller the variance but might increase the bias n Prediction and Feature Selection in GLM 14
15 Back to feature selection Recall: Why we are not satisfied with the least squares estimates: Prediction accuracy o often low bias but high variance o variance gets smaller when coefficients shrink toward zero o bias increases a bit but overall accuracy might improve Interpretative ability o with large number of predictors, often we d like to determine a smaller subset that exhibit the strongest effects o in order to get the big picture, we are willing to sacrifice some of the small details Also: least squares estimates is not defined when p > n To improve: automatically perform feature selection Two classes of method 1. Coefficient shrinkage methods (modern techniques) 2. Subset selection methods Prediction and Feature Selection in GLM 15
16 Shrinkage / penalized / regularization β = arg min β R p y Xβ 2 2 subject to R β < t or equivalently β = arg min β R p y Xβ λ R(β) R(β) is called regularizer (or penalty) on the complexity of the model The term λ is called tuning parameter, controlling the amount of regularization The larger λ the greater the amount of regularization/penalty λ shrinks the coefficient estimates toward 0 Note that regularization can be applied beyond regression, e.g., classification, clustering, principal component analysis, etc Prediction and Feature Selection in GLM 16
17 Ridge β = arg min β R p y Xβ λ β 2 2 shrinks the coefficients toward zero but not exactly zero so it doesn t do variable selection but it still outperforms least squares estimates for prediction goal and encourages grouping effect Prediction and Feature Selection in GLM 17
18 Lasso β = arg min β R p y Xβ λ β 1 shrinks some of the coefficients exactly zero so it does variable selection (sparse model), when p > n lasso picks max n variables but it typically fails to do group selection it tends to select one variable from a group and ignore the other Prediction and Feature Selection in GLM 18
19 Elastic net β = arg min β R p y Xβ λ 2 β λ 1 β 1 simply combines advantages of ridge and lasso methods Prediction and Feature Selection in GLM 19
20 Why shrinkage models works well? Elements of Statistical Learning, Hastie, Tibshirani, Friedman, 2 nd, 2009 Prediction and Feature Selection in GLM 20
21 Some more pictures Elements of Statistical Learning, Hastie, Tibshirani, Friedman, 2 nd, 2009 Prediction and Feature Selection in GLM 21
22 All great, but how to choose λ opt? Recall: We of course would like to choose our model complexity to trade bias off with variance in such a way as to minimize the test error But we have access only to training error Unfortunately training error is not a good estimate of the test error Cross Validation Elements of Statistical Learning, Hastie, Tibshirani, Friedman, 2 nd, 2009 Prediction and Feature Selection in GLM 22
23 Cross Validation Data Outer loop Training Test Inner loop Validation β λopt Prediction and Feature Selection in GLM 23
24 Shrinkage methods in R: glmnet package Glmnet is a package that fits a generalized linear model via penalized maximum likelihood The algorithm is extremely fast, and can exploit sparsity in the input matrix It fits linear, logistic and multinomial, poisson, and Cox regression models A variety of predictions can be made from the fitted models It can also fit multi-response linear regression glmnet solves β = arg min β R p y Xβ λ 1 α β 2 2 /2 + α β 1 the elastic net penalty is controlled by α α bridges the gap between lasso (α = 1, default) and ridge α = 0 cv.glmnet is the main function to do cross validation Prediction and Feature Selection in GLM 24
25 In this approach we retain only a subset of the variables, and eliminate the rest from the model Least squares regression is used to estimate the coefficients of the inputs that are retained There are a number of different strategies for choosing the subset Best subset regression Forward stepwise selection Backward stepwise selection Hybrid stepwise selection Prediction and Feature Selection in GLM 25
26 Best Subset Best subset regression finds for each k {1, 2,, p} the subset of size k that gives smallest residual sum of squares (RSS) There is an efficient algorithm leaps and bound procedure (Furnival and Wilson, 1974) makes this feasible for p as large as 30 or 40 This procedure is available in R with package bestglm The question of how to choose k involves the tradeoff between bias and variance and there are a number of criteria that one may use Typically we choose the model that minimizes an estimate of the expected prediction error (EPE) Prediction and Feature Selection in GLM 26
27 Stepwise selection Rather than search through all possible subsets (becomes infeasible for p much larger than 40), we can seek for a good path through them Forward stepwise selection Starts with the intercept (null model) and sequentially adds into the model one-at-a-time the predictor that improves most the fit Suppose current model has k inputs represented with estimates β and we add a predictor resulting in estimates β The improvement of fit is often based on the statistic RSS β RSS(β) F = RSS(β)/(n k 2) Strategy: add sequentially the predictor producing the largest value of F, stopping when no predictor produces an F-ratio greater than the 90 th or 95 th percentile of the F 1,n k 2 distribution Can be used even when p > n The only viable subset method when p is very large Prediction and Feature Selection in GLM 27
28 Stepwise selection (Cont.) Backward stepwise selection Starts with the full model containing all p predictors and sequentially deletes predictors producing the smallest fit Can be used only when p < n Hybrid stepwise selection Consider both forward and backward moves at each stage and make the best move Prediction and Feature Selection in GLM 28
29 Best subset or stepwise selection? Stepwise selection the F-ratio stopping rule provides only local control of the model search and does not attempt to find the best model along the sequence of models that it examines Best subset selection (all-subset selection) we can choose the model from the sequence that minimizes an estimate of expected prediction error When the goal is prediction: best subset is proper Prediction and Feature Selection in GLM 29
30 or Shrinkage? By retaining only a subset of the predictors and eliminate the rest from the model, subset selection produces a model that is interpretable and has a possibly lower prediction error than the full model It is a discrete process: variables are either retained or eliminated Therefor it often exhibits high variance, so doesn t reduce the prediction error of the full model. Shrinkage methods are more continuous, and doesn t suffer from high variability Prediction and Feature Selection in GLM 30
Linear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationDifferent types of regression: Linear, Lasso, Ridge, Elastic net, Ro
Different types of regression: Linear, Lasso, Ridge, Elastic net, Robust and K-neighbors Faculty of Mathematics, Informatics and Mechanics, University of Warsaw 04.10.2009 Introduction We are given a linear
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationVariable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods
Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods Ellen Sasahara Bachelor s Thesis Supervisor: Prof. Dr. Thomas Augustin Department of Statistics
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationMSG500/MVE190 Linear Models - Lecture 15
MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationCOMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso
COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified
More informationVariable Selection and Regularization
Variable Selection and Regularization Sanford Weisberg October 15, 2012 Variable Selection In a regression problem with p predictors, we can reduce the dimension of the regression problem in two general
More informationA simulation study of model fitting to high dimensional data using penalized logistic regression
A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationarxiv: v3 [stat.ml] 14 Apr 2016
arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationMachine Learning for Biomedical Engineering. Enrico Grisan
Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationPENALIZING YOUR MODELS
PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights
More informationTuning Parameter Selection in L1 Regularized Logistic Regression
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationSTK4900/ Lecture 5. Program
STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationLecture Data Science
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Regression Analysis JProf. Dr. Last Time How to find parameter of a regression model Normal Equation Gradient Decent
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationLECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning
LECTURE 10: LINEAR MODEL SELECTION PT. 1 October 16, 2017 SDS 293: Machine Learning Outline Model selection: alternatives to least-squares Subset selection - Best subset - Stepwise selection (forward and
More informationSparse Ridge Fusion For Linear Regression
University of Central Florida Electronic Theses and Dissertations Masters Thesis (Open Access) Sparse Ridge Fusion For Linear Regression 2013 Nozad Mahmood University of Central Florida Find similar works
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High
More informationLecture 7: Modeling Krak(en)
Lecture 7: Modeling Krak(en) Variable selection Last In both time cases, we saw we two are left selection with selection problems problem -- How -- do How we pick do we either pick either subset the of
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationHomework 1: Solutions
Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationRegularized Regression
Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More information