Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.
|
|
- Lawrence Palmer
- 5 years ago
- Views:
Transcription
1 Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection CN700/March 4, 2008 Satyavarta Auditory Neuroscience Laboratory, Department of Cognitive and Neural Systems, Boston University, Boston MA 02215
2 Outline: Model Assessment and Selection Choosing Model Complexity Model Assessment Other Loss Functions Arriving at a Model Comparing Model Classes Choosing Model Complexity Bias Variance Decomposition Choosing Model Complexity Bias Variance Decomposition: Special Cases Bias and Variance Bias-Variance Tradeoff in Model Complexity Bias-Variance with 0-1 Loss model selection (ch ) slide 2 of 27
3 Bias-Variance with 0-1 Loss: knn Bias-Variance Tradeoff with 0-1 Loss: Regression Optimism AIC in model selection Estimates of Model Complexity: # of Parameters Estimates of Model Complexity: Vapnik Chernovenkis Dimension Shatter Example: Error of models picked by criteria relative to best model References model selection (ch ) slide 2 of 27
4 Choosing Model Complexity model selection (ch ) slide 3 of 27
5 Model Assessment Given X, estimate Y as fˆ model selection (ch ) slide 4 of 27
6 Model Assessment Given X, estimate Y as fˆ Loss function squared error: L(Y, ˆ f (X))=(Y ˆ f (X)) 2 model selection (ch ) slide 4 of 27
7 Model Assessment Given X, estimate Y as fˆ Loss function squared error: L(Y, ˆ f (X))=(Y ˆ f (X)) 2 Loss function absolute error: L(Y, ˆ f (X))= Y ˆ f (X) model selection (ch ) slide 4 of 27
8 Model Assessment Given X, estimate Y as fˆ Loss function squared error: L(Y, ˆ f (X))=(Y ˆ f (X)) 2 Loss function absolute error: L(Y, ˆ f (X))= Y ˆ f (X) Test error Err= E[L(Y, ˆ f (X))] model selection (ch ) slide 4 of 27
9 Model Assessment Given X, estimate Y as fˆ Loss function squared error: L(Y, ˆ f (X))=(Y ˆ f (X)) 2 Loss function absolute error: L(Y, ˆ f (X))= Y ˆ f (X) Test error Err= E[L(Y, ˆ f (X))] model selection (ch ) slide 4 of 27
10 Model Assessment Given X, estimate Y as fˆ Loss function squared error: L(Y, ˆ f (X))=(Y ˆ f (X)) 2 Loss function absolute error: L(Y, ˆ f (X))= Y ˆ f (X) Test error Err= E[L(Y, ˆ f (X))] Training error err= N 1 N i=1 L(y i, f ˆ(x i )) model selection (ch ) slide 4 of 27
11 Model Assessment Given X, estimate Y as fˆ Loss function squared error: L(Y, ˆ f (X))=(Y ˆ f (X)) 2 Loss function absolute error: L(Y, ˆ f (X))= Y ˆ f (X) Test error Err= E[L(Y, ˆ f (X))] Training error err= N 1 N i=1 L(y i, f ˆ(x i )) model selection (ch ) slide 4 of 27
12 Other Loss Functions 0 1 loss: L(G, Ĝ(X))= I(G ˆ(G)(X)) Log likelihood loss: L(G, ˆp(X))= 2 log pˆ G (X) model selection (ch ) slide 5 of 27
13 Arriving at a Model Model training Model Selection Model assessment Data Train Validation Test model selection (ch ) slide 6 of 27
14 Arriving at a Model Model training Training set Model Selection Model assessment Data Train Validation Test model selection (ch ) slide 6 of 27
15 Arriving at a Model Model training Training set Model Selection Validation set Model assessment Data Train Validation Test model selection (ch ) slide 6 of 27
16 Arriving at a Model Model training Training set Model Selection Validation set Model assessment Test set Data Train Validation Test model selection (ch ) slide 6 of 27
17 Comparing Model Classes Data rich Validation Data Train Validation Test Data poor Approximate validation Analytically: AIC, BIC, MDL, SRM Efficient Sample re-use: cross-validation, bootstrapping model selection (ch ) slide 7 of 27
18 Choosing Model Complexity model selection (ch ) slide 8 of 27
19 Bias Variance Decomposition Assumptions Y= f (X)+ǫ,ǫ N(0,σ 2 ǫ ) Squared Loss error Error Err(x 0 ) = E[(Y ˆ f (x 0 )) 2 X=x 0 ] =σ 2 ǫ+ [E ˆ f (x 0 ) 2 f (x 0 )] 2 + E[ ˆ f (x 0 ) E ˆ f (x 0 )] 2 =σ 2 ǫ + Bias2 ( ˆ f (x 0 ))+Var( ˆ f (x 0 )) model selection (ch ) slide 9 of 27
20 Choosing Model Complexity model selection (ch ) slide 10 of 27
21 Bias Variance Decomposition: Special Cases k-nearest neighbor fit Err(x 0 )=σ 2 ǫ+ [ f (x 0 ) 1 k kl=1 f (x l ) ] 2 +σ 2 ǫ /k Linear model fit ˆ f (x)= ˆβ T x Err(x 0 )=σ 2 ǫ+ [ f (x 0 ) E ˆ f p (x 0 ) ] 2 + h(x0 ) 2 σ 2 ǫ Linear Model family: further decomposition of bias [ f (x0 ) E fˆ p (x 0 ) ] 2 [ f (x0 ) E fˆ α (x 0 ) ] 2 = [ f (x 0 ) β T x 0] 2 + [β T x 0 E ˆβ T α x 0] 2 = [Model Bias] 2 + [Estimation Bias] 2 model selection (ch ) slide 11 of 27
22 Bias and Variance model selection (ch ) slide 12 of 27
23 Bias-Variance Tradeoff in Model Complexity Model Sample variance Best model (*) Total error, model bias, model variance Restricted model (restricted) estimation bias Choose restricted model if B est + B + Var restricted < B + Var Figure: Hastie et al. 7.2 model selection (ch ) slide 13 of 27
24 Bias-Variance with 0-1 Loss Assumptions Y= f (X)+ǫ,ǫ N(0,σ 2 ǫ ) Squared Loss error Error Err(x 0 ) = E[(Y ˆ f (x 0 )) 2 X=x 0 ] =σ 2 ǫ+ [E ˆ f (x 0 ) 2 f (x 0 )] 2 + E[ ˆ f (x 0 ) E ˆ f (x 0 )] 2 =σ 2 ǫ + Bias2 ( ˆ f (x 0 ))+Var( ˆ f (x 0 )) model selection (ch ) slide 14 of 27
25 Bias-Variance with 0-1 Loss: knn Figures: Hastie et al. 7.3 Prediction error (red), bias 2 (green) and variance (blue) model selection (ch ) slide 15 of 27
26 Bias-Variance Tradeoff with 0-1 Loss: Regression Figures: Hastie et al. 7.3 Prediction error (red), bias 2 (green) and variance (blue) model selection (ch ) slide 16 of 27
27 Optimism Training error err= N 1 N i=1 L(y i, f ˆ(x i )) True error Err=E[L(Y, ˆ f (X))] Model adapts to data err Err In-sample error Err in = N 1 N i=1 E yey new L(Yi new new responses observed at each training point x i, i=1, 2,..., N Y new i Optimism op Err in E y ( err), ˆ f (x i )) model selection (ch ) slide 17 of 27
28 Estimating In-sample error using Optimism Err in = E y ( err)+op For squared error, 0 1 loss, and other loss functions op= 2 N N Cov(ŷ i, y i ) i=1 Tighter the data is fit, higher the optimism Err in = E y ( err)+ N 2 N i=1 Cov(ŷ i, y i ) For linear fit with d inputs: Cov(ŷ i, y i )=dσ 2 ǫ Err in = E y ( err)+ 2 N dσ2 ǫ model selection (ch ) slide 18 of 27
29 Estimates of in-sample prediction error C p statistic Estimate ˆσ ǫ from a low-bias model C p = E y ( err)+ 2 N d ˆσ2 ǫ For logistic regression, using binomial likelihood err ˆ Err in = 2 N loglik = 2 N loglik+2 d N AIC model selection (ch ) slide 19 of 27
30 Estimates of in-sample prediction error (contd) Akaike Information Criterion (AIC): maximize likelihood minimize likelihood For Gaussian model, AIC= C p In general, for a family of models with tuning parameterα, AIC(α)= err+2 ˆ d(α) N σ2 ǫ d(α) is the effective number of parameters, e.g. d(s )=trace(s ) model selection (ch ) slide 20 of 27
31 AIC in model selection Figures: Hastie et al. 7.4 Pick model with smallest AIC model selection (ch ) slide 21 of 27
32 More Estimates of in-sample prediction error Bayes Information Criterion (BIC) BIC= 2loglik+log(N)d Penalizes complexity more heavily than AIC= 2 N loglik+2 d N Asymptotically optimal: picks correct model (if it lies in the family) as N Minimum Description Length: Formally the same as BIC, motivated by Information theory Descriptionlen = arg min len(encoded message) + len(encoding parameters) model selection (ch ) slide 22 of 27
33 Estimates of Model Complexity: # of Parameters model selection (ch ) slide 23 of 27
34 Estimates of Model Complexity: # of Parameters Y=β 0 +β 1 X model selection (ch ) slide 23 of 27
35 Estimates of Model Complexity: # of Parameters Y=β 0 +β 1 X Y= I(sin(α 1 x+α 0 )), model selection (ch ) slide 23 of 27
36 Estimates of Model Complexity: Vapnik Chernovenkis Dimension The VC dimension of the class{ f (x,α)} is defined to be the largest number of points (in some configuration) that can be shattered by members of{ f (x,α)}. model selection (ch ) slide 24 of 27
37 Shatter A set of points is said to be shattered by a class of functions if, for any binary labeling, a member of the class can perfectly separate them model selection (ch ) slide 25 of 27
38 Shatter A set of points is said to be shattered by a class of functions if, for any binary labeling, a member of the class can perfectly separate them model selection (ch ) slide 25 of 27
39 Shatter A set of points is said to be shattered by a class of functions if, for any binary labeling, a member of the class can perfectly separate them Max points shattered: 3 model selection (ch ) slide 25 of 27
40 Shatter A set of points is said to be shattered by a class of functions if, for any binary labeling, a member of the class can perfectly separate them Max points shattered: 3 model selection (ch ) slide 25 of 27
41 Shatter A set of points is said to be shattered by a class of functions if, for any binary labeling, a member of the class can perfectly separate them Max points shattered: 3 Max points shattered: model selection (ch ) slide 25 of 27
42 Example: Error of models picked by criteria relative to best model Figures: Hastie et al. 7.7 model selection (ch ) slide 26 of 27
43 References Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001, pp tibs/elemstatlearn/ model selection (ch ) slide 27 of 27
PDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationTransformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17
Model selection I February 17 Remedial measures Suppose one of your diagnostic plots indicates a problem with the model s fit or assumptions; what options are available to you? Generally speaking, you
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSTK-IN4300 Statistical Learning Methods in Data Science
STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no STK-IN4300: lecture 2 1/ 38 Outline of the lecture STK-IN4300 - Statistical Learning Methods in Data Science Linear
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture Linear Methods for Regression Linear Regression Models and Least Squares Subset selection STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationBootstrap & Confidence/Prediction intervals
Bootstrap & Confidence/Prediction intervals Olivier Roustant Mines Saint-Étienne 2017/11 Olivier Roustant (EMSE) Bootstrap & Confidence/Prediction intervals 2017/11 1 / 9 Framework Consider a model with
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationBias-Variance Decomposition. Mohammad Emtiyaz Khan EPFL Oct 6, 2015
Bias-Variance Decomposition Mohammad Emtiyaz Khan EPFL Oct 6, 2015 Mohammad Emtiyaz Khan 2015 Motivation In ridge regression, we observe a typical behaviour for train and test errors with respect to model
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationLinear Regression In God we trust, all others bring data. William Edwards Deming
Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationECE-271B. Nuno Vasconcelos ECE Department, UCSD
ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationSupport Vector Machines
Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some
More informationBias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017
Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationLocal regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression
Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationMachine Learning: A Statistics and Optimization Perspective
Machine Learning: A Statistics and Optimization Perspective Nan Ye Mathematical Sciences School Queensland University of Technology 1 / 109 What is Machine Learning? 2 / 109 Machine Learning Machine learning
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationMinimum Description Length (MDL)
Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationModel Selection for Regression with Continuous Kernel Functions Using the Modulus of Continuity
Journal of Machine Learning Research 9 (008) 607-633 Submitted /07; Revised 8/08; Published /08 Model Selection for Regression with Continuous Kernel Functions Using the Modulus of Continuity Imhoi Koo
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationLecture 8 Genomic Selection
Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationMidterm Exam Solutions, Spring 2007
1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationFocused fine-tuning of ridge regression
Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More information