Learning From Data Lecture 13 Validation and Model Selection
|
|
- Benedict Hicks
- 5 years ago
- Views:
Transcription
1 Learning From Data Lecture 13 Validation and Model Selection The Validation Set Model Selection Cross Validation M. Magdon-Ismail CSCI 4100/6100
2 recap: Regularization Regularization combats the effects of noise by putting a leash on the algorithm. E aug (h) = E in (h)+ λ N Ω(h) Ω(h) smooth, simple h noise is rough, complex. Different regularizers give different results can choose λ, the amount of regularization. λ = 0 λ = λ = 0.01 λ = 1 Data Target Fit y y y y x x x x Overfitting Underfitting Optimal λ balances approximation and generalization, bias and variance. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 2 /29 Peeking at E out
3 Validation: A Sneak Peek at E out E out (g) = E in (g)+overfit penalty }{{} VC bounds this using a complexity error bar for H regularization estimates this through a heuristic complexity penalty for g Validation goes directly for the jugular: E out (g) = E in (g)+overfit penalty. }{{} validation estimates this directly In-sample estimate of E out is the Holy Grail of learning from data. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 3 /29 Test set
4 The Test Set D (N data points) D test (K test points) E test is an estimate for E out (g) E Dtest [e k ] = E out (g) g g E out (g) e k = e(g(x k ),y k ) e 1,e 2,...,e K E test = 1 K K k=1 e k E[E test ] = 1 K = 1 K e 1,...,e K are independent K E[e k ] k=1 K E out (g)= E out (g) k=1 Var[E test ] = 1 K Var[e K 2 k ] k=1 = 1 K Var[e] տ decreases like 1 K bigger K = more reliable E test. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 4 /29 Validation set
5 The Validation Set D (N data points) D train (N K training points) D val (K validation points) 1. Remove K points from D g e k = e(g (x k ),y k ) e 1,e 2,...,e K 2. Learn using D train g. D = D train D val. g E val = 1 K K k=1 e k 3. Test g on D val E val. 4. Use error E val to estimate E out (g ). E out (g ) c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 5 /29 Validation
6 The Validation Set D (N data points) D train (N K training points) D val (K validation points) 1. Remove K points from D g e k = e(g (x k ),y k ) e 1,e 2,...,e K 2. Learn using D train g. D = D train D val. g E val = 1 K K k=1 e k 3. Test g on D val E val. 4. Use error E val to estimate E out (g ). E out (g ) c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 6 /29 Reliability of validation
7 The Validation Set D (N data points) E val is an estimate for E out (g ) E Dval [e k ] = E out (g ) D train (N K training points) g D val (K validation points) e k = e(g (x k ),y k ) e 1,e 2,...,e K E[E test ] = 1 K = 1 K K E[e k ] k=1 K E out (g )= E out (g ) k=1 g E out (g ) E val = 1 K K k=1 e k e 1,...,e K are independent Var[E val ] = 1 K Var[e K 2 k ] k=1 = 1 Var[e(g )] K տ decreases like 1 K depends on g, not H bigger K = more reliable E val? c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 7 /29 E val versus K
8 Sfrag Choosing K Expected Eval Size of Validation Set, K Rule of thumb: K = N 5. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 8 /29 Restoring D
9 Restoring D D (N) D train Primary goal: output best hypothesis. g was trained on all the data. (N K) Secondary goal: estimate E out (g). g is behind closed doors. g D val (K) E out (g) E out (g ) g E val (g ) E in (g) E val (g ) }{{} which should we use? CUSTOMER c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 9 /29 E val versus E in
10 E val Versus E in E out (g) E in (g)+o ( dvc N logn E out (g) E out (g ) E val (g )+O learning curve is decreasing (a practical truth, not a theorem) ւ Biased error bar depends on H. ) ( 1 K ) տ Unbiased error bar depends on g. E val (g) usually wins as an estimate for E out (g), especially when the learning curve is not steep. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 10 /29 Model Selection
11 Model Selection The most important use of validation H 1 H 2 H 3 H M D train g 1 g 2 g 3 g M D val E 1 c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 11 /29 Validation estimate for E out (g 1 )
12 Validation Estimate for (H 1,g 1 ) The most important use of validation H 1 H 2 H 3 H M D train g 1 D val E val (g 1 ) c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 12 /29 Call it E 1
13 Validation Estimate for (H 1,g 1 ) The most important use of validation H 1 H 2 H 3 H M D train g 1 D val E 1 c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 13 /29 Validation estimates E 1,...,E M
14 Compute Validation Estimates for All Models The most important use of validation H 1 H 2 H 3 H M D train g 1 g 2 g 3 g M D val E 1 E 2 E 3 E M c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 14 /29 Pick best validation error
15 Pick The Best Model According to Validation Error The most important use of validation H 1 H 2 H 3 H M D train g 1 g 2 g 3 g M D val E 1 E 2 E 3 E M c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 15 /29 Biased E val (g m )
16 E val (g m ) is not Unbiased For E out (g m ) 0.8 Expected Error E out (g m ) E val (g m )...because we choose one of the M finalists Validation Set Size, K E out (g m ) E val (g m )+O ( ) lnm K VC error bar for selecting a hypothesis from M using a data set of size K. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 16 /29 Restoring D
17 Restoring D H 1 H 2 H 3 H M g 1 g 2 g 3 g M E 1 Model with best g also has best g We can find model with best g using validation leap of faith true modulo E val error bar c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 17 /29 Comparing E in and E val
18 Comparing E in and E val for Model Selection 0.56 validation: g m H 1 H 2 H M D train in-sample: g m g 1 g 2 g M Expected Eout 0.52 validation: g m D val E 1 E E } 2 {{} M pick the best 0.48 optimal Validation Set Size, K D (H m, E m ) g m c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 18 /29 Selecting λ
19 Application to Selecting λ Which regularization parameter to use? λ 1,λ 2,...,λ M. This is a special case of model selection over M models, (H,λ 1 ) (H,λ 2 ) (H,λ 3 ) (H,λ M ) g 1 g 2 g 3 g M Picking a model amounts to chosing the optimal λ c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 19 /29 Tradeoff with K
20 The Dilemma When Choosing K Validation relies on the following chain of reasoning, E out (g) E out (g ) E val (g ) (small K) (large K) c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 20 /29 K = 1?
21 Can we get away with K = 1? Yes, almost! c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 21 /29 Leave one out
22 The Leave One Out Error (K = 1) y e 1 x g 1 E[e 1 ] = E out (g 1 )...but it is a wild estimate c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 22 /29 E cv
23 The Leave One Our Errors e 2 e 3 y y y e 1 x x x E[e 1 ] = E out (g 1 ) E cv = 1 N N n=1 e n c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 23 /29 CV is unbiased
24 Cross Validation is Unbiased Theorem. E cv is an unbiased estimate of Ēout(N 1). տ Expected E out when learning with N 1 points. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 24 /29 Reliability of E cv
25 Reliability of E cv e n and e m are not independent. e n depends on g n which was trained on (x m,y m ). e m is evaluated on (x m,y m ). e n E cv 1 N Effective number of fresh examples giving a comparable estimate of E out c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 25 /29 Computational considerations
26 Cross Validation is Computationally Intensive N epochs of learning each on a data set of size N 1. Analytic approaches, for example linear regression with weight decay w reg = (Z t Z+λI) 1 Z t y H(λ) = Z(Z t Z+λI) 1 Z t. E cv = 1 N N n=1 ( ) ŷn y 2 n 1 H nn (λ) 10-fold cross validation D {}}{ D 1 D 2 D 3 D 4 D 5 D 6 D 7 D 8 D 9 D 10 train validate train c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 26 /29 Restoring D
27 Restoring D D D 1 g 1 D 2 g 2 D N (x 1,y 1 ) (x 2,y 2 ) (x N,y N ) g N e 1 } e 2 {{ e N } take average ( E out (g (N) ) Ēout(N 1) E cv +O learning curve ) 1 N. nearly independent e n g E cv CUSTOMER E cv can be used for model selection just as E val, for example to choose λ. c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 27 /29 Digits
28 Digits Problem: 1 Versus Not E out Symmetry Error E cv 1 Not 1 # Features Used E in Average Intensity x = (1,x 1,x 2 ) z = (1,x 1,x 2,x 2 1,x 1 x 2,x 2 2,x 3 1,x 2 1x 2,x 1 x 2 2,x 3 2,...,x 5 1,x 4 1x 2,x 3 1x 2 2,x 2 1x 3 2,x 1 x 4 2,x 5 2) }{{} 5th order polynomial transform 20 dimensional non linear feature space c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 28 /29 Validation Wins
29 Validation Wins In the Real World Symmetry Symmetry Average Intensity Average Intensity no validation (20 features) E in = 0% E out = 2.5% cross validation (6 features) E in = 0.8% E out = 1.5% c AM L Creator: Malik Magdon-Ismail Validation and Model Selection: 29 /29
Learning From Data Lecture 12 Regularization
Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100 recap: Overfitting Fitting the data more than is warranted Data Target Fit
More informationLearning From Data Lecture 10 Nonlinear Transforms
Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600 recap: The Linear Model linear in w: makes the algorithms work linear in x:
More informationLearning From Data Lecture 25 The Kernel Trick
Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random
More informationLearning From Data Lecture 5 Training Versus Testing
Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization (E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI
More informationLearning From Data Lecture 14 Three Learning Principles
Learning From Data Lecture 14 Three Learning Principles Occam s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D (N)
More informationLearning From Data Lecture 8 Linear Classification and Regression
Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis E out E
More informationLearning From Data Lecture 7 Approximation Versus Generalization
Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationLearning From Data Lecture 2 The Perceptron
Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1.
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLearning From Data Lecture 3 Is Learning Feasible?
Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron
More informationLinear models and the perceptron algorithm
8/5/6 Preliminaries Linear models and the perceptron algorithm Chapters, 3 Definition: The Euclidean dot product beteen to vectors is the expression dx T x = i x i The dot product is also referred to as
More informationPreliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1
90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression
More informationLow Bias Bagged Support Vector Machines
Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi di Milano, Italy valentini@dsi.unimi.it Thomas G. Dietterich Department of Computer
More informationLearning From Data Lecture 26 Kernel Machines
Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationORIE 4741: Learning with Big Messy Data. Generalization
ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationMachine Learning. Model Selection and Validation. Fabio Vandin November 7, 2017
Machine Learning Model Selection and Validation Fabio Vandin November 7, 2017 1 Model Selection When we have to solve a machine learning task: there are different algorithms/classes algorithms have parameters
More informationLearning From Data Lecture 9 Logistic Regression and Gradient Descent
Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression Gradient Descent M. Magdon-Ismail CSCI 4100/6100 recap: Linear Classification and Regression The linear signal:
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationORIE 4741: Learning with Big Messy Data. Train, Test, Validate
ORIE 4741: Learning with Big Messy Data Train, Test, Validate Professor Udell Operations Research and Information Engineering Cornell December 4, 2017 1 / 14 Exercise You run a hospital. A vendor wants
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationPerformance Evaluation
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationCSC 411 Lecture 6: Linear Regression
CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression
More informationLearning from Data: Regression
November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationFoundations of Computer Science Lecture 14 Advanced Counting
Foundations of Computer Science Lecture 14 Advanced Counting Sequences with Repetition Union of Overlapping Sets: Inclusion-Exclusion Pigeonhole Principle Last Time To count complex objects, construct
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationFoundations of Computer Science Lecture 23 Languages: What is Computation?
Foundations of Computer Science Lecture 23 Languages: What is Computation? A Formal Model of a Computing Problem Decision Problems and Languages Describing a Language: Regular Expressions Complexity of
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationTrain the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?
Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the
More informationINTRODUCTION TO PATTERN
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 11: Linear Models for Classification Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationESS2222. Lecture 3 Bias-Variance Trade-off
ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationDegrees of Freedom in Regression Ensembles
Degrees of Freedom in Regression Ensembles Henry WJ Reeve Gavin Brown University of Manchester - School of Computer Science Kilburn Building, University of Manchester, Oxford Rd, Manchester M13 9PL Abstract.
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationCSCI 5622 Machine Learning
CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor
More informationClassifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier
Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationLinear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More information