Cross-validation for detecting and preventing overfitting
|
|
- Irma Gilbert
- 6 years ago
- Views:
Transcription
1 Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them to fit our own needs. PowerPoint originals are available. If ou make use of a significant portion of these slides in our own lecture, please include this message, or the following link to the source repositor of Andrew s tutorials http// Comments and corrections gratefull received. Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon Universit awm@cs.cmu.edu Let s consider three methods Copright 200, Andrew W. Moore Oct 5th, 200 Copright 200, Andrew W. Moore Cross- Slide 2 Univariate Linear regression with a constant term X Y X= = =().. =.. Originall discussed in the previous Andrew Lecture Neural Nets Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 4 Univariate Linear regression with a constant term X Y X= = =().. Z= = =.. z =(,).. z k =(, k ) =.. Univariate Linear regression with a constant term X Y X= = =().. =.. Z= = β=(z T Z) - (Z T ) z =(,).. z k =(, k ) =.. est = β 0 + β Copright 200, Andrew W. Moore Cross- Slide 5 Copright 200, Andrew W. Moore Cross- Slide 6
2 Quadratic Regression Quadratic Regression X Y Z= z=(,, 2,) 9 X= = =(,2).. =.. = Much more about this in the future Andrew Lecture Favorite Regression Algorithms β=(z T Z) - (Z T ) est = β 0 + β + β 2 2 Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 8 Join-the-dots Which is best? Also known as piecewise linear nonparametric regression if that makes ou feel better Wh not choose the method with the best fit to the data? Copright 200, Andrew W. Moore Cross- Slide 9 Copright 200, Andrew W. Moore Cross- Slide 0 What do we reall want? The test method. Randoml choose 0% of the data to be in a test training Wh not choose the method with the best fit to the data? How well are ou going to predict future data drawn from the same distribution? Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 2 2
3 The test method The test method. Randoml choose 0% of the data to be in a test. Randoml choose 0% of the data to be in a test training training. Perform our regression on the training. Perform our regression on the training (Linear regression eample) (Linear regression eample) Mean Squared Error = Estimate our future performance with the test Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 4 The test method The test method. Randoml choose 0% of the data to be in a test. Randoml choose 0% of the data to be in a test training training. Perform our regression on the training. Perform our regression on the training (Quadratic regression eample) 4. Estimate our future performance with the test Mean Squared Error = 0.9 (Join the dots eample) Mean Squared Error = Estimate our future performance with the test Copright 200, Andrew W. Moore Cross- Slide 5 Copright 200, Andrew W. Moore Cross- Slide 6 The test method Good news Ver ver simple Can then simpl choose the method with the best test- score Bad news What s the downside? The test method Good news Ver ver simple Can then simpl choose the method with the best test- score Bad news Wastes data we get an estimate of the best method to appl to 0% less data If we don t have much data, our test- might just be luck or unluck We sa the test- estimator of performance has high variance Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 8
4 LOOCV (Leave-one-out Cross ) For k= to R. Let ( k, k ) be the k th record LOOCV (Leave-one-out Cross ) For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data Copright 200, Andrew W. Moore Cross- Slide 9 Copright 200, Andrew W. Moore Cross- Slide 20 LOOCV (Leave-one-out Cross ) For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data. Train on the remaining R- LOOCV (Leave-one-out Cross ) For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data. Train on the remaining R- 4. Note our error ( k, k ) Copright 200, Andrew W. Moore Cross- Slide 2 Copright 200, Andrew W. Moore Cross- Slide 22 LOOCV (Leave-one-out Cross ) For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data. Train on the remaining R- 4. Note our error ( k, k ) When ou ve done all points, report the mean error. LOOCV (Leave-one-out Cross ) For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data. Train on the remaining R- 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV = 2.2 Copright 200, Andrew W. Moore Cross- Slide 2 Copright 200, Andrew W. Moore Cross- Slide 24 4
5 LOOCV for Quadratic Regression For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data. Train on the remaining R- 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV =0.962 LOOCV for Join The Dots For k= to R. Let ( k, k ) be the k th record 2. Temporaril remove ( k, k ) from the data. Train on the remaining R- 4. Note our error ( k, k ) When ou ve done all points, report the mean error. MSE LOOCV =. Copright 200, Andrew W. Moore Cross- Slide 25 Copright 200, Andrew W. Moore Cross- Slide 26 Which kind of Cross? Test- Leaveone-out Downside Variance unreliable estimate of future performance Epensive. Has some weird behavior Upside Cheap Doesn t waste data Randoml break the data into k partitions (in our eample we ll have k=..can we get the best of both worlds? Copright 200, Andrew W. Moore Cross- Slide 2 Copright 200, Andrew W. Moore Cross- Slide 28 Randoml break the data into k partitions (in our eample we ll have k= For the red partition Train on all the the test- sum of errors on the red Randoml break the data into k partitions (in our eample we ll have k= For the red partition Train on all the the test- sum of errors on the red For the green partition Train on all the Find the test- sum of errors on the green Copright 200, Andrew W. Moore Cross- Slide 29 Copright 200, Andrew W. Moore Cross- Slide 0 5
6 Randoml break the data into k partitions (in our eample we ll have k= For the red partition Train on all the the test- sum of errors on the red Randoml break the data into k partitions (in our eample we ll have k= For the red partition Train on all the the test- sum of errors on the red For the green partition Train on all the Find the test- sum of errors on the green For the green partition Train on all the Find the test- sum of errors on the green For the blue partition Train on all the points not in the blue partition. Find the test- sum of errors on the blue MSE FOLD =2.05 For the blue partition Train on all the points not in the blue partition. Find the test- sum of errors on the blue Then report the mean error Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 2 Randoml break the data into k partitions (in our eample we ll have k= For the red partition Train on all the the test- sum of errors on the red Randoml break the data into k partitions (in our eample we ll have k= For the red partition Train on all the the test- sum of errors on the red For the green partition Train on all the Find the test- sum of errors on the green For the green partition Train on all the Find the test- sum of errors on the green Quadratic Regression MSE FOLD =. For the blue partition Train on all the points not in the blue partition. Find the test- sum of errors on the blue Then report the mean error Joint-the-dots MSE FOLD =2.9 For the blue partition Train on all the points not in the blue partition. Find the test- sum of errors on the blue Then report the mean error Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 4 Which kind of Cross? Test- Leaveone-out 0-fold -fold R-fold Downside Variance unreliable estimate of future performance Epensive. Has some weird behavior Wastes 0% of the data. 0 times more epensive than test Wastier than 0-fold. Epensivier than test Identical to Leave-one-out Upside Cheap Doesn t waste data Which kind of Cross? Test- Onl wastes 0%. Onl 0 times more epensive instead of R times. Slightl better than test Leaveone-out 0-fold -fold R-fold Downside Variance unreliable Cheap estimate of future performance But note One of Epensive. Andrew s Doesn t jos waste in data life is Has some weird behavioralgorithmic tricks for Wastes 0% of the data. making Onl wastes these cheap 0%. Onl 0 times more epensive 0 times more epensive than test instead of R times. Wastier than 0-fold. Epensivier than test Identical to Leave-one-out Upside Slightl better than test Copright 200, Andrew W. Moore Cross- Slide 5 Copright 200, Andrew W. Moore Cross- Slide 6 6
7 CV-based Model Selection We re tring to decide which algorithm to use. We train each machine and make a table Alternatives to CV-based model selection Model selection methods. Cross-validation 2. AIC (Akaike Information Criterion). BIC (Baesian Information Criterion) 4. VC-dimension (Vapnik-Chervonenkis Dimension) i f i TRAINERR 0-FOLD-CV-ERR Choice f 2 4 f 2 f f 4 Onl directl applicable to choosing classifiers 5 6 f 5 f 6 Described in a future Andrew Lecture Copright 200, Andrew W. Moore Cross- Slide Copright 200, Andrew W. Moore Cross- Slide 8 Which model selection method is best?. (CV) Cross-validation 2. AIC (Akaike Information Criterion). BIC (Baesian Information Criterion) 4. (SRMVC) Structural Risk Minimize with VC-dimension AIC, BIC and SRMVC advantage ou onl need the training error. CV error might have more variance SRMVC is wildl conservative Asmptoticall AIC and Leave-one-out CV should be the same Asmptoticall BIC and carefull chosen k-fold should be same You want BIC ou want the best structure instead of the best predictor (e.g. for clustering or Baes Net structure finding) Man alternatives---including proper Baesian approaches. It s an emotional issue. Copright 200, Andrew W. Moore Cross- Slide 9 Other Cross-validation issues Can do leave all pairs out or leave-allntuples-out if feeling resourceful. Some folks do k-folds in which each fold is an independentl-chosen sub of the data Do ou know what AIC and BIC are? If so LOOCV behaves like AIC asmptoticall. k-fold behaves like BIC if ou choose k carefull If not Nardel nardel noo noo Copright 200, Andrew W. Moore Cross- Slide 40 Cross- for regression Choosing the number of hidden units in a neural net Feature selection (see later) Choosing a polnomial degree Choosing which regressor to use Supervising Gradient Descent This is a weird but common use of Test- validation Suppose ou have a neural net with too man hidden units. It will overfit. As gradient descent progresses, maintain a graph of MSE-test-error vs. Iteration Mean Squared Error Use the weights ou found on this iteration Training Set Test Set Iteration of Gradient Descent Copright 200, Andrew W. Moore Cross- Slide 4 Copright 200, Andrew W. Moore Cross- Slide 42
8 Supervising Gradient Descent This is a weird but common use of Test- validation Suppose ou have a neural net with too Relies on an intuition that a not-full- man hidden units. It will overfit. As gradient descent having fewer progresses, parameters. maintain a graph of MSE-test-error Works prett well in vs. practice, Iteration apparentl minimized of weights is somewhat like Cross-validation for classification Instead of computing the sum squared errors on a test, ou should compute Mean Squared Error Use the weights ou found on this iteration Training Set Test Set Iteration of Gradient Descent Copright 200, Andrew W. Moore Cross- Slide 4 Copright 200, Andrew W. Moore Cross- Slide 44 Cross-validation for classification Instead of computing the sum squared errors on a test, ou should compute The total number of misclassifications on a test. Cross-validation for classification Instead of computing the sum squared errors on a test, ou should compute The total number of misclassifications on a test. But there s a more sensitive alternative Compute log P(all test outputs all test inputs, our model) Copright 200, Andrew W. Moore Cross- Slide 45 Copright 200, Andrew W. Moore Cross- Slide 46 Cross- for classification Choosing the pruning parameter for decision trees Feature selection (see later) What kind of Gaussian to use in a Gaussianbased Baes Classifier Choosing which classifier to use Copright 200, Andrew W. Moore Cross- Slide 4 Cross- for densit estimation Compute the sum of log-likelihoods of test points Eample uses Choosing what kind of Gaussian assumption to use Choose the densit estimator NOT Feature selection (test densit will almost alwas look better with fewer features) Copright 200, Andrew W. Moore Cross- Slide 48 8
9 Feature Selection Suppose ou have a learning algorithm LA and a of input attributes { X, X 2.. X m } You epect that LA will onl find some sub of the attributes useful. Question How can we use cross-validation to find a useful sub? Four ideas Forward selection Backward elimination Hill Climbing Another fun area in which Andrew has spent a lot of his wild outh Stochastic search (Simulated Annealing or GAs) Copright 200, Andrew W. Moore Cross- Slide 49 Ver serious warning Intensive use of cross validation can overfit. How? What can be done about it? Copright 200, Andrew W. Moore Cross- Slide 50 Ver serious warning Intensive use of cross validation can overfit. How? Imagine a data with 50 records and 000 attributes. You tr 000 linear regression models, each one using one of the attributes. Ver serious warning Intensive use of cross validation can overfit. How? Imagine a data with 50 records and 000 attributes. You tr 000 linear regression models, each one using one of the attributes. The best of those 000 looks good! What can be done about it? What can be done about it? Copright 200, Andrew W. Moore Cross- Slide 5 Copright 200, Andrew W. Moore Cross- Slide 52 Ver serious warning Intensive use of cross validation can overfit. How? Imagine a data with 50 records and 000 attributes. You tr 000 linear regression models, each one using one of the attributes. The best of those 000 looks good! But ou realize it would have looked good even if the output had been purel random! What can be done about it? Hold out an additional test before doing an model selection. Check the best model performs well even on the additional test. Copright 200, Andrew W. Moore Cross- Slide 5 What ou should know Wh ou can t use training--error to estimate the qualit of our learning algorithm on our data. Wh ou can t use training error to choose the learning algorithm Test- cross-validation Leave-one-out cross-validation k-fold cross-validation Feature selection methods CV for classification, regression & densities Copright 200, Andrew W. Moore Cross- Slide 54 9
VC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationProbability Densities in Data Mining
Probabilit Densities in Data Mining Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these
More informationEight more Classic Machine Learning algorithms
Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures. Feel free to use these slides verbatim, or to modif them
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit
More informationFeedforward Neural Networks
Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationLearning From Data Lecture 7 Approximation Versus Generalization
Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis
More informationAlgorithms for Playing and Solving games*
Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationAndrew W. Moore Professor School of Computer Science Carnegie Mellon University
Spport Vector Machines Note to other teachers and sers of these slides. Andrew wold be delighted if yo fond this sorce material sefl in giving yor own lectres. Feel free to se these slides verbatim, or
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMidterm Exam Solutions, Spring 2007
1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationInformation Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
te to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationMinimum Description Length (MDL)
Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationProbabilistic and Bayesian Analytics
Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationChapter 18 Quadratic Function 2
Chapter 18 Quadratic Function Completed Square Form 1 Consider this special set of numbers - the square numbers or the set of perfect squares. 4 = = 9 = 3 = 16 = 4 = 5 = 5 = Numbers like 5, 11, 15 are
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationLinear Regression In God we trust, all others bring data. William Edwards Deming
Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification
More informationLecture 27: More on Rotational Kinematics
Lecture 27: More on Rotational Kinematics Let s work out the kinematics of rotational motion if α is constant: dω α = 1 2 α dω αt = ω ω ω = αt + ω ( t ) dφ α + ω = dφ t 2 α + ωo = φ φo = 1 2 = t o 2 φ
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationINTRODUCTION TO DIOPHANTINE EQUATIONS
INTRODUCTION TO DIOPHANTINE EQUATIONS In the earl 20th centur, Thue made an important breakthrough in the stud of diophantine equations. His proof is one of the first eamples of the polnomial method. His
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationFIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS
LINEAR CLASSIFIER 1 FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS x f y High Value Customers Salary Task: Find Nb Orders 150 70 300 100 200 80 120 100 Low Value Customers Salary Nb Orders 40 80 220
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationFigure 1: Visualising the input features. Figure 2: Visualising the input-output data pairs.
Regression. Data visualisation (a) To plot the data in x trn and x tst, we could use MATLAB function scatter. For example, the code for plotting x trn is scatter(x_trn(:,), x_trn(:,));........... x (a)
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More information