Prediction problems 3: Validation and Model Checking
|
|
- William Fox
- 5 years ago
- Views:
Transcription
1 Prediction problems 3: Validation and Model Checking Data Science 101 Team May 17, 2018
2 Outline Validation Why is it important How should we do it? Model checking Checking whether your model is a good fit to the data What to do if it is not?
3 Using a very powerful model Let us do a small experiment predicting handwritten digits again (4 versus 9), where x is pixels in image, y { 1, 1} Idea: why use just the pixel values? Why not squares or other powers of pixel values? Use a more powerful prediction model with vectors β (1), β (2), and β (3), which weight powers of pixel intensity : ŷ = β 0 + p j=1 β (1) j x j + p j=1 β (2) j xj 2 }{{} quadratic terms + p j=1 β (3) j xj 3 }{{} cubic terms
4 Generating Polynomials in R Xtrain.powers = cbind(xtrain, Xtrain * Xtrain, Xtrain * Xtr Xtest.powers = cbind(xtest, Xtest * Xtest, Xtest * Xtest * poly.reg = lm(ytrain ~ Xtrain.powers) What is your hypothesis about how this model will do on Error on the training data? Error on the testing data?
5 Experiments with a powerful model on digit recognition ## Errors in training cat(paste("training data: ", sum(sign(poly.reg$fitted.value length(ytrain), " data points\n", sep = "")) Training data: 0 mistakes of 1296 data points Now, the moment of truth: how does our super classifier work? beta.0 = poly.reg$coefficients[1] beta = poly.reg$coefficients[2:length(poly.reg$coefficients test.pred = Xtest.powers %*% beta + beta.0 ## Now, let's find a few example mistakes mistakes = which(sign(test.pred)!= ytest) cat(paste("test data: ", length(mistakes), " mistakes of ", round(100 * length(mistakes)/length(ytest), digits = 1) Test data: 29 mistakes of 377 data points (7.7% error)
6 Experiments with a simple model on digit recognition linreg = lm(ytrain ~ Xtrain) ## Errors in training mistakes = which(sign(linreg$fitted.values)!= ytrain) cat(paste("training data: ", length(mistakes), " mistakes o " data points (", round(100 * length(mistakes)/length(y "% error)\n", sep = "")) Training data: 7 mistakes of 1296 data points (0.5% error)
7 What about our old standby simple classifier? beta.0 = linreg$coefficients[1] beta = linreg$coefficients[2:length(linreg$coefficients)] test.pred = Xtest %*% beta + beta.0 ## Now, let's find a few example mistakes mistakes = which(sign(test.pred)!= ytest) cat(paste("test data: ", length(mistakes), " mistakes of ", round(100 * length(mistakes)/length(ytest), digits = 1) Test data: 11 mistakes of 377 data points (2.9% error)
8 Validation How can we check how good our classifier is? The basic goal in prediction (machine learning) is to do well on future data Often, the best source of future data is to hold some data in the training set out in a test set or validation set Why do we do this? To avoid overfitting forcing our model to match our training data too closely To confirm that we are making reasonable predictions
9 A little theory (the classification case) Suppose we fit a model with parameters β on a training set We keep out a validation (or test) set of size N, with pairs x i, y i, independent of the training data For a classification problem, with very high probability, the validation error rate êrr = 1 N N 1{ŷ i y i } i=1 is an accurate measure of the true error rate of our classifier for all future data, at least within êrr/ N + 4/N
10 Classification continued... Plot of validation error as a function of validation test size for several random validation sets
11 Revisiting validation of our models For simple classifier: Test data: 11 mistakes of 377 data points (2.9% error) The true error rate (on future images) should be no more than =.019 better or worse than 2.9% error. For the fancier classifier: Test data: 29 mistakes of 377 data points (7.7% error) So the true error rate (on future images) is (likely) no more than.077 N + 4 N =.025 better or worse than 7.7% error.
12 What is going on? Overfitting: we have overfit to our training data When we use a model that is too powerful for the amount of data we have, we fit spurious junk N = 10 # Generate data that is nothing but random Normal n y = 0.25 * rnorm(n) x = seq(0, 1, length.out = N) y x Fit the noise with a model of the form ŷ = β 0 + β 1 x + β 2 x β 9 x 9
13 Fitting a model that predicts the data perfectly X = cbind(x) for (ii in 2:(N - 1)) { X = cbind(x, x^ii) # Construct data with all powers up } polynomial.linreg = lm(y ~ X) y x
14 Generating a little more data And yet, if we get a bit more data, it becomes clear we have overfit y.additional = 0.25 * rnorm(100) plot(x, y, xlab = "x", pch = 21, ylim = c(-2, 2), ylab = "y cex = 2) points(x, y, pch = 20) points(x.interp, y.additional, pch = 20) lines(c(0, 1), c(0, 0), col = "red", lwd = 2) lines(x.interp, yhat, col = "blue", lwd = 2) y x
15 Overfitting and the bias-variance tradeoff
16 Model checking Defining the model (according to George Box) All models are wrong some models are useful... Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity What do we check for in a regression model? Are the assumptions reasonable? Have we chosen good features (x variables)? We consider two diagnostics (also good for modeling) Residual plots Probability plots
17 Model checking: residual plots Do we have good features? We make predictions ŷ = β 0 + p β j x j j=1 Consider the errors (residuals) in each prediction p r = y ŷ = y β 0 β j x j j=1 For each variable j {1, 2,..., p}, we ask Is the assumption of linearity reasonable?
18 Model checking: what function should we use? Idea: holding all other variables constant (but at their best fits), what is the right fit for variable j? Plot the prediction errors y ŷ, removing predicted contribution of variable x j, versus variable x j That is, plot y ŷ + β }{{} j x j = y β 0 β k x k vs. x j }{{} k j residual remove component for j
19 Model checking example: finding the right functions Generate data from the model y = 2.1 x x x x x ε where ε is normal mean 0, variance 1/4 p = 4 n = 100 betas = c(2.1, 3.5, -1.9, 2, 1.5) x.samples = rnorm(n * p) X = matrix(x.samples, nrow = n, ncol = p) X.with.quadratic = cbind(x, X[, 1]^2) y = X.with.quadratic %*% betas * rnorm(n) We have generated n = 100 points from this distribution
20 Model checking example: basic plots Plot y against each coordinate of x par(mfrow = c(2, 2)) plot(x[, 1], y, xlab = "x1", ylab = "y") plot(x[, 2], y, xlab = "x2", ylab = "y") plot(x[, 3], y, xlab = "x3", ylab = "y") plot(x[, 4], y, xlab = "x4", ylab = "y") y y x x2 y y x x4
21 Model checking example: first diagnostic Plot y against each coordinate of x linreg = lm(formula = y ~ X) betas = linreg$coefficients[2:(p+1)] par(mfrow = c(2, 2)) plot(x[, 1], linreg$residuals + X[, 1] * betas[1], xlab = " plot(x[, 2], linreg$residuals + X[, 2] * betas[2], xlab = " plot(x[, 3], linreg$residuals + X[, 3] * betas[3], xlab = " plot(x[, 4], linreg$residuals + X[, 4] * betas[4], xlab = " residual residual x1 x2 residual residual x3 x4
22 x3 x4 Model checking example: adding quadratic in Plot y against each coordinate of x X.with.quadratic = cbind(x, X[, 1]^2) linreg = lm(formula = y ~ X.with.quadratic) betas = linreg$coefficients[2:(p+1)] par(mfrow = c(2, 2)) plot(x[, 1], linreg$residuals + X[, 1] * betas[1], xlab = " plot(x[, 2], linreg$residuals + X[, 2] * betas[2], xlab = " plot(x[, 3], linreg$residuals + X[, 3] * betas[3], xlab = " plot(x[, 4], linreg$residuals + X[, 4] * betas[4], xlab = " residual residual x x2 residual residual
23 Model checking: do we have fidelity to the data? The QQ ( quantile-quantile ) plot is a plot of quantiles of one distribution against another Quantile of a distribution: for α [0, 1], q α = q : P(Y q) = α The function qqnorm plots the quantiles of a distribution against those for a normal If things are normal, qqnorm should look linear
24 Model checking: do we have fidelity to the data? Example 1: our simulation qqnorm(linreg$residuals/sd(linreg$residuals)) lines(c(-2, 2), c(-2, 2), col = "red", lwd = 2) Normal Q Q Plot Sample Quantiles Theoretical Quantiles
25 Model checking: does our model look good Example 2: Boston housing data set library(mass) data(boston) ## Remove the $500,000 sale price, as it is a category boston = Boston[Boston$medv!= 50, ] fullregression = lm(formula = medv ~., data = boston) qqnorm(fullregression$residuals/sd(fullregression$residuals lines(c(-3, 3), c(-3, 3), col = "red", lwd = 2) Normal Q Q Plot Sample Quantiles Theoretical Quantiles
ECON/FIN 250: Forecasting in Finance and Economics: Section 4.1 Forecasting Fundamentals
ECON/FIN 250: Forecasting in Finance and Economics: Section 4.1 Forecasting Fundamentals Patrick Herb Brandeis University Spring 2016 Patrick Herb (Brandeis University) Forecasting Fundamentals ECON/FIN
More informationHoliday Assignment PS 531
Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationChapter 5 Exercises 1
Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine
More informationDISCRIMINANT ANALYSIS: LDA AND QDA
Stat 427/627 Statistical Machine Learning (Baron) HOMEWORK 6, Solutions DISCRIMINANT ANALYSIS: LDA AND QDA. Chap 4, exercise 5. (a) On a training set, LDA and QDA are both expected to perform well. LDA
More informationClass 04 - Statistical Inference
Class 4 - Statistical Inference Question 1: 1. What parameters control the shape of the normal distribution? Make some histograms of different normal distributions, in each, alter the parameter values
More informationAssignment 4. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14
Assignment 4 Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14 Exercise 1 (Rewriting the Fisher criterion for LDA, 2 points) criterion J(w) = w, m +
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationExplore the data. Anja Bråthen Kristoffersen
Explore the data Anja Bråthen Kristoffersen density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by a density function, p(x)
More informationNeural networks (NN) 1
Neural networks (NN) 1 Hedibert F. Lopes Insper Institute of Education and Research São Paulo, Brazil 1 Slides based on Chapter 11 of Hastie, Tibshirani and Friedman s book The Elements of Statistical
More informationExplore the data. Anja Bråthen Kristoffersen Biomedical Research Group
Explore the data Anja Bråthen Kristoffersen Biomedical Research Group density 0.2 0.4 0.6 0.8 Probability distributions Can be either discrete or continuous (uniform, bernoulli, normal, etc) Defined by
More informationStatistical Computing Session 4: Random Simulation
Statistical Computing Session 4: Random Simulation Paul Eilers & Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center p.eilers@erasmusmc.nl Masters Track Statistical Sciences,
More informationBias-Variance Tradeoff. David Dalpiaz STAT 430, Fall 2017
Bias-Variance Tradeoff David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 03 released Regrade policy Style policy? 2 Statistical Learning Supervised Learning Regression Parametric Non-Parametric
More informationChapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)
Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data
More information1 A Support Vector Machine without Support Vectors
CS/CNS/EE 53 Advanced Topics in Machine Learning Problem Set 1 Handed out: 15 Jan 010 Due: 01 Feb 010 1 A Support Vector Machine without Support Vectors In this question, you ll be implementing an online
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationChapter 9. Polynomial Models and Interaction (Moderator) Analysis
Chapter 9. Polynomial Models and Interaction (Moderator) Analysis In Chapter 4, we introduced the quadratic model as a device to test for curvature in the conditional mean function. You could also use
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationHow to mathematically model a linear relationship and make predictions.
Introductory Statistics Lectures Linear regression How to mathematically model a linear relationship and make predictions. Department of Mathematics Pima Community College Redistribution of this material
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationHierarchical Modeling
Hierarchical Modeling Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. General Idea One benefit
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationStatistical Learning
Statistical Learning Supervised learning Assume: Estimate: quantity of interest function predictors to get: error Such that: For prediction and/or inference Model fit vs. Model stability (Bias variance
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More information9. Least squares data fitting
L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationINTRODUCTION TO PATTERN
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More information6.867 Machine learning: lecture 2. Tommi S. Jaakkola MIT CSAIL
6.867 Machine learning: lecture 2 Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learning problem hypothesis class, estimation algorithm loss and estimation criterion sampling, empirical and
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationLearning: Binary Perceptron. Examples: Perceptron. Separable Case. In the space of feature vectors
Linear Classifiers CS 88 Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley Feature Vectors Some (Simplified) Biology Very loose inspiration
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationIntroduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationChapter 7. Data Partitioning. 7.1 Introduction
Chapter 7 Data Partitioning 7.1 Introduction In this book, data partitioning refers to procedures where some observations from the sample are removed as part of the analysis. These techniques are used
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationNonstationary time series models
13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationMeasurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R
Measurement, Scaling, and Dimensional Analysis Summer 2017 Bill Jacoby METRIC MDS IN R This handout shows the contents of an R session that carries out a metric multidimensional scaling analysis of the
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationLecture 5 : The Poisson Distribution
Lecture 5 : The Poisson Distribution Jonathan Marchini November 5, 2004 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,
More informationLecture #11: Classification & Logistic Regression
Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationContents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.
2 2 dplyr lfe readr MASS auto.csv plot() plot() ggplot2 plot() # Start the.jpeg driver jpeg("your_plot.jpeg") # Make the plot plot(x = 1:10, y = 1:10) # Turn off the driver dev.off() # Start the.pdf driver
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More informationCh 5 : Probability To Statistics
Summer 2017 UAkron Dept. of Stats [3470 : 461/561] Applied Statistics Ch 5 : Probability To Statistics Contents 1 Random Sampling 2 1.1 Probability and Statistics...........................................................
More informationSolutions to obligatorisk oppgave 2, STK2100
Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationLogistic Regression. 0.1 Frogs Dataset
Logistic Regression We move now to the classification problem from the regression problem and study the technique ot logistic regression. The setting for the classification problem is the same as that
More information1 1 + exp( yw x). (1)
2 Robust modelling In this part we ll consider building a model where our training data is unreliable. The idea considered in this part could be applied widely, including to noisy images. However, we ll
More informationMetric Predicted Variable With One Nominal Predictor Variable
Metric Predicted Variable With One Nominal Predictor Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more
More informationMatematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer
Lunds universitet Matematikcentrum Matematisk statistik Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer General information on labs During the rst half of the course MASA01 we will have
More informationJian WANG, PhD. Room A115 College of Fishery and Life Science Shanghai Ocean University
Jian WANG, PhD j_wang@shou.edu.cn Room A115 College of Fishery and Life Science Shanghai Ocean University Contents 1. Introduction to R 2. Data sets 3. Introductory Statistical Principles 4. Sampling and
More informationSTK 2100 Oblig 1. Zhou Siyu. February 15, 2017
STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter
More informationMathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 206 Jonathan Pillow Homework 8: Logistic Regression & Information Theory Due: Tuesday, April 26, 9:59am Optimization Toolbox One
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationLinear Classifiers: Expressiveness
Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationChapter 5. Transformations
Chapter 5. Transformations In Chapter 4, you learned ways to diagnose violations of model assumptions. What should you do if some assumptions are badly violated? Often, you can use transformations to solve
More information