CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
|
|
- Daisy Kennedy
- 6 years ago
- Views:
Transcription
1 CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data Analysis Group May 2, 27 Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 / 53
2 Terminology Machine Learning 2 Statistical Learning 3 Data Mining 4 Pattern Recognition 5... It s all about learning from data. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 2 / 53
3 What is machine learning? The field of pattern recognition/machine learning is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer 26, page. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 3 / 53
4 Example: Handwritten Digit Recognition pixel images Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 4 / 53
5 Example: Handwritten Digit Recognition grid Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 5 / 53
6 Machine Learning Approach Use training data D = {(x, y ),..., (x n, y n )} of n labeled examples, and fit a model to the training data. This model can subsequently be used to predict the class (digit) for new input vectors x. The ability to categorize correctly new examples is called generalization. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 6 / 53
7 Types of Learning Problems Supervised Learning Numeric target: regression. Discrete unordered target: classification. Discrete ordered target: ordinal classification/regression; ranking. Unsupervised Learning Clustering. Density estimation. Frequent pattern mining. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 7 / 53
8 Linear Regression Model The central assumption of linear regression is E[y x] = w + w x, where E stands for expected value ( average ). Alternatively, we can write with E[ε x] =. y = w + w x + ε The observed y values are composed of a structural part which is a (linear) function of x, and random noise. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 8 / 53
9 Minimizing empirical error Given training data D = {(x, y ), (x 2, y 2 ),..., (x n, y n )}, find the values of w and w such that the sum of squared errors is minimized. SSE(w, w ) = n (y i i= prediction for y i {}}{ (w + w x i ) ) 2 Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 9 / 53
10 Example: the data generating process y x y = sin(2πx) + ε, ε N(µ =, σ =.3) Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 / 53
11 Fitting a linear model: large empirical error y x ŷ =.8.68x Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 / 53
12 Fitting a third-order polynomial: just about right y x ŷ = x 28.23x x 3 Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 2 / 53
13 Fitting a ninth-order polynomial: zero error, but overfitting y x ŷ = x x x x 9 Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 3 / 53
14 Lesson Learned Minimizing empirical error may be a good way to fit the parameters of a single model, but it is not a good way to compare models of different complexities, as this would lead to overfitting and hence bad generalization. There are different ways to address this problem, for example: evaluate the predictive performance on data that was not used for training. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 4 / 53
15 Cross-Validation: Training Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 5 / 53
16 Cross-Validation: Prediction Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 6 / 53
17 Cross-Validation: Training Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 7 / 53
18 Cross-Validation: Prediction Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 8 / 53
19 Cross-Validation: Training Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 9 / 53
20 Cross-Validation: Prediction Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 2 / 53
21 Cross-Validation: Training Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 2 / 53
22 Cross-Validation: Prediction Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
23 Cross-Validation: Training Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
24 Cross-Validation: Prediction Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
25 K-fold cross-validation Divide the data in K parts. 2 For each of the K parts do Use the remaining K parts to train the model. Predict on the part that was not used for training. 3 Compute accuracy of the predictions. All predictions are made on data that was not used for training! Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
26 K-fold cross-validation: selecting a complexity parameter C is a complexity parameter, for example the degree of the polynomial in the regression example. Divide the data in K parts. 2 For each value c of C do For each of the K parts do Use the remaining K parts to train the model with C = c. Predict on the part that was not used for training. Compute accuracy of the predictions with C = c. 3 Select c as the value of C with highest accuracy. 4 Train on the complete data with C = c. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
27 Logistic regression for binary classification Code the 2 classes as and (coding is arbitrary, but this coding is often convenient). y {, }: why not linear regression? Logistic regression assumption: and therefore E[y x] = P(y = x) = P(y = x) = ew +w x + e w +w x + e w +w x since P(y = x) and P(y = x) should add up to one. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
28 Logistic regression has a linear decision boundary The log odds ln is a linear function of x. ( ) ( P(y = x) e w +w x = ln P(y = x) ) = w + w x Both classes are equally probable when ( ) P(y = x) P(y = x) =, and therefore when ln = P(y = x) P(y = x) So the decision boundary is w + w x = Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
29 Fitting the logistic regression function The coefficients w and w are estimated by maximum likelihood. Except for some unlikely cases, there is a unique optimal solution. Plug in the estimates to get the fitted response function: P(y = x) = eŵ+ŵ x + eŵ+ŵ x Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
30 Analysis of the handwritten digit data We have 42, examples of handwritten digits in the data frame mnist.dat. 2 The first column is the class label (digit), the remaining 784 columns are the pixel values. 3 Each class is approximately equally frequent. We derive 2 features: The amount of ink : sum pixel values of a digit. 2 Horizontal symmetry: subtract amount of ink in right half from amount of ink in left half of the image. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 3 / 53
31 Distribution of digits in the data Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 3 / 53
32 Feature: amount of ink digit amount of ink Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
33 Feature: horizontal symmetry digit horizontal symmetry Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
34 Scatter plot of the sample of zeroes and ones amount of ink horizontal symmetry Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
35 Fitting a logistic regression model # Fit a logistic regression model to the sample of zeroes and ones. > digits.logreg <- glm(digit ~ ink+horsym,data=mnist.df[index.s,], family="binomial") # Give some relevant information about the fitted model. > summary(digits.logreg) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-5 *** ink e-5 *** horsym * --- Signif. codes: "***". "**". "*".5 ".". " " Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
36 Logistic Regression Decision Boundary amount of ink horizontal symmetry Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
37 Prediction with a logistic regression model # Use the logistic regression model to make predictions on all zeroes and ones. # The result is a vector of probabilities of digit. > digits.logreg.pred <- predict(digits.logreg,newdata=mnist.df[index.test,],type="response") # Make a so-called "confusion matrix" of the true class against the predicted class. # We predict the class with the highest fitted probability. > digits.logreg.confmat <- table(as.numeric(digits.logreg.pred >.5), mnist.df[index.test,])[:2,:2] # Display the confusion matrix. > digits.logreg.confmat # Compute the percentage correctly classified. > sum(diag(digits.logreg.confmat))/sum(digits.logreg.confmat) [] Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
38 Assignment : Logistic Regression Go to: feeld/teaching.html, download the workspace, and load it into R. Open the script file on the webpage. Reproduce my analysis by copying the relevant lines from the script file, and entering them into R. 2 Perform a similar analysis, but now for digits 8 and 9. Make appropriate changes to the relevant commands in the script file. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
39 Crash course in classification trees Growing the tree. Split the data into two subsets using a test on a single predictor (for example ink > 4,). 2 Try all possible such tests, and choose the most informative one (biggest reduction of error on the training data). 3 Split the two resulting subsets in a similar manner. 4 Continue until some stopping condition is met (subset has become too small) 2 Pruning the tree: consider pruned subtrees of the tree grown, and pick the one with smallest cross-validated error. 3 Prediction: pass a new case down the tree, and predict the majority class of the leaf node where it ends up. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
40 Example: Loan Data Record age married? own house income gender class 22 no no 28, male bad 2 46 no yes 32, female bad 3 24 yes yes 24, male bad 4 25 no no 27, male bad 5 29 yes yes 32, female bad 6 45 yes yes 3, female good 7 63 yes yes 58, male good 8 36 yes no 52, male good 9 23 no yes 4, female good 5 yes yes 28, female good Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 4 / 53
41 Credit Scoring Tree 5 5 income > 36, income 36, bad rec# good 3 7,8, , age > 37 age 37 married 2 2,6, not married 4,3,4,5 2 6, 2 Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 4 / 53
42 Why not split on gender in top node? gender = male 5 5 gender = female bad rec# good 3 2,3,4,7, ,5,6,9, Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
43 Growing a classification tree # Load the necessary libraries (packages). > library(rpart) > library(rpart.plot) # Set the random seed for reproducibility. > set.seed(2345) # Grow a classification tree on the sample. > digits.rpart <- rpart(digit ~ ink+horsym,data=mnist.df[index.s,], cp=,minsplit=2,minbucket=) # Show the cost-complexity pruning results. > digits.rpart$cptable CP nsplit rel error xerror xstd Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
44 Pruning sequence size of tree X val Relative Error Inf cp Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
45 The Big Tree.5 % yes ink >= 2e+3 no.5 52% horsym >= % ink >= 9e+3.3 5% ink >= 25e+3.8 5% ink < 9e+3.7 3% ink >= 2e % ink < 23e+3 horsym >= %.89 4% horsym < % ink < 2e+3. 46%. 2%. %. %. %. %. %. %. 2%. 2%. 44% Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
46 Pruning the Big Tree Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
47 The Pruned Tree yes.5 % ink >= 2e+3 no.5 52% horsym >= %. %.98 48% Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
48 The Decision Boundary amount of ink horizontal symmetry Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
49 Assignment 2: Classification Trees Reproduce my analysis by copying the relevant lines from the script file, and entering them into R. 2 Perform a similar analysis, but now for digits 8 and 9. Make appropriate changes to the relevant commands in the script file. 3 In pruning, pick the subtree with lowest cross-validation error. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
50 Nearest neighbour classification Intuition: examples tend to have the same class as examples that are close by in feature space. 2 So to classify a new example, find the nearest training example(s) and predict their majority class. 3 Note that we don t actually learn a model, we just have to memorize (store) the training set for future reference. 4 Scale the variables to have mean and standard deviation. x i = x i x s x, i =,..., n Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 5 / 53
51 Nearest neighbour: example? Prediction for k=?, k=3?, k=9? Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, 27 5 / 53
52 The 3-NN Decision Boundary Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
53 The 3-NN Decision Boundary Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
54 Assignment 3: Nearest Neighbour Reproduce my analysis by copying the relevant lines from the script file, and entering them into R. 2 Perform a similar analysis, but now for digits 8 and 9. Make appropriate changes to the relevant commands in the script file. 3 Use cross-validation (knn.cv) on the training sample to estimate the accuracy of the knn classifier for different values of k. Ad Feelders ( Universiteit Utrecht ) Machine Learning May 2, / 53
Data Mining Classification Trees (2)
Data Mining Classification Trees (2) Ad Feelders Universiteit Utrecht September 14, 2017 Ad Feelders ( Universiteit Utrecht ) Data Mining September 14, 2017 1 / 46 Basic Tree Construction Algorithm Construct
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationData-analysis and Retrieval Ordinal Classification
Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationOn Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong
On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates
More informationLecture 7 Decision Tree Classifier
Machine Learning Dr.Ammar Mohammed Lecture 7 Decision Tree Classifier Decision Tree A decision tree is a simple classifier in the form of a hierarchical tree structure, which performs supervised classification
More informationPredictive Modeling: Classification. KSE 521 Topic 6 Mun Yi
Predictive Modeling: Classification Topic 6 Mun Yi Agenda Models and Induction Entropy and Information Gain Tree-Based Classifier Probability Estimation 2 Introduction Key concept of BI: Predictive modeling
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationC4.5 - pruning decision trees
C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationData Mining 2018 Logistic Regression Text Classification
Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50 Two types of approaches to classification In (probabilistic)
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationRegression and Classification Trees
Regression and Classification Trees 1 Regression Trees The basic idea behind regression trees is the following: Group the n subjects into a bunch of groups based solely on the explanatory variables. Prediction
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationOliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften
Statistisches Data Mining (StDM) Woche 11 Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften oliver.duerr@zhaw.ch Winterthur, 29 November 2016 1 Multitasking
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationLecture 2. Judging the Performance of Classifiers. Nitin R. Patel
Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationLearning Decision Trees
Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationDecision Tree And Random Forest
Decision Tree And Random Forest Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni. Koblenz-Landau, Germany) Spring 2019 Contact: mailto: Ammar@cu.edu.eg
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationData Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td
Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationStat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,
1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationMachine Leanring Theory and Applications: third lecture
Machine Leanring Theory and Applications: third lecture Arnak Dalalyan ENSAE PARISTECH 12 avril 2016 Framework and notation Framework and notation We observe (X i, Y i ) X Y, i = 1,..., n independent randomly
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationDecision T ree Tree Algorithm Week 4 1
Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Read pp. 105 117 of the text book. Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. Due date
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationMachine Learning and Deep Learning! Vincent Lepetit!
Machine Learning and Deep Learning!! Vincent Lepetit! 1! What is Machine Learning?! 2! Hand-Written Digit Recognition! 2 9 3! Hand-Written Digit Recognition! Formalization! 0 1 x = @ A Images are 28x28
More informationDecision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18
Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationy i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x
Question 1 Suppose that we have data Let x 1 n x i px 1, y 1 q,..., px n, y n q. ȳ 1 n y i s 2 X 1 n px i xq 2 Throughout this question, we assume that the simple linear model is correct. We also assume
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationClassification and Prediction
Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationAssignment 4. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14
Assignment 4 Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14 Exercise 1 (Rewriting the Fisher criterion for LDA, 2 points) criterion J(w) = w, m +
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationStatistical Consulting Topics Classification and Regression Trees (CART)
Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More information