B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015

Size: px
Start display at page:

Download "B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015"

Transcription

1 - Machine Learning - Homework Enrique Areyan April 8, 01 Problem 1: Give decision trees to represent the following oolean functions a) A b) A C c) Ā d) A C D e) A C D where Ā is a negation of A and is an exclusive OR operation. Solution: I will use the convention that a path from the left of a node for attribute A corresponds to that attribute being (A), and a path to the right corresponds to the attribute being (Ā). a) b) c) A Ā A Ā A Ā C C d) e) Note: read it like {A C D} A Ā C C C C A Ā C C D D D D C C D D D D

2 Problem : Consider the data set from the following table Sky Temperature Humidity Wind Water Forecast Enjoy Sport 1 Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes a) Using Enjoy Sport as the target, show the decision tree that would be learned if the splitting criterion was information gain b) Add the training example from the table below and compute the new decision tree. This time, show the value of the information gain for each candidate attribute at each step in growing the tree. Sky Temperature Humidity Wind Water Forecast Enjoy Sport 1 Sunny Warm Normal Weak Warm Same No Solution: I will use the notation from the Tom Mitchel s book to denote a set with positive and negative examples, e.g., S +, 1 in the first data set (there are positive examples corresponding to Enjoy Sport yes, and 1 negative example corresponding to Enjoy Sport no). a) Let S +, 1. Then, Entropy(S) log ( ) 1 log Clearly, if the splitting criterion is information gain we will split either by attribute Sky or Temperature since both of these partition the training set immediately. Nonetheless, let us check that the math works. We will use the following formula to compute information gain for attribute A: Gain(S, A) Entropy(S) i. V alues(sky) {Sunny, Rainy}. S Sunny +, 0 Entropy(S Sunny ) 0. S Rainy 0+, 1 Entropy(S Rainy ) 0. v V alues(a) Gain(S, Sky) Gain(S, Sky) S v S Entropy(S v) ii. V alues(t emperature) {W arm, Cold}. S W arm +, 0 Entropy(S W arm ) 0. S Cold 0+, 1 Entropy(S Cold ) 0. Gain(S, T emperature) Gain(S, T emperature) iii. V alues(humidity) {N ormal, High}. S Normal 1+, 0 Entropy(S Normal ) 0. S High +, 1 Entropy(S High ) log ( ) 1 log Gain(S, Humidity) Gain(S, Humidity) 0.16 iv. V alues(w ind) {Strong}. S Strong +, 1 Entropy(S Strong ) log Gain(S, W ind) Gain(S, W ind) 0 v. V alues(w ater) {W arm, Cool}. S W arm +, 1 Entropy(S W arm ) log S Cool 1+, 0 Entropy(S Cool ) 0. ( ) 1 log ( ) 1 log Gain(S, W ater) Gain(S, W ater)

3 vi. V alues(f orecast) {Same, Change}. S Same +, 0 Entropy(S Same ) 0. S Change 1+, 1 Entropy(S Change ) 1 log 1 log 1. Gain(S, F orecast) Gain(S, F orecast) 0.11 We have a tie between Sky and T emperature for the attributes with higher Information Gain. Choose one randomly, say T emperature to get the decision tree for the concept Enjoy Sport: W arm Temperature Cold Yes No b) Let us go through the same process as before but adding the new training example. Let S +,. Then, Entropy(S) ( ) log ( ) log i. V alues(sky) {Sunny, Rainy}. S Sunny +, 1 Entropy(S Sunny ) log S Rainy 0+, 1 Entropy(S Rainy ) 0. ( ) 1 log Gain(S, Sky) Gain(S, Sky) 0.0 ii. V alues(t emperature) {W arm, Cold}. S W arm +, 1 Entropy(S W arm ) log S Cold 0+, 1 Entropy(S Cold ) 0. ( ) 1 log Gain(S, T emperature) Gain(S, T emperature) 0.0 iii. V alues(humidity) {N ormal, High}. S Normal 1+, 1 Entropy(S Normal ) 1 log 1 log 1. S High +, 1 Entropy(S High ) ( ) log 1 log Gain(S, Humidity) Gain(S, Humidity) iv. V alues(w ind) {Strong, W eak}. S Strong +, 1 Entropy(S Strong ) log S W eak 0+, 1 Entropy(S Strong ) 0. ( ) 1 log Gain(S, W ind) Gain(S, W ind) 0.0 v. V alues(w ater) {W arm, Cool}. S W arm +, Entropy(S W arm ) log S Cool 1+, 0 Entropy(S Cool ) 0. ( ) log Gain(S, W ater) Gain(S, W ater) ( ) 1. vi. V alues(f orecast) {Same, Change}. S Same +, 1 Entropy(S Same ) ( ) log 1 log S Change 1+, 1 Entropy(S Change ) 1 log 1 log 1. Gain(S, F orecast) Gain(S, F orecast) 0.000

4 There is a three way tie between attributes Sky, Temperature and Wind. I am choosing Temperature first: Temperature W arm Cold +,1- No Now we have to repeat the process but considering the S to contain only points S {1,,, } In this case S +, 1. Then, Entropy(S) ( ) log 1 log i. V alues(sky) {Sunny, }. S Sunny +, 1 Entropy(S Sunny ) log Gain(S, Sky) Gain(S, Sky) 0 ii. V alues(humidity) {N ormal, High}. S Normal 1+, 1 Entropy(S Normal ) 1 log S High +, 0 Entropy(S High ) 0. ( ) 1 log 1 log 1. Gain(S, Humidity) Gain(S, Humidity) 0.11 iii. V alues(w ind) {Strong, W eak}. S Strong +, 0 Entropy(S Strong ) 0. S W eak 0+, 1 Entropy(S Strong ) 0. Gain(S, W ind) Gain(S, W ind) iv. V alues(w ater) {W arm, Cool}. S W arm +, 1 Entropy(S W arm ) log S Cool 1+, 0 Entropy(S Cool ) 0. ( ) 1 log Gain(S, W ater) Gain(S, W ater) 0.16 v. V alues(f orecast) {Same, Change}. S Same +, 1 Entropy(S Same ) log S Change 1+, 0 Entropy(S Change ) 0. ( ) 1 log Gain(S, F orecast) Gain(S, F orecast) 0.8 Attribute Wind has the highest information gain, so we separate by this attribute: W arm Temperature Cold Strong Wind W eak No Yes No All training examples are correctly classified. Hence, this is the final tree.

5 Problem : Consider a two-layer feed-forward neural network with two inputs x 1 and x, one hidden unit z, and one output unit f. This network has five weights (w 1z, w z, w 0z, w zf, w 0f ) where w 0z represents the bias for unit z and where w 0f represents the bias for unit f. Initialize these weights to the values (0.1, -0.1, 0.1, -0.1, 0.1), then give their values after each of the first two training iterations of the backpropagation algorithm. Assume learning rate η 0., momentum µ 0.9, incremental weight updates, and the following training examples {((x 1, x ) i, y i )} {((1, 0), 1), ((0, 1), 0)} Solution: Using the tool pybrain in python we can obtain the weights of two iterations of ackpropagation: # w 1z w z w 0z w zf w 0f Note that this neural network has only one hidden unit, so it will learn a linear concept analogous to logistic regression. It is interesting to see the lines generated by the hidden unit for the first two iterations: Problem : We start to see how the neural network adapts to the data, which in this case is linearly separable. The green line correspond to the second iteration and the blue to the first iteration. Consider minimizing the following regularized error function using the backpropagation algorithm E ij f ij ) j1(y + γ wi,j i,j Derive the gradient descent update rule for this error function. Show that it can be implemented by multiplying each weight by some constant before performing the standard gradient descent update as shown in class. Solution: Following the notation used in class, let us minimize the above regularized error function. Note: We will use the notation b for the weights from hidden unit Z u to output f v. We will use the notation a for the weights from input X u to hidden unit Z v. Now we optimize. First for b s and later for a s: E b b n b n (y ij f ij ) j1 m (y ij f ij ) + γ j1 i,j + γ wi,j b i,j w b i,j ( (yi1 f i1 ) + (y i f i ) + + (y im f im ) ) + γb b n (y iu f iu ) ϕ(b T u z i ) + γb (1) b

6 Choose ϕ(x) e x. Then ϕ (x) e x (1 + e x ) e x e x ϕ(x)(1 ϕ(x)) 1 + e x (1) n (y iu f iu )ϕ(b T u z i )(1 ϕ(b T u z i ))z iv + γb () Note that f iu ϕ(b T u z i ) which means that, 1 f iu (1 ϕ(b T u z i )) Thus, () n (y iu f iu )f iu (1 f iu )z iv + γb Let β iu (y iu f iu )f iu (1 f iu ) to conclude that n E β iu z iv + γb b The update rule for gradient descent is { n } b (t+1) b (t) η β iu z iv + γb (t) b (t) η n β iu z iv ηγb (t) (1 ηγ)b (t) η n β iu z iv Optimization for a s: E a a n a (y ij f ij ) j1 m (y ij f ij ) + γ j1 i,j j1 + γ wi,j a i,j w a i,j ( (y ij f ij ) ϕ(b T a j z i ) ) + γa () y definition b T j z i h b jl z il. Thus, l1 ( ϕ(b T a j z i ) ) ϕ (b T j z i) b T j a z i ϕ (b T j z i) a ( h ) b jl z il l1 ϕ (b T j z i)b ju a (z iu ) ϕ (b T j z i)b ju a ( ϕ(a T u x i ) ) ϕ (b T j z i)b ju ϕ (a T u x i )x iv ()

7 Replacing () into () and rearranging: E (y ij f ij )ϕ (b T j a z i)b ju ϕ (a T u x i )x iv + γa j1 { (yij f ij )ϕ (b T j z i) } b ju ϕ (a T u x i )x iv + γa j1 β ij b ju ϕ (a T u x i )x iv + γa () j1 Let α iu β ij b ju ϕ (a T u x i )x iv β ij b ju z iu (1 z iu )x iv, and replace in () to conclude that n E α iu x iv + γa a The update rule for gradient descent is { n } n a (t+1) a (t) η α iu x iv + γa (t) a (t) η α iu x iv The gradient descent update rule for this error function is: a (t+1) b (t+1) (1 ηγ)a (t) η (1 ηγ)b (t) η b : for every u {1,,..., m} and for every v {1,,..., h} a : for every u {1,,..., h} and for every v {1,,..., k} ηγa (t) (1 ηγ)a (t) η n α iu x iv n β iu z iv n α iu x iv This shows that the minimization of the regularized error function can be implemented by multiplying each weight by the constant (1 ηγ) before performing the standard gradient descent update.

8 Problem : Using the neural network code from class as a starting point, compare the following ensemble methods in a binary classification scenario: 0 bagged neural networks with 0 bagged regression trees. In each case, the final decision is made by averaging the raw outputs from the ensemble members. Use at least different classification data sets to provide comparisons. Select appropriate data sets from the UCI Machine Learning Repository. Proper comparisons should be made using 10-fold cross-validation experiments. Use your knowledge about model comparison to formally conclude which of the two algorithms is better. Normalize the variables on the training set, then normalize the test set using the normalization parameters from the training set (e.g. mean and standard deviation for each feature). If data sets have categorical variables, provide a proper approach to convert them into numerical variables. Use classification error to evaluate quality of classifiers. Solution: All the code for this problem was produced using MatLab. As usual, you can find the code attached to the OnCourse submission. I tested the following three data sets: 1. Spambase: Transfusion: Ionosphere: Results on each of the sets follow: Spambase Transfusion Ionosphere Run # NN Trees Run # NN Trees Run # NN Trees These results corresponds to runs of the corresponding bagged algorithm. Each bag consisted of 0 models, i.e., either 0 Neural networks or 0 Trees respectively. Neural Networks are feedforwad networks with 1 hidden layers of 10 neurons and trained using Scaled Conjugate Gradient. The models for the bags were selected by 10-fold cross-validation. Note that all the data was normalized before training the models. All of the datasets are binary and thus, the final decision was made by taking a majority vote of individual decisions from the 0 constituents of a single bag. In case of a tie, one class was chosen randomly and with

9 equal probability. For each run we have the classification error defined as: E #examples incorrectly classified #total data set For example, for the Spambase, run 1, the bagged Neural Network (NN) misclassified (about 8.% of examples) whereas the bag of Trees misclassified (about 1.% of examples). Just inspecting the data one can conclude that Trees outperformed NN in every case. For mathematical sound evidence, let us perform two different statistical tests: 1. Let us hypothesize that NN perform the same as Trees, i.e., H 0 : T NN vs. H 1 : T > NN Suppose that H 0 is. In this case the probability that Trees won times out of is given by: p i ( i ) ( 1 ) i ( ) i 1 ( ) ( 1 ) ( ) 1 ( ) Clearly, this is a small enough probability at any reasonable level. Conclude that it is not the case that T NN. Note also that this test apply to all data sets since in all of these Trees won every time.. The next test is a T-test for the means of two independent samples. The samples in this case are the results for NN and trees. For this I used the following function provided by the Python package SciPy: (ttest, pvalue) ttest_ind(data_nni, data_treesi) Where data_nni is a vector of numbers with the results for bagged neural networks. Likewise, data_treesi is a vector of numbers with the bagged results for trees. According to the function documentation (see.), this function: "Calculates the T-test for the means of TWO INDEPENDENT samples of scores. This is a two-sided test for the null hypothesis that independent samples have identical average (expected) values. This test assumes that the populations have identical variances... We can use this test, if we observe two independent samples from the same or different population, e.g. exam scores of boys and girls or of two ethnic groups. The test measures whether the average (expected) value differs significantly across samples. If we observe a large p-value, for example larger than 0.0 or 0.1, then we cannot reject the null hypothesis of identical average scores. If the p-value is smaller than the threshold, e.g. 1%, % or 10%, then we reject the null hypothesis of equal averages." The results of the test for each data set are: dataset ttest pvalue spambase transfusion ionosphere Clearly and in all cases we reject the hypothesis that the mean of the classification error of the algorithms are the same. Together tests (1) and () provide strong evidence that bagged Trees outperform bagged Neural Networks for all datasets tested. This may be due to the fact that these are small datasets where Trees have a greater chance of fitting the data. An alternative reason is that we would need more than 10 neurons to obtain better results for neural networks.

10 References 1. Tom Mitchel s book chapter on decision trees. UC Irvine Machine Learning Repository:

Lecture 2: Foundations of Concept Learning

Lecture 2: Foundations of Concept Learning Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Answers Machine Learning Exercises 2

Answers Machine Learning Exercises 2 nswers Machine Learning Exercises 2 Tim van Erven October 7, 2007 Exercises. Consider the List-Then-Eliminate algorithm for the EnjoySport example with hypothesis space H = {?,?,?,?,?,?, Sunny,?,?,?,?,?,

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system Introduction to machine learning Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2). Introduction to machine learning When appropriate

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Concept Learning through General-to-Specific Ordering

Concept Learning through General-to-Specific Ordering 0. Concept Learning through General-to-Specific Ordering Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 2 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Version Spaces.

Version Spaces. . Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

Concept Learning. Space of Versions of Concepts Learned

Concept Learning. Space of Versions of Concepts Learned Concept Learning Space of Versions of Concepts Learned 1 A Concept Learning Task Target concept: Days on which Aldo enjoys his favorite water sport Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees Part 1. Rao Vemuri University of California, Davis Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression

More information

The Solution to Assignment 6

The Solution to Assignment 6 The Solution to Assignment 6 Problem 1: Use the 2-fold cross-validation to evaluate the Decision Tree Model for trees up to 2 levels deep (that is, the maximum path length from the root to the leaves is

More information

Decision Trees. Common applications: Health diagnosis systems Bank credit analysis

Decision Trees. Common applications: Health diagnosis systems Bank credit analysis Decision Trees Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br Decision

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Concept Learning.

Concept Learning. . Machine Learning Concept Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

epochs epochs

epochs epochs Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Machine Learning

Machine Learning Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

The Perceptron Algorithm 1

The Perceptron Algorithm 1 CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Decision Tree Learning

Decision Tree Learning 0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:

More information

Voting (Ensemble Methods)

Voting (Ensemble Methods) 1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning Lesson 39 Neural Networks - III 12.4.4 Multi-Layer Perceptrons In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Artificial Neural Networks

Artificial Neural Networks 0 Artificial Neural Networks Based on Machine Learning, T Mitchell, McGRAW Hill, 1997, ch 4 Acknowledgement: The present slides are an adaptation of slides drawn by T Mitchell PLAN 1 Introduction Connectionist

More information

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011 Decision Support Systems MEIC - Alameda 2010/2011 Homework #8 Due date: 5.Dec.2011 1 Rule Learning 1. Consider once again the decision-tree you computed in Question 1c of Homework #7, used to determine

More information

Decision Trees. Danushka Bollegala

Decision Trees. Danushka Bollegala Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains

More information

Learning Neural Networks

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination. Outline [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specic ordering over hypotheses Version spaces and candidate elimination algorithm Picking new examples

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Learning and Neural Networks

Learning and Neural Networks Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information