Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011
|
|
- Edmund Morgan
- 5 years ago
- Views:
Transcription
1 Decision Support Systems MEIC - Alameda 2010/2011 Homework #8 Due date: 5.Dec Rule Learning 1. Consider once again the decision-tree you computed in Question 1c of Homework #7, used to determine the political affiliation of several Deputies of the Portuguese Parliament from their voting tendencies. For your commodity, we reproduce in Table 1 the dataset you used to construct the decision-tree. Table 1: Data-set D with Portuguese Parliament vote samples. Dep. ID N. Tax Inc. (V 1) Labor Reg. (V 2) Ed. Bud. (V 3) For. Pol. (V 4) Affil. ID013 Yes No No Unk Right ID030 Yes No Yes Unk Right ID050 No Yes No Unk Right ID063 No Unk Yes Unk Right ID070 Yes Yes Yes Yes Right ID072 Yes Unk Yes No Right ID102 Yes No Yes No Right ID112 Yes Yes No Yes Right ID130 No Yes Yes No Right ID165 Yes Unk No Unk Left ID177 No Unk No Yes Left ID217 Yes Unk No Yes Left ID221 No No No Unk Left ID229 No No Yes No Left
2 Homework 8 Decision Support Systems Page 2 of 9 (a) ( 1 / 2 val.) From the decision-tree you computed, write down all IF-THEN rules necessary to build an equivalent rule-based classifier. Note: Make sure to use the decision-tree from the official solution to HW7 provided by the faculty. Solutions based on different decision-trees will not be considered. Recall that the decision-tree from the previous homework is V 2 No Yes Unk V 1 Right V 3 No Yes No Yes Left Right Left Right From this tree, we can derive the following rule-based classifier: 1. IF V 2 = Yes THEN C = Right 2. IF (V 2 = No V 1 = No ) THEN C = Left 3. IF (V 2 = No V 1 = Yes ) THEN C = Right 4. IF (V 2 = Unk V 3 = No ) THEN C = Left 5. IF (V 2 = Unk V 3 = Yes ) THEN C = Right (b) ( 1 / 2 val.) Compute the coverage and accuracy of these rules for the dataset D in Table 1. The coverage corresponds to the number of tuples that verify the conditions in the rule. The accuracy represents the percentage of correctly classified tuples. Since the tree is able to correctly classify all instances in the dataset D, this means that all previously derived rules have an accuracy of 1.0. As for the coverage, we have: Rule 1 Coverage: 4/14 = Rule 2 Coverage: 2/14 = Rule 3 Coverage: 3/14 = Rule 4 Coverage: 3/14 = Rule 5 Coverage: 2/14 =
3 Homework 8 Decision Support Systems Page 3 of 9 2 Neural networks 2. (3 val.) Derive a gradient descent training rule for a network with a single unit with p inputs and output given by ŷ(x) = w 0 + w 1 x 1 + w 1 x w p x p + w p x 2 p, where x = [x 1,..., x p ]. Consider that the error in a dataset D = {(x n, y n ), n = 1,..., N} is given by E(w) = N (ŷ(x n ) y n ) 2. n=1 The gradient descent update rule can be obtained by deriving the error with respect to the network parameters. In our case, we can represent the network output as ŷ(x) = p w i φ i (x), where each φ i is a state-dependent feature. In particular, we have that { 1 if i = 0 φ i (x) = x i + x 2 i otherwise. Deriving the error with respect to the general weight w i yields: E(w) w i = 2 = 2 i=0 N (ŷ(x n ) y n ) ŷ(x) w i n=1 N (ŷ(x n ) y n )φ i (x). n=1 Finally, we get the update rule where each w i is given by w i w i + w i, w i = 2 = N (y n ŷ(x n ))φ i (x) n=1 { 2 N n=1 (y n ŷ(x n )) if i = 0 2 N n=1 (y n ŷ(x n ))(x i + x 2 i ) otherwise. 3. (3 val.) Consider the two-layer neural network depicted in Fig. 1. Initialize the weight vector w = [w 0c, w ac, w bc, w 0d, w cd ] to [0.1, 0.1, 0.1, 0.1, 0.1] and indicate the values of the weight vector after the two initial iterations of back-propagation. Assume that the activation function of both units c and d is the logistic sigmoid function, given by σ(x) = exp( x).
4 Homework 8 Decision Support Systems Page 4 of 9 x 0 w 0c z 0 w 0d x a wac z c w cd z d y x b w bc Figure 1: Two-layer neural network with two inputs, x 1 and x 2, and output ŷ = z d. The nodes x 0 and z 0 correspond to the bias. In your computations use η = 0.3 and the dataset D = {([1, 0], 1), ([0, 1], 0)}, where each point in the dataset is of the form (x, y), with x = [x a, x b ]. Each iteration of back-propagation consists in processing one data-point in the network. We begin the first iteration with the data-point x 1 = [1, 0], for which the intended output is y 1 = 1. The initial stage of back-propagation consists in computing the activation and output of each unit in the network. This yields: a c = w 0c + w ac x a + w bc w b = = 0.2 z c = σ(0.2) = 0.55 a d = w 0d + w cd z c = = 0.15 z d = σ(0.15) = We now compute the δ j, propagating it back through the network: and we get the updated weights δ d = z d (1 z d )(z d y 1 ) = 0.54 (1 0.54) (0.54 1) = 0.12 δ c = z c (1 z c )w cd δ d = 0.55 (1 0.55) = 0.00 w 0c w 0c ηδ c x 0 = = 0.10 w ac w ac ηδ c x a = = 0.10 w bc w bc ηδ c x b = = 0.10 w 0d w 0d ηδ d z 0 = = 0.13 w cd w cd ηδ d z c = = 0.12 In the second iteration, we use the data-point x 1 = [0, 1], for which the intended output is y 1 = 0. Again, the initial stage of back-propagation consists in computing the activation and output of each unit in the network. This yields: a c = = 0.2 z c = 0.55 a d = = z d = We now compute the δ j, propagating it back through the network: δ d = 0.55 (1 0.55) (0.55 0) = 0.14 δ c = 0.55 (1 0.55) = 0.00
5 Homework 8 Decision Support Systems Page 5 of 9 and we get the updated weights w 0c = 0.10 w ac = 0.10 w bc = 0.10 w 0d = 0.09 w cd = (3 val.) Consider the two-layer feed-forward neural network in Fig. 2. The output of the network is z 0 w 01 x 0 z 1 w 0 x 1 y x p w pm z M w 0 Figure 2: Two-layer neural network with p inputs, x 1 through x p, and one output. The nodes x 0 and z 0 correspond to the bias. Double indexed weights connect the inputs to the first layer, while single indexed weights connect the first to the output layer. given by ( M p ) ŷ(x, w) = σ w j h w ij x i + w 0j + w 0, (1) j=1 i=1 where h is the activation function for the units in the first layer and σ is the logistic sigmoid function σ(x) = exp( x). Suppose that the activation function h is also the logistic sigmoid function. Show that there exists an equivalent network that computes exactly the same function, but where the activation function for the units in the first layer is h(x) = tahn(x). Recall that the tanh function is defined as tanh(x) = exp(x) exp( x) exp(x) + exp( x). Suggestion: First find the relation between σ(x) and tanh(x), and then show that the parameters in the two networks differ by linear transformations.
6 Homework 8 Decision Support Systems Page 6 of 9 We start by noting that Inverting the above relation, we get exp(x) exp( x) tanh(x) = exp(x) + exp( x) = 1 exp( 2x) 1 + exp( 2x) 1 = 1 + exp( 2x) 1 + exp( 2x) exp( 2x) = σ(2x) ( 1 σ(2x) ) j=1 = 2σ(2x) 1. σ(x) = tanh( x 2 ) Replacing this in the expression for the output of the network, we get ( ) M ŷ(x, w) = σ w j 2 tanh 1 p w ij x i + w M 0j w j w 0, i=1 j=1 or, equivalently, with ( M p ) ŷ(x, w) = σ w jtanh w ijx i + w 0j + w 0, j=1 i=1 w 0 = w 0 + w j = w j 2 w ij = w ij 2. M j=1 w j 2 ; for j 0; This output expression is in the same form as (1), indicating that the network obtained (i) has the same topology as the original network; (ii) the activation function h is tanh, as required; and (iii) the two networks have the same output. 2.1 Practical Questions (Using SQL Server 2008) 5. Consider the 3 models you deployed in Lab 8 and analyzed in Homework #7. You will now compare these models with Microsoft Neural Networks. To this purpose, depart from the deployment of the three mining models you analyzed in Homework #7 and add a fourth model, corresponding to Microsoft Neural Networks. Process all models. (a) (3 val.) For Microsoft Neural Networks, provide a snapshot of the variables pane of the Neural Network viewer, showing the attributes that favor each of the two possible values for the output Bike Buyer. Compare these results with those obtained for the other methods in Homework #7.
7 Homework 8 Decision Support Systems Page 7 of 9 We depict below variable pane for MS Neural Network on the attribute Bike Buyer. In contrast with the methods analyzed in HW7, MS Neural Networks indicates the region as a primary attribute to identify bike buyers. 1 Other important attributes include the commute distance, the number of children and the number of cars owned. These results are in accordance with those determined by MS Clustering (as seen in HW7), although the order of the factors is different. It is also important to note that, although not appearing as the most significant attribute, the number of cars owned is also considered as an important factor to identify bike buyers, as outlined in both panes portrayed. (b) (2 val.) Provide the lift chart comparing the performance of the four methods. Repeat the analysis in Question 4b of Homework #7, comparing the performance of Microsoft Neural Networks with that of the other methods and indicating which one performs best. The lift chart comparing the performance of the different methods is 1 In the results above, we have ignored the attribute Geography Key in the training of the classifier.
8 Homework 8 Decision Support Systems Page 8 of 9 Pergunta 5.b) As seen in HW7, MS Pergunta Decision 5.c) Trees exhibits better predictive performance, while MS Naive Bayes and MS Clustering Predicted exhibit 0 (Actual) similar performance. 1 (Actual) Comparing now with MS Neural Networks, MS Decision Trees remains the method with best predictive performance. MS Neural Networks, while slightly outperforming MS Naive Bayes and MS Clustering, is very similar to the other two methods and still falls behind MS Decision Trees. This may indicate that the MS Decision Trees (and, to a lesser extent, MS Neural Networks) are richer classification models and, as such, are able to better capture Bike Buyer classification. (c) (2 val.) Provide the confusion matrix for the neural network model. Compare the performance of this model with that of the other methods in Homework #7 in terms of confusion matrix. Compare also the results in terms of confusion matrix with those obtained in the lift charts. Note: You don t need to include the results from Homework #7. The confusion matrix for MS Neural networks is Pos. Label Neg. Label Positive Negative These results are in accordance with those observed from the lift chart. The confusion matrix indicates again indicates that MS Decision Trees exhibits a better performance, above MS Neural Networks, MS Naive Bayes and MS Clustering. MS Neural Networks, while slightly outperforming MS Naive Bayes and MS Clustering, exhibits a very similar performance, in general. This comparison can be made more explicit by directly comparing the accuracy of all four methods: Method Accuracy MS Decision Trees 71.9% MS Neural Networks 64.7% MS Naive Bayes 63.6% MS Clustering 62.1% (d) (3 val.) Perform cross-validation with Fold Count 3, 5 and 10 for the neural network model. Set Max Cases = 1, 000. Make sure that the Target attribute is Bike Buyer and that Target State is blank. Indicate the average and standard deviation for the Pass measure, corresponding to the
9 Homework 8 Decision Support Systems Page 9 of 9 number of correct labels obtained. Perform a comparative analysis of the results obtained with Microsoft Neural Networks and all other methods in Homework #7 (you don t have to repeat the results obtained with the other methods). Finally, we performed 3-, 5- and 10-fold cross validation on MS Neural Networks, obtaining the following results: Method N-fold MS Neural Nets. 3-Fold (50.4%) Av. Acc. (%) 5-Fold (53.6%) 10-Fold (50.7%) 3-Fold Std. Dev. 5-Fold Fold It is interesting to note the general tendency observed in these results. In general, MS Neural Networks and MS Decision Trees exhibit a significantly worse performance than that observed when analyzing the confusion matrix and the lift charts. In fact, the accuracy of MS Decision Trees goes from a value of around 70%, as seen in Question 5(c), to a value between 50% and 55%. Similarly, MS Neural Networks go from around 65% accuracy to a value between 50% and 55%. On the other hand, the accuracy of both MS Naive Bayes and MS Clustering exhibits only a minor decrease, to a value around 60%. To understand these results, we note that cross-validation is conducted with a dataset of 1, 000 data-points (corresponding to the parameter Max Cases), which is then further divided for testing and training. The accuracy results observed are thus obtained with a significantly smaller amount of data. Therefore, it is not surprising that all methods show some decrease in performance, since they are trained with significantly less data. The fact that MS Clustering and MS Naive Bayes seem to be less sensitive to the smaller amount of data may suggest that these are simpler models that require less data for training. On the other hand, one may also venture that the worse performance of MS Decision Trees and MS Neural Networks may be due to overfitting due to the small dataset used for training. We note, however, that our results do not provide sufficient information to state this conclusively.
Stochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMidterm: CS 6375 Spring 2018
Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationCIS519: Applied Machine Learning Fall Homework 5. Due: December 10 th, 2018, 11:59 PM
CIS59: Applied Machine Learning Fall 208 Homework 5 Handed Out: December 5 th, 208 Due: December 0 th, 208, :59 PM Feel free to talk to other members of the class in doing the homework. I am more concerned
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann
ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationRevision: Neural Network
Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationAE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)
1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationClassification. Sandro Cumani. Politecnico di Torino
Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNeural Networks (and Gradient Ascent Again)
Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear
More informationGradient Descent Training Rule: The Details
Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationFA Homework 2 Recitation 2
FA17 10-701 Homework 2 Recitation 2 Logan Brooks,Matthew Oresky,Guoquan Zhao October 2, 2017 Logan Brooks,Matthew Oresky,Guoquan Zhao FA17 10-701 Homework 2 Recitation 2 October 2, 2017 1 / 15 Outline
More informationNeural Networks DWML, /25
DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationB555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015
- Machine Learning - Homework Enrique Areyan April 8, 01 Problem 1: Give decision trees to represent the following oolean functions a) A b) A C c) Ā d) A C D e) A C D where Ā is a negation of A and is
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationSource localization in an ocean waveguide using supervised machine learning
Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego Part I Localization on Noise09
More informationNN V: The generalized delta learning rule
NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More information6.036: Midterm, Spring Solutions
6.036: Midterm, Spring 2018 Solutions This is a closed book exam. Calculators not permitted. The problems are not necessarily in any order of difficulty. Record all your answers in the places provided.
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationFINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE
FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.
More information