Learning from Examples
|
|
- Leslie Blake
- 6 years ago
- Views:
Transcription
1 Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble learning and boosting
2 Data Fitting f(x) f(x) f(x) f(x) x x x x (a) (b) (c) (d) Accuracy Simplicity Hypothesis space size Hypothesis space expressive power Accuracy of best member versus complexity of finding it
3 Decision Trees Patrons? None Some Full No Yes WaitEstimate? > No Alternate? Hungry? No Yes No Yes Yes Reservation? Fri/Sat? Yes Alternate? No Yes No Yes No Yes Bar? Yes No Yes Yes Raining? No Yes No Yes No Yes No Yes Figure 8.2 FILES: figures/restaurant-tree.eps (Tue Nov 3 6:23: ). A decision tree for deciding whether to wait for a table.
4 Construction Algorithm Input: examples, attributes.. If examples is empty, return the plurality parent label. 2. If every example has the same label, return that label. 3. If attributes is empty, return the plurality example label. 4. Pick an attribute, partition the examples, and recurse.
5 Picking an Attribute Type? Patrons? French Italian Thai Burger None Some Full No Yes Hungry? No Yes 4 2 (a) (b) Figure 8.4 FILES: figures/restaurant-stub.eps (Tue Nov 3 6:23: ). Splitting the examples by testing on attributes. At each node we show the positive (light boxes) and negative (dark boxes) examples remaining. (a) Splitting on Type brings us no nearer to distinguishing between positive and negative examples. (b) Splitting on Patrons does a good job of separating positive and negative examples. After splitting on Patrons, Hungry is a fairly good second test.
6 Output Decision Tree Patrons? None Some Full No Yes Hungry? No Yes No Type? French Italian Thai Burger Yes No Fri/Sat? Yes No Yes No Yes Figure 8.6 FILES: figures/induced-restaurant-tree.eps (Tue Nov 3 6:23: ). The decision tree induced from the 2-example training set.
7 Impurity Impurity is a heuristic for decision tree construction. The impurity of p positive and n negative instances is p p + n n p + n = pn (p + n) 2 The impurity is unimodal with minima of 0 at p = 0 and n = 0 and a maximum of /4 at p = n. Average impurity after test with k subsets with p i and n i k p i n i k (p i + n i ) (p i + n i ) 2 = p i n i p i + n i i= Pick the test that minimizes this value. Identical tree for restaurant example. i=
8 Learning Curve Proportion correct on test set Training set size Figure 8.7 FILES:. A learning curve for the decision tree learning algorithm on 00 randomly generated examples in the restaurant domain. Each data point is the average of 20 trials.
9 Cross Validation Split data into k equal subsets Perform k learning rounds Round k reserves one subset for testing Average the results k = 0 is common k = n (singleton sets) is the ultimate Construct classifier from all the data
10 Model Complexity Versus Quality Validation Set Error Training Set Error 40 Error rate Tree size
11 Computational Learning Theory I We will consider Boolean functions of Boolean attributes. Assumption: training and test data are independent samples from a fixed distribution. The error of a hypthesis is the probability that it is wrong on a random sample from this distribution. A hypothesis is approximately correct if its error is less than ɛ. A hypothesis is probably approximately correct (PAC) if it is approximately correct with probability δ. The parameters ɛ and δ must be between 0 and but are otherwise arbitrary. Goal: compute a PAC hypothesis from a reasonable number of samples with reasonable computational complexity. Idea: a bad (not approximately correct) hypothesis will usually fail quickly. Pick a hypothesis space H with H members.
12 Computational Learning Theory II Probability that a bad h is right on a sample ɛ. Probability that it is right on n samples ( ɛ) n. Probability for any bad h in H is H ( ɛ) n. We want this to be less than δ: H ( ɛ) n δ. Fun fact: ɛ e ɛ. Take logs and rearrange: n ɛ (log H + log δ ) This n is called the sample complexity of H. Any hypothesis that is consistent with n samples is PAC!
13 PAC Learning The sample complexity limits the choice of H. The sample complexity of decision trees is exponential in the number of attributes. A decision tree on m Boolean attributes is equivalent to a propositional logic formula in disjunctive normal form. Every formula is expressible in disjunctive normal form. The truth table of a formula has 2 m rows. Each of the 2 2 m subsets can be the true rows. We consider a smaller sample space next. But something is wrong with computational learning theory because decision trees work well in practice! Fishy assumptions: no prior knowledge, distribution independent, independent of structure of H.
14 Decision Lists Patrons(x, Some) Yes No Patrons(x, Full) ^ Yes Fri/Sat(x) No No Yes Yes log H = O(m k ) for m attributes and k conjuncts per test. ( ) 2m choices of i literals in m attributes. i Conjuncts can have i = 0,,..., k literals, altogether: O(m k ) A conjunct can classify yes, classify no, or be absent: 3 m k Conjuncts can be in any order: 3 mk m k! Stirling approximation yields bound. Greedy algorithms give good results. Example: pick smallest conjunction that matches some instances.
15 Decision Lists Versus Decision Trees Proportion correct on test set Decision tree Decision list Training set size
16 Least-Squares Fitting Fit a line to points in the plane. Line is h w (x) = w 0 + w x with unknown w 0, w. Training data is points (x, y ),..., (x n, y n ). Minimize distance (y,..., y n ) (h w (x ),..., h w (x n )). Square and take partials with respect to w 0 and w. w 0 w n (y i w 0 w x i ) 2 = 0 i= n (y i w 0 w x i ) 2 = 0 i= Obtain two linear equations in w 0 and w. General case: fit linear combination of basis functions to data. Example: w 0 + w x + w 2 sin x + w 3 cos x.
17 Linear Classifier (Perceptron) Instances are feature vectors x = (x, x 2 ). Find a line that separates the classes. The points are linearly separable if such a line exists. Approximate separation is useful for non-separable data. General case x = (, x,..., x n ) Linear function w x = w0 + w x + + w n x n classes y = 0 and y = Classifier h w (x) returns if w x > 0 and 0 otherwise
18 Linearly Separable Data x x x x separable not separable Earthquake versus explosion given body and surface waves. Larger dataset is more accurate, but is not linearly separable
19 Perceptron Learning Proportion correct Number of weight updates Proportion correct Number of weight updates Proportion correct Number of weight updates separable not separable decaying α Error on training data (x j, y j ) is e w = j (y j h w (x j )) 2. Update w i w i + α(y j h w (x j ))x j,i (like gradient descent). Fixed α > 0 converges on linearly separable data. Decreasing α O(/t) in iteration t usually converges. Convergence is uneven and can be slow.
20 Threshold Functions hard soft halfwave Performance greatly improved with soft threshold g(z) = + e z Classify based on h w (x) = g(w x) > 0.5. Recent neural networks use the halfwave rectifier.
21 Learning with Soft Threshold Squared error per example Number of weight updates Squared error per example Number of weight updates Squared error per example Number of weight updates separable not separable decaying α Compute w that minimizes e w = (y g(w x)) 2. Gradient descent on f (x): iterate x x αf (x). Multivariate version for e w g (z) = = e z ( + e z ) 2 = e z + ( + e z ) 2 + e z = g(z)( g(z)) ( + e z 2 ) w i w i α(y g(w x))g(w x)( g(w x))x i Converges fast and smoothly even on non-separable data.
22 Neural Networks Bias Weight a 0 = a j = g(in j ) wi,j a i w 0,j Σ in j g a j Input Links Input Function Activation Function Output Output Links Perceptrons are of limited use because linear separation is rare. The natural next step is a network of perceptrons. Each perceptron is analogous to a neuron. The network is called a neural network.
23 Feed-forward Networks w,3 3 w,3 3 w 3,5 5 w,4 w,4 w 3,6 2 w 2,3 w 2,4 4 2 w 2,3 w 2,4 4 w 4,5 w 4,6 6 (a) (b) A feed-forward network is a directed graph of perceptrons. It is organized into input, hidden, and output layers. It is trained by gradient descent, called back propagation.
24 What Feed-Forward Networks Can Learn x x x? 0 0 x x x 2 (a) x and x 2 (b) x or x2 (c) x xor x2 No hidden layers: linearly separable functions. One hidden layer: continuous functions. Two hidden layers: discontinuous functions.
25 Deep Learning The term deep learning refers primarily to neural networks with multiple hidden layers. The internal layers are meant to learn a hierarchy of domain features without human help. Deep learning is today s hottest machine learning technique. The basic ideas are 35 years old, e.g. back propagation. Increased computing power and data storage enable larger networks and training sets. There are some improvements in network organization, notably convolutional networks, and in training algorithms, notably stochastic gradient descent, half-wave rectifier threshold function, and dropout.
26 Nonparameteric Methods A neural network learns a fixed set of parameters. Too many/few parameters cause over/under fitting. The user must pick a network that avoids these problems. Update: deep learning questions this claim. Non parametric methods pick the number of parameters based on the training data. They are more flexible, but use more time and space.
27 k Nearest Neighbors x x x 2 x 2 k = k = 5 Store all the training data. Classify based on the majority vote of k nearest neighbors. Metric: Euclidean, Manhattan, Hamming, normalization. Degrades with dimension.
28 Nonparametric Regression (Curve Fitting) linear 3-nearest average nearest linear regression locally weighted regression
29 Locally Weighted Regression kernel of width regression Weight the error in sample (x i, y i ) by a function of δ = x x i. Function has a maximum of at δ = 0 and decreases to zero monotonically and symmetrically. Quadratic kernel function with width u: k(δ) = max(0, (2δ/u) 2 ). Compute w that minimizes i k(x x i)(y i w x i ) 2. Predict y = w x.
30 Support Vector Machines Make training data linearly separable by defining extra features as polynomials in given features. Use optimal linear classifier. Use kernel functions for fast training and classification. All the rage 5 0 years ago. Deep learning is hotter now.
31 Maximum Margin Separator Linear separators might misclassify nearby test data. Support vectors: points closest to linear separator. Maximum margin separator: furthest from support vectors.
32 Computing the Maximum Margin Separator h(x) is a function of the support vectors: ( ) h(x) = sign α i y i (x x i ) b There are usually (but not always) few support vectors. Compute α i and b via quadratic programming. Algorithm and result use x x i. i
33 Defining Features for Linear Separability.5 2x x 2 x x x x Circular separator in 2D x 2 + x 2 2 =. Linear separator in 3D (could have used 2D) u + u 2 = with u = x 2, u 2 = x 2 2, u 3 = 2x x 2.
34 Kernel Trick Replace feature vector x with feature vector F (x). Circle example: F (x) = (x 2, x 2 2, 2x x 2 ). Training and classification use F (a) F (b) instead of a b. Pick F (x) such that F (a) F (b) = K(a, b). K is called a kernel function. Circle example: K(a, b) = (a b) 2. (a b) 2 = (a b + a 2 b 2 ) 2 = a 2 b 2 + 2a a 2 b b 2 + a 2 2b 2 2 F (a) F (b) = (a 2, a 2 2, 2a a 2 ) (b 2, b 2 2, 2b b 2 ) = a 2 b 2 + 2a a 2 b b 2 + a 2 2b 2 2 An explicit definition of F (x) is unnecessary.
35 Ensemble Learning Generate multiple hypotheses and use the majority vote. Reduces error to the extent hypotheses are independent. Expands hypothesis space, e.g. triangles versus lines.
36 Boosted Learning Learning algorithm for samples weighted by importance u j. Neural network with u j weights: e w = j uj (y j h w (x j )) 2. Decision tree: make u j copies of (x j, y j ). Construct hypothesis h with all weights equal to. Assign h the sum of the weights of its correct answers. Increase/decrease the weights of the samples that h got wrong/right. Construct hypotheses h 2,..., h k. Classify based on the k answers weighted by their hypotheses.
37 Boosted Learning of Decision Trees h = h 2 = h 3 = h 4 = h
38 Restaurant Data Proportion correct on test set Boosted decision stumps Decision stump Training set size Training/test accuracy Training error Test error Number of hypotheses K
39 Character Recognition
40 Learning Algorithm versus Dataset Size Proportion correct on test set Training set size (millions of words)
Learning and Neural Networks
Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a
More informationLearning Decision Trees
Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationArtificial Intelligence Roman Barták
Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationCS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationLearning from Observations. Chapter 18, Sections 1 3 1
Learning from Observations Chapter 18, Sections 1 3 Chapter 18, Sections 1 3 1 Outline Learning agents Inductive learning Decision tree learning Measuring learning performance Chapter 18, Sections 1 3
More informationSupervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Decision Trees Mausam (based on slides by UW-AI faculty) Decision Trees To play or not to play? http://www.sfgate.com/blogs/images/sfgate/sgreen/2007/09/05/2240773250x321.jpg
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationIntroduction To Artificial Neural Networks
Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised
More information1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )
Logic FOL Syntax FOL Rules (Copi) 1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, ) Dealing with Time Translate into first-order
More informationCS 380: ARTIFICIAL INTELLIGENCE
CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING 11/11/2013 Santiago Ontañón santi@cs.drexel.edu https://www.cs.drexel.edu/~santi/teaching/2013/cs380/intro.html Summary so far: Rational Agents Problem
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationDecision Trees. CS 341 Lectures 8/9 Dan Sheldon
Decision rees CS 341 Lectures 8/9 Dan Sheldon Review: Linear Methods Y! So far, we ve looked at linear methods! Linear regression! Fit a line/plane/hyperplane X 2 X 1! Logistic regression! Decision boundary
More informationIncremental Stochastic Gradient Descent
Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w - η E D [w] over the entire data D E D [w]=1/2σ d (t d -o d ) 2 Incremental mode: gradient descent w=w - η E d [w] over individual
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDecision Trees. None Some Full > No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. No Yes. Patrons? WaitEstimate? Hungry? Alternate?
Decision rees Decision trees is one of the simplest methods for supervised learning. It can be applied to both regression & classification. Example: A decision tree for deciding whether to wait for a place
More informationIntroduction to Machine Learning
Introduction to Machine Learning Reading for today: R&N 18.1-18.4 Next lecture: R&N 18.6-18.12, 20.1-20.3.2 Outline The importance of a good representation Different types of learning problems Different
More informationFrom inductive inference to machine learning
From inductive inference to machine learning ADAPTED FROM AIMA SLIDES Russel&Norvig:Artificial Intelligence: a modern approach AIMA: Inductive inference AIMA: Inductive inference 1 Outline Bayesian inferences
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationName (NetID): (1 Point)
CS446: Machine Learning Fall 2016 October 25 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains four
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationSections 18.6 and 18.7 Analysis of Artificial Neural Networks
Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationBayesian learning Probably Approximately Correct Learning
Bayesian learning Probably Approximately Correct Learning Peter Antal antal@mit.bme.hu A.I. December 1, 2017 1 Learning paradigms Bayesian learning Falsification hypothesis testing approach Probably Approximately
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationArtificial neural networks
Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationCS:4420 Artificial Intelligence
CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationClassification Algorithms
Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationLinear classification with logistic regression
Section 8.6. Regression and Classification with Linear Models 725 Proportion correct.9.7 Proportion correct.9.7 2 3 4 5 6 7 2 4 6 8 2 4 6 8 Number of weight updates Number of weight updates Number of weight
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationRegression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationCSC242: Intro to AI. Lecture 23
CSC242: Intro to AI Lecture 23 Administrivia Posters! Tue Apr 24 and Thu Apr 26 Idea! Presentation! 2-wide x 4-high landscape pages Learning so far... Input Attributes Alt Bar Fri Hun Pat Price Rain Res
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More information22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1
Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable
More informationWeb-Mining Agents Computational Learning Theory
Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More information