# Numerical Learning Algorithms

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Numerical Learning Algorithms Example SVM for Separable Examples Example SVM for Nonseparable Examples Example Gaussian Kernel SVM Example Gaussian Kernel, Zoomed In Ensemble Learning 7 Ensemble Learning Boosting Example Boosting Algorithm Example Run of AdaBoost Example Run of AdaBoost, Continued Introduction Naive Bayes Naive Bayes Naive Bayes Example Naive Bayes Example Continued Linear Models 6 Linear Models Example of Numeric Examples Linear Regression Least Squares Gradient Descent Perceptron Learning Rule Perceptrons Continued Example of Perceptron Learning (α = ) The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Neural Networks 4 Artificial Neural Networks ANN Structure ANN Illustration Illustration Sigmoid Activation Plot of Sigmoid Function Backpropagation Applying The Chain Rule Support Vector Machines Support Vector Machines

2 Introduction Numerical learning methods learn the parameters or weights of a model, often by optimizing an error function. Examples include: Calculate the parameters of a probability distribution. Separate positive from negative examples by a decision boundary. Find points close to positive but far from negative examples. Update parameters to decrease error. CS 79 Artificial Intelligence Numerical Learning Algorithms Naive Bayes Naive Bayes For class C and attributes X i, assume: P(C, X,..., X n ) = P(C)P(X C)...P(X n C) This corresponds to a Bayesian network where C is the sole parent of each X i. Estimate prior and conditional probabilities by counting. If an outcome occurs m times out of n trials, Laplace s law of succession recommends the estimate (m + )/(n + k) where k is the number of outcomes. CS 79 Artificial Intelligence Numerical Learning Algorithms Naive Bayes Example Using Laplace s law of succession on the 4 examples.: P(pos) = (9 + )/(4 + ) = /6 P(neg) = (5 + )/(4 + ) = 6/6 P(sunny pos) = ( + )/(9 + ) = / P(overcast pos) = (4 + )/(9 + ) = 5/ P(rain pos) = ( + )/(9 + ) = 4/ Naive Bayes Example Continued For the first example: P(pos sunny, hot, high, false) = α (/6) (/)(/)(4/)(7/) α.94 P(neg sunny, hot, high, false) = α (6/6) (4/8) (/8) (5/7)(/7) α CS 79 Artificial Intelligence Numerical Learning Algorithms 5 Linear Models 6 Linear Models For a linear model, the output and each attribute must be numeric. The input of an example is a numeric vector x = (., x,..., x n ). A hypothesis is a weight vector w = (w o, w,..., w n ). w is the bias weight. The output of a hypothesis is computed by ŷ = w o + w x +...w n x n = w x The loss on example (x, y) is typically one of: Squared error loss: L (y, ŷ) = (y ŷ) Absolute error loss: L (y, ŷ) = y ŷ / loss: L / (y, ŷ) = if y = ŷ else CS 79 Artificial Intelligence Numerical Learning Algorithms 6 CS 79 Artificial Intelligence Numerical Learning Algorithms 4 4

3 Example of Numeric Examples No. Input Attributes Output Sunny Rainy Hot Cool Humid Windy CS 79 Artificial Intelligence Numerical Learning Algorithms 7 Linear Regression Linear regression finds the weights that minimizes loss over the training set. Gradient descent changes the weights based on the gradient, the derivatives of the loss with respect to the weights. (more on next page) The linear least squares algorithm calculates the weights by: w = (X X) X y where X is the data matrix and y is the vector of outputs. Classification can be performed by if w x > then positive else negative CS 79 Artificial Intelligence Numerical Learning Algorithms 8 Least Squares Gradient Descent w zeroes loop until convergence for each example (x j, y j ) ŷ j w x j for each w i in w w i w i + α(y j ŷ j )x ij where α is the learning rate. This is a small number chosen to tradeoff speed of convergence vs. closeness to optimal weights. CS 79 Artificial Intelligence Numerical Learning Algorithms 9 Perceptron Learning Rule [differs from book] A perceptron does gradient descent for absolute error loss (more accurately, ramp loss ). This assumes each y j is or. w zeroes loop until convergence for each example (x j, y j ) ŷ j w x j if (y j = ŷ j <) (y j = ŷ j > ) then for each w i in w w i w i + α y j x ij Again, α is the learning rate. CS 79 Artificial Intelligence Numerical Learning Algorithms Perceptrons Continued The perceptron convergence theorem states that if some w classifies all the training examples correctly, then the perceptron learning rule will converge to zero error on the training examples. Usually, many epochs (passes over the training examples) are needed until convergence. If zero error is not possible, use α./n, where n is the number of normalized or binary inputs. CS 79 Artificial Intelligence Numerical Learning Algorithms 5 6

4 Example of Perceptron Learning (α = ) Using α = : Inputs Weights x x x x 4 y ŷ L w w w w w CS 79 Artificial Intelligence Numerical Learning Algorithms Neural Networks 4 Artificial Neural Networks An (artificial) neural network consists of units, connections, and weights. Inputs and outputs are numeric. Biological NN soma axon, dendrite synapse potential threshold signal Artificial NN unit connection weight weighted sum bias weight activation CS 79 Artificial Intelligence Numerical Learning Algorithms 4 The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm The k-nearest neighbor algorithm classifies a test example by finding the k closest training example(s), returning the most common class. Suppose % noise (best possible test error is %). With sufficient training exs., a test example will agree with its nearest neighbor with prob. (.9)(.9) + (.)(.) =.8 (both not noisy or both noisy) and disagree with prob. (.9)(.) + (.)(.9) =.8. In general -nearest neighbor converges to less than twice the optimal error (-NN to less than % higher than optimal). CS 79 Artificial Intelligence Numerical Learning Algorithms ANN Structure A typical unit j receives inputs a, a,... from other units and performs a weighted sum: in j = w j + Σi w ij a i and outputs activation a j = g(in j ). Typically, input units store the inputs, hidden units transform the inputs into an internal numeric vector, and an output unit transforms the hidden values into the prediction. An ANN is a function f(x,w) = a, where x is an example, W is the weights, and a is the prediction (activation value from output unit). Learning is finding a W that minimizes error. CS 79 Artificial Intelligence Numerical Learning Algorithms 5 7 8

5 ANN Illustration INPUT UNITS x x x x 4 w 5 w 5 w 5 w 45 w 6 w 6 w 6 w 46 WEIGHTS HIDDEN UNITS a 5 w 5 + w 6 a 6 w 57 w 7 w 67 OUTPUT UNIT a 7 OUTPUT CS 79 Artificial Intelligence Numerical Learning Algorithms 6 Illustration INPUT UNITS x x x x 4 HIDDEN UNITS a a 6 WEIGHTS 4 OUTPUT UNIT a 7 OUTPUT CS 79 Artificial Intelligence Numerical Learning Algorithms 7 Sigmoid Activation The sigmoid function is defined as: sigmoid(x) = + e x It is commonly used for ANN activation functions: a j = sigmoid(in j ) = sigmoid(w i + Σi w ij a i ) Note that sigmoid(x) = sigmoid(x)( sigmoid(x)) x CS 79 Artificial Intelligence Numerical Learning Algorithms 8 Plot of Sigmoid Function sigmoid(x) CS 79 Artificial Intelligence Numerical Learning Algorithms 9 9

6 Backpropagation One learning method is backpropagating the error from the output to all of the weights. It is an application of the delta rule. Given loss L(W,x, y), obtain the gradient: [ ] L L(W,x, y) =...,,... w ij To decrease error, use the update rule: w ij w ij α L w ij where α is the learning rate. CS 79 Artificial Intelligence Numerical Learning Algorithms Applying The Chain Rule Using L = (y k a k ) for output unit k: L w jk = L a k in k a k in k w jk = (y k a k ) a k ( a k ) a j For weights from input to hidden units: L w ij = L a k in k a j in j a k in k a j in j w ij = (y k a k ) a k ( a k ) w jk a j ( a j ) x i CS 79 Artificial Intelligence Numerical Learning Algorithms Support Vector Machines Support Vector Machines A SVM assigns a weight α i to each example (x i, y i ) (x i is an attribute value vector, y i is either or ). A SVM computes a discriminant by: ( ) h(x) = sign b + Σ α i y i K(x,x i ) i where K is a kernel function. A SVM learns by optimizing the error function: minimize h / + Σ i max(, y i h(x i )) subject to α i C where h is the size of h in kernel space CS 79 Artificial Intelligence Numerical Learning Algorithms Example SVM for Separable Examples w.x + b = - w.x + b = w.x + b = CS 79 Artificial Intelligence Numerical Learning Algorithms

7 Example SVM for Nonseparable Examples Example Gaussian Kernel SVM w.x + b = - w.x + b = w.x + b = CS 79 Artificial Intelligence Numerical Learning Algorithms 5 CS 79 Artificial Intelligence Numerical Learning Algorithms 4 Example Gaussian Kernel, Zoomed In CS 79 Artificial Intelligence Numerical Learning Algorithms 6 4

8 Ensemble Learning 7 Ensemble Learning There are many algorithms for learning a single hypothesis. Ensemble learning will learn and combine a collection of hypotheses by running the algorithm on different training sets. Bagging (briefly mentioned in the book) runs a learning algorithm on repeated subsamples of the training set. If there are n examples, then a subsample of n examples is generated by sampling with replacement. On a test example, each hypothesis casts vote for the class it predicts. CS 79 Artificial Intelligence Numerical Learning Algorithms 7 Boosting In boosting, the hypotheses are learned in sequence. Both hypotheses and examples have weights with different purposes. After each hypothesis is learned, its weight is based on its error rate, and the weights of the training examples (initially all equal) are also modified. On a test example, when each hypothesis predicts a class, its weight is the size of its vote. The ensemble predicts the class with the highest vote. CS 79 Artificial Intelligence Numerical Learning Algorithms 8 Example Run of AdaBoost Using the 4 examples as a training set: The hypothesis windy = false class = pos is wrong on 5 of the 4 examples. The weights of the correctly classified examples are multiplied by 5/9, then all examples are multiplied by 4/ so they sum up to again. This hypothesis has a weight of log(9/5). Note that after weight updating, the sum of the correctly classified examples equals the sum of the incorrectly classified examples. CS 79 Artificial Intelligence Numerical Learning Algorithms Example Run of AdaBoost, Continued The next hypothesis must be different from the previous one to have error less than /. Now the hypothesis outlook = overcast class = pos has an error rate of 9/9. The weights of the correctly classified examples are multiplied times 9/6.475, then all examples are multiplied by 9/58.55 so they sum up to again. This hypothesis has a weight of log(6/9). CS 79 Artificial Intelligence Numerical Learning Algorithms Example Boosting Algorithm AdaBoost(examples, algorithm, iterations). n number of examples. initialize weights w[... n] to /n. for i from to iterations 4. h[i] algorithm(examples) 5. error sum of exs. misclassfied by h[i] 6. for j from to n 7. if h[i] is correct on example j 8. then w[j] w[j] error/( error) 9. normalize w[...n] so it sums to. weight of h[i] log(( error)/error). return h[... iterations] and their weights CS 79 Artificial Intelligence Numerical Learning Algorithms 9 5 6

### Artifical Neural Networks

Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

### Mining Classification Knowledge

Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

### Mining Classification Knowledge

Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

### Artificial neural networks

Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

### Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

### Machine Learning (CSE 446): Neural Networks

Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /

### CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

### CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

### Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

### CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

### FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

### Neural networks. Chapter 20. Chapter 20 1

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

### Multilayer Perceptron

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

### CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#\$

### Neural networks. Chapter 19, Sections 1 5 1

Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

### Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

### Revision: Neural Network

Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

### Neural networks. Chapter 20, Section 5 1

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

### MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

### Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

### Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last

Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

### Logistic Regression. Machine Learning Fall 2018

Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

### Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

### MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

### Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

### SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

### AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

### Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

### Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

### Artificial Neural Network

Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

### The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

### AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the

### B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015

- Machine Learning - Homework Enrique Areyan April 8, 01 Problem 1: Give decision trees to represent the following oolean functions a) A b) A C c) Ā d) A C D e) A C D where Ā is a negation of A and is

### Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Supervised Learning (contd) Decision Trees Mausam (based on slides by UW-AI faculty) Decision Trees To play or not to play? http://www.sfgate.com/blogs/images/sfgate/sgreen/2007/09/05/2240773250x321.jpg

### Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

### Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

### Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

### Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

### Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

### Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

### Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

### LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

### Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

### Linear discriminant functions

Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

### Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano Prof. Josenildo Silva jcsilva@ifma.edu.br 2015 2012-2015 Josenildo Silva (jcsilva@ifma.edu.br) Este material é derivado dos

### 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

### Final Exam, Fall 2002

15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

### MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

### Artificial Intelligence Roman Barták

Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Introduction We will describe agents that can improve their behavior through diligent study of their

### CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

### 22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

### CS 6375 Machine Learning

CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

### Machine Learning Lecture 5

Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

### CSCE 478/878 Lecture 6: Bayesian Learning

Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

### Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S

### Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

### Statistical Machine Learning from Data

January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

### Machine Learning Algorithm. Heejun Kim

Machine Learning Algorithm Heejun Kim June 12, 2018 Machine Learning Algorithms Machine Learning algorithm: a procedure in developing computer programs that improve their performance with experience. Types

### The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

### Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

### Artificial Neural Networks

Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

### CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

### Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Classification What is classification Classification Simple methods for classification Classification by decision tree induction Classification evaluation Classification in Large Databases Classification

### Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

### Neural Networks: Introduction

Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

### Pattern Recognition and Machine Learning

Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

### Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

### Evaluation. Andrea Passerini Machine Learning. Evaluation

Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

### AI Programming CS F-20 Neural Networks

AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

### NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

### Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

### ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

### Learning from Examples

Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

### Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

### Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

### Neural Networks DWML, /25

DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

### Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

### Evaluation requires to define performance measures to be optimized

Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

### VBM683 Machine Learning

VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

### Artificial Neural Network

Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

### CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

### ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

### Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

### 18.9 SUPPORT VECTOR MACHINES

744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

### Hierarchical Boosting and Filter Generation

January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

### COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

### Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

### Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

### Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

### ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

### Bayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.

Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of

### FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

### Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

### Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks

### Logistic Regression & Neural Networks

Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability