Security Analytics. Topic 6: Perceptron and Support Vector Machine

Similar documents
Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Revision: Neural Network

6.036 midterm review. Wednesday, March 18, 15

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Multilayer Perceptrons and Backpropagation

Neural Networks and the Back-propagation Algorithm

Nonlinear Classification

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks (Part 1) Goals for the lecture

Intro to Neural Networks and Deep Learning

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial neural networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Simple Neural Nets For Pattern Classification

Simple neuron model Components of simple neuron

COMS 4771 Introduction to Machine Learning. Nakul Verma

CSC242: Intro to AI. Lecture 21

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Data Mining Part 5. Prediction

Artificial Intelligence

Midterm: CS 6375 Spring 2015 Solutions

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Logistic Regression. COMP 527 Danushka Bollegala

y(x n, w) t n 2. (1)

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Logistic Regression & Neural Networks

Multilayer Perceptron

Lecture 17: Neural Networks and Deep Learning

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Input layer. Weight matrix [ ] Output layer

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Deep Feedforward Networks

Jeff Howbert Introduction to Machine Learning Winter

Neural networks and support vector machines

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Neural Networks biological neuron artificial neuron 1

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

AI Programming CS F-20 Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

1 Machine Learning Concepts (16 points)

Linear & nonlinear classifiers

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Neural networks. Chapter 20. Chapter 20 1

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Lecture 6. Regression

Discriminative Models

Machine Learning and Data Mining. Linear classification. Kalev Kask

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Course 395: Machine Learning - Lectures

Machine Learning. Neural Networks

Name (NetID): (1 Point)

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Numerical Learning Algorithms

Introduction to Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Classification with Perceptrons. Reading:

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Support Vector Machines

CPSC 340: Machine Learning and Data Mining

Supervised Learning. George Konidaris

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Introduction to Deep Learning

Unit III. A Survey of Neural Network Model

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

ML4NLP Multiclass Classification

Support Vector Machines: Maximum Margin Classifiers

Linear Regression (continued)

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Lab 5: 16 th April Exercises on Neural Networks

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Discriminative Models

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Machine Learning (CS 567) Lecture 2

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

ECE521 Lectures 9 Fully Connected Neural Networks

Logistic Regression Logistic

Artifical Neural Networks

Training the linear classifier

Neural networks. Chapter 19, Sections 1 5 1

Machine Learning Linear Models

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Part 8: Neural Networks

Convolutional Neural Networks

Natural Language Processing with Deep Learning CS224N/Ling284

Multilayer Perceptron

Transcription:

Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton

Readings Principle of Data Mining Chapter 10: Predictive Modeling for Classification 10.3 Perceptron

PERCENTRON

Input signals sent from other neurons Connection strengths determine how the signals are accumulated If enough sufficient signals accumulate, the neuron fires a signal.

input signals x and coefficients w are multiplied weights correspond to connection strengths signals are added up if they are enough, FIRE! x 1 add x 2 w 1 w 2 M a x w i i1 i if ( a t) output else output 1 0 output signal output signal x 3 w 3 incoming signal connection strength activation level

Calculation M a x w i i1 i Sum notation (just like a loop from 1 to M) double[] x = double[] w = Multiple corresponding elements and add them up a (activation) if (activation > threshold) FIRE!

The Perceptron Decision Rule M i1 t if x i wi then output 1, else output 0

if M i1 x i w output = 0 i t output = 1 then output Decision boundary 1, else output 0 Rugby player = 1 Ballet dancer = 0

Is this a good decision boundary? if M i1 x i w i t then output 1, else output 0

w 1 = 1.0 w 2 = 0.2 t = 0.05 if M i1 x i w i t then output 1, else output 0

w 1 = 2.1 w 2 = 0.2 t = 0.05 if M i1 x i w i t then output 1, else output 0

w 1 = 1.9 w 2 = 0.02 t = 0.05 if M i1 x i w i t then output 1, else output 0

w 1 = 0.8 w 2 = -0.03 t = 0.05 Changing the weights/threshold makes the decision boundary move. Pointless / impossible to do it by hand only ok for simple 2-D case. We need an algorithm.

x [ 1.0, 0.5, 2.0 ] M a i 1 x w i i w [ 0.2, 0.5, 0.5 ] x1 w1 t 1.0 x2 x3 w2 w3 Q1. What is the activation, a, of the neuron? a M x w i i (1.0 0.2) (0.50.5) (2.00.5) i1 1.45 Q2. Does the neuron fire? if (activation > threshold) output=1 else output=0. So yes, it fires.

x [ 1.0, 0.5, 2.0 ] M a i 1 x w i i w [ 0.2, 0.5, 0.5 ] x1 w1 t 1.0 x2 x3 w2 w3 Q3. What if we set threshold at 0.5 and weight #3 to zero? a M x w i i (1.0 0.2) (0.5 0.5) (2.00.0) 0.45 i1 if (activation > threshold) output=1 else output=0. So no, it does not fire..

The Perceptron Model if d i1 wi x i t then ŷ 1else ŷ 0 "player" "dancer" 1 0 Error function Number of mistakes (a.k.a. classification error) Learning algo.???...need tooptimisethe wand t values... height weight

Perceptron Learning Rule new weight = old weight + 0.1 ( truelabel output ) input update What weight updates do these cases produce? if ( target = 0, output = 0 ). then update =? if ( target = 0, output = 1 ). then update =? if ( target = 1, output = 0 ). then update =? if ( target = 1, output = 1 ). then update =?

Learning algorithm for the Perceptron initialise weights to random numbers in range -1 to +1 for n = 1 to NUM_ITERATIONS for each training example (x,y) calculate activation for each weight update weight by learning rule end end end Perceptron convergence theorem: If the data is linearly separable, then application of the Perceptron learning rule will find a separating decision boundary, within a finite number of iterations

Supervised Learning Pipeline for Perceptron Training data Learning algorithm (search for good parameters) Testing Data (no labels) Model (if then ) Predicted Labels

New data. non-linearly separable height Our model does not match the problem! (AGAIN!) weight if d i1 wi x i t then"player"else "dancer" Many mistakes!

One Approach: Multilayer Perceptron x1 x2 x3 x4 x5

if Another Approach: Sigmoid activation no more thresholds needed d i1 wi x i t then ŷ 1else ŷ 0 a 1 1 exp( d i1 w i x i ) activation level d w ix i1 i

MLP decision boundary nonlinear problems, solved! height weight

Neural Networks - summary Perceptrons are a (simple) emulation of a neuron. Layering perceptrons gives you a multilayer perceptron. An MLP is one type of neural network there are others. An MLP with sigmoid activation functions can solve highly nonlinear problems. Downside we cannot use the simple perceptron learning algorithm. Instead we have the backpropagation algorithm. We will cover this later.

Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 f(x) = sign(w T x + b)

SEE SLIDES FOR SVM