Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Similar documents
Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning and Data Mining. Linear classification. Kalev Kask

Lecture 17: Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning

Neural networks. Chapter 19, Sections 1 5 1

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Machine Learning (CSE 446): Neural Networks

CSC242: Intro to AI. Lecture 21

Machine Learning. Neural Networks

Logistic Regression & Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Multilayer Perceptrons (MLPs)

Neural networks. Chapter 20. Chapter 20 1

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Lecture 5: Logistic Regression. Neural Networks

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Course 395: Machine Learning - Lectures

Artificial Intelligence

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Introduction Biologically Motivated Crude Model Backpropagation

Artifical Neural Networks

Artificial Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Multilayer Perceptron

Supervised Learning. George Konidaris

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks: Backpropagation

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

CSC321 Lecture 5: Multilayer Perceptrons

Neural Networks (Part 1) Goals for the lecture

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Neural networks. Chapter 20, Section 5 1

10. Artificial Neural Networks

Linear discriminant functions

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Artificial Neural Networks

Security Analytics. Topic 6: Perceptron and Support Vector Machine

CSC 411 Lecture 10: Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

y(x n, w) t n 2. (1)

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Artificial Neuron (Perceptron)

Feedforward Neural Nets and Backpropagation

Introduction to Neural Networks

Neural networks and support vector machines

Neural Networks and the Back-propagation Algorithm

Artificial Neural Networks 2

Learning and Memory in Neural Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks and Deep Learning

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Artificial Neural Networks

Artificial Neural Networks. Part 2

Nonlinear Classification

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Artificial Neural Networks

AI Programming CS F-20 Neural Networks

Revision: Neural Network

Deep Feedforward Networks

Advanced statistical methods for data analysis Lecture 2

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks: Introduction

Introduction to Deep Learning

Unit III. A Survey of Neural Network Model

Neural Networks: Basics. Darrell Whitley Colorado State University

Artificial Neural Networks. MGS Lecture 2

Statistical NLP for the Web

Deep Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Perceptron

Multilayer Perceptrons and Backpropagation

Artificial Neural Networks. Historical description

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Machine Learning (CSE 446): Backpropagation

SGD and Deep Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

PV021: Neural networks. Tomáš Brázdil

Feed-forward Network Functions

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural networks and optimization

Deep Neural Networks (1) Hidden layers; Back-propagation

The Perceptron Algorithm

Machine Learning Lecture 5

Transcription:

+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler

Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions feature space using a linear function (a straight line, or a hyperplane) separates the two classes using a straight line in feature space in 2 dimensions the decision boundary is a straight line Linearly separable data Linearly non-separable data Feature 2, x 2 Feature 2, x 2 Decision boundary Decision boundary Feature 1, x 1 Feature 1, x 1

Perceptron Classifier (2 features) T(f) x 1 x 2 w 1 w 2 Classifier f = w 1 X1 + w 2 X2 + w 0 T(f) f(x,y) 1 w 0 weighted sum Threshold Function {-1, +1} of the inputs output = class decision Decision Boundary at f(x) = 0 Solve: X 2 = -w 1 /w 2 X 1 w 0 /w 2 (Line)

Perceptron (Linear classifier) T(f) x 1 x 2 w 1 w 2 Classifier f = w 1 X1 + w 2 X2 + w 0 T(f) f(x,y) 1 1D example: w 0 weighted sum Threshold Function {0, 1} of the inputs output = class decision T(f) = 0 if f < 0 T(f) = 1 if f > 0 Decision boundary = x such that T( w 1 x + w 0 ) transitions

Features and perceptrons Recall the role of features We can create extra features that allow more complex decision boundaries Linear classifiers Features [1,x] Decision rule: T(ax+b) = ax + b >/< 0 Boundary ax+b =0 => point Features [1,x,x 2 ] Decision rule T(ax 2 +bx+c) Boundary ax 2 +bx+c = 0 =? What features can produce this decision rule?

Features and perceptrons Recall the role of features We can create extra features that allow more complex decision boundaries For example, polynomial features (x) = [1 x x 2 x 3 ] What other kinds of features could we choose? Step functions? F1 F2 Linear function of features a F1 + b F2 + c F3 + d Ex: F1 F2 + F3 F3

Multi-layer perceptron model Step functions are just perceptrons! Features are outputs of a perceptron Combination of features output of another x 1 1 w 11 w 10 w 21 w 20 w 31 w 30 F1 F2 F3 w 2 w 3 w 1 Out Output layer Hidden layer w 10 w 11 W 1 = w 20 w 21 W 2 = w 1 w 2 w 3 w 30 w 31 Linear func+on of features: a F1 + b F2 + c F3 + d Ex: F1 F2 + F3

Multi-layer perceptron model Step functions are just perceptrons! Features are outputs of a perceptron Combination of features output of another Regression version: Remove ac+va+on func+on from output x 1 1 w 11 w 10 w 11 W 1 = w 20 w 21 w 30 w 31 w 10 w 21 w 20 w 31 w 30 F1 F2 F3 Hidden layer w 2 w 3 w 1 Out Output layer W 2 = w 1 w 2 w 3 Linear func+on of features: a F1 + b F2 + c F3 + d Ex: F1 F2 + F3

Features of MLPs Simple building blocks Each element is just a perceptron f n Can build upwards Perceptron: Step function / Linear partition Input Features

Features of MLPs Simple building blocks Each element is just a perceptron f n Can build upwards 2-layer: Features are now partitions All linear combinations of those partitions Input Features Layer 1

Features of MLPs Simple building blocks Each element is just a perceptron f n Can build upwards 3-layer: Features are now complex functions Output any linear combination of those Input Features Layer 1 Layer 2

Features of MLPs Simple building blocks Each element is just a perceptron f n Can build upwards Current research: Deep architectures (many layers) Input Features Layer 1 Layer 2 Layer 3

Features of MLPs Simple building blocks Each element is just a perceptron function Can build upwards Flexible function approximation Approximate arbitrary functions with enough hidden nodes Output y Layer 1 h 1 h 2 h 3 v 0 v 1 Input Features x 0 x 1 h 1 h 2

Neural networks Another term for MLPs Biological motivation Neurons Simple cells Dendrites sense charge Cell weighs inputs Fires axon w 2 w 3 w 1 How stuff works: the brain

Activation functions Logis+c Hyperbolic Tangent Gaussian Linear And many others

Feed-forward networks Information flows left-to-right Input observed features Compute hidden nodes (parallel) Compute next layer X W[0] H1 W[1] H2 X1 = _add1(x); # add constant feature T = X1.dot(W[0].T); # linear response H = Sig( T ); # activation f n H1 = _add1(h); # add constant feature S = H1.dot(W[1].T); # linear response H2 = Sig( S ); # activation f n %... Alternative: recurrent NNs Information

Feed-forward networks A note on multiple outputs: Regression: Predict multi-dimensional y Shared representation = fewer parameters Classification Predict binary vector Multi-class classification y = 2 = [0 0 1 0 ] Multiple, joint binary predictions (image tagging, etc.) Often trained as regression (MSE), with saturating activation Information

+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Backpropagation Prof. Alexander Ihler

Training MLPs Observe features x with target y Push x through NN = output is ŷ Error: (y- ŷ) 2 How should we update the weights to improve? (Can use different loss func+ons if desired ) Single layer Logistic sigmoid function Inputs Hidden Layer Outputs Smooth, differentiable Optimize using: Batch gradient descent Stochastic gradient descent

Forward pass Backpropagation Just gradient descent Apply the chain rule to the MLP Loss function Output layer Hidden layer (Identical to logistic mse regression with inputs h j ) ŷ k h j

Forward pass Backpropagation Just gradient descent Apply the chain rule to the MLP Loss function Output layer Hidden layer (Identical to logistic mse regression with inputs h j ) ŷ k h j x i

Forward pass Backpropagation Just gradient descent Apply the chain rule to the MLP Loss function Output layer Hidden layer B2 = (Y-Yhat) * dsig(s) #(1xN3) % X : (1xN1) H = Sig(X1.dot(W[0])) % W1 : (N2 x N1+1) % H : (1xN2) Yh = Sig(H1.dot(W[1])) % W2 : (N3 x N2+1) % Yh : (1xN3) G2 = B2.T.dot( H ) #(N3x1)*(1xN2)=(N3xN2) B1 = B2.dot(W[1])*dSig(T)#(1xN3).(N3*N2)*(1xN2) G1 = B1.T.dot( X ) #(N2 x N1+1)

Example: Regression, MCycle data Train NN model, 2 layer 1 input features => 1 input units 10 hidden units 1 target => 1 output units Logis+c sigmoid ac+va+on for hidden layer, linear for output layer Data: + learned predic/on f n: Responses of hidden nodes (= features of linear regression): select out useful regions of x

Example: Classifica+on, Iris data Train NN model, 2 layer 2 input features => 2 input units 10 hidden units 3 classes => 3 output units (y = [0 0 1], etc.) Logis+c sigmoid ac+va+on func+ons Op+mize MSE of predic+ons using stochas+c gradient

MLPs in practice Example: Deep belief nets (Hinton et al. 2007) Handwriting recognition Online demo 784 pixels ó 500 mid ó 500 high ó 2000 top ó 10 labels x h 1 h 2 h 3 ŷ h 3 ŷ h 2 h 1 x

MLPs in practice Example: Deep belief nets (Hinton et al. 2007) Handwriting recognition Online demo 784 pixels ó 500 mid ó 500 high ó 2000 top ó 10 labels x h 1 h 2 h 3 ŷ h 3 ŷ h 2 h 1 x

MLPs in practice Example: Deep belief nets (Hinton et al. 2007) Handwriting recognition Online demo 784 pixels ó 500 mid ó 500 high ó 2000 top ó 10 labels Fix output, simulate inputs

Neural networks & DBNs Want to try them out? Matlab Deep Learning Toolbox https://github.com/rasmusbergpalm/deeplearntoolbox PyLearn2 https://github.com/lisa-lab/pylearn2 TensorFlow

Summary Neural networks, multi-layer perceptrons Cascade of simple perceptrons Each just a linear classifier Hidden units used to create new features Together, general function approximators Enough hidden units (features) = any function Can create nonlinear classifiers Also used for function approximation, regression, Training via backprop Gradient descent; logistic; apply chain rule