Introduction to the Introduction to Artificial Neural Network

Similar documents
EEE 241: Linear Systems

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Multilayer Perceptron (MLP)

Multilayer neural networks

Multi-layer neural networks

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Neural networks. Nuno Vasconcelos ECE Department, UCSD

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Neural Networks & Learning

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Supervised Learning NNs

Machine Learning CS-527A ANN ANN. ANN Short History ANN. Artificial Neural Networks (ANN) Artificial Neural Networks

Evaluation of classifiers MLPs

Pattern Classification

Multigradient for Neural Networks for Equalizers 1

Fundamentals of Computational Neuroscience 2e

Nonlinear Classifiers II

1 Input-Output Mappings. 2 Hebbian Failure. 3 Delta Rule Success.

Lecture Notes on Linear Regression

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Multi layer feed-forward NN FFNN. XOR problem. XOR problem. Neural Network for Speech. NETtalk (Sejnowski & Rosenberg, 1987) NETtalk (contd.

Week 5: Neural Networks

Neural Networks. Class 22: MLSP, Fall 2016 Instructor: Bhiksha Raj

MATH 567: Mathematical Techniques in Data Science Lab 8

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Generalized Linear Methods

Which Separator? Spring 1

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Model of Neurons. CS 416 Artificial Intelligence. Early History of Neural Nets. Cybernetics. McCulloch-Pitts Neurons. Hebbian Modification.

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Neural Networks. Neural Network Motivation. Why Neural Networks? Comments on Blue Gene. More Comments on Blue Gene

1 Convex Optimization

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

IV. Performance Optimization

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Support Vector Machines CS434

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Deep Learning. Boyang Albert Li, Jie Jay Tan

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Why feed-forward networks are in a bad shape

Lecture 23: Artificial neural networks

Support Vector Machines

Lecture 21: Numerical methods for pricing American type derivatives

Classification (klasifikácia) Feedforward Multi-Layer Perceptron (Dopredná viacvrstvová sieť) 14/11/2016. Perceptron (Frank Rosenblatt, 1957)

Feature Selection: Part 1

Support Vector Machines

Non-linear Canonical Correlation Analysis Using a RBF Network

Course 395: Machine Learning - Lectures

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Introduction to Neural Networks. David Stutz

CS294A Lecture notes. Andrew Ng

Support Vector Machines

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Gradient Descent Learning and Backpropagation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

FORECASTING EXCHANGE RATE USING SUPPORT VECTOR MACHINES

CoSMo 2012 Gunnar Blohm

10-701/ Machine Learning, Fall 2005 Homework 3

CSC 411 / CSC D11 / CSC C11

Radial-Basis Function Networks

Ensemble Methods: Boosting

Lecture 10 Support Vector Machines. Oct

Statistical machine learning and its application to neonatal seizure detection

Classification learning II

CHAPTER III Neural Networks as Associative Memory

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

CSE 252C: Computer Vision III

Lecture 10 Support Vector Machines II

Portfolios with Trading Constraints and Payout Restrictions

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Video Data Analysis. Video Data Analysis, B-IT

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Linear Classification, SVMs and Nearest Neighbors

Intro to Visual Recognition

Supporting Information

Natural Language Processing and Information Retrieval

Solving Nonlinear Differential Equations by a Neural Network Method

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Training Convolutional Neural Networks

Queueing Networks II Network Performance

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Relevance Vector Machines Explained

Lecture 4: Perceptrons and Multilayer Perceptrons

Online Classification: Perceptron and Winnow

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

CS294A Lecture notes. Andrew Ng

Fundamentals of Neural Networks

Maximal Margin Classifier

A 2D Bounded Linear Program (H,c) 2D Linear Programming

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Numerical Heat and Mass Transfer

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

A neural network with localized receptive fields for visual pattern classification

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Transcription:

Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp n any of the content of these sldes.

Outlne Bologcal Inspratons Applcatons & Propertes of ANN Perceptron Mult-Layer Perceptrons rror Backpropagaton Algorthm Remarks on ANN

Bologcal Inspratons Humans perform complex tasks lke vson, motor control, or language understandng very ell One ay to buld ntellgent machnes s to try to mtate the (organzatonal prncples of) human bran

Bologcal Neuron

Artfcal Neural Netorks ANNs have been dely used n varous domans for: Pattern recognton Functon approxmaton tc.

Perceptron (Artfcal Neuron) A perceptron takes a vector of real-valued nputs calculates a lnear combnaton of the nputs outputs f the result s greater than some threshold and - (or 0) otherse

Perceptron To smplfy notaton, assume an addtonal constant nput x 0 =. We can rte the perceptron functon as

Representatonal Poer of Perceptron The perceptron ~ a hyperplane decson surface n the n-dmensonal space of nstances X 2 - <, 2 > - - X - Lnearly separable data

Boolean Functons A sngle perceptron can be used to represent many boolean functons (true); 0 (false) Perceptrons can represent all of the prmtve boolean functons AND, OR, NOT

Implementng AND x x 2 W=-.5 o(x,x 2 ) o( x, x2) = f.5 x x2 = 0 otherse > 0

Implementng OR x x 2 W=-0.5 o(x,x 2 ) o(x,x2) = f 0.5 x x2 > 0 = 0 otherse

Implementng NOT W=0.5 - x o(x ) o( x ) = f 0.5 x > = 0 otherse 0

The XOR Functon Unfortunately, some Boolean functons cannot be represented by a sngle perceptron x x 2 o(x,x 2 ) 0 2 0 0 0 0 0 0 0 0 2 0 2 0 2 0 2 0 > > There s no assgnment of values to 0, and 2 that satsfes above nequaltes. XOR cannot be represented! XOR(x,x 2 )

Lnear Seprablty Boolean AND Boolean XOR Representaton Theorem: Perceptrons can only represent lnearly separable functons. That s, the decson surface separatng the output values has to be a plane. (Mnsky & Papert, 969)

Remarks on perceptron Perceptrons can represent all the prmtve Boolean functons AND, OR, and NOT Some Boolean functons cannot be represented by a sngle perceptron Such as the XOR functon very Boolean functon can be represented by some combnaton of AND, OR, and NOT We ant netorks of the perceptrons

Implementng XOR by Mult-layer perceptron (MLP) NAND OR x XOR x2 = (x OR x2) AND (NOT(x AND x2))

Mult-Layer Perceptrons (MLP)

Representaton Poer of MLP Conjuncton of pece-se hyperplanes

Representaton Poer of ANN Boolean functons: very Boolean functon can be represented exactly by some netork th to layers of unts Contnuous functons: very bounded contnuous functon can be approxmated th arbtrarly small error (under a fnte norm) by a netork th to layers of unts Arbtrary functons: Any functon can be approxmated to arbtrary accuracy by a netork th three layers of unts

Neuron netork desgn for non-closed form problem Testng data x * Tranng data {x, y } Tranng process Neuron netork Testng process Result y *

Defnton of Tranng rror Tranng error : a functon of eght vector over the tranng data set D ( ) 2 d D ( t o d d ) 2 Unthresholded perceptron or lnear unt

Gradent Descent To reduce error, update the eght vector n the drecton of steepest descent along the error surface n,,,, ) ( 2 0 )) ( ( η

Gradent Descent

Weght Update Rule )) ( ( η ), ( η = D d d d d x o t ) )( (

Gradent Descent Search Algorthm repeat 0 for each tranng example <x, t(x)> o(x)= x for each η(t(x)-o(x))x for each untl (termnaton condton)

Perceptron Learnng Rule vs Delta Rule = ( t o) η x Perceptron learnng rule = η( t o) x Delta rule The perceptron learnng rule uses the output of the threshold functon (ether - or ) for learnng. The delta-rule uses the net output thout further mappng nto output values - or

Perceptron Learnng Algorthm Guaranteed to converge thn a fnte tme f the tranng data s lnearly separable and η s suffcently small If the data are not lnearly separable, convergence s not assured

Feed-forard Netorks

Defnton of rror for MLP ( ) ( ) ( ) D d d d t o 2 2 ) ( ( ) ( ) ( ) ( ) ( ) h j h h o o,,,,,, ) ( 2 2 ( ) () η

Output Layer s Weght Update ( ) ( ) ( ) j j m m m m m m j j m m m j l m m lm l j l l l d j d h o o o t h e o o t h and e o o o h t h t o t ) ( 2 ) ( 2 ) ( 2 2 2 2 2 2 ) ( ) ( = = = = = Θ = Θ = Θ = = σ σ σ σ σ σ σ

Hdden Layer s Weght Update Dstrbute error to rror of h j nputs proportonal to eghts j j Smlar to output layer: jk = [ ( t o ) o ( o ) ] j h j ( h j ) x k rror Back-propagaton

rror Back Propagaton Algorthm ntalze all eghts to small random numbers repeat for each tranng example <x, t(x)> for each hdden node j Θ( jk xk k for each output node o Θ( jh for each output node s eght for each hdden node s eght for each hdden node s eght for each output node s eght untl (termnaton condton) h ) j j ) / = o ( o )( t o ) h / = [ o ( o )( t o ) ] h ( h ) x j jk jk j j jk η j η jk j j j j k

Generalzaton, Overfttng, etc. Artfcal neural netorks th a large number of eghts tend to overft the tranng data To ncrease generalzaton accuracy, use a valdaton set Fnd the optmal number of perceptrons Fnd the optmal number of tranng teratons Stop hen overfttng happens

Generalzaton, Overfttng, etc.