Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Similar documents
Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Lecture Notes on Linear Regression

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Homework Assignment 3 Due in class, Thursday October 15

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

VQ widely used in coding speech, image, and video

Learning from Data 1 Naive Bayes

SDMML HT MSc Problem Sheet 4

Radial-Basis Function Networks

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Logistic Classifier CISC 5800 Professor Daniel Leeds

EM and Structure Learning

Linear Classification, SVMs and Nearest Neighbors

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Multilayer neural networks

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

The big picture. Outline

Multilayer Perceptron (MLP)

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Support Vector Machines

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

CS47300: Web Information Search and Management

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Classification as a Regression Problem

Maximum Likelihood Estimation (MLE)

Generative classification models

Statistical machine learning and its application to neonatal seizure detection

Clustering & Unsupervised Learning

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Machine learning: Density estimation

Ensemble Methods: Boosting

Support Vector Machines

Introduction to the Introduction to Artificial Neural Network

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Nonlinear Classifiers II

Evaluation of classifiers MLPs

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Lecture 12: Classification

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Fundamentals of Neural Networks

Statistical Foundations of Pattern Recognition

Chapter 7 Channel Capacity and Coding

Lecture 10 Support Vector Machines II

Multi-layer neural networks

Communication with AWGN Interference

Lecture 3: Shannon s Theorem

1 The Mistake Bound Model

Supervised Learning NNs

Gaussian process classification: a message-passing viewpoint

Evaluation for sets of classes

Support Vector Machines

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Fuzzy Systems (2/2) Francesco Masulli

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Support Vector Machines

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Support Vector Machines

Boostrapaggregating (Bagging)

Lecture 3. Ax x i a i. i i

Lecture 3: Dual problems and Kernels

Clustering & (Ken Kreutz-Delgado) UCSD

Video Data Analysis. Video Data Analysis, B-IT

Engineering Risk Benefit Analysis

Week 5: Neural Networks

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Classification Bayesian Classifiers

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Chapter 6 Support vector machine. Séparateurs à vaste marge

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Kristin P. Bennett. Rensselaer Polytechnic Institute

Maximum Likelihood Estimation

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Improving the performance of radial basis function classifiers in condition monitoring and fault diagnosis applications where unknown faults may occur

Error Probability for M Signals

Which Separator? Spring 1

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Chapter 7 Channel Capacity and Coding

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Clustering with Gaussian Mixtures

Report on Image warping

IDIAP IDIAP RESEARCH REPORT A NEURAL NETWORK FOR CLASSIFICATION WITH INCOMPLETE DATA. Martigny - Valais - Suisse. August Andrew C.

Relevance Vector Machines Explained

Differentiating Gaussian Processes

1 Convex Optimization

Transcription:

Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM)

Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn functon f(.) precse error can be determned and s used to drve the learnng. Unsupervsed learnng: (compettve, SOM, BM) no target/desred output provded to help learnng, learnng s self-organzed/clusterng renforcement f tlearnng: n between bt the two no target output for nput vectors n tranng samples a udge/crtc wll evaluate the output good: reward sgnal (+1) bad: penalty sgnal (-1)

RL exsts n many places Orgnated from psychology (condtonal reflex) In many applcatons, t s much easer to determne good/bad, rght/wrong, acceptable/unacceptable than to provde precse correct answer/error. It s up to the learnng process to mprove the system s performance based on the crtc s sgnal. Machne learnng communty, dfferent theores and algorthms maor dffculty: credt/blame dstrbuton chess playng: W/L (mult-step) soccer playng: W/L (mult-player)

P Prncple of frl Let r = +1 reword (good output) r = -1 penalty (bad output) If r = +1, the system s encouraged to contnue what t s dong If r = -1, the system s encouraged not to do what t s d dong. Need to search for better output because r = -1does not tndcate what tthe good output should be. common method s random search

ARP: the assocatve reword-and-penalty Algorthm for NN RL (Barton and Anandan, 1985) Archtecture z(k) crtc y(k) nput: x(k) output: y(k) x(k) stochastc unts:z(k) for random search

Random search by stochastc t unts z 2 / 1 ( 1) (1 net T pz e ) 2 / 1 ( 1) (1 net T pz e ) or let z obey a contnuous probablty dstrbuton functon. or let z net where s a random nose, obeys certan dstrbuton. Key: z s not a determnstc functon of x, ths gves z a chance to be a good output. t Prepare desred output (temporary) dk ( ) yk ( ) f rk ( ) 1 yk ( ) f rk ( ) 1

Compute the errors at z layer ek ( ) dk ( ) Ezk ( ( )) where E(z(k)) s the expected value of z(k) because z s a random varable How to compute E(z(k)) tk take average of z over a perod of tme compute from the dstrbuton, f possble f logstc sgmod functon s used, E( z) ( 1) g( net) ( 1)(1 g( net)) tanh( net / T ) Tranng: Delta rule to learn weghts for output nodes w ey f r 1 wth ey f r1 BP or other methods to modfy weghts at lower layers

Probablstc Neural Networks 1. Purpose: classfy a gven nput pattern x nto one of the predefned classes by Bayesan decson rule. Suppose there are k predefned classes s 1, s k P(s ): pror probablty of class s P(x s ): condtonal probablty of x, gven s P(x): probablty of x P(s x): posteror probablty of s, gven x Example: S { s1 s k }, the set of all patents s : the set of all patents havng dsease s x: a descrpton (manfestatons) t of a patent t

P(x s ): prob. patent t wth dsease s wll have descrpton x P(s x): prob. patent t wth descrpton x wll have dsease s. by Bayes theorem: P ( s x ) max P ( s x ) Px ( s ) Ps ( ) Ps ( x) because Px ( ) s constant, s Px ( ) Ps ( x) max Ps ( x) ff Px ( s) Ps ( ) max Px ( s) Ps ( ) In PNN, P( x s ) are learned from examplers

2. Estmate probabltes () - Tranng exemplars: x the th exemplar belongng to s - Prors can be obtaned ether by experts estmate or k calculated from exemplars Ps ( ) s / 1 s - Condtonals are estmated accordng to Parzen estmator: () 2 n 1 x x Px ( s ) exp m /2 m 2 (2 ) n 1 2 where m : dmenson of the pattern n : # of exemplars n s x : nput pattern - closely related to radal bass functon of Gaussan 2 1 ( x u ) f ( x) exp( ) 2 2 2

3. PNN archtecture: feed forward of 4 layers decson layer class layer z y exemplar layer () 2 2 y exp( x x / ) nput layer Exemplar layer: RBF nodes, one per exemplar, centered on () y determned by the dstance between and x x () y s large f t s close to x, Class layer: connectng to all exemplars belongng to that class s, z approx. Parzen estmate t of P( x s) () z s large f x s close to more x Decson layer: pcks up wnner based on z P ( s ) If necessary tranng to adust weghts for upper layers () x

4. Comments: Classfcaton by Bayes rule Fast classfcaton Fast learnng Guaranteed to approach the Bayes optmal decson surface provded that the class probablty densty functons are smooth and contnuous. Trade nodes for tme( not good wth large tranng samples) The probablstc densty functon to be represented must be smooth and contnuous.