Lecture 12: Multilayer perceptrons II

Similar documents
Lecture 7: Linear and quadratic classifiers

LECTURE 9: Principal Components Analysis

Unsupervised Learning and Other Neural Networks

Supervised learning: Linear regression Logistic regression

Regression and the LMS Algorithm

15-381: Artificial Intelligence. Regression and neural networks (NN)

Introduction to local (nonparametric) density estimation. methods

LECTURE 2: Linear and quadratic classifiers

Econometric Methods. Review of Estimation

CSE 5526: Introduction to Neural Networks Linear Regression

LECTURE 21: Support Vector Machines

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

ABOUT ONE APPROACH TO APPROXIMATION OF CONTINUOUS FUNCTION BY THREE-LAYERED NEURAL NETWORK

Bayes (Naïve or not) Classifiers: Generative Approach

Rademacher Complexity. Examples

Dimensionality reduction Feature selection

Generative classification models

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Chapter 5 Properties of a Random Sample

ESS Line Fitting

Some Different Perspectives on Linear Least Squares

Classification : Logistic regression. Generative classification model.

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

An Introduction to. Support Vector Machine

Summary of the lecture in Biostatistics

Support vector machines II

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

STK4011 and STK9011 Autumn 2016

Linear regression (cont.) Linear methods for classification

Binary classification: Support Vector Machines

QR Factorization and Singular Value Decomposition COS 323

Support vector machines

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

1 Onto functions and bijections Applications to Counting

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

6. Nonparametric techniques

Lecture 3 Probability review (cont d)

Linear regression (cont) Logistic regression

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

CHAPTER 4 RADICAL EXPRESSIONS

Point Estimation: definition of estimators

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

= 2. Statistic - function that doesn't depend on any of the known parameters; examples:

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

ρ < 1 be five real numbers. The

Kernel-based Methods and Support Vector Machines

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

Functions of Random Variables

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

Chapter Two. An Introduction to Regression ( )

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

18.413: Error Correcting Codes Lab March 2, Lecture 8

X ε ) = 0, or equivalently, lim

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Some Applications of the Resampling Methods in Computational Physics

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

9.1 Introduction to the probit and logit models

TESTS BASED ON MAXIMUM LIKELIHOOD

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

A New Method for Solving Fuzzy Linear. Programming by Solving Linear Programming

2006 Jamie Trahan, Autar Kaw, Kevin Martin University of South Florida United States of America

Minimization of Unconstrained Nonpolynomial Large-Scale Optimization Problems Using Conjugate Gradient Method Via Exact Line Search

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Q-analogue of a Linear Transformation Preserving Log-concavity

Chapter 2 Supplemental Text Material

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

A tighter lower bound on the circuit size of the hardest Boolean functions

Maps on Triangular Matrix Algebras

means the first term, a2 means the term, etc. Infinite Sequences: follow the same pattern forever.

EVALUATION OF FUNCTIONAL INTEGRALS BY MEANS OF A SERIES AND THE METHOD OF BOREL TRANSFORM

Objectives of Multiple Regression

EP2200 Queueing theory and teletraffic systems. Queueing networks. Viktoria Fodor KTH EES/LCN KTH EES/LCN

Generalized Linear Regression with Regularization

Qualifying Exam Statistical Theory Problem Solutions August 2005

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

Lecture 4 Sep 9, 2015

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Multiple Choice Test. Chapter Adequacy of Models for Regression


Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

Dimensionality Reduction and Learning

Line Fitting and Regression

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Lecture 1: Introduction to Regression

A Mean- maximum Deviation Portfolio Optimization Model

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

CHAPTER VI Statistical Analysis of Experimental Data

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

x y exp λ'. x exp λ 2. x exp 1.

Randomness and uncertainty play an important

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Chapter 4 Multiple Random Variables

1 Lyapunov Stability Theory

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Transcription:

Lecture : Multlayer perceptros II Bayes dscrmats ad MLPs he role of hdde uts A eample Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

Bayes dscrmats ad MLPs ( As we have see throuhout the course, the classfer that mmzes the probablty of error could be epressed as a famly of dscrmat fuctos defed by the mamum a posteror arma{ ( P( } How does the output of a MLP relates to ths optmal classfer? Assume a MLP wth a oe-of-c ecod for the tarets f t( 0 otherwse he cotrbuto to the error of the th output euro s J W ;W t ( ( ( ( ( ;W + ( ( ;W 0 Where (;W s the dscrmat fucto computed by the MLP for the th class ad the set of wehts W ( ( ;W + ( ( ;W 0 ( ;W Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

Bayes dscrmats ad MLPs ( For a fte umber of eamples, the prevous crtero fucto becomes lm J ( W lm ( ( ;W + ( ( ;W 0 P ( ( ;W + lm lm ( ( ;W 0 ( ( ( ;W P( d + P( k ( ( ;W 0 P( k ( ( ;W ( ;W + P(, d + ( ;W P(, k ( ;W ( P(, + P(, k d ( ;W P(, d + P(, ( ;W P( d ( ;W P( P( d + P ( P( d P ( P( d + P(, ( ( ;W P( P( d P ( P( d + P(, lm lm 444444 444444 depedet of W d d d d d Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

Bayes dscrmats ad MLPs ( he back propaato rule chaes W to mmze J(W, so fact t s mmz ( ( ;W P( P( d Summ over all classes (output euros, the we coclude that back-prop also mmzes So, the lmt of fte eamples, the outputs of the MLP wll appromate ( a leastsquares sese the true a posteror probabltes otce that oth sad here s specfc to MLPs Ay dscrmat fucto wth adaptve parameters traed to mmze the sum squared error at the output of a -of-c ecod wll appromate the a posteror probabltes hs result wll be true f ad oly f he MLP has eouh hdde uts to represet the a posteror destes ad We have a fte umber of eamples ad he MLP does ot et trapped a local mma I practce we wll have a lmted umber of eamples so C ( ( ( ( ;W P P ( ;W P( he outputs wll ot always represet probabltes For stace, there s o uaratee that they wll sum up to We ca use ths result to determe f the etwork has traed properly If the sum of the outputs dffers sfcatly from, t wll be a dcato that the MLP s ot model the a posteror destes properly ad that we may have to chae the MLP (topoloy, umber of hdde uts, etc. d Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 4

he role of hdde uts ( Let us assume a MLP wth o-lear actvato fuctos for the hdde layer(s ad lear actvato fucto for the output layer ω ω If we hold costat the set of hdde wehts w <m> m..m-, the mmzato of the obectve fucto J(W wth respect of the output wehts w <M> becomes a lear optmzato problem ad ca, therefore, be solved closed form W arm W C k ( ( ( yk tk It ca be show [Bshop, 99] that the role of the output bases s to compesate for the dfferece betwee the averaes (over the data set of the taret values ad the wehted sum of the averaes of the hdde ut outputs w H M < M > [ ] w E[ ] 0k E tk k y hs allows us to ore the mea of the outputs ad tarets ad epress the obectve fucto as (dropp de ( for clarty C HM ( < > < > M M ~ J W w ~ k y t { k zero mea zero mea < M > < M > < M > ~ y y E[ y ] where ~ t t E[ t ] D w <m>, m..m- w <M> ω C Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty

he role of hdde uts ( o fd the optmal output wehts w <M> we form the partal dervatve H J W M < > > M w k y~ M w k We troduce the follow matr otato W <M> deotes the wehts of the lear layer <M-> deotes the zero-mea outputs of the last hdde layer (each colum s a eample, each row s a output deotes the zero-mea tarets (each colum s a eample, each row s a output So the prevous mmzato problem becomes Ad the optmal set of wehts W <M> becomes It s mportat to otce that ths soluto ca be calculated eplctly, o teratve procedure (.e. steepest descet s ecessary We ow tur our atteto to the hdde layer(s W ( < M > < M > < t < M > < M > 0 Us matr otato we ca aa epress the obectve fucto as C HM ( < > < > M M ~ J W w k y~ t { k zero mea zero mea < M > < M > r ( W ( W W < M > < M > < M > < M > < M > ( ( 4444 4444 pseudo verse of < M > ~ y~ 0 Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 6

he role of hdde uts ( Substtut the optmal value of W <M> yelds J W r ( ( ( r { where S S B depedet of W S S B Sce the product s depedet of W, the mmzato of J(W s equvalet to mamz J (W ( r [ S S J W B ] Sce we are us -of-c ecod the output layer, t ca be show that S B becomes C < M > < M > < M > < M > yk E y k SB k ( yk yk ( yk yk where k yk E[ y k] otce that ths S B dffers from the covetoal betwee-class covarace matr by hav k stead of k. hs meas that the MLP wll have a stro bas favor of classes that have a lare umber of eamples COCLUSIO [ ] Choos the optmum wehts of a MLP to mmze the square error at the output layer forces the wehts of the hdde layer(s to be chose so that the trasformato from the put data to the output of the (last hdde layer mamzes the dscrmat fucto r[s B S ] measured at the output of the (last hdde layer hs s precsely why MLPs have bee demostrated to perform classfcato tasks so well Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 7

A eample We tra a two-layer MLP to classfy fve odors from a array of sty as sesors he MLP has sty puts, oe for each as sesor he MLP has fve outputs, oe for each odor Output euros use the -of-c ecod of classes Four hdde euros are used (as may as LDA proectos he hdde layer has the lostc smodal actvato fucto he output layer has lear actvato fucto ra Hdde wehts ad bases traed wth steepest descet rule Output wehts ad bases traed wth the pseudo-verse rule 4 Sesor respose 0 - - - 60 ω ω ω ω 4 ω 0 0 0 0 40 0 60 70 80 90 orae apple cherry frut-puch tropcal-puch Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 8

A eample: results 0.8 0.8 4 4 44 44 4 hdde euro 0.6 0.4 0. 4 4 0. 0.4 0.6 0.8 hdde euro hdde euro 4 0.6 0.4 0. 0. 0.4 0.6 0.8 hdde euro 4 ω Sesor respose 0 - - - 0 0 0 0 40 0 60 70 80 90 orae apple cherry frut-puch tropcal-puch 60 ω ω ω 4 ω output euros 4 0 40 60 80 eample Itroducto to Patter Recoto Rcardo Guterrez-Osua Wrht State Uversty 9