Evaluation of classifiers MLPs

Similar documents
Multilayer neural networks

Multi-layer neural networks

Classification learning II

Generative classification models

Multilayer Perceptron (MLP)

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Neural networks. Nuno Vasconcelos ECE Department, UCSD

EEE 241: Linear Systems

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Week 5: Neural Networks

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Logistic Classifier CISC 5800 Professor Daniel Leeds

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Pattern Classification

SDMML HT MSc Problem Sheet 4

Introduction to the Introduction to Artificial Neural Network

MATH 567: Mathematical Techniques in Data Science Lab 8

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

1 Convex Optimization

10-701/ Machine Learning, Fall 2005 Homework 3

Support Vector Machines

Lecture 10 Support Vector Machines. Oct

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Kristin P. Bennett. Rensselaer Polytechnic Institute

Neural Networks & Learning

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Lecture 10 Support Vector Machines II

IV. Performance Optimization

SVMs for regression Multilayer neural networks

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Multigradient for Neural Networks for Equalizers 1

Kernel Methods and SVMs Extension

Linear Classification, SVMs and Nearest Neighbors

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Lecture Notes on Linear Regression

CSC 411 / CSC D11 / CSC C11

Support Vector Machines

Video Data Analysis. Video Data Analysis, B-IT

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Ensemble Methods: Boosting

Generalized Linear Methods

Decision Analysis (part 2 of 2) Review Linear Regression

Support Vector Machines CS434

Introduction to Neural Networks. David Stutz

Which Separator? Spring 1

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

18-660: Numerical Methods for Engineering Design and Optimization

Support Vector Machines

Natural Language Processing and Information Retrieval

Feature Selection: Part 1

1 Input-Output Mappings. 2 Hebbian Failure. 3 Delta Rule Success.

Linear Feature Engineering 11

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Chapter 6 Support vector machine. Séparateurs à vaste marge

Maximum Likelihood Estimation (MLE)

Neural Networks. Neural Network Motivation. Why Neural Networks? Comments on Blue Gene. More Comments on Blue Gene

CSCI B609: Foundations of Data Science

Neural Networks. Class 22: MLSP, Fall 2016 Instructor: Bhiksha Raj

Supervised Learning NNs

Support Vector Machines

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Multi layer feed-forward NN FFNN. XOR problem. XOR problem. Neural Network for Speech. NETtalk (Sejnowski & Rosenberg, 1987) NETtalk (contd.

Classification as a Regression Problem

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Boostrapaggregating (Bagging)

Intro to Visual Recognition

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Support Vector Machines

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Gradient Descent Learning and Backpropagation

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

Homework Assignment 3 Due in class, Thursday October 15

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS294A Lecture notes. Andrew Ng

Radial-Basis Function Networks

Statistical Foundations of Pattern Recognition

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

CSE 252C: Computer Vision III

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Online Classification: Perceptron and Winnow

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Fundamentals of Neural Networks

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

Supporting Information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

CSE 546 Midterm Exam, Fall 2014(with Solution)

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Classification (klasifikácia) Feedforward Multi-Layer Perceptron (Dopredná viacvrstvová sieť) 14/11/2016. Perceptron (Frank Rosenblatt, 1957)

Model of Neurons. CS 416 Artificial Intelligence. Early History of Neural Nets. Cybernetics. McCulloch-Pitts Neurons. Hebbian Modification.

AGC Introduction

Transcription:

Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label ω that are classfed th a label predct α α ω 4 target ω 7 54 α

Evaluaton For any data set e use to test the model e can buld a confuson matrx: target ω ω α 4 7 predct α 54 agreement Error:? Evaluaton For any data set e use to test the model e can buld a confuson matrx: target predct α α Error: 37/3 Accuracy - Error 94/3 ω 4 ω 7 54 agreement

Evaluaton for bnary classfcaton Entres n the confuson matrx for bnary classfcaton have names: target ω ω α TP FP predct α FN TN TP: True postve ht FP: False postve false alarm TN: True negatve correct reecton FN: False negatve a mss Addtonal statstcs Senstvty recall Specfcty SENS SPEC TP TP + FN TN TN + FP Postve predctve value precson TP PPT TP + FP Negatve predctve value TN NPV TN + FN

Bnary classfcaton: addtonal statstcs Confuson matrx target predct 4 8 SENS 4/6 SPEC 8/9 PPV 4/5 NPV 8/ Ro and column quanttes: Senstvty SENS Specfcty SPEC Postve predctve value PPV Negatve predctve value NPV Bnary decsons: ROC..8.6 ω ω.4. x * -. - -5 - -5 5 5 Probabltes: SENS SPEC p p x > x* x ω x < x* x ω

Recever Operatng Characterstc ROC ROC curve plots : SN p x > x* x ω -SP p x > x* x ω for dfferent x*..8.6.4. ω x * ω -. - -5 - -5 5 5 SN p x > x* x ω.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 -SPEC p x > x* x ω ROC curve..8...8.8 Case Case Case 3.6.6.6.4.4.4... -. - -5 - -5 5 5 -. - -5 - -5 5 5 -. - -5 - -5 5 5 p x > x* x ω.9.8.7.6.5.4.3.....3.4.5.6.7.8.9 p x > x* x ω

Recever operatng characterstc ROC shos the dscrmnablty beteen the to classes under dfferent decson bases Decson bas can be changed usng dfferent loss functon Zero-one loss functon Msclassfcaton error Based on the zero-one loss functon Any msclassfed example counts as Correctly classfed example counts as ω ω ω α 4 α 7 54 8 α 4 76 agreement

General loss functon Error functon based on a more general loss functon Dfferent msclassfcatons have dfferent eght loss α our choce ω true label λ α ω loss for classfcaton Example: λ α ω α α α ω 3 3 ω ω 5 Bayesan decson theory More general loss functon Dfferent msclassfcatons have dfferent eght loss λ α ω Expected loss for the classfcaton choce R α x λ α ω P y ω x Also called condtonal rs Decson rule: α x Chooses label acton accordng to the nput The optmal decson rule α * x arg mn λ α ω P y ω x α α

Bayesan decson theory The optmal decson rule α * x arg mn λ α ω P y ω x α Ho to modfy classfers to handle dfferent loss? Dscrmnatve models: Drectly optmze the parameters accordng to the ne loss functon Generatve models: Learn probabltes as before Decsons about classes are based to mnmze the emprcal loss as seen above Calculatng the loss for data Confuson matrx: Counts of examples th: class label ω that are classfed th a label α α α α ω 4 7 ω 54 4 ω 8 76 agreement Loss N λ α ω N α ω

Multlayer neural netors Lnear unts x x Lnear regresson f x d + d x f x Logstc regresson f x p y x g x x d z d + x f x p y x x d On-lne gradent update: + α y f x x d The same On-lne gradent update: + α y f x + α y f x x + α y f x x

Lmtatons of basc lnear unts Lnear regresson f x + d x Logstc regresson f x p y x g d + x x x f x x x z p y x d d x d x d Functon lnear n nputs!! Lnear decson boundary!! Regresson th the quadratc model. Lmtaton: lnear hyper-plane only a non-lnear surface can be better

Classfcaton th the lnear model. Logstc regresson model defnes a lnear decson boundary Example: classes blue and red ponts Decson boundary.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Lnear decson boundary logstc regresson model s not optmal but not that bad 5 4 3 - - -3-4 -4-3 - - 3 4 5 6

When logstc regresson fals? Example n hch the logstc regresson model fals 5 4 3 - - -3-4 -4-3 - - 3 4 5 Lmtatons of lnear unts. Logstc regresson does not or for party functons - no lnear decson boundary exsts.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Soluton: a model of a non-lnear decson boundary

f x x Extensons of smple lnear unts use feature bass functons to model nonlneartes Lnear regresson m + φ x φ x φ x φ x - an arbtrary functon of x Logstc regresson f x g + φ x m x d φ m x m f x Learnng th extended lnear unts Feature bass functons model nonlneartes Lnear regresson m + φ x x φ x φ x Logstc regresson f x m g + φ x x d x φ m m Important property: The same problem as learnng of the eghts for lnear unts the nput has changed but the eghts are lnear n the ne nput Problem: too many eghts to learn

Mult-layered neural netors Alternatve ay to ntroduce nonlneartes to regresson/classfcaton models Idea: Cascade several smple neural models th logstc unts. Much le neuron connectons. Multlayer neural netor Also called a multlayer perceptron MLP Cascades multple logstc regresson unts Example: layer classfer th non-lnear decson boundares x x x d z z z p y x Input layer Hdden layer Output layer

Multlayer neural netor Models non-lneartes through logstc regresson unts Can be appled to both regresson and bnary classfcaton problems Input layer x x x d Hdden layer z z Output layer regresson f x f x z classfcaton f x p y x opton Multlayer neural netor Non-lneartes are modeled usng multple hdden logstc regresson unts organzed n layers The output layer determnes hether t s a regresson or a bnary classfcaton problem Input layer x Hdden layers Output layer regresson f x f x x classfcaton x d opton f x p y x

Learnng th MLP Ho to learn the parameters of the neural netor? Gradent descent algorthm Weght updates based on the error: J D α J D We need to compute gradents for eghts n all unts Can be computed n one bacard seep through the net!!! The process s called bac-propagaton Bacpropagaton --th level -th level +-th level x x z + l + z l x l + x z - output of the unt on level - nput to the sgmod functon on level - eght beteen unts and on levels - and + x z g z x

Bacpropagaton δ x u u n u f y K δ Update eght usng a data pont D J α D J z δ Let Then: x z z D J D J δ S.t. s computed from and the next layer + δ l x x l l l + + δ δ Last unt s the same as for the regular lnear unts: It s the same for the classfcaton th the log-lelhood measure of ft and lnear regresson th least-squares error!!! } { > < y D x x Learnng th MLP Gradent descent algorthm Weght update: D J α x z z D J D J δ x αδ x δ - -th output of the - layer - dervatve computed va bacpropagaton α - a learnng rate

Learnng th MLP Onlne gradent descent algorthm Weght update: α J onlne D u J onlne Du z J onlne Du z δ x αδ x x - -th output of the - layer δ - dervatve computed va bacpropagaton α - a learnng rate Onlne gradent descent algorthm for MLP Onlne-gradent-descent D number of teratons Intalze all eghts for :: number of teratons do select a data pont D u <xy> from D set learnng rate α compute outputs x for each unt compute dervatves δ va bacpropagaton update all eghts n parallel αδ x end for return eghts

Xor Example. lnear decson boundary does not exst.5.5 -.5 - -.5 - - -.5 - -.5.5.5 Xor example. Lnear unt

Xor example. Neural netor th hdden unts Xor example. Neural netor th hdden unts

MLP n practce Optcal character recognton dgts x Automatc sortng of mals 5 layer netor th multple output functons outputs 9 layer Neurons Weghts 5 3 4 3 3 5 784 336 x 4 nputs 336 784