Artificial Neural Networks

Similar documents
Support Vector Machines

Reading Group on Deep Learning Session 1

Multilayer Perceptron

Machine Learning Lecture 5

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Lecture 4: Perceptrons and Multilayer Perceptrons

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Models for Classification

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Artificial Neural Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

MODULE -4 BAYEIAN LEARNING

ECE521 week 3: 23/26 January 2017

Neural Networks and the Back-propagation Algorithm

Neural Network Training

Artificial Neural Networks

Artifical Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

An Introduction to Statistical and Probabilistic Linear Models

Artificial Neural Networks. MGS Lecture 2

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Ch 4. Linear Models for Classification

Multi-layer Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

ECE521 Lecture7. Logistic Regression

Introduction to Neural Networks

Advanced statistical methods for data analysis Lecture 2

CSC 411: Lecture 04: Logistic Regression

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

COMP-4360 Machine Learning Neural Networks

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

MLPR: Logistic Regression and Neural Networks

The Perceptron Algorithm

Linear & nonlinear classifiers

From perceptrons to word embeddings. Simon Šuster University of Groningen

Artificial Neural Network

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Data Mining Part 5. Prediction

PATTERN RECOGNITION AND MACHINE LEARNING

Lecture 5: Logistic Regression. Neural Networks

Computational statistics

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Neural networks and support vector machines

Classification with Perceptrons. Reading:

Introduction to Neural Networks

CSC242: Intro to AI. Lecture 21

Linear discriminant functions

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Artificial Intelligence

Linear Models for Regression

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Machine Learning Lecture 7

Machine Learning. 7. Logistic and Linear Regression

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Neural Networks (Part 1) Goals for the lecture

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

NEURAL NETWORKS

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

y(x n, w) t n 2. (1)

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Lecture 3: Pattern Classification

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Linear Models for Regression

Ways to make neural networks generalize better

Pattern Recognition and Machine Learning

Multilayer Perceptron

Logistic Regression & Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

1 Machine Learning Concepts (16 points)

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Iterative Reweighted Least Squares

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Deep Feedforward Networks

Introduction to Neural Networks

Relevance Vector Machines

Bayesian Machine Learning

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Computational Intelligence Winter Term 2017/18

Artificial Neural Networks. Edward Gatt

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neural Networks 2

Transcription:

Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Knowledge textbook verbal rules rule-based systems experience non-verbal patterns pattern recognition

A real-life situation

and its abstraction (f, 30,1,0,67.8,12.2, ) (m, 52,1,1,57.4,8.9, ) (m, 28, 1,1,51.1,19.2, ) (f, 46, 1,1,16.3,9.5.2, ) (m, 65,1,0,56.1,17.4, ) (m, 38, 1,0,22.8,19.2, ) Model(p)

Another real-life situation benign lesion malignant lesion

Example: Logistic regression 1 y = 1 + e ( b 1 x 1 +b 2 x 2 +b 0 )

So why use ANNs? Human brain good at pattern recognition Mimic structure and processing of brain: Parallel processing Distributed representation Expect: Fault tolerance Good generalization capability More flexible than logistic regression

Overview Motivation Perceptrons Multilayer perceptrons Improving generalization Bayesian perspective

Terminology input output weights learning covariate dependent var. parameters estimation

ANN topology

Artificial neurons

Activation functions

Hyperplanes A vector w = (w 1,,w n ) defines a hyperplane Hyperplane divides n-space of points x = (x 1,,x n ): w 1 x 1 + + w n x n > 0 w 1 x 1 + + w n x n = 0 (the plane itself) w 1 x 1 + + w n x n < 0 Abbreviation: w x := w 1 x 1 + + w n x n

Linear separability Hyperplane through origin: w x = 0 Bias w 0 to move hyperplane from origin: w x + w 0 = 0

Linear separability Convention: w := (w 0,w), x := (1,x) Class labels t i {+1,-1} Error measure E = -Σ t i (w x i ) i miscl. How to minimize E?

Linear separability Error measure E = -Σ t i (w x i ) 0 i miscl. -1 +1 { x w x > 0 } { x w x < 0 }

Gradient descent Simple function minimization algorithm Gradient is vector of partial derivatives Negative gradient is direction of steepest descent

Perceptron learning Find minimum of E by iterating w k+1 = w k η grad w E E = -Σ t i (w x i ) i miscl. grad w E = -Σ t i x i i miscl. online version: pick misclassified x i w k+1 = w k + η t i x i

Perceptron learning Update rule w k+1 = w k + η t i x i Theorem: perceptron learning converges for linearly separable sets

Why? From perceptrons to multilayer perceptrons

Multilayer perceptrons Sigmoidal hidden layer Can represent arbitrary decision regions Can be trained similar to perceptrons

Decision theory Pattern recognition not deterministic Needs language of probability theory Given abstraction x: x model C 1 C 2 Decide C 1 if P(C 1 x) > P(C 2 x)

Some background math Have data set D = {(x i,t i )} drawn from probability distribution P(x,t) Model P(x,t) given samples D by ANN with adjustable parameter w Statistics analogy:

Some background math Maximize likelihood of data D Likelihood L = Π p(x i,t i ) = Π p(t i x i )p(x i ) Minimize -log L = -Σ log p(t i x i ) -Σ log p(x i ) Drop second term: does not depend on w Two cases: regression and classification

Likelihood for regression For regression, targets t are real values Minimize -Σ log p(t i x i ) Assume network outputs y(x i,w) are noisy targets t i Minimizing log L equivalent to minimizing Σ (y(x i,w) t i ) 2 (sum-of-squares error)

Likelihood for classification For classification, targets t are class labels Minimize -Σ log p(t i x i ) Assume network outputs y(x i,w) are P(C 1 x) Minimizing log L equivalent to minimizing -Σ t i log y(x i,w) +(1 t i ) * log(1-y(x i,w)) (cross-entropy error)

Backpropagation algorithm Minimizing error function by gradient descent: w k+1 = w k η grad w E Iterative gradient calculation by propagating error signals

Backpropagation algorithm Problem: how to set learning rate η? Better: use more advanced minimization algorithms (second-order information)

Backpropagation algorithm Classification cross-entropy Regression sum-of-squares

ANN output for regression Mean of p(t x)

ANN output for classification P(t = 1 x)

Improving generalization Problem: memorizing (x,t) combinations ( overtraining ) 0.7 0.5 0-0.5 0.9 1-0.2-1.2 1 0.3 0.6 1-0.2 0.5?

Improving generalization Need test set to judge performance Goal: represent information in data set, not noise How to improve generalization? Limit network topology Early stopping Weight decay

Limit network topology Idea: fewer weights less flexibility Analogy to polynomial interpolation:

Limit network topology

Early stopping Idea: stop training when information (but not noise) is modeled Need validation set to determine when to stop training

Early stopping

Weight decay Idea: control smoothness of network output by controlling size of weights Add term - α w 2 to error function

Weight decay

Bayesian perspective Error function minimization corresponds to maximum likelihood (ML) estimate: single best solution w ML Can lead to overtraining Bayesian approach: consider weight posterior distribution p(w D). Advantage: error bars for regression, averaged estimates for classification

Bayesian perspective Posterior = likelihood * prior p(w D) = p(d w) p(w)/p(d) Two approaches to approximating p(w D): Sampling Gaussian approximation

Sampling from p(w D) prior * likelihood = posterior

Gaussian approx. to p(w D) Find maximum w MAP of p(w D) Approximate p(w D) by Gaussian around w MAP Fit curvature:

Gaussian approx. to p(w D) Max p(w D) = min log p(w D) = min log p(d w) log p(w) Minimizing first term: finds ML solution Minimizing second term: for zero-mean Gaussian prior p(w) adds term - α w 2 Therefore, adding weight decay amounts to finding MAP solution!

Bayesian example for regression

Bayesian example for classification

Summary ANNs inspired by functionality of brain Nonlinear data model Trained by minimizing error function Goal is to generalize well Avoid overtraining Distinguish ML and MAP solutions

Pointers to the literature Lisboa PJ. A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Netw. 2002 Jan;15(1):11-39. Almeida JS. Predictive non-linear modeling of complex data by artificial neural networks. Curr Opin Biotechnol. 2002 Feb;13(1):72-6. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001 Aug;23(1):89-109. Dayhoff JE, DeLeo JM. Artificial neural networks: opening the black box. Cancer. 2001 Apr 15;91(8 Suppl):1615-35. Basheer IA, Hajmeer M. Artificial neural networks: fundamentals, computing, design, and application. J Microbiol Methods. 2000 Dec 1;43(1):3-31. Bishop, CM. Neural Networks for Pattern Recognition. Oxford University Press 1995.