Intelligent Systems Discriminative Learning, Neural Networks

Similar documents
Intelligent Systems Statistical Machine Learning

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks and Deep Learning

Advanced statistical methods for data analysis Lecture 2

Neural Networks: Introduction

Lecture 5: Logistic Regression. Neural Networks

Machine Learning Lecture 5

Intelligent Systems Statistical Machine Learning

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural networks. Chapter 20, Section 5 1

Linear Models for Classification

Neural networks. Chapter 20. Chapter 20 1

Lab 5: 16 th April Exercises on Neural Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Unit III. A Survey of Neural Network Model

Neural networks. Chapter 19, Sections 1 5 1

Single layer NN. Neuron Model

Neural Networks and the Back-propagation Algorithm

Multilayer Perceptron

An Introduction to Statistical and Probabilistic Linear Models

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Multi-layer Neural Networks

Linear discriminant functions

10-701/ Machine Learning, Fall

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Support Vector Machines

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Simple Neural Nets For Pattern Classification

In the Name of God. Lecture 11: Single Layer Perceptrons

Ch 4. Linear Models for Classification

The Perceptron Algorithm

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Neural Networks Lecture 2:Single Layer Classifiers

Support Vector Machines and Kernel Methods

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Learning Linear Detectors

Computational statistics

ECE521 Lectures 9 Fully Connected Neural Networks

Artificial Neural Networks

Discriminative Models

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Machine Learning Lecture 10

y(x n, w) t n 2. (1)

Lecture 17: Neural Networks and Deep Learning

Machine Learning : Support Vector Machines

Artificial Neuron (Perceptron)

Computational Intelligence Winter Term 2017/18

Logistic Regression. Machine Learning Fall 2018

CSC321 Lecture 5: Multilayer Perceptrons

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

MLPR: Logistic Regression and Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Artificial Neural Networks

Linear & nonlinear classifiers

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Discriminative Models

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Introduction to Deep Learning

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Notes on Back Propagation in 4 Lines

Grundlagen der Künstlichen Intelligenz

Artificial Neural Networks. MGS Lecture 2

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

The Perceptron. Volker Tresp Summer 2016

Advanced Machine Learning

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Networks of McCulloch-Pitts Neurons

Logistic Regression & Neural Networks

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

CS489/698: Intro to ML

Artificial neural networks

Feedforward Neural Nets and Backpropagation

Deep Learning for Computer Vision

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Revision: Neural Network

Bias-Variance Tradeoff

Transcription:

Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015,

Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm 2) Non-linear decision rules 3. Feed-Forward Neural Networks: 1) Architecture 2) Modeling abilities 3) Learning Error Back-Propagation Intelligent Systems: Discriminative Learning, Neural Networks 2

Discriminant Functions Let a parameterized family of probability distributions be given. Each particular p.d. leads to a classifier (for a fixed loss). The final goal is the classification (applying the classifier). Generative approach: 1. Learn the parameters of the probability distribution (e.g. ML) 2. Derive the corresponding classifier (e.g. Bayes) 3. Apply the classifier for test data Discriminative approach: 1. Learn the unknown parameters of the classifier directly 2. Apply the classifier for test data If the family of classifiers is well parameterized, it is not necessary to consider the underlying probability distribution at all!!! Intelligent Systems: Discriminative Learning, Neural Networks 3

Example: two Gaussians Assume, we know the probability model: two Gaussians of equal variance, i.e. k 2 {1, 2}, x 2 R n, p(k, x) =p(k) p(x k) p(x k) = 1 ( p 2 ) n exp apple x µk 2 2 2 Assume, at the end we want to do the Maximum A-posteriori Decision Consequently (see the previous lecture), the classifier is a separating plane (a linear classifier). Let us search during the learning just for a good separating plane instead to learn the unknown parameters of the underlying probability model Intelligent Systems: Discriminative Learning, Neural Networks 4

Neuron Hunan vs. Computer (two nice pictures from Wikipedia) Intelligent Systems: Discriminative Learning, Neural Networks 5

Neuron (McCulloch and Pitt, 1943) Input: Weights: Activation: Output: x 2 R n w 2 R n b 2 R y = f(y 0 b)=f(hw, xi b) Step-function f(y 0 )= 1 if y 0 > 0 0 otherwise hx, wi 7 b Sigmoid-function (differentiable!!!) f(y 0 )= 1 1 + exp( y 0 ) Intelligent Systems: Discriminative Learning, Neural Networks 6

Geometric interpretation hx, wi = x w cos Let w be normalized, i.e. w =1 ) x cos is the length of the projection of x onto w. Separating plane: hx, wi = const Neuron implements a linear classifier Intelligent Systems: Discriminative Learning, Neural Networks 7

A special case boolean functions Input: Output: x =(x 1,x 2 ),x i 2 {0, 1} y = x 1 &x 2 Find w and b so, that step(w 1 x 1 + w 2 x 2 b)=x 1 &x 2 x 1 x 2 y 0 0 0 0 1 0 1 0 0 1 1 1 w 1 = w 2 =1, b =1.5 Disjunction, other boolean functions, but XOR Intelligent Systems: Discriminative Learning, Neural Networks 8

The (one possible) learning task Given: training data L = (x 1,y 1 ), (x 2,y 2 )...(x L,y L ),x l 2 R n,y l 2 {0, 1} Find: w 2 R n,b2 R so that f(hx l,wi b) =y l for all l =1,...,L For a step-neuron: system of linear inequalities hx l,wi >b if y l =1 hx l,wi <b if y l =0 Solution is not unique in general!!! Intelligent Systems: Discriminative Learning, Neural Networks 9

Preparation 1 Eliminate the bias: The trick modify the training data x =(x 1,x 2,...,x n ) ) x =(x 1,x 2,...,x n, 1) w =(w 1,w 2,...,w n ) ) w =(w 1,w 2,...,w n,b) hx l,wi 7 b ) h x l, wi 7 0 Example in 1D non-separable without the bias separable without the bias Intelligent Systems: Discriminative Learning, Neural Networks 10

Preparation 2 Remove the sign: The trick the same ˆx l = x l for all l with y l =1 ˆx l = x l for all l with y l =0 All in all: hx l,wi >b if y l =1 hx l,wi <b if y l =0 ) hˆx l, wi > 0 Intelligent Systems: Discriminative Learning, Neural Networks 11

Perceptron Algorithm (Rosenblatt, 1958) Solution of a system of linear inequalities: 1. Search for an equation that is not satisfied, i.e. hx l,wiapple0 2. If not found Stop else update w new = w old + x l go to 1. The algorithm terminates in a finite number of steps if a solution exists (the training data are separable) The solution is a convex combination of the data points Intelligent Systems: Discriminative Learning, Neural Networks 12

An example problem Consider another decision rule for a real valued feature x 2 R : a n x n + a n 1 x n 1 +...+ a 1 x + a 0 = X i a i x i 7 0 It is not a linear classifier anymore but a polynomial one. The task is again to learn the unknown coefficients a i given the training data (x l,y l )..., x l 2 R, y l 2 {0, 1} Is it also possible to do that in a Perceptron-like fashion? Intelligent Systems: Discriminative Learning, Neural Networks 13

An example problem The idea: reduce the given problem to the Perceptron-task. Observation: although the decision rule is not linear with respect to x, it is still linear with respect to the unknown coefficients a i The same trick again modify the data: w =(a n,a n 1,...,a 1,a 0 ) x =(x n,x n 1,...,x,1) ) X i a i x i = h x, wi In general, it is very often possible to learn non-linear decision rules by the Perceptron algorithm using an appropriate transformation of the input space (more examples at seminars). Extension Support Vector Machines, Kernels Intelligent Systems: Discriminative Learning, Neural Networks 14

Many classes Before: two classes a mapping Now: many classes a mapping R n! {0, 1} R n! {1, 2,...,K} How to generalize? How to learn? Two simple (straightforward) approaches: The first one: one vs. all there is one binary classifier per class, that separates this class from all others. The classification is ambiguous in some areas. Intelligent Systems: Discriminative Learning, Neural Networks 15

Many classes Another one: pairwise classifiers there is a classifier for each class pair Less ambiguous, better separable. However: K(K 1)/2 binary classifiers instead of K in the previous case. The goal: no ambiguities, K parameter vectors ) Fisher Classifier Intelligent Systems: Discriminative Learning, Neural Networks 16

Fisher classifier Idea: in the binary case the output greater is the scalar product hx, wi y is the more likely to be 1 the generalization: y = arg max hx, w ki k Geometric interpretation (let be normalized) w k Consider projections of an input vector onto vectors w k x The input space is partitioned into the set of convex cones. Intelligent Systems: Discriminative Learning, Neural Networks 17

Feed-Forward Neural Networks Output level i-th level First level Input level X y ij = f w ijj 0y i 1j 0 b ij j 0 Special case: m =1, Step-neurons a mapping R n! {0, 1} Intelligent Systems: Discriminative Learning, Neural Networks 18

What we can do with it? One level single step-neuron linear classifier Intelligent Systems: Discriminative Learning, Neural Networks 19

What we can do with it? Two levels, & -neuron as the output intersection of half-spaces If the number of neurons is not limited, all convex subspaces can be implemented with an arbitrary precision. Intelligent Systems: Discriminative Learning, Neural Networks 20

What we can do with it? Three levels all possible mappings subspaces: R n! {0, 1} as union of convex Three levels (really even less) are enough to implement all possible mappings!!! Intelligent Systems: Discriminative Learning, Neural Networks 21

Learning Error Back-Propagation Learning task: Given: training data (x l,k l ),..., x l 2 R n, k l 2 R Find: all weights and biases of the net. Error Back-Propagation is a gradient descent method for Feed- Forward-Networks with Sigmoid-neurons First, we need an objective (error to be minimized) F (w, b) = X l k l y(x l ; w, b) 2! min w,b Now: derive, build the gradient and go. Intelligent Systems: Discriminative Learning, Neural Networks 22

Error Back-Propagation We start from a single neuron and just one example (x, k). Remember: F (w, b) =(k y) 2 y = 1 1 + exp( y 0 ) y 0 = hx, wi = X j x j w j Derivation according to the chain-rule: @F (w, b) @w j = @F @y @y @y 0 @y0 @w j = =(y k) exp( y 0 ) (1 + exp( y 0 )) 2 x j = d(y 0 ) x j Intelligent Systems: Discriminative Learning, Neural Networks 23

Error Back-Propagation In general: compute errors at the i-th level from all -s at the (i + 1) -th level propagate the error. The Algorithm (for just one example (x, k) ): 1. Forward: compute all y and y 0 (apply the network), compute the output error n = y n k ; 2. Backward: compute errors in the intermediate levels: ij = X j 0 i+1j 0 d(y 0 i+1j 0 ) w i+1j 0 j 3. Compute the gradient and apply it: @F @w ijj 0 = ij d(y 0 ij) y i 1j 0 For many examples just sum them up. Intelligent Systems: Discriminative Learning, Neural Networks 24

A special case Convolutional Networks Local features convolutions with a set of predefined masks (more detailed in lectures Computer Vision ). Intelligent Systems: Discriminative Learning, Neural Networks 25

Convolutional Networks Yann LeCun, Koray Kavukcuoglu and Clement Farabet Convolutional Networks and Applications in Vision Intelligent Systems: Discriminative Learning, Neural Networks 26

Neural Networks Summary 1. Discriminative learning: learn decision strategies directly without to consider the underlying probability model at all 3. Neurons are linear classifiers. 4. Learning by the Perceptron-algorithm. 5. Many non-linear decision rules can be transformed to linear ones by the corresponding transformation of the input space. 6. Classification into more than two classes is possible either by naïve approaches (e.g. by a set of one-vs-all simple classifiers) or more principled by Fisher-Classifier. 5. Feed-Forward Neural Networks implement (arbitrary complex) decision strategies. 6. Error Back-Propagation for learning (gradient descent). 7. Some special cases are useful in Computer Vision. Intelligent Systems: Discriminative Learning, Neural Networks 27