Statistical Machine Learning from Data

Size: px
Start display at page:

Download "Statistical Machine Learning from Data"

Transcription

1 January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland bengio@idiap.ch

2 Samy Bengio Statistical Machine Learning from Data 2 1 Generalities

3 Samy Bengio Statistical Machine Learning from Data 3 1 Generalities

4 Artificial Neural Networks Samy Bengio Statistical Machine Learning from Data 4 y y = g(s) s = f(x;w) w1 w2 x1 x2 An ANN is a set of units (neurons) connected to each other Each unit may have multiple inputs but have one output Each unit performs 2 functions: integration: s = f (x; θ) transfer: y = g(s)

5 Artificial Neural Networks: Functions Samy Bengio Statistical Machine Learning from Data 5 Example of integration function: s = θ 0 + i x i θ i Examples of transfer functions: tanh: y = tanh(s) 1 sigmoid: y = 1 + exp( s) Some units receive inputs from the outside world. Some units generate outputs to the outside world. The other units are often named hidden. Hence, from the outside, an ANN can be viewed as a function. There are various forms of ANNs. The most popular is the Multi Layer Perceptron (MLP).

6 Transfer Functions (Graphical View) Samy Bengio Statistical Machine Learning from Data 6 y = tanh(x) 1 1 y = sigmoid(x) y = x 1

7 Samy Bengio Statistical Machine Learning from Data 7 Generalities Introduction Characteristics 1 Generalities

8 Introduction Characteristics (Graphical View) outputs units output layer hidden layers parameters input layer inputs Samy Bengio Statistical Machine Learning from Data 8

9 Introduction Characteristics An MLP is a function: ŷ = MLP(x; θ) The parameters θ = {w l i,j, bl i : i, j, l} From now on, let x i (p) be the i th value in the p th example represented by vector x(p) (and when possible, let us drop p). Each layer l (1 l M) is fully connected to the previous layer Integration: s l i = b l i + j y l 1 j w l i,j Transfer: yi l = tanh(si l) or y i l = sigmoid(si l) or y i l = si l The output of the zeroth layer contains the inputs yi 0 = x i The output of the last layer M contains the outputs ŷ i = yi M Samy Bengio Statistical Machine Learning from Data 9

10 Characteristics of MLPs Introduction Characteristics An MLP can approximate any continuous functions However, it needs to have at least 1 hidden layer (sometimes easier with 2), and enough units in each layer Moreover, we have to find the correct value of the parameters θ This is an NP-complete problem!!!! How can we find these parameters? Answer: optimize a given criterion using a gradient method. Note: capacity controlled by the number of parameters Samy Bengio Statistical Machine Learning from Data 10

11 Samy Bengio Statistical Machine Learning from Data 11 Separability Generalities Introduction Characteristics Linear Linear+Sigmoid Linear+Sigmoid+Linear Linear+Sigmoid+Linear+Sigmoid+Linear

12 Criterion Basics Derivation Algorithm and Example Universal Approximator Samy Bengio Statistical Machine Learning from Data 12 1 Generalities

13 Generalities Criterion Basics Derivation Algorithm and Example Universal Approximator Objective: minimize a criterion C over a set of data D n : n C(D n, θ) = L(y(p), ŷ(p)) where p=1 ŷ(p) = MLP(x(p); θ) We are searching for the best parameters θ: θ = arg min θ C(D n, θ) Gradient descent: an iterative procedure where, at each iteration s we modify the parameters θ: θ s+1 = θ s η C(D n, θ s ) θ s where η is the learning rate. WARNING: local optima. Samy Bengio Statistical Machine Learning from Data 13

14 : The Basics Criterion Basics Derivation Algorithm and Example Universal Approximator a = f(b) Chain Rule if a = f (b) and b = g(c) then a c = a b b c = f (b) g (c) b = g(c) c Samy Bengio Statistical Machine Learning from Data 14

15 : The Basics Criterion Basics Derivation Algorithm and Example Universal Approximator Sum Rule if a = f (b, c) and b = g(d) and c = h(d) a = f(b,c) then a d = a b b d + a c c d b = g(d) c = h(d) a d f (b, c) = g f (b, c) (d)+ h (d) b c d Samy Bengio Statistical Machine Learning from Data 15

16 Criterion Basics Derivation Algorithm and Example Universal Approximator Basics (Graphical View) criterion targets outputs back propagation of the error parameter to tune inputs Samy Bengio Statistical Machine Learning from Data 16

17 : Criterion Criterion Basics Derivation Algorithm and Example Universal Approximator First: we need to pass the gradient through the criterion The global criterion C is: n C(D n, θ) = L(y(p), ŷ(p)) p=1 Example: the mean squared error criterion (MSE): L(y, ŷ) = d i=1 1 2 (y i ŷ i ) 2 And the derivative with respect to the output ŷ i : L(y, ŷ) ŷ i = ŷ i y i Samy Bengio Statistical Machine Learning from Data 17

18 Samy Bengio Statistical Machine Learning from Data 18 Generalities : Last Layer Criterion Basics Derivation Algorithm and Example Universal Approximator Second: derivative wrt the parameters of the last layer M ŷ i = y M i = tanh(s M i ) s M i = b M i + j y M 1 j w M i,j Hence the derivative with respect to w M i,j is: ŷ i w M i,j = ŷ i s M i sm i w M i,j And the derivative with respect to b M i = (1 (yi M ) 2 ) y M 1 j is: ŷ i b M i = ŷ i s M i sm i b M i = (1 (y M i ) 2 ) 1

19 Samy Bengio Statistical Machine Learning from Data 19 Generalities : Other Layers Criterion Basics Derivation Algorithm and Example Universal Approximator Third: derivative wrt to the output of a hidden layer y l j ŷ i y l j = k ŷ i y l+1 k y l+1 k y l j where and y l+1 k y l j ŷ i y M i = y l+1 k s l+1 k sl+1 k yj l = (1 (y l+1 k ) 2 ) w l+1 k,j = 1 and ŷ i y M k i = 0

20 Samy Bengio Statistical Machine Learning from Data 20 Generalities Criterion Basics Derivation Algorithm and Example Universal Approximator : Other Parameters Fourth: derivative wrt the parameters of hidden layer y l j and ŷ i w l j,k ŷ i b l j = ŷ i y l j = ŷ i y l j = ŷ i y l j = ŷ i y l j y l j w l j,k y l 1 k y l j b l j 1

21 Criterion Basics Derivation Algorithm and Example Universal Approximator : Global Algorithm For each iteration 1 Initialize gradients C θ i = 0 for each θ i 2 For each example z(p) = (x(p), y(p)) 1 Forward phase: compute ŷ(p) = MLP(x(p), θ) 2 Compute L(y(p),ŷ(p)) ŷ(p) 3 For each layer l from M to 1: 1 Compute ŷ(p) y j l y l j 2 Compute y j l and b j l w j,k l 3 Accumulate gradients: C b l j = C + C bj l L C wj,k l 3 Update the parameters: θ s+1 = C + C wj,k l L L ŷ(p) ŷ(p) yj l L ŷ(p) ŷ(p) yj l = θ s i η C θ s i y l j b l j y l j w l j,k i Samy Bengio Statistical Machine Learning from Data 21

22 : An Example (1) Criterion Basics Derivation Algorithm and Example Universal Approximator Let us start with a simple MLP: 0.3 linear tanh tanh Samy Bengio Statistical Machine Learning from Data 22

23 Samy Bengio Statistical Machine Learning from Data 23 Generalities : An Example (2) Criterion Basics Derivation Algorithm and Example Universal Approximator We forward one example and compute its MSE: linear 1.07 MSE =1.23 target tanh tanh

24 Samy Bengio Statistical Machine Learning from Data 24 Generalities : An Example (3) Criterion Basics Derivation Algorithm and Example Universal Approximator We backpropagate the gradient everywhere: 0.3 db= dw= linear 1.07 MSE =1.23 target 0.5 dy = 1.57 dmse =1.57 ds = dw= dy= dy= tanh tanh db= ds= ds= dw=0.53 dw= dw= 0.53 dw= db=

25 : An Example (4) Criterion Basics Derivation Algorithm and Example Universal Approximator We modify each parameter with learning rate 0.1: 0.1 linear tanh tanh Samy Bengio Statistical Machine Learning from Data 25

26 Samy Bengio Statistical Machine Learning from Data 26 Generalities : An Example (5) Criterion Basics Derivation Algorithm and Example Universal Approximator We forward the same example and compute its (smaller) MSE: linear 0.24 MSE =0.27 target tanh tanh

27 Samy Bengio Statistical Machine Learning from Data 27 Generalities MLP are Universal Approximators Criterion Basics Derivation Algorithm and Example Universal Approximator It can be shown that, under reasonable assumptions, one can approximate any smooth function with an MLP with one layer of hidden units. First intuition: Let us consider a classification task Let us consider hard transfert functions for hidden units: { 1 if s > 0 y = step(s) = 0 otherwise Let us consider linear transfert functions for output units: First attempt: ŷ = c + y = s N M v i sign x j w i,j + b i i=1 j=1

28 Criterion Basics Derivation Algorithm and Example Universal Approximator Illustration: Universal Approximators Samy Bengio Statistical Machine Learning from Data 28

29 Criterion Basics Derivation Algorithm and Example Universal Approximator Illustration: Universal Approximators Samy Bengio Statistical Machine Learning from Data 29

30 Criterion Basics Derivation Algorithm and Example Universal Approximator Illustration: Universal Approximators Samy Bengio Statistical Machine Learning from Data 30

31 Criterion Basics Derivation Algorithm and Example Universal Approximator Illustration: Universal Approximators Samy Bengio Statistical Machine Learning from Data 31

32 Criterion Basics Derivation Algorithm and Example Universal Approximator Illustration: Universal Approximators... but what about that? Samy Bengio Statistical Machine Learning from Data 32

33 Samy Bengio Statistical Machine Learning from Data 33 Generalities Criterion Basics Derivation Algorithm and Example Universal Approximator Universal Approximation by Cosines Let us consider simple functions of two variables y(x 1, x 2 ) Fourrier decomposition: y(x 1, x 2 ) s A s (x 1 ) cos(sx 2 ) where coefficients of A s are functions of x 1. Further Fourrier decomposition: y(x 1, x 2 ) A s,l cos(lx 1 ) cos(sx 2 ) s l We know that cos(α) cos(β) = 1 2 cos(α + β) cos(α β): y(x 1, x 2 ) [ 1 A s,l 2 cos(lx 1 + sx 2 ) + 1 ] 2 cos(lx 1 sx 2 ) s l

34 Criterion Basics Derivation Algorithm and Example Universal Approximator Universal Approximation by Cosines The cos function can be approximated with linear combinations of step functions: f (z) = f 0 + i (f i+1 f i )step(z z i ) So y(x 1, x 2 ) can be approximated by a linear combination of step functions whose arguments are linear combinations of x 1 and x 2, and which can be approximated by tanh functions. Samy Bengio Statistical Machine Learning from Data 34

35 Samy Bengio Statistical Machine Learning from Data 35 Generalities Binary Multiclass Error Correcting Output Codes 1 Generalities

36 ANN for Binary Binary Multiclass Error Correcting Output Codes One output with target coded as { 1, 1} or {0, 1} depending on the last layer output function (linear, sigmoid, tanh,...) For a given output, the associated class corresponds to the nearest target. How to obtain class posterior probabilities: use a sigmoid with targets {0, 1} if the model is correctly trained (with, for instance MSE criterion) = ŷ(x) = E[Y X = x] = 1 P(Y = 1 X = x)+0 P(Y = 0 X = x) the output will thus encode P(Y = 1 X = x) Note: we do not optimize directly the classification error... Samy Bengio Statistical Machine Learning from Data 36

37 Samy Bengio Statistical Machine Learning from Data 37 Generalities ANN for Multiclass Binary Multiclass Error Correcting Output Codes Simplest solution: one-hot encoding One output per class, coded for instance as (0,, 1,, 0) For a given output, the associated class corresponds to the index of the maximum value in the output vector How to obtain class posterior probabilities: use a softmax: ŷ i = exp(s i) X exp(s j ) j each output i will encode P(Y = i X = x) Otherwise: each class corresponds to a different binary code For example for a 4-class problem, we could have an 8-dim code for each class For a given output, the associated class corresponds to the nearest code (according to a given distance) Example: Error Correcting Output Codes (ECOC)

38 Error Correcting Output Codes Binary Multiclass Error Correcting Output Codes Let us represent a 4-class problem with 6 bits: class 1: class 2: class 3: class 4: We then create 6 classifiers (or 1 classifier with 6 outputs) For example: the first classifier will try to separate classes 1 and 2 from classes 3 and 4 Samy Bengio Statistical Machine Learning from Data 38

39 Error Correcting Output Codes Binary Multiclass Error Correcting Output Codes Given our 4-class problem represented with 6 bits: class 1: class 2: class 3: class 4: When a new example comes, we compute the distance between the code obtained by the 6 classifiers and the 4 classes: obtained: distances: (let us use Manhatan distance) to class 1: 5 to class 3: 2 to class 2: 4 to class 4: 3 Samy Bengio Statistical Machine Learning from Data 39

40 Binary Multiclass Error Correcting Output Codes What is a Good Error Correcting Output Code How to devise a good error correcting output code? Maximize the minimum Hamming distance between any pair of code words. A good ECOC should satisfy two properties: Row separation. (Hamming distance) Column separation. Column functions should be as uncorrelated as possible with each other. Samy Bengio Statistical Machine Learning from Data 40

41 Stochastic Initialization Learning Rate Weight Decay Training Criteria Samy Bengio Statistical Machine Learning from Data 41 1 Generalities

42 Generalities Stochastic Initialization Learning Rate Weight Decay Training Criteria A good book to make ANNs working Content: G. B. Orr and K. Müller. Neural Networks: Tricks of the Trade Springer. Stochastic Gradient Initialization Learning Rate and Learning Rate Decay Weight Decay Samy Bengio Statistical Machine Learning from Data 42

43 Stochastic Stochastic Initialization Learning Rate Weight Decay Training Criteria The gradient descent technique is batch: First accumulate the gradient from all examples, then adjust the parameters What if the data set is very big, and contains redundencies? Other solution: stochastic gradient descent Adjust the parameters after each example instead Stochastic: we approximate the full gradient with its estimate at each example Nevertheless, convergence proofs exist for such method. Moreover: much faster for large data sets!!! Other gradient techniques: second order methods such as conjugate gradient: good for small data sets Samy Bengio Statistical Machine Learning from Data 43

44 Samy Bengio Statistical Machine Learning from Data 44 Initialization Generalities Stochastic Initialization Learning Rate Weight Decay Training Criteria How should we initialize the parameters of an ANN? One common problem: saturation When the weighted sum is big, the output of the tanh (or sigmoid) saturates, and the gradient tends towards 0 derivative is good derivative is almost zero weighted sum

45 Samy Bengio Statistical Machine Learning from Data 45 Initialization Generalities Stochastic Initialization Learning Rate Weight Decay Training Criteria Hence, we should initialize the parameters such that the average weighted sum is in the linear part of the transfer function: See Leon Bottou s thesis for details input data: normalized with zero mean and unit variance, targets: regression: normalized with zero mean and unit variance, classification: output transfer function is tanh: 0.6 and -0.6 output transfer function is sigmoid: 0.8 and 0.2 output transfer function is linear: 0.6 and -0.6 [ ] 1 parameters: uniformly distributed in, 1 fan in fan in

46 Stochastic Initialization Learning Rate Weight Decay Training Criteria Learning Rate and Learning Rate Decay How to select the learning rate η? If η is too big: the optimization diverges If η is too small: the optimization is very slow and may be stuck into local minima One solution: progressive decay initial learning rate η = η 0 learning rate decay η d At each iteration s: η(s) = η 0 (1 + s η d ) Samy Bengio Statistical Machine Learning from Data 46

47 Stochastic Initialization Learning Rate Weight Decay Training Criteria Learning Rate Decay (Graphical View) /(1+0.1*x) Samy Bengio Statistical Machine Learning from Data 47

48 Samy Bengio Statistical Machine Learning from Data 48 Weight Decay Generalities Stochastic Initialization Learning Rate Weight Decay Training Criteria One way to control the capacity: regularization For MLPs, when the weights tend to 0, sigmoid or tanh functions are almost linear, hence with low capacity Weight decay: penalize solutions with high weights and bias (in amplitude) C(D n, θ) = n L(y(p), ŷ(p)) + β θ 2 p=1 where β controls the weight decay. Easy to implement: n = θj s θ s+1 j p=1 j=1 θ 2 j L(y(p), ŷ(p)) η θj s η β θj s

49 Examples of Training Criteria Stochastic Initialization Learning Rate Weight Decay Training Criteria Mean-squared error, for regression: L(y, ŷ) = d i=1 1 2 (y i ŷ i ) 2 Cross-entropy criterion, for classification (targets { 1, 1}): Hard version: L(y, ŷ) = log(1 + exp( y i ŷ i )) L(y, ŷ) = 1 y i ŷ i + Samy Bengio Statistical Machine Learning from Data 49

50 Examples of Training Criteria Stochastic Initialization Learning Rate Weight Decay Training Criteria log(1 + exp(-x)) 1 - x + 0.5*x*x Samy Bengio Statistical Machine Learning from Data 50

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Other Artificial Neural Networks Samy Bengio IDIAP Research Institute, Martigny, Switzerland,

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 4 April 205 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models can

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Links between Perceptrons, MLPs and SVMs

Links between Perceptrons, MLPs and SVMs Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

I D I A P. A New Margin-Based Criterion for Efficient Gradient Descent R E S E A R C H R E P O R T. Ronan Collobert * Samy Bengio * IDIAP RR 03-16

I D I A P. A New Margin-Based Criterion for Efficient Gradient Descent R E S E A R C H R E P O R T. Ronan Collobert * Samy Bengio * IDIAP RR 03-16 R E S E A R C H R E P O R T I D I A P A New Margin-Based Criterion for Efficient Gradient Descent Ronan Collobert * Samy Bengio * IDIAP RR 03-16 March 14, 2003 D a l l e M o l l e I n s t i t u t e for

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

Computational Intelligence Winter Term 2017/18

Computational Intelligence Winter Term 2017/18 Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today Single-Layer Perceptron Accelerated Learning

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Computational Intelligence

Computational Intelligence Plan for Today Single-Layer Perceptron Computational Intelligence Winter Term 00/ Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Accelerated Learning

More information

word2vec Parameter Learning Explained

word2vec Parameter Learning Explained word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea Neural Networks Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea What is Neural Networks? One of supervised learning method using one or more hidden layer.

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16 COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Logistic Regression. Stochastic Gradient Descent

Logistic Regression. Stochastic Gradient Descent Tutorial 8 CPSC 340 Logistic Regression Stochastic Gradient Descent Logistic Regression Model A discriminative probabilistic model for classification e.g. spam filtering Let x R d be input and y { 1, 1}

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information