Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Size: px
Start display at page:

Download "Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks"

Transcription

1 Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science

2 Topics covered in the lecture: Outline /4 Neuron types Layers Loss functions Computing loss gradients via backpropagation Learning neural networks Regularization

3 Neural Networks Overview Composition of simple linear or non-linear functions (neurons) parametrized by weights and biases 3/4 Training examples: T m = {(xi, yi) (X Y) i =,, m}, where X R n and Y R K Here we consider H a hypothesis space of neural networks having a fixed architecture Learning methods are based on Empirical Risk Minimization: R T m(h (w,b) ) = m m i= l(y i, h (w,b) (x i )), where h (w,b) H denotes a neural network parametrized by w and b Note that in the following I will use L(w) m RT m(h(w,b)) and ŷ i h (w,b)c (x i ) to simplify the notation

4 McCulloch-Pitts Perceptron x w 4/4 x w P s ŷ x n w n b x = ( ) T x, x,, x n R n input (feature vector) w = ( ) T w, w,, w n R n weights b R bias (threshold) s = w, { x R inner potential if s < 0 f(s) = activation function else ŷ = h (w,b) (x) {, } output (activity) ( n ) ŷ = f(s) = f w i x i + b = f ( w, x + b) i= It is the linear classifier we have already seen

5 McCulloch-Pitts Perceptron: Treating Bias x w 0 = b w 5/4 x w P s ŷ x n w n Treat bias as an extra fixed input x0 = weighted w0 = b: ŷ = f ( w, x + b) = f ( w, x + w 0 ) = f ( w, x ) x = (, x,, xn)t Rn+ w = (w0, w,, wn)t Rn+ Unless otherwise noted we will use x, w instead of x, w

6 Activation Functions 6/4 Step Function Bipolar Step Function Linear Sigmoid Hyperbolic Tangent ReLU Logistic sigmoid: σ(s) es = + e s e s + Note: tanh(s) = es e s e s = σ(s) + e s

7 Linear Neuron Training examples: T m = {(xi, yi) (Rn+ R) i =,, m} 7/4 Single neuron with linear activation function linear regression: Inputs: X = ŷ = s = x, w, x x n x T = x m x mn x T m ŷ R Targets: y = (y,, ym)t, yi R Outputs: ŷ = (ŷ,, ŷm)t, ŷi R For the whole dataset we get: ŷ = Xw, ŷ R m

8 Linear Neuron: Maximum Likelihood Estimation Assumption: data are Gaussian distributed with mean xi, w and variance σ : 8/4 Likelihood for iid data: p (y w, X, σ) = y i N ( x i, w, σ ) = x i, w + N ( 0, σ ) m p (y i w, x i, σ) = i= m i= = ( πσ ) m e σ mi= (y i w,x i ) = = ( πσ ) m e σ (y Xw)T (y Xw) Negative Log Likelihood (switching to minimization): ( πσ ) e σ (y i w,x i ) = L (w) = m log ( πσ ) + σ (y Xw)T (y Xw)

9 Linear Neuron: Maximum Likelihood Estimation (contd) Note that 9/4 m i= (y i w, x i ) }{{} l(y i,ŷ i ) = (y Xw) T (y Xw) is the sum-of-squares or squared error (SE) Minimization of L (w) least squares estimaton Solving L w = 0 we get w = ( X T X ) X T y (see seminar) y 08x + + (0, ) data model x

10 Logistic Sigmoid and Probability Denote: ŷ = σ(s), ŷ (0, ) 0/4 Sigmoid output can represent a parameter of the Bernoulli distribution: p(y ŷ) = Ber(y ŷ) = ŷ y ( ŷ) y = {ŷ for y = ŷ for y = 0 Motivation: log-odds linear model (see AE4B33RPZ) Binary classifier: h(ŷ) = { if ŷ > 0 else Sigmoid -0 0

11 Logistic Regression MCP neuron using sigmoid activation function logistic regression: /4 Inputs: X = ŷ = σ( w, x ), ŷ (0, ) x x n x T = x m x mn x T m Target class: y = (y,, ym)t, yi {0, } Output class: ŷ = (ŷ,, ŷm)t, ŷi (0, )

12 Logistic Regression MLE Leads to the Cross-Entropy Likelihood, for the logistic regression: /4 p(y w, X) = m i= Ber(y i ŷ i ) = m i= ŷ y i i ( ŷ i ) y i Negative Log Likelihood: L(w) = m i= [y i log ŷ i + ( y i ) log ( ŷ i )] }{{} l(y i,ŷ i ) This loss function is called the cross-entropy The l(yi, ŷi) is the negative log probability of the correct answer y i {0, } given by the model output ŷ i (0, )

13 Maximum Likelihood Estimation Maximum Likelihood Estimation: w = argmin w L(w) Derivative of the loss wrt to the sigmoid argument: 3/4 L s i = ŷ i y i (see seminar) Gradient wrt logistic regression parameters: m L w = i= L s i s i w = m i= x i (ŷ i y i ) = X T (ŷ y) L w = 0 has no analytical solution = use numerical methods

14 Rectified Linear Unit (ReLU) Definition f(s) = max(0, s) 4/4 Fast to compute Helps with vanishing gradients problem: the gradient is constant for s > 0, while for sigmoid-like activations it becomes increasingly small Leads to sparse representations: s < 0 turns the neuron completely off Unbounded: use regularization to prevent numerical problems Might block gradient propagation dead units Leaky ReLU ReLU -

15 Multilayer Perceptron (MLP) X 5/4 X X softmax fully connected hidden layer hidden layer output layer Feed-forward ANN Fully-connected layers MLP for regression would typically use linear output layer

16 Recurrent Neural Network (RNN) 6/4 P P P Fully-Connected Recurrent Neural Network (FRNN) Both inputs and outputs are sequences Feedback connections memory

17 Modular and Hierarchical Architectures 7/4 module module Layers can be organized in modules Hierarchies of modules Module reuse

18 Linear Layer Output k: ŷk = x, wk, k = 0,,, K 8/4 All outputs using weight matrix W: ŷ = xt W Multiple samples: Ŷ = XW X W = w T w T K T = w 0 w 0K w n w nk x x x n w 0 3 w nk K ŷ ŷ ŷ 3 ŷ K X = x T = x x n Ŷ = ŷ T = ŷ ŷ K x T m x m x mn y T m ŷ m ŷ mk

19 Softmax Layer 9/4 s s s 3 s K softmax 3 K ŷ ŷ ŷ 3 ŷ K Multinominal classification, K mutually exclusive classes Definition: σk(s) e s k K c= es c, where K is the number of classes Softmax represents a probability distribution: σk (0, ) for k { K} and K c= σ c = Describes class membership probabilities: p(y = k s) = σk(s)

20 Softmax Layer MLE 0/4 Target: y = (y ym)t, yi {,,, K} One-hot encoding for sample i and class k: let yik = [yi = k] Likelihood: p(y w, X) = m i= K c= ŷ y ic ic Negative Log Likelihood: L(w) = m K y ic log(ŷ ic ) i= c= Again the cross-entropy See seminar for the gradient

21 Multinominal Logistic Regression linear layer + softmax layer = multinominal logistic regression: /4 ŷ k = σ k (x T W) X softmax s ŷ x s ŷ x 3 s 3 3 ŷ 3 x n K s K K ŷ K Classifier: h(x, W) = argmax k ŷ k

22 problem binary classification multinominal classification regression multi-output regression Loss Functions: Summary suggested loss function cross-entropy m [y i log ŷ i + ( y i ) log ( ŷ i )] i= multinominal cross-entropy m K y ic log(ŷ ic ) i= c= squared error m (y i ŷ i ) i= squared error m K (y ic ŷ ic ) i= c= Mean wrt to m is often used, in that case these losses exactly correspond to the empirical risk R T m(h) /4

23 Backpropagation Overview A method to compute a gradient of the loss function with respect to its parameters: L(w) L(w) is in turn used by optimization methods like gradient descent Here, we present the "modular" backpropagation (see Nando de Freitas Machine Learning course: nandodefreitas/machinelearning/) Let us use multinominal logistic regression as an example x x x n X 3 K s s s 3 s K softmax 3 K ŷ ŷ ŷ 3 ŷ K loss y L 3/4

24 Backpropagation: the Loss Function The loss function is the multinominal cross-entropy in this case: L(w) = m i= ( ) K exp ( x i, w c ) [y i = c] log K k= exp ( x i, w k ) c= 4/4 x x x n X 3 K s s s 3 s K softmax z = x z = s z 3 = ŷ y z 4 = L 3 K ŷ ŷ ŷ 3 ŷ K loss L

25 Backpropagation Based on Modules Computation of L(w) involves repetitive use of the chain rule 5/4 We can make things simpler by divide and conquer approach Divide to simplest possible modules (these can be later combined into complex hierarchies) Represent even the loss function as a module Passing messages z l in z l+ out l

26 Backpropagation: Backward Pass Message Let δl = L be the sensitivity of the loss to the module input for layer l, z l then: δ l i = L z l i = j L z l+ j zl+ j zi l = j δ l+ j z l+ j z l i We need to know how to compute derivatives of outputs wrt inputs only! 6/4 z = x X z = f (z ) softmax z 3 = f (z ) loss L = z 4 = f 3 (z 3 ) layer 3 layer = =

27 Backpropagation: Parameters Similarly if the module has parameters we want to know how the loss changes wrt them: 7/4 L w l i = j L z l+ j zl+ j wi l = j δ l+ j z l+ j w l i Derivatives of module outputs wrt to the parameters are all we need z = x X z = f (z ) softmax z 3 = f (z ) loss L = z 4 = f 3 (z 3 ) layer 3 layer = =

28 Backpropagation: Steps So for each module we need only to specify these three messages: forward: z l+ = f(z l ) backward: zl+ z l parameter (optional): zl+ w l 8/4 Forward Pass z = x X z = f (z ) softmax z 3 = f (z ) loss L = z 4 = f 3 (z 3 ) layer 3 layer = 3 Parameters Backward Pass (Backpropagation)

29 forward: zl+ j = Example: Linear Layer n w ij zi, l i=0 j =,, K 9/4 backward: zl+ j zi l = w ij, i = 0,, n, j =,, K parameter: zl+ j w ik = [j = k]z l i X z l 0 w 0 z l+ z l z l+ z l 3 z l+ 3 z l n w nk K z l+ K

30 Example: Squared Error n ( forward: zl+ = yi zi) l i= 30/4 backward: zl+ z l i = (y i z l i), i {,, n}

31 Gradient Descent Task: find parameters which minimize loss over the training dataset: 3/4 θ = argmin θ L(θ) where θ is a set of all parameters defining the ANN (eg, all weight matrices) Gradient descent: θ(t+) = θ(t) η(t) L(θ(t)) where η (t) > 0 is the learning rate or step size at iteration t

32 Batch, Online and Mini-Batch Learning When to update weights? 3/4 (Full) Batch learning: after all patterns are used (epoch) inefficient for redundant datasets Online learning: after each training pattern noise can help overcome local minima but can also harm the convergence in the final stages while fine-tuning Stochastic Gradient Descent (SGD) does this convergence almost surely to local minimum when η (t) decreases appropriately in time Mini-batch learning: after a small sample of training patterns

33 Momentum Simulate inertia to overcome plateaus in the error landscape: 33/4 v (t+) = µv (t) η (t) L(θ (t) ) θ (t+) = θ (t) + v (t+) where µ [0, ] is the momentum parameter Momentum damps oscillations in directions of high curvature It builds velocity in directions with consistent (possibly small) gradient

34 Adagrad Adaptive Gradient method (Duchi, Hazan and Singer, 0) 34/4 Motivation: a magnitude of gradient differs a lot for different parameters Idea: reduce learning rates for parameters having high values of gradient g (t+) i θ (t+) i = g (t) i + = θ (t) i ( L θ (t) i ) η g (t+) i + ɛ L θ (t) i g i accumulates squared partial derivatives wrt to the parameter θi ɛ is a small positive number to prevent division by zero Weakness: ever increasing gi leads to slow convergence eventually

35 RMSProp Similar to Adagrad but employs a moving average: 35/4 g (t+) i = γg (t) i + ( γ) ( L θ (t) i ) γ is a decay parameter (typical value γ = 09) Unlike for Adagrad updates do not get infinitesimally small

36 Regularization How to deal with overfitting? 36/4 get more data find simpler model, search for optimal architecture, eg, number, type and size of layers use regularization Most types of regularization are based on penalties for model complexity Bayesian point of view: introduce prior distribution on model parameters

37 L Regularization Recall the solution for the linear regression w = (XT X) XT y What if XT X has no inverse? We can modify the solution by adding a small element to the diagonal: 37/4 w = ( X T X + λi ) X T y, λ > 0 It turns out that this approach no only helps with inverting XT X but it also improves model generalization It is the solution of the regularized loss function: L (w) = (y Xw) T (y Xw) + λw T w this one is called the L regularization, see seminar for the derivation The term λwt w = λ w Note that we omit bias in λwt w minimizes the size of the weight vector

38 L Regularization as Gaussian Prior Recall the likelihood: 38/4 p (y w, X) = ( πσ ) m e σ (y Xw)T (y Xw) Define a Gaussian prior with zero mean and variance σ0 for the parameters: p (w) = ( πσ0 ) e σ 0 w T w Then the posterior is: p(w y, X) = p(y w, X) p(w) p(y X) The denominator does not depend on the parameters w: p(w y, X) p(y w, X) p(w)

39 MAP Estimate Maximizing p(w y, X) gives us the Maximum a posteriori (MAP) estimate: 39/4 w MAP = argmax w p(w y, X) = argmin w ( log p(w y, X)) where log p(w y, X) = σ (y Xw)T (y Xw) + σ0 w T w + C We can omit C, define λ = σ already know: σ 0 and minimize the loss function we L (w) = (y Xw) T (y Xw) + λw T w

40 Weight Decay Discussion Having zero mean Gaussian prior keeps the weights smaller 40/4 Weight decay is widely used for most types of layers in ANNs Intuition: sigmoid-like neurons kept near zero potential (via small weights) behave similarly to linear neurons The same works for other models, eg, polynomial regression λ is usually set using cross validation Sigmoid -0 0

41 Other Regularization Approaches 4/4 L regularization: sum absolute values, ie, use λ w Early stopping: start with small weights, stop when validation loss starts to grow Randomize inputs: same as the weight decay for linear neurons Weight sharing and sparse connectivity: Convolutional Neural Networks Model averaging Dropout and DropConnect Augmenting dataset

42 Deep Neural Networks Next Lecture 4/4 Convolutional Neural Networks Transfer learning

43 w w P s ŷ n w n b

44 w 0 = b w w P s ŷ n w n

45 Step Function Bipolar Step Function Linear Sigmoid Hyperbolic Tangent ReLU

46 y 08x + + (0, ) data model x

47 Sigmoid 0 0

48 ReLU -

49 X X X softmax fully connected hidden layer hidden layer output layer

50 P P P

51 module module

52 X w 0 ŷ ŷ 3 ŷ 3 n w nk K ŷ K

53 softmax s ŷ s ŷ s 3 3 ŷ 3 s K K ŷ K

54 X softmax s ŷ s ŷ 3 s 3 3 ŷ 3 n K s K K ŷ K

55 X softmax loss s ŷ x s ŷ 3 s 3 3 ŷ 3 L n K s K K ŷ K y

56 X softmax loss s ŷ x s ŷ x 3 s 3 3 ŷ 3 L x n K s K K ŷ K = x z = s z 3 = ŷ y z 4 = L

57 z l in z l+ out l

58 z = x X z = f (z ) softmax z 3 = f (z ) loss L = z 4 = f 3 (z 3 ) layer 3 layer = =

59 z = x X z = f (z ) softmax z 3 = f (z ) loss L = z 4 = f 3 (z 3 ) layer 3 layer = =

60 Forward Pass z = x X z = f (z ) softmax z 3 = f (z ) loss L = z 4 = f 3 (z 3 ) layer 3 layer @z 3 @w 3 Parameters Backward Pass (Backpropagation)

61 X l 0 w 0 z l+ l z l+ l 3 z l+ 3 l n w nk K z l+ K

62

63

64 Sigmoid 0 0

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Neural Networks: Optimization & Regularization

Neural Networks: Optimization & Regularization Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Understanding Neural Networks : Part I

Understanding Neural Networks : Part I TensorFlow Workshop 2018 Understanding Neural Networks Part I : Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Neural Networks

More information

CS260: Machine Learning Algorithms

CS260: Machine Learning Algorithms CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 2: Modular Learning

Lecture 2: Modular Learning Lecture 2: Modular Learning Deep Learning @ UvA MODULAR LEARNING - PAGE 1 Announcement o C. Maddison is coming to the Deep Vision Seminars on the 21 st of Sep Author of concrete distribution, co-author

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

Deep Learning II: Momentum & Adaptive Step Size

Deep Learning II: Momentum & Adaptive Step Size Deep Learning II: Momentum & Adaptive Step Size CS 760: Machine Learning Spring 2018 Mark Craven and David Page www.biostat.wisc.edu/~craven/cs760 1 Goals for the Lecture You should understand the following

More information

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) TARGETED PIECES OF KNOWLEDGE Linear regression Activation function Multi-Layers Perceptron (MLP) Stochastic Gradient Descent

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Deep Learning & Artificial Intelligence WS 2018/2019

Deep Learning & Artificial Intelligence WS 2018/2019 Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target

More information

Overview of gradient descent optimization algorithms. HYUNG IL KOO Based on

Overview of gradient descent optimization algorithms. HYUNG IL KOO Based on Overview of gradient descent optimization algorithms HYUNG IL KOO Based on http://sebastianruder.com/optimizing-gradient-descent/ Problem Statement Machine Learning Optimization Problem Training samples:

More information

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost

More information

CSC321 Lecture 8: Optimization

CSC321 Lecture 8: Optimization CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Classification. Sandro Cumani. Politecnico di Torino

Classification. Sandro Cumani. Politecnico di Torino Politecnico di Torino Outline Generative model: Gaussian classifier (Linear) discriminative model: logistic regression (Non linear) discriminative model: neural networks Gaussian Classifier We want to

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

More Tips for Training Neural Network. Hung-yi Lee

More Tips for Training Neural Network. Hung-yi Lee More Tips for Training Neural Network Hung-yi ee Outline Activation Function Cost Function Data Preprocessing Training Generalization Review: Training Neural Network Neural network: f ; θ : input (vector)

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Multilayer Perceptron

Multilayer Perceptron Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information