Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Size: px
Start display at page:

Download "Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning"

Transcription

1 Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Département Informatique Conservatoire Nationnal des Arts et Métiers (Cnam)

2 8 Weeks on Deep Learning and Strutured Prediction Week 1-5: Deep Learning Week 6-8: Structured Prediction and Applications RCP209 / Deep Learning 2/ 54

3 Outline 1 Context 2 Neural Networks 3 Training Deep Neural Networks nicolas.thome@cnam.fr RCP209 / Deep Learning 3/ 54

4 Context Neural Nets Backprop Context Big Data Superabundance of data: images, videos, audio, text, use traces, etc BBC: 2.4M videos Facebook: 350B images 100M monitoring cameras 1B each day Obvious need to access, search, or classify these data: Recognition Huge number of applications: mobile visual search, robotics, autonomous driving, augmented reality, medical imaging etc Leading track in major ML/CV conferences during the last decade RCP209 / Deep Learning 4/ 54

5 Recognition and classification Classification : assign a given data to a given set of pre-defined classes Recognition much more general than classification, e.g. Object Localization in images Ranking for document indexing Sequence prediction for text, speech, audio, etc Many tasks can be cast as classification problems importance of classification nicolas.thome@cnam.fr RCP209 / Deep Learning 5/ 54

6 Focus on Visual Recognition: Perceiving Visual World Visual Recognition: archetype of low-level signal understanding Supposed to be a master class problem in the early 80 s Certainly the most impacted topic by deep learning Scene categorization Object localization Context & Attribute recognition Rough 3D layout, depth ordering Rich description of scene, e.g. sentences nicolas.thome@cnam.fr RCP209 / Deep Learning 6/ 54

7 Context Neural Nets Backprop Recognition of low-level signals Challenge: filling the semantic gap What we perceive vs What a computer sees Illumination variations View-point variations Deformable objects intra-class variance etc How to design "good" intermediate representation? nicolas.thome@cnam.fr RCP209 / Deep Learning 7/ 54

8 Deep Learning (DL) & Recognition of low-level signals DL: breakthrough for the recognition of low-level signal data Before DL: handcrafted intermediate representations for each task Needs expertise (PhD level) in each field Weak level of semantics in the representation RCP209 / Deep Learning 8/ 54

9 Deep Learning (DL) & Recognition of low-level signals DL: breakthrough for the recognition of low-level signal data Since DL: automatically learning intermediate representations Outstanding experimental performances >> handcrafted features Able to learn high level intermediate representations Common learning methodology field independent, no expertise RCP209 / Deep Learning 9/ 54

10 Deep Learning (DL) & Representation Learning DL: breakthrough for representation learning Automatically learning intermediate levels of representation Ex: Natural language Processing (NLP) RCP209 / Deep Learning 10/ 54

11 Outline 1 Context 2 Neural Networks 3 Training Deep Neural Networks nicolas.thome@cnam.fr RCP209 / Deep Learning 11/ 54

12 The Formal Neuron: 1943 [MP43] Basis of Neural Networks Input: vector x R m, i.e. x = {x i } i {1,2,...,m} Neuron output ŷ R: scalar nicolas.thome@cnam.fr RCP209 / Deep Learning 12/ 54

13 The Formal Neuron: 1943 [MP43] Mapping from x to ŷ: 1 Linear (affine) mapping: s = w x + b 2 Non-linear activation function: f : ŷ = f (s) nicolas.thome@cnam.fr RCP209 / Deep Learning 13/ 54

14 The Formal Neuron: Linear Mapping Linear (affine) mapping: s = w x + b = m w i x i + b i=1 w: normal vector to an hyperplane in R m linear boundary b bias, shift the hyperplane position 2D hyperplane: line 3D hyperplane: plane nicolas.thome@cnam.fr RCP209 / Deep Learning 14/ 54

15 The Formal Neuron: Activation Function ŷ = f (w x + b), f activation function Popular f choices: step, sigmoid, tanh 1 if z 0 Step (Heaviside) function: H(z) = 0 otherwise nicolas.thome@cnam.fr RCP209 / Deep Learning 15/ 54

16 Step function: Connection to Biological Neurons Formal neuron, step activation H: ŷ = H(w x + b) ŷ = 1 (activated) w x b ŷ = 0 (unactivated) w x < b Biological Neurons: output activated input weighted by synaptic weight threshold nicolas.thome@cnam.fr RCP209 / Deep Learning 16/ 54

17 Sigmoid Activation Function Neuron output ŷ = f (w x + b), f activation function Sigmoid: σ(z) = (1 + e az ) 1 a : more similar to step function (step: a ) Sigmoid: linear and saturating regimes nicolas.thome@cnam.fr RCP209 / Deep Learning 17/ 54

18 The Formal neuron: Application to Binary Classification Binary Classification: label input x as belonging to class 1 or 0 Neuron output with sigmoid: ŷ = 1 1+e a(w x+b) Sigmoid: probabilistic interpretation ŷ P(1/x) Input x classified as 1 if P(1/x) > 0.5 w x + b > 0 Input x classified as 0 if P(1/x) < 0.5 w x + b < 0 sign(w x + b): linear boundary decision in input space! nicolas.thome@cnam.fr RCP209 / Deep Learning 18/ 54

19 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54

20 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b Sigmoid activation function: ŷ = (1 + e a(w x+b) ) 1, a = 10 nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54

21 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b Sigmoid activation function: ŷ = (1 + e a(w x+b) ) 1, a = 1 nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54

22 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b Sigmoid activation function: ŷ = (1 + e a(w x+b) ) 1, a = 0.1 nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54

23 From Formal Neuron to Neural Networks Formal Neuron: 1 A single scalar output 2 Linear decision boundary for binary classification Single scalar output: limited for several tasks Ex: multi-class classification, e.g. MNIST or CIFAR nicolas.thome@cnam.fr RCP209 / Deep Learning 20/ 54

24 Perceptron and Multi-Class Classification Formal Neuron: limited to binary classification Multi-Class Classification: use several output neurons instead of a single one! Perceptron Input x in R m Output neuron ŷ 1 is a formal neuron: Linear (affine) mapping: s 1 = w 1 x + b 1 Non-linear activation function: f : ŷ 1 = f (s 1 ) Linear mapping parameters: w 1 = {w 11,..., w m1 } R m b 1 R nicolas.thome@cnam.fr RCP209 / Deep Learning 21/ 54

25 Perceptron and Multi-Class Classification Input x in R m Output neuron ŷ k is a formal neuron: Linear (affine) mapping: s k = w k x + b k Non-linear activation function: f : yˆ k = f (s k ) Linear mapping parameters: w k = {w 1k,..., w mk } R m b k R nicolas.thome@cnam.fr RCP209 / Deep Learning 22/ 54

26 Perceptron and Multi-Class Classification Input x in R m (1 m), output ŷ: concatenation of K formal neurons Linear (affine) mapping matrix multiplication: s = xw + b W matrix of size m K - columns are w k b: bias vector - size 1 K Element-wise non-linear activation: ŷ = f (s) nicolas.thome@cnam.fr RCP209 / Deep Learning 23/ 54

27 Perceptron and Multi-Class Classification Soft-max Activation: ŷ k = f (s k ) = e s k K e s k k =1 Probabilistic interpretation for multi-class classification: Each output neuron class yˆ k P(k/x, w) Logistic Regression (LR) Model! nicolas.thome@cnam.fr RCP209 / Deep Learning 24/ 54

28 2d Toy Example for Multi-Class Classification x = {x 1, x 2} [ 5; 5] [ 5; 5], ŷ: 3 outputs (classes) Linear mapping for each class: s k = w k x + b k w 1 = [1; 1], b 1 = 2 w 2 = [0; 1], b 2 = 1 w 3 = [1; 0.5], b 3 = 10 Soft-max output: P(k/x, W) nicolas.thome@cnam.fr RCP209 / Deep Learning 25/ 54

29 2d Toy Example for Multi-Class Classification x = {x 1, x 2} [ 5; 5] [ 5; 5], ŷ: 3 outputs (classes) w 1 = [1; 1], b 1 = 2 w 2 = [0; 1], b 2 = 1 w 3 = [1; 0.5], b 3 = 10 Soft-max output: P(k/x, W) Class Prediction: k = arg max P(k/x, W) k nicolas.thome@cnam.fr RCP209 / Deep Learning 25/ 54

30 Beyond Linear Classification X-OR Problem Logistic Regression (LR): NN with 1 input layer & 1 output layer LR: limited to linear decision boundaries X-OR: NOT 1 and 2 OR NOT 2 AND 1 X-OR: Non linear decision function nicolas.thome@cnam.fr RCP209 / Deep Learning 26/ 54

31 Beyond Linear Classification LR: limited to linear boundaries Solution: add a layer! Input x in R m, e.g. m = 4 Output ŷ in R K (K # classes), e.g. K = 2 Hidden layer h in R L nicolas.thome@cnam.fr RCP209 / Deep Learning 27/ 54

32 Multi-Layer Perceptron Hidden layer h: x projection to a new space R L Neural Net with 1 hidden layer: Multi-Layer Perceptron (MLP) h: intermediate representations of x for classification ŷ: h = f (xw + b) Mapping from x to ŷ: non-linear boundary! activation f crucial! nicolas.thome@cnam.fr RCP209 / Deep Learning 28/ 54

33 Deep Neural Networks Adding more hidden layers: Deep Neural Networks (DNN) Basis of Deep Learning Each layer h l projects layer h l 1 into a new space Gradually learning intermediate representations useful for the task nicolas.thome@cnam.fr RCP209 / Deep Learning 29/ 54

34 Conclusion Deep Neural Networks: applicable to classification problems with non-linear decision boundaries Visualize prediction from fixed model parameters Reverse problem: Supervised Learning RCP209 / Deep Learning 30/ 54

35 Outline 1 Context 2 Neural Networks 3 Training Deep Neural Networks nicolas.thome@cnam.fr RCP209 / Deep Learning 31/ 54

36 Training Multi-Layer Perceptron (MLP) Input x, output y A parametrized model x y: f w(x i ) = ŷ i Supervised context: training set A = {(x i, y i )} i {1,2,...,N} A loss function l(ŷ i, y i ) for each annotated pair (x i, y i ) Assumptions: parameters w R d continuous, L differentiable Gradient w = L : steepest direction to decrease loss L w nicolas.thome@cnam.fr RCP209 / Deep Learning 32/ 54

37 MLP Training Gradient descent algorithm: Initialyze parameters w Update: w (t+1) = w (t) η L w Until convergence, e.g. w 2 0 nicolas.thome@cnam.fr RCP209 / Deep Learning 33/ 54

38 Gradient Descent Update rule: w (t+1) = w (t) η L w η learning rate Convergence ensured? provided a "well chosen" learning rate η nicolas.thome@cnam.fr RCP209 / Deep Learning 34/ 54

39 Gradient Descent Update rule: w (t+1) = w (t) η L w Global minimum? convex a) vs non convex b) loss L(w) a) Convex function a) Non convex function nicolas.thome@cnam.fr RCP209 / Deep Learning 35/ 54

40 Supervised Learning: Multi-Class Classification Logistic Regression for multi-class classification s i = x i W + b Soft-Max (SM): ŷ k P(k/x i, W, b) = Supervised loss function: L(W, b) = 1 N e s k K e s k k =1 N i=1 l(ŷ i, y i ) 1 y {1; 2;...; K} 2 ŷ i = arg max P(k/x i, W, b) k 3 l 0/1 (ŷ i, yi 1 if ŷ i yi ) = : 0/1 loss 0 otherwise nicolas.thome@cnam.fr RCP209 / Deep Learning 36/ 54

41 Logistic Regression Training Formulation Input x i, ground truth output supervision yi One hot-encoding for yi : y 1 if c is the groud truth class for x i c,i = 0 otherwise nicolas.thome@cnam.fr RCP209 / Deep Learning 37/ 54

42 Logistic Regression Training Formulation Loss function: multi-class Cross-Entropy (CE) l CE l CE : Kullback-Leiber divergence between y i and ^y i K l CE (^y i, y i ) = KL(y i,^y i ) = y c,i log(ŷ c,i ) = log(ŷ c,i ) c=1 B KL asymmetric: KL(^y i, y i ) KL(y i,^y i ) B nicolas.thome@cnam.fr RCP209 / Deep Learning 38/ 54

43 Logistic Regression Training L CE (W, b) = 1 N N i=1 l CE (ŷ i, y i ) = 1 N N i=1 l CE smooth convex upper bound of l 0/1 gradient descent optimization Gradient descent: W (t+1) = W (t) η L CE W MAIN CHALLENGE: computing L CE W log(ŷ c,i ) = 1 N x Key Property: chain rule z = x y y z Backpropagation of gradient error! (b (t+1) = b (t) η L CE b ) N l CE? W i=1 nicolas.thome@cnam.fr RCP209 / Deep Learning 39/ 54

44 Chain Rule l x = l ^y ^y x Logistic regression: l CE W = l CE ^y i ^y i s i s i W nicolas.thome@cnam.fr RCP209 / Deep Learning 40/ 54

45 Logistic Regression Training: Backpropagation l CE W = l CE ^y i ^y i s i, l s i W CE (ŷ i, yi ) = log(ŷ c,i ) Update for 1 example: l CE = 1 ^y i ŷ c,i = 1 ^y i l CE = ^y s i y i i = δ y i δ c,c l CE W = x i T δ y i nicolas.thome@cnam.fr RCP209 / Deep Learning 41/ 54

46 Logistic Regression Training: Backpropagation Whole dataset: data matrix X (N m), label matrix ^Y, Y (N K) L CE (W, b) = 1 N N i=1 log(ŷ c,i ), L CE W = L CE ^Y ^Y S S W L CE s L CE W = ^Y Y = y = XT y nicolas.thome@cnam.fr RCP209 / Deep Learning 42/ 54

47 Perceptron Training: Backpropagation Perceptron vs Logistic Regression: adding hidden layer (sigmoid) Goal: Train parameters W y and W h (+bias) with Backpropagation computing L CE W y = 1 N N l CE and L W i=1 y CE W h = 1 N N l CE i=1 W h Last hidden layer Logistic Regression First hidden layer: l CE W h = x i T l CE u i computing l CE u i = δ h i nicolas.thome@cnam.fr RCP209 / Deep Learning 43/ 54

48 Perceptron Training: Backpropagation Computing l CE u i = δ h i use chain rule:... Leading to: l CE u i = l CE v i v i h i h i u i l CE = δ h u i i = δ yt i W y σ (h i ) = δ yt i W y (h i (1 h i )) nicolas.thome@cnam.fr RCP209 / Deep Learning 44/ 54

49 Deep Neural Network Training: Backpropagation Multi-Layer Perceptron (MLP): adding more hidden layers Backpropagation update Perceptron: assuming L W l +1 = H l T l+1 L = l+1 known U l+1 Computing L U l = l (= l+1t W l+1 H l (1 H l ) sigmoid) L W l = H l 1 T h l nicolas.thome@cnam.fr RCP209 / Deep Learning 45/ 54

50 Neural Network Training: Optimization Issues Classification loss over training set (vectorized w, b ignored): L CE (w) = 1 N N i=1 l CE (ŷ i, y i ) = 1 N Gradient descent optimization: w (t+1) = w (t) η L CE w Gradient (t) w = 1 N wrt: w dimension Training set size N i=1 log(ŷ c,i ) (w(t) ) = w (t) η (t) w N l CE (ŷ i,y i ) (w (t) ) linearly scales w i=1 Too slow even for moderate dimensionality & dataset size! nicolas.thome@cnam.fr RCP209 / Deep Learning 46/ 54

51 Stochastic Gradient Descent Solution: approximate (t) w = 1 N Stochastic Gradient Descent (SGD) Use a single example (online): N l CE (ŷ i,y i ) (w (t) ) with subset of examples w i=1 (t) w l CE (ŷ i, yi ) (w (t) ) w Mini-batch: use B < N examples: (t) w 1 B B l CE (ŷ i, yi ) (w (t) ) i=1 w Full gradient SGD (online) SGD (mini-batch) nicolas.thome@cnam.fr RCP209 / Deep Learning 47/ 54

52 Stochastic Gradient Descent SGD: approximation of the true Gradient w! Noisy gradient can lead to bad direction, increase loss BUT: much more parameter updates: online N, mini-batch N B Faster convergence, at the core of Deep Learning for large scale datasets Full gradient SGD (online) SGD (mini-batch) nicolas.thome@cnam.fr RCP209 / Deep Learning 48/ 54

53 Optimization: Learning Rate Decay Gradient descent optimization: w (t+1) = w (t) η (t) w η setup? open question Learning Rate Decay: decrease η during training progress Inverse (time-based) decay: η t = η 0, r decay rate 1+r t Exponential decay: η t = η 0 e λt Step Decay η t = η 0 r tu t... Exponential Decay (η 0 = 0.1, λ = 0.1s) Step Decay (η 0 = 0.1, r = 0.5, tu = 10) nicolas.thome@cnam.fr RCP209 / Deep Learning 49/ 54

54 Generalization and Overfitting Learning: minimizing classification loss L CE over training set Training set: sample representing data vs labels distributions Ultimate goal: train a prediction function with low prediction error on the true (unknown) data distribution L train = 4, L train = 9 L test = 15, L test = 13 Optimization Machine Learning! Generalization / Overfitting! nicolas.thome@cnam.fr RCP209 / Deep Learning 50/ 54

55 Regularization Regularization: improving generalization, i.e. test ( train) performances Structural regularization: add Prior R(w) in training objective: L(w) = L CE (w) + αr(w) L 2 regularization: weight decay, R(w) = w 2 Commonly used in neural networks Theoretical justifications, generalization bounds (SVM) Other possible R(w): L 1 regularization, dropout, etc nicolas.thome@cnam.fr RCP209 / Deep Learning 51/ 54

56 Regularization and hyper-parameters Neural networks: hyper-parameters to tune: Training parameters: learning rate, weight decay, learning rate decay, # epochs, etc Architectural parameters: number of layers, number neurones, non-linearity type, etc Hyper-parameters tuning: improve generalization: estimate performances on a validation set nicolas.thome@cnam.fr RCP209 / Deep Learning 52/ 54

57 Neural networks: Conclusion Training issues at several levels: optimization, generalization, cross-validation Limits of fully connected layers and Convolutional Neural Nets? next course! RCP209 / Deep Learning 53/ 54

58 References I Warren S McCulloch and Walter Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics 5 (1943), no. 4, nicolas.thome@cnam.fr RCP209 / Deep Learning 54/ 54

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea Neural Networks Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea What is Neural Networks? One of supervised learning method using one or more hidden layer.

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

DeepLearning on FPGAs

DeepLearning on FPGAs DeepLearning on FPGAs Introduction to Deep Learning Sebastian Buschäger Technische Universität Dortmund - Fakultät Informatik - Lehrstuhl 8 October 21, 2017 1 Recap Computer Science Approach Technical

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Machine Learning

Machine Learning Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information