Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
|
|
- Kory Carson
- 6 years ago
- Views:
Transcription
1 Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Département Informatique Conservatoire Nationnal des Arts et Métiers (Cnam)
2 8 Weeks on Deep Learning and Strutured Prediction Week 1-5: Deep Learning Week 6-8: Structured Prediction and Applications RCP209 / Deep Learning 2/ 54
3 Outline 1 Context 2 Neural Networks 3 Training Deep Neural Networks nicolas.thome@cnam.fr RCP209 / Deep Learning 3/ 54
4 Context Neural Nets Backprop Context Big Data Superabundance of data: images, videos, audio, text, use traces, etc BBC: 2.4M videos Facebook: 350B images 100M monitoring cameras 1B each day Obvious need to access, search, or classify these data: Recognition Huge number of applications: mobile visual search, robotics, autonomous driving, augmented reality, medical imaging etc Leading track in major ML/CV conferences during the last decade RCP209 / Deep Learning 4/ 54
5 Recognition and classification Classification : assign a given data to a given set of pre-defined classes Recognition much more general than classification, e.g. Object Localization in images Ranking for document indexing Sequence prediction for text, speech, audio, etc Many tasks can be cast as classification problems importance of classification nicolas.thome@cnam.fr RCP209 / Deep Learning 5/ 54
6 Focus on Visual Recognition: Perceiving Visual World Visual Recognition: archetype of low-level signal understanding Supposed to be a master class problem in the early 80 s Certainly the most impacted topic by deep learning Scene categorization Object localization Context & Attribute recognition Rough 3D layout, depth ordering Rich description of scene, e.g. sentences nicolas.thome@cnam.fr RCP209 / Deep Learning 6/ 54
7 Context Neural Nets Backprop Recognition of low-level signals Challenge: filling the semantic gap What we perceive vs What a computer sees Illumination variations View-point variations Deformable objects intra-class variance etc How to design "good" intermediate representation? nicolas.thome@cnam.fr RCP209 / Deep Learning 7/ 54
8 Deep Learning (DL) & Recognition of low-level signals DL: breakthrough for the recognition of low-level signal data Before DL: handcrafted intermediate representations for each task Needs expertise (PhD level) in each field Weak level of semantics in the representation RCP209 / Deep Learning 8/ 54
9 Deep Learning (DL) & Recognition of low-level signals DL: breakthrough for the recognition of low-level signal data Since DL: automatically learning intermediate representations Outstanding experimental performances >> handcrafted features Able to learn high level intermediate representations Common learning methodology field independent, no expertise RCP209 / Deep Learning 9/ 54
10 Deep Learning (DL) & Representation Learning DL: breakthrough for representation learning Automatically learning intermediate levels of representation Ex: Natural language Processing (NLP) RCP209 / Deep Learning 10/ 54
11 Outline 1 Context 2 Neural Networks 3 Training Deep Neural Networks nicolas.thome@cnam.fr RCP209 / Deep Learning 11/ 54
12 The Formal Neuron: 1943 [MP43] Basis of Neural Networks Input: vector x R m, i.e. x = {x i } i {1,2,...,m} Neuron output ŷ R: scalar nicolas.thome@cnam.fr RCP209 / Deep Learning 12/ 54
13 The Formal Neuron: 1943 [MP43] Mapping from x to ŷ: 1 Linear (affine) mapping: s = w x + b 2 Non-linear activation function: f : ŷ = f (s) nicolas.thome@cnam.fr RCP209 / Deep Learning 13/ 54
14 The Formal Neuron: Linear Mapping Linear (affine) mapping: s = w x + b = m w i x i + b i=1 w: normal vector to an hyperplane in R m linear boundary b bias, shift the hyperplane position 2D hyperplane: line 3D hyperplane: plane nicolas.thome@cnam.fr RCP209 / Deep Learning 14/ 54
15 The Formal Neuron: Activation Function ŷ = f (w x + b), f activation function Popular f choices: step, sigmoid, tanh 1 if z 0 Step (Heaviside) function: H(z) = 0 otherwise nicolas.thome@cnam.fr RCP209 / Deep Learning 15/ 54
16 Step function: Connection to Biological Neurons Formal neuron, step activation H: ŷ = H(w x + b) ŷ = 1 (activated) w x b ŷ = 0 (unactivated) w x < b Biological Neurons: output activated input weighted by synaptic weight threshold nicolas.thome@cnam.fr RCP209 / Deep Learning 16/ 54
17 Sigmoid Activation Function Neuron output ŷ = f (w x + b), f activation function Sigmoid: σ(z) = (1 + e az ) 1 a : more similar to step function (step: a ) Sigmoid: linear and saturating regimes nicolas.thome@cnam.fr RCP209 / Deep Learning 17/ 54
18 The Formal neuron: Application to Binary Classification Binary Classification: label input x as belonging to class 1 or 0 Neuron output with sigmoid: ŷ = 1 1+e a(w x+b) Sigmoid: probabilistic interpretation ŷ P(1/x) Input x classified as 1 if P(1/x) > 0.5 w x + b > 0 Input x classified as 0 if P(1/x) < 0.5 w x + b < 0 sign(w x + b): linear boundary decision in input space! nicolas.thome@cnam.fr RCP209 / Deep Learning 18/ 54
19 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54
20 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b Sigmoid activation function: ŷ = (1 + e a(w x+b) ) 1, a = 10 nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54
21 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b Sigmoid activation function: ŷ = (1 + e a(w x+b) ) 1, a = 1 nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54
22 The Formal neuron: Toy Example for Binary Classification 2d example: m = 2, x = {x 1, x 2} [ 5; 5] [ 5; 5] Linear mapping: w = [1; 1] and b = 2 Result of linear mapping : s = w x + b Sigmoid activation function: ŷ = (1 + e a(w x+b) ) 1, a = 0.1 nicolas.thome@cnam.fr RCP209 / Deep Learning 19/ 54
23 From Formal Neuron to Neural Networks Formal Neuron: 1 A single scalar output 2 Linear decision boundary for binary classification Single scalar output: limited for several tasks Ex: multi-class classification, e.g. MNIST or CIFAR nicolas.thome@cnam.fr RCP209 / Deep Learning 20/ 54
24 Perceptron and Multi-Class Classification Formal Neuron: limited to binary classification Multi-Class Classification: use several output neurons instead of a single one! Perceptron Input x in R m Output neuron ŷ 1 is a formal neuron: Linear (affine) mapping: s 1 = w 1 x + b 1 Non-linear activation function: f : ŷ 1 = f (s 1 ) Linear mapping parameters: w 1 = {w 11,..., w m1 } R m b 1 R nicolas.thome@cnam.fr RCP209 / Deep Learning 21/ 54
25 Perceptron and Multi-Class Classification Input x in R m Output neuron ŷ k is a formal neuron: Linear (affine) mapping: s k = w k x + b k Non-linear activation function: f : yˆ k = f (s k ) Linear mapping parameters: w k = {w 1k,..., w mk } R m b k R nicolas.thome@cnam.fr RCP209 / Deep Learning 22/ 54
26 Perceptron and Multi-Class Classification Input x in R m (1 m), output ŷ: concatenation of K formal neurons Linear (affine) mapping matrix multiplication: s = xw + b W matrix of size m K - columns are w k b: bias vector - size 1 K Element-wise non-linear activation: ŷ = f (s) nicolas.thome@cnam.fr RCP209 / Deep Learning 23/ 54
27 Perceptron and Multi-Class Classification Soft-max Activation: ŷ k = f (s k ) = e s k K e s k k =1 Probabilistic interpretation for multi-class classification: Each output neuron class yˆ k P(k/x, w) Logistic Regression (LR) Model! nicolas.thome@cnam.fr RCP209 / Deep Learning 24/ 54
28 2d Toy Example for Multi-Class Classification x = {x 1, x 2} [ 5; 5] [ 5; 5], ŷ: 3 outputs (classes) Linear mapping for each class: s k = w k x + b k w 1 = [1; 1], b 1 = 2 w 2 = [0; 1], b 2 = 1 w 3 = [1; 0.5], b 3 = 10 Soft-max output: P(k/x, W) nicolas.thome@cnam.fr RCP209 / Deep Learning 25/ 54
29 2d Toy Example for Multi-Class Classification x = {x 1, x 2} [ 5; 5] [ 5; 5], ŷ: 3 outputs (classes) w 1 = [1; 1], b 1 = 2 w 2 = [0; 1], b 2 = 1 w 3 = [1; 0.5], b 3 = 10 Soft-max output: P(k/x, W) Class Prediction: k = arg max P(k/x, W) k nicolas.thome@cnam.fr RCP209 / Deep Learning 25/ 54
30 Beyond Linear Classification X-OR Problem Logistic Regression (LR): NN with 1 input layer & 1 output layer LR: limited to linear decision boundaries X-OR: NOT 1 and 2 OR NOT 2 AND 1 X-OR: Non linear decision function nicolas.thome@cnam.fr RCP209 / Deep Learning 26/ 54
31 Beyond Linear Classification LR: limited to linear boundaries Solution: add a layer! Input x in R m, e.g. m = 4 Output ŷ in R K (K # classes), e.g. K = 2 Hidden layer h in R L nicolas.thome@cnam.fr RCP209 / Deep Learning 27/ 54
32 Multi-Layer Perceptron Hidden layer h: x projection to a new space R L Neural Net with 1 hidden layer: Multi-Layer Perceptron (MLP) h: intermediate representations of x for classification ŷ: h = f (xw + b) Mapping from x to ŷ: non-linear boundary! activation f crucial! nicolas.thome@cnam.fr RCP209 / Deep Learning 28/ 54
33 Deep Neural Networks Adding more hidden layers: Deep Neural Networks (DNN) Basis of Deep Learning Each layer h l projects layer h l 1 into a new space Gradually learning intermediate representations useful for the task nicolas.thome@cnam.fr RCP209 / Deep Learning 29/ 54
34 Conclusion Deep Neural Networks: applicable to classification problems with non-linear decision boundaries Visualize prediction from fixed model parameters Reverse problem: Supervised Learning RCP209 / Deep Learning 30/ 54
35 Outline 1 Context 2 Neural Networks 3 Training Deep Neural Networks nicolas.thome@cnam.fr RCP209 / Deep Learning 31/ 54
36 Training Multi-Layer Perceptron (MLP) Input x, output y A parametrized model x y: f w(x i ) = ŷ i Supervised context: training set A = {(x i, y i )} i {1,2,...,N} A loss function l(ŷ i, y i ) for each annotated pair (x i, y i ) Assumptions: parameters w R d continuous, L differentiable Gradient w = L : steepest direction to decrease loss L w nicolas.thome@cnam.fr RCP209 / Deep Learning 32/ 54
37 MLP Training Gradient descent algorithm: Initialyze parameters w Update: w (t+1) = w (t) η L w Until convergence, e.g. w 2 0 nicolas.thome@cnam.fr RCP209 / Deep Learning 33/ 54
38 Gradient Descent Update rule: w (t+1) = w (t) η L w η learning rate Convergence ensured? provided a "well chosen" learning rate η nicolas.thome@cnam.fr RCP209 / Deep Learning 34/ 54
39 Gradient Descent Update rule: w (t+1) = w (t) η L w Global minimum? convex a) vs non convex b) loss L(w) a) Convex function a) Non convex function nicolas.thome@cnam.fr RCP209 / Deep Learning 35/ 54
40 Supervised Learning: Multi-Class Classification Logistic Regression for multi-class classification s i = x i W + b Soft-Max (SM): ŷ k P(k/x i, W, b) = Supervised loss function: L(W, b) = 1 N e s k K e s k k =1 N i=1 l(ŷ i, y i ) 1 y {1; 2;...; K} 2 ŷ i = arg max P(k/x i, W, b) k 3 l 0/1 (ŷ i, yi 1 if ŷ i yi ) = : 0/1 loss 0 otherwise nicolas.thome@cnam.fr RCP209 / Deep Learning 36/ 54
41 Logistic Regression Training Formulation Input x i, ground truth output supervision yi One hot-encoding for yi : y 1 if c is the groud truth class for x i c,i = 0 otherwise nicolas.thome@cnam.fr RCP209 / Deep Learning 37/ 54
42 Logistic Regression Training Formulation Loss function: multi-class Cross-Entropy (CE) l CE l CE : Kullback-Leiber divergence between y i and ^y i K l CE (^y i, y i ) = KL(y i,^y i ) = y c,i log(ŷ c,i ) = log(ŷ c,i ) c=1 B KL asymmetric: KL(^y i, y i ) KL(y i,^y i ) B nicolas.thome@cnam.fr RCP209 / Deep Learning 38/ 54
43 Logistic Regression Training L CE (W, b) = 1 N N i=1 l CE (ŷ i, y i ) = 1 N N i=1 l CE smooth convex upper bound of l 0/1 gradient descent optimization Gradient descent: W (t+1) = W (t) η L CE W MAIN CHALLENGE: computing L CE W log(ŷ c,i ) = 1 N x Key Property: chain rule z = x y y z Backpropagation of gradient error! (b (t+1) = b (t) η L CE b ) N l CE? W i=1 nicolas.thome@cnam.fr RCP209 / Deep Learning 39/ 54
44 Chain Rule l x = l ^y ^y x Logistic regression: l CE W = l CE ^y i ^y i s i s i W nicolas.thome@cnam.fr RCP209 / Deep Learning 40/ 54
45 Logistic Regression Training: Backpropagation l CE W = l CE ^y i ^y i s i, l s i W CE (ŷ i, yi ) = log(ŷ c,i ) Update for 1 example: l CE = 1 ^y i ŷ c,i = 1 ^y i l CE = ^y s i y i i = δ y i δ c,c l CE W = x i T δ y i nicolas.thome@cnam.fr RCP209 / Deep Learning 41/ 54
46 Logistic Regression Training: Backpropagation Whole dataset: data matrix X (N m), label matrix ^Y, Y (N K) L CE (W, b) = 1 N N i=1 log(ŷ c,i ), L CE W = L CE ^Y ^Y S S W L CE s L CE W = ^Y Y = y = XT y nicolas.thome@cnam.fr RCP209 / Deep Learning 42/ 54
47 Perceptron Training: Backpropagation Perceptron vs Logistic Regression: adding hidden layer (sigmoid) Goal: Train parameters W y and W h (+bias) with Backpropagation computing L CE W y = 1 N N l CE and L W i=1 y CE W h = 1 N N l CE i=1 W h Last hidden layer Logistic Regression First hidden layer: l CE W h = x i T l CE u i computing l CE u i = δ h i nicolas.thome@cnam.fr RCP209 / Deep Learning 43/ 54
48 Perceptron Training: Backpropagation Computing l CE u i = δ h i use chain rule:... Leading to: l CE u i = l CE v i v i h i h i u i l CE = δ h u i i = δ yt i W y σ (h i ) = δ yt i W y (h i (1 h i )) nicolas.thome@cnam.fr RCP209 / Deep Learning 44/ 54
49 Deep Neural Network Training: Backpropagation Multi-Layer Perceptron (MLP): adding more hidden layers Backpropagation update Perceptron: assuming L W l +1 = H l T l+1 L = l+1 known U l+1 Computing L U l = l (= l+1t W l+1 H l (1 H l ) sigmoid) L W l = H l 1 T h l nicolas.thome@cnam.fr RCP209 / Deep Learning 45/ 54
50 Neural Network Training: Optimization Issues Classification loss over training set (vectorized w, b ignored): L CE (w) = 1 N N i=1 l CE (ŷ i, y i ) = 1 N Gradient descent optimization: w (t+1) = w (t) η L CE w Gradient (t) w = 1 N wrt: w dimension Training set size N i=1 log(ŷ c,i ) (w(t) ) = w (t) η (t) w N l CE (ŷ i,y i ) (w (t) ) linearly scales w i=1 Too slow even for moderate dimensionality & dataset size! nicolas.thome@cnam.fr RCP209 / Deep Learning 46/ 54
51 Stochastic Gradient Descent Solution: approximate (t) w = 1 N Stochastic Gradient Descent (SGD) Use a single example (online): N l CE (ŷ i,y i ) (w (t) ) with subset of examples w i=1 (t) w l CE (ŷ i, yi ) (w (t) ) w Mini-batch: use B < N examples: (t) w 1 B B l CE (ŷ i, yi ) (w (t) ) i=1 w Full gradient SGD (online) SGD (mini-batch) nicolas.thome@cnam.fr RCP209 / Deep Learning 47/ 54
52 Stochastic Gradient Descent SGD: approximation of the true Gradient w! Noisy gradient can lead to bad direction, increase loss BUT: much more parameter updates: online N, mini-batch N B Faster convergence, at the core of Deep Learning for large scale datasets Full gradient SGD (online) SGD (mini-batch) nicolas.thome@cnam.fr RCP209 / Deep Learning 48/ 54
53 Optimization: Learning Rate Decay Gradient descent optimization: w (t+1) = w (t) η (t) w η setup? open question Learning Rate Decay: decrease η during training progress Inverse (time-based) decay: η t = η 0, r decay rate 1+r t Exponential decay: η t = η 0 e λt Step Decay η t = η 0 r tu t... Exponential Decay (η 0 = 0.1, λ = 0.1s) Step Decay (η 0 = 0.1, r = 0.5, tu = 10) nicolas.thome@cnam.fr RCP209 / Deep Learning 49/ 54
54 Generalization and Overfitting Learning: minimizing classification loss L CE over training set Training set: sample representing data vs labels distributions Ultimate goal: train a prediction function with low prediction error on the true (unknown) data distribution L train = 4, L train = 9 L test = 15, L test = 13 Optimization Machine Learning! Generalization / Overfitting! nicolas.thome@cnam.fr RCP209 / Deep Learning 50/ 54
55 Regularization Regularization: improving generalization, i.e. test ( train) performances Structural regularization: add Prior R(w) in training objective: L(w) = L CE (w) + αr(w) L 2 regularization: weight decay, R(w) = w 2 Commonly used in neural networks Theoretical justifications, generalization bounds (SVM) Other possible R(w): L 1 regularization, dropout, etc nicolas.thome@cnam.fr RCP209 / Deep Learning 51/ 54
56 Regularization and hyper-parameters Neural networks: hyper-parameters to tune: Training parameters: learning rate, weight decay, learning rate decay, # epochs, etc Architectural parameters: number of layers, number neurones, non-linearity type, etc Hyper-parameters tuning: improve generalization: estimate performances on a validation set nicolas.thome@cnam.fr RCP209 / Deep Learning 52/ 54
57 Neural networks: Conclusion Training issues at several levels: optimization, generalization, cross-validation Limits of fully connected layers and Convolutional Neural Nets? next course! RCP209 / Deep Learning 53/ 54
58 References I Warren S McCulloch and Walter Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics 5 (1943), no. 4, nicolas.thome@cnam.fr RCP209 / Deep Learning 54/ 54
Lecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationArtificial Neural Networks The Introduction
Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationCOMP-4360 Machine Learning Neural Networks
COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca
More informationLecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data
Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationArtificial Neural Network
Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationArtificial Neural Networks
Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:
More informationNeural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea
Neural Networks Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea What is Neural Networks? One of supervised learning method using one or more hidden layer.
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationDeepLearning on FPGAs
DeepLearning on FPGAs Introduction to Deep Learning Sebastian Buschäger Technische Universität Dortmund - Fakultät Informatik - Lehrstuhl 8 October 21, 2017 1 Recap Computer Science Approach Technical
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationCENG 793. On Machine Learning and Optimization. Sinan Kalkan
CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen
More informationMaking Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationMachine Learning
Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More information