Deep Learning Lab Course 2017 (Deep Learning Practical)

Size: px
Start display at page:

Download "Deep Learning Lab Course 2017 (Deep Learning Practical)"

Transcription

1 Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of Freiburg January 2, 208 Andreas Eitel Uni FR DL 207 ()

2 Technical Issues Location: Monday, 4:00-6:00, building 082, room Remark: We will be there for questions every week from 4:00-6:00. We expect you to work on your own. Your attendance is required during lectures/presentations We expect you have basic knowledge in ML (e.g. heard the Machine Learning lecture). Contact information (tutors): Aaron Klein Artemij Amiranashvili Mohammadreza Zolfaghari Maria Hügle Jingwhei Zhang Andreas Eitel Homepage: learning course Andreas Eitel Uni FR DL 207 (2)

3 Schedule and outline Phase Today: Introduction to Deep Learning (lecture). Assignment ) Implementation of a feed-forward Neural Network from scratch (each person individually, -2 page report). 23.0: meeting solving open questions 30.0: meeting solving open questions 06.: Intro to CNNs and Tensorflow (mandatory lecture), hand in Assignment Assignment 2) Train CNN in tensorflow (each person individually, -2 page report) 3. meeting solving open questions Andreas Eitel Uni FR DL 207 (3)

4 Schedule and outline Phase Today: Introduction to Deep Learning (lecture). Assignment ) Implementation of a feed-forward Neural Network from scratch (each person individually, -2 page report). 23.0: meeting solving open questions 30.0: meeting solving open questions 06.: Intro to CNNs and Tensorflow (mandatory lecture), hand in Assignment Assignment 2) Train CNN in tensorflow (each person individually, -2 page report) 3. meeting solving open questions Phase 2 (split into three tracks) 20. First meeting in tracks, hand in Assignment 2 Assignment 3) Group work track specific task, -2 page report Assignment 4) Group work advanced topic, -2 page report Andreas Eitel Uni FR DL 207 (3)

5 Schedule and outline Phase Today: Introduction to Deep Learning (lecture). Assignment ) Implementation of a feed-forward Neural Network from scratch (each person individually, -2 page report). 23.0: meeting solving open questions 30.0: meeting solving open questions 06.: Intro to CNNs and Tensorflow (mandatory lecture), hand in Assignment Assignment 2) Train CNN in tensorflow (each person individually, -2 page report) 3. meeting solving open questions Phase 2 (split into three tracks) 20. First meeting in tracks, hand in Assignment 2 Assignment 3) Group work track specific task, -2 page report Assignment 4) Group work advanced topic, -2 page report Phase 3: 5.0: meeting final project topics and solving open questions Assignment 5) Solve a challenging problem of your choice using the tools from Phase 2 (group work + 5 min presentation) Andreas Eitel Uni FR DL 207 (3)

6 Tracks Track Neurorobotics/Robotics Robot navigation Deep reinforcement learning Andreas Eitel Uni FR DL 207 (4)

7 Tracks Track Neurorobotics/Robotics Robot navigation Deep reinforcement learning Track 2 Machine Learning Architecture search and hyperparameter optimization Visual recognition Andreas Eitel Uni FR DL 207 (4)

8 Tracks Track Neurorobotics/Robotics Robot navigation Deep reinforcement learning Track 2 Machine Learning Architecture search and hyperparameter optimization Visual recognition Track 3 Computer Vision Image segmentation Autoencoders Generative adversarial networks Andreas Eitel Uni FR DL 207 (4)

9 Groups and Evaluation In total we have three practical phases Reports: Short -2 page report explaining your results, typically accompanied by -2 figures (e.g. a learning curve) hand in the code you used to generate these results as well, send us your github/bitbucket repo Presentations: Phase 3: 5 min group work presentation in last week the reports and presentation after each exercise will be evaluated w.r.t. scientific quality and quality of presentation Requirements for passing: attending the class in the lecture and presentation weeks active participation in the small groups, i.e., programming, testing, writing report Andreas Eitel Uni FR DL 207 (5)

10 Today... Groups: You will work on your own for the first exercise Assignment: You will have three weeks to build a MLP and apply it to the MNIST dataset (more on this at the end) Lecture: Short recap on how MLPs (feed-forward neural networks) work and how to train them Andreas Eitel Uni FR DL 207 (6)

11 (Deep) Machine Learning Overview Data Model M? inference Learn Model M from the data 2 Let the model M infer unknown quantities from data Andreas Eitel Uni FR DL 207 (7)

12 (Deep) Machine Learning Overview Data Model M? inference Andreas Eitel Uni FR DL 207 (8)

13 Machine Learning Overview What is the difference between deep learning and a standard machine learning pipeline? Andreas Eitel Uni FR DL 207 (9)

14 Standard Machine Learning Pipeline () Engineer good features (not learned) (2) Learn Model (3) Inference e.g. classes of unseen data x x x 2 (3) 3 () x x x features color height (4 cm) 2 color height (3.5 cm) 3 color height (0 cm) (2) Data Andreas Eitel Uni FR DL 207 (0)

15 Unsupervised Feature Learning Pipeline (a) (b) (2) (3) Maybe engineer good features (not learned) Learn (deep) representation unsupervisedly Learn Model Inference e.g. classes of unseen data x x x 2 3 (a) x x x features color height (4 cm) 2 color height (3.5 cm) 3 color height (0 cm) (3) (2) Data Pixels (b) Andreas Eitel Uni FR DL 207 ()

16 Supervised Deep Learning Pipeline () Jointly Learn everything with a deep architecture (2) Inference e.g. classes of unseen data x x x Classes 2 (2) 3 Data Andreas Eitel Uni FR Pixels () DL 207 (2)

17 Training supervised feed-forward neural networks Let s formalize! We are given: Dataset D = {(x, y ),..., (x N, y N )} A neural network with parameters θ which implements a function f θ (x) We want to learn: The parameters θ such that i [, N] : f θ (x i ) = y i Andreas Eitel Uni FR DL 207 (3)

18 Training supervised feed-forward neural networks A neural network with parameters θ which implements a function f θ (x) θ is given by the network weights w and bias terms b f(x) (2) h (x) () h (x) smax a smax a a a a a a ai... x xj... xn Andreas Eitel Uni FR DL 207 (4)

19 Neural network forward-pass Computing f θ (x) for a neural network is a forward-pass f(x) smax a smax a unit i activation: N a i = w i,jx j + b i j=i unit i output: h () i (x) = t(a i) where t( ) is an activation or transfer function (2) h (x) () h (x) a a a a a ai wi,j x xj xn bi Andreas Eitel Uni FR DL 207 (5)

20 Neural network forward-pass Computing f θ (x) for a neural network is a forward-pass alternatively (and much faster) use vector notation: layer activation: a () = W () x + b () layer output: h () (x) = t(a () ) where t( ) is applied element wise f(x) (2) h (x) () h (x) W () smax a smax a a a a a a ai wi,j x xj xn bi Andreas Eitel Uni FR DL 207 (6)

21 Neural network forward-pass Computing f θ (x) for a neural network is a forward-pass Second layer layer 2 activation: a (2) = W (2) h () (x) + b (2) layer 2 output: h () (x) = t(a (2) ) where t( ) is applied element wise () f(x) (2) h (x) W (2) h (x) W () smax a smax a a a a a a ai... x xj... xn Andreas Eitel Uni FR DL 207 (7)

22 Neural network forward-pass Computing f θ (x) for a neural network is a forward-pass Output layer output layer activation: a (3) = W (3) h (2) (x) + b (3) network output: f(x) = o(a (3) ) where o( ) is the output nonlinearity for classification use softmax: e z i o i(z) = z j= ez j () f(x) (2) h (x) h (x) W (3) W (2) W () smax a smax a a a a a a ai... x xj... xn Andreas Eitel Uni FR DL 207 (8)

23 Training supervised feed-forward neural networks Neural network activation functions Typical nonlinearities for hidden layers are: tanh(a i), sigmoid σ(a i) =, ReLU relu(ai) = max(ai, 0) + e a i tanh and sigmoid are both squashing non-linearities ReLU just thresholds at 0 Why not linear? Andreas Eitel Uni FR DL 207 (9)

24 Training supervised feed-forward neural networks Train parameters θ such that i [, N] : f θ (x i ) = y i Andreas Eitel Uni FR DL 207 (20)

25 Training supervised feed-forward neural networks Train parameters θ such that i [, N] : f θ (x i ) = y i We can do this via minimizing the empirical risk on our dataset D min L(f θ, D) = min θ θ N where l(, ) is a per example loss N l(f θ (x i ), y i ), () i= Andreas Eitel Uni FR DL 207 (20)

26 Training supervised feed-forward neural networks Train parameters θ such that i [, N] : f θ (x i ) = y i We can do this via minimizing the empirical risk on our dataset D min L(f θ, D) = min θ θ N N l(f θ (x i ), y i ), () i= where l(, ) is a per example loss For regression often use the squared loss: l(f θ (x), y) = 2 M (f j,θ (x) y j) 2 j= For M-class classification use the negative log likelihood: l(f θ (x), y) = M log(f j,θ (x)) y j j Andreas Eitel Uni FR DL 207 (20)

27 Gradient descent The simplest approach to minimizing min L(f θ, D) is gradient descent θ Gradient descent: θ 0 init randomly do θ t+ = θ t γ L(f θ, D) θ while (L(f θ t+, V ) L(f θ t, V )) 2 > ɛ Andreas Eitel Uni FR DL 207 (2)

28 Gradient descent The simplest approach to minimizing min L(f θ, D) is gradient descent θ Gradient descent: θ 0 init randomly do θ t+ = θ t γ L(f θ, D) θ while (L(f θ t+, V ) L(f θ t, V )) 2 > ɛ Where V is a validation dataset (why not use D?) Remember in our case: L(f θ, D) = N l(f θ (x i ), y i ) N i= We will get to computing the derivatives shortly Andreas Eitel Uni FR DL 207 (2)

29 Gradient descent Gradient descent example: D = {(x, y ),..., (x 00, y 00 )} with x U[0, ] y = 3 x + ɛ where ɛ N (0, 0.) Andreas Eitel Uni FR DL 207 (22)

30 Gradient descent Gradient descent example: D = {(x, y ),..., (x 00, y 00 )} with x U[0, ] y = 3 x + ɛ where ɛ N (0, 0.) Learn parameters θ of function f θ (x) = θx using loss l(f θ (x), y) = 2 f θ(x) y 2 2 = 2 (f θ(x) y) 2 L(f θ, D) θ = N N i= l(f θ, D) θ = N N (θx y)x i= Andreas Eitel Uni FR DL 207 (22)

31 Gradient descent Gradient descent example: D = {(x, y ),..., (x 00, y 00 )} with x U[0, ] y = 3 x + ɛ where ɛ N (0, 0.) Learn parameters θ of function f θ (x) = θx using loss l(f θ (x), y) = 2 f θ(x) y 2 2 = 2 (f θ(x) y) 2 L(f θ, D) θ = N N i= l(f θ, D) θ = N N (θx y)x i= Andreas Eitel Uni FR DL 207 (22)

32 Gradient descent Gradient descent example γ = 2. gradient descent Andreas Eitel Uni FR DL 207 (23)

33 Stochastic Gradient descent (SGD) There are two problems with gradient descent:. We have to find a good γ 2. Computing the gradient is expensive if the training dataset is large! Problem 2 can be attacked with online optimization (we will have a look at this) Problem remains but can be tackled via second order methods or other advanced optimization algorithms (rprop/rmsprop, adagrad) Andreas Eitel Uni FR DL 207 (24)

34 Gradient descent. We have to find a good γ (γ = 2., γ = 5.) gradient descent 2 Andreas Eitel Uni FR DL 207 (25)

35 Stochastic Gradient descent (SGD) 2 Computing the gradient is expensive if the training dataset is large! What if we would only evaluate f on parts of the data? Stochastic Gradient Descent: θ 0 init randomly do (x, y ) D sample example from dataset D θ t+ = θ t γ t l(f θ (x ), y ) θ while (L(f θ t+, V ) L(f θ t, V )) 2 > ɛ where γ t and (γ t ) 2 < t= t= (γ should go to zero but not too fast) Andreas Eitel Uni FR DL 207 (26)

36 Stochastic Gradient descent (SGD) 2 Computing the gradient is expensive if the training dataset is large! What if we would only evaluate f on parts of the data? Stochastic Gradient Descent: θ 0 init randomly do (x, y ) D sample example from dataset D θ t+ = θ t γ t l(f θ (x ), y ) θ while (L(f θ t+, V ) L(f θ t, V )) 2 > ɛ where γ t and (γ t ) 2 < t= t= (γ should go to zero but not too fast) SGD can speed up optimization for large datasets but can yield very noisy updates in practice mini-batches are used (compute l(, ) for several samples and average) we still have to find a learning rate shedule γ t Andreas Eitel Uni FR DL 207 (26)

37 Stochastic Gradient descent (SGD) Same data, assuming that gradient evaluation on all data takes 4 times as much time as evaluating a single datapoint (gradient descent (γ = 2), stochastic gradient descent (γ t = 0.0 t )) sgd Andreas Eitel Uni FR DL 207 (27)

38 Neural Network backward pass Now how do we compute the gradient for a network? l(f(x), y) Use the chain rule: f(g(x)) x = f(g(x)) g(x) g(x) x first compute loss on output layer then backpropagate to get l(f(x), y) l(f(x), y) and W (3) a (3) f(x) (2) h (x) W (3) W (2) () h (x) W () smax a smax a a a a a a ai... x xj... xn Andreas Eitel Uni FR DL 207 (28)

39 Neural Network backward pass Now how do we compute the gradient for a network? l(f(x), y) gradient wrt. layer 3 weights: l(f(x), y) l(f(x), y) a (3) = W (3) a (3) W (3) assuming l is NLL and softmax outputs, gradient wrt. layer 3 activation is: l(f(x), y) = (y f(x)) a (3) f(x) (2) h (x) W (3) W (2) () h (x) smax a a a y is one-hot encoded a a ai gradient of a wrt. W (3) : W () a (3) W = (3) h(2) (x) T... x xj... xn a smax a recall a (3) = W (3) h (2) (x) + b (3) Andreas Eitel Uni FR DL 207 (29)

40 Neural Network backward pass Now how do we compute the gradient for a network? gradient wrt. layer 3 weights: l(f(x), y) l(f(x), y) a (3) = W (3) a (3) W (3) assuming l is NLL and softmax outputs, gradient wrt. layer 3 activation is: l(f(x), y) = (y f(x)) a (3) f(x) (2) h (x) W (3) W (2) l(f(x), y) smax a a a gradient of a wrt. W (3) () : h (x) a (3) a a W = (3) h(2) (x) T ai W () combined:... l(f(x), y) = (y f(x))(h (2) (x)) T x xj... xn W (3) a smax recall a (3) = W (3) h (2) (x) + b (3) a Andreas Eitel Uni FR DL 207 (30)

41 Neural Network backward pass Now how do we compute the gradient for a network? gradient wrt. layer 3 weights: l(f(x), y) l(f(x), y) a (3) = W (3) a (3) W (3) assuming l is NLL and softmax outputs, gradient wrt. layer 3 activation is: l(f(x), y) = (y f(x)) a (3) gradient of a wrt. W (3) : a (3) W = (3) (h(2) (x)) T gradient wrt. previous layer: l(f(x), y) h (2) (x) l(f(x), y) a (3) = a (3) h (2) (x) ( = W (3)) T l(f(x), y) a (3) f(x) (2) h (x) W (3) W (2) () h (x) W () l(f(x), y) smax a smax a a a a a a ai... x xj... xn recall a (3) = W (3) h (2) (x) + b (3) Andreas Eitel Uni FR DL 207 (3)

42 Neural Network backward pass Now how do we compute the gradient for a network? l(f(x), y) gradient wrt. layer 2 weights: l(f(x), y) l(f(x), y) h (2) (x) a (2) = W (2) h (2) (x) a (2) W (2) same schema as before just have to consider computing derivative of activation function h(2) (x), e.g. for sigmoid σ( ) a (2) f(x) (2) h (x) W (3) W (2) () h (x) smax a a a a a ai h (2) (x) = σ(a i)( a i) a W () (2) and backprop even further... x xj... xn a smax a recall a (3) = W (3) h (2) (x) + b (3) Andreas Eitel Uni FR DL 207 (32)

43 Gradient Checking Backward-pass is just repeated application of the chain rule However, there is a huge potential for bugs... Gradient checking to the rescue (simply check code via finite-differences): Gradient Checking: θ = (W (), b (),... ) init randomly x init randomly ; y init randomly g analytic compute gradient l(f θ(x), y) θ for i in #θ ˆθ = θ ˆθi = ˆθ i + ɛ g numeric = l(fˆθ(x), y) l(f θ (x), y) ɛ assert( g numeric g analytic < ɛ) can also be used to test partial implementations (i.e. layers, activation functions) simply remove loss computation and backprop ones via backprop Andreas Eitel Uni FR DL 207 (33)

44 Overfitting If you train the parameters of a large network θ = (W (), b (),... ) you will see overfitting! L(f θ (x), D) L(f θ (x), V ) This can be at least partly conquered with regularization Andreas Eitel Uni FR DL 207 (34)

45 Overfitting If you train the parameters of a large network θ = (W (), b (),... ) you will see overfitting! L(f θ (x), D) L(f θ (x), V ) This can be at least partly conquered with regularization weight decay: change cost (and gradient) L(f θ, D) = N min θ enforces small weights (occams razor) N l(f θ (x i ), y i ) + #θ θ i 2 #θ i= i Andreas Eitel Uni FR DL 207 (34)

46 Overfitting If you train the parameters of a large network θ = (W (), b (),... ) you will see overfitting! L(f θ (x), D) L(f θ (x), V ) This can be at least partly conquered with regularization weight decay: change cost (and gradient) L(f θ, D) = N min θ N l(f θ (x i ), y i ) + #θ θ i 2 #θ i= i enforces small weights (occams razor) dropout: kill 50% of the activations in each hidden layer during training forward pass. Multiply hidden activations by during testing 2 prevents co-adaptation / enforces robustness to noise Andreas Eitel Uni FR DL 207 (34)

47 Overfitting If you train the parameters of a large network θ = (W (), b (),... ) you will see overfitting! L(f θ (x), D) L(f θ (x), V ) This can be at least partly conquered with regularization weight decay: change cost (and gradient) L(f θ, D) = N min θ N l(f θ (x i ), y i ) + #θ θ i 2 #θ i= i enforces small weights (occams razor) dropout: kill 50% of the activations in each hidden layer during training forward pass. Multiply hidden activations by during testing 2 prevents co-adaptation / enforces robustness to noise Many, many more! Andreas Eitel Uni FR DL 207 (34)

48 Assignment Implementation: Implement a simple feed-forward neural network by completing the provided stub this includes: possibility to use 2-4 layers sigmoid/tanh and ReLU for the hidden layer softmax output layer optimization via gradient descent (gd) optimization via stochastic gradient descent (sgd) gradient checking code (!!!) weight initialization with random noise (!!!) (use normal distribution with changing std. deviation for now) Bonus points for testing some advanced ideas: implement dropout, l2 regularization implement a different optimization scheme (rprop, rmsprop, adagrad) Code stub: Evaluation: Find good parameters (learning rate, number of iterations etc.) using a validation set (usually take the last 0k examples from the training set) After optimizing parameters run on the full dataset and test once on the test-set (you should be able to reach.6.8% error ) Submission: Clone our github repo and send us the link to your github/bitbucket repo including your solution code and report. to eitel@cs.uni-freiburg.de, hueglem@cs.uni-freiburg.de and zhang@cs.uni-freiburg.de Andreas Eitel Uni FR DL 207 (35)

49 Slide Information These slides have been created by Tobias Springenberg for the DL Lab Course WS 206/207. Andreas Eitel Uni FR DL 207 (36)

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Deep Learning & Artificial Intelligence WS 2018/2019

Deep Learning & Artificial Intelligence WS 2018/2019 Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Machine Learning (CSE 446): Backpropagation

Machine Learning (CSE 446): Backpropagation Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN) TARGETED PIECES OF KNOWLEDGE Linear regression Activation function Multi-Layers Perceptron (MLP) Stochastic Gradient Descent

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Lecture 4 Backpropagation

Lecture 4 Backpropagation Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued) Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 14. Deep Learning An Overview Joschka Boedecker and Wolfram Burgard and Bernhard Nebel Guest lecturer: Frank Hutter Albert-Ludwigs-Universität Freiburg July 14, 2017

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

DeepLearning on FPGAs

DeepLearning on FPGAs DeepLearning on FPGAs Introduction to Deep Learning Sebastian Buschäger Technische Universität Dortmund - Fakultät Informatik - Lehrstuhl 8 October 21, 2017 1 Recap Computer Science Approach Technical

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

An overview of deep learning methods for genomics

An overview of deep learning methods for genomics An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Notes on Back Propagation in 4 Lines

Notes on Back Propagation in 4 Lines Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Lecture 6 Optimization for Deep Neural Networks

Lecture 6 Optimization for Deep Neural Networks Lecture 6 Optimization for Deep Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 12, 2017 Things we will look at today Stochastic Gradient Descent Things

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Understanding Neural Networks : Part I

Understanding Neural Networks : Part I TensorFlow Workshop 2018 Understanding Neural Networks Part I : Artificial Neurons and Network Optimization Nick Winovich Department of Mathematics Purdue University July 2018 Outline 1 Neural Networks

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information