Continuous NN. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Size: px
Start display at page:

Download "Continuous NN. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University."

Transcription

1 Continuous NN Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Continuous NN 1

2 Course Overview Continuous NN 2

3 Course Overview Lecture 1: Linear Models 1. Introduction and Applications 2. Linear Models: Least-Squares and Logistic Regression Lecture 2: Neural Networks 1. Introduction to Nonlinear Models 2. Single Layer Neural Networks 3. Training Algorithms for Single Layer Neural Networks 4. Neural Networks and Residual Neural Networks (ResNets) Lecture 3: Neural Networks as Differential Equations 1. Resnets as ODEs 2. Residual CNN and their relation to PDEs Continuous NN 3

4 Stability Continuous NN 4

5 Continuous NN 5 Motivation: Nonlinear Models In general, impossible to find a linear separator between classes input features transformed features Goal/Trick Embed the points in higher dimension and/or move the points to make them linearly separable

6 Stability of Deep Residual Networks Why are ResNets more stable? A small change Y 1 = Y 0 + hσ(k 0 Y 0 + b 0 ). =. Y N = Y N 1 + hσ(k N 1 Y N 1 + b N 1 ) This is nothing but a forward Euler discretization of the Ordinary Differential Equation (ODE) t Y(t) = σ(k(t)y(t) + b(t)), Y(0) = Y 0 We can understand the behavior by learning the dynamics of nonlinear ODEs [6, 5]. Continuous NN 6

7 Crash Course: ODE Continuous NN 7

8 Continuous NN 8 Crash Course on ODEs Given the ODE Assumptions: t y(t) = f (t, y(t)) 1. f differentiable with Jacobian J(t, y) = ( ) f y 2. J changes sufficiently slowly in time Then (see also [2, 3, 1]) If Re(eig(J)) > 0 If Re(eig(J)) < 0 point) If Re(eig(J)) = 0 Unstable Stable (converge to a stationary Stable, energy bounded

9 Continuous NN 9 Stability of Residual Network Assume forward propagation of single example y 0 t y(t) = σ(k(t)y(t) + b(t)), y(0) = y 0 The Jacobian is J(t) = diag (σ (K(t)y(t) + b(t))) K(t) Here, σ (x) 0 for tanh, ReLU,... Hence, problem is stable when 1. J changes slowly in time 2. Re(eigK(t)) 0 for every t Remember that we learn K ensure stability by regularization/constraints!

10 Continuous NN 10 Residual Network as a Path Planning Problem Forward propagation in residual network (continuous) t Y(t) = σ(k(t)y(t) + b(t)), Y(0) = Y 0 The goal is to plan a path (via K and b) such that the initial data can be linearly separated Question: What is a layer, what is depth?

11 Continuous NN 11 Stability: Continuous vs. Discrete Assume K is chosen so that the (continuous) forward propagation is stable t Y(t) = σ(ky(t) + b(t)), Y(0) = Y 0 And assume we use the forward Euler method to discretize Y j+1 = Y j + hσ(k j Y j + b j ) Is the network stable? Not always...

12 Continuous NN 12 Stability: A Simple Example Look at the simplest possible forward propagation t Y(t) = λy(t) And assume we use the forward Euler to discretize Y j+1 = Y j + hλy j = (1 + hλ)y j Then the method is stable only if 1 + hλ 1 Not every network is stable! Time step size depends on λ.

13 Why you should care about stability min θ 2 Y N(θ) C 2 F Y j+1 (θ) = Y j (θ) + 10 N tanh (KY j(θ)) where C = Y 200 (1, 1), Y 0 N (0, 1), and θ 1 θ 2 θ 1 θ 2 K(θ) = θ 2 θ 1 θ 2 θ 1 θ 1 θ 2 θ 1 θ 2 objective, N = 5 objective, N = 100 Next: Compare different inputs generalization Continuous NN 13

14 Continuous NN 14 Why you should care about stability - 2 objective, Y train 0 objective, Y test 0 abs. diff stable, N = 100 unstable, N = 5

15 Continuous NN 15 Stability: A Non-Trivial Example Consider the antisymmetric kernel model K(t) = K(t) K(t). Here, Re(eig(J(t))) = 0 for all θ. Assume we use the forward Euler to discretize Y j+1 = Y j + hσ((k j K j )Y j + b j ). Tricky question: How to pick h to ensure stability? Answer: Impossible since eigenvalues of Jacobian are imaginary. Need other method than forward Euler.

16 Experiment: Peaks Compare the three approaches for training a residual neural network EResNet PeaksSGD.m - stochastic gradient descent EResNet PeaksNewtonCG.m - Newton CG with block-diagonal Hessian approximation EResNet PeaksVarPro.m - Fully coupled solver. Eliminate θ and use steepest descent/newton CG for reduced problem. Continuous NN 16

17 Parametric Models Continuous NN 17

18 Motivation Recall single layer Z = σ(ky + b), where Y R n f n, K R m n f, b R, and σ element-wise activation. We saw that m n f needed to fit training data. Conservative example: Consider MNIST (n f = 28 2 ) and use m = n f 614, 656 unknowns for a single layer. Famous quote: With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. Possible remedies: Regularization: penalize K Parametric model: K(θ) where θ R p with p m n f. Continuous NN 18

19 Some Simple Parametric Models Diagonal scaling: K(θ) = diag(θ) R n f n f Advantage: preserves size and structure of data. Antisymmetric kernel 0 θ 1 θ 2 K(θ) = θ 1 0 θ 3 θ 2 θ 3 0 Advantage?: real(λ i (K(θ))) = 0. M-matrix θ 1 + θ 2 θ 1 θ 2 K(θ) = θ 3 θ 3 + θ 4 θ 4 θ 0 θ 5 θ 6 θ 5 + θ 6 Advantage: like differential operator Continuous NN 19

20 Continuous NN 20 Differentiating Parametric Models Need derivatives of model to optimize θ in E(Wσ(K(θ)Y + b), C) (we can re-use previous derivatives and use chain rule) Note that all previous models are linear in the following sense K(θ) = mat(q θ) Therefore, matrix-vector products with the Jacobian simply are J θ (K(θ))v = mat(q v) and J θ (K(θ)) w = Q w where v R p and w R m.

21 Example: Derivative of M-matrix θ 1 + θ 2 θ 1 θ 2 K(θ) = θ 3 θ 3 + θ 4 θ 4 θ 0 θ 5 θ 6 θ 5 + θ 6 verify that this can be written as K(θ) = mat(q θ) where Q = R Note: not efficient to construct Q when p large but helpful when computing derivatives Continuous NN 21

22 Convolutional Neural Networks [7] y R input features z R output features θ R 5 5 convolution kernel useful for speech, images, videos,... efficient parameterization, efficient codes (GPUs,... ) now: CNNs as parametric model and PDEs, simple code see E13Conv2D.m Continuous NN 22

23 Continuous NN 23 Convolutions in 1D Let y, z, θ : R R, z : R R be continuous functions then z(x) = (θ y)(x) = θ(x t)y(t)dt. Assume θ(x) 0 only in interval [ a, a] (compact support). A few properties θ y = F 1 ((Fθ)(Fy)), F is Fourier transform θ y = y θ

24 Discrete Convolutions in 1D Let θ R 2k+1 be stencil, y R n f grid function z i = (θ y) i = k θ j y i 1. j= k Example: Discretize θ R 3 (non-zeros only), y, z R 4 on regular grid z 1 = θ 3 w 1 + θ 2 x 1 + θ 1 x 2 z 2 = θ 3 x 1 + θ 2 x 2 + θ 1 x 3 z 3 = θ 3 x 2 + θ 2 x 3 + θ 1 x 4 z 4 = θ 3 x 3 + θ 2 x 4 + θ 1 w 2 where w 1, w 2 are used to implement different boundary conditions (right choice? depends... ). Continuous NN 24

25 Structured Matrices - 1 z 1 z 2 z 3 z 4 = θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 w 1 x 1 x 2 x 3 x 4 w 2 Different boundary conditions lead to different structures Zero boundary conditions: w 1 = w 2 = 0 z 1 θ 2 θ 1 z 2 z 3 = θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 z 4 θ 3 θ 2 x 1 x 2 x 3 x 4 This is a Toeplitz matrix (constant along diagonals). Continuous NN 25

26 Structured Matrices - 2 Periodic boundary conditions: w 1 = x 4 and w 2 = x 1 z 1 θ 2 θ 1 θ 3 x 1 z 2 z 3 = θ 3 θ 2 θ 1 x 2 θ 3 θ 2 θ 1 x 3 z 4 θ 1 θ 3 θ 2 x 4 this is a circulant matrix (each row/column is periodic shift of previous row/column) An attractive property of a circulant matrix is that we can efficiently compute its eigendecomposition K(θ) = F diag(λ)f where F is the discrete Fourier transform and the eigenvalues, λ C 4, can be computed using first column λ = F(K(θ)u 1 ) where u 1 = (1, 0, 0, 0). Continuous NN 26

27 Coding: 1D Convolution using FFTs Let θ R 3 be some stencil and n f = m = build a sparse matrix K for computing the convolution with periodic boundary conditions. Hint: spdiags 2. compute the eigenvalues of K using eig(full(k)) and using fft and first column of K. Compare! 3. verify that norm(k*y - real(ifft(lam.*fft(y)))) is small. 4. repeat previous item for transpose. 5. write code that computes eigenvalues for arbitrary stencil size without building K. Hint: circshift Continuous NN 27

28 Derivatives of 1D Convolution - 1 Recall that we need a way to compute J θ (K(θ)Y)v and J θ (K(θ)Y) w, (J θ R m p ) (note that we put Y inside the bracket to avoid tensors) Assume single example, y. Since we have periodic boundary conditions K(θ)y = real(f (λ(θ) Fy)) = real(f diag(fy) λ(θ)), λ(θ) = F(K(θ)u 1 ). Need to differentiate eigenvalues w.r.t. θ. Note linearity K(θ)u 1 = Qθ, Q =? Continuous NN 28

29 Continuous NN 29 Derivatives of 1D Convolution - 2 Assume we have K(θ)y = real(f diag(fy) FQθ)) Then mat-vecs with Jacobian are easy to compute J θ (K(θ)y)v = real(f (diag(fy)fqv)) and (note that F = F and (F ) = F ) J θ (K(θ)y) w = real(q Fdiag(Fy)F w) Code this and check Jacobian and its transpose using conv1d.m!

30 Extension: Width of CNNs RGB image input channels output channels Width of CNN can be controlled by number of input and output channels of each layer. Let y = (y R, y G, y B ), then we might compute z 1 z 2 z 3 z 4 = K 11 (θ 11 ) K 12 (θ 12 ) K 11 (θ 13 ) K 21 (θ 21 ) K 22 (θ 22 ) K 23 (θ 23 ) K 31 (θ 31 ) K 32 (θ 32 ) K 33 (θ 33 ) K 41 (θ 41 ) K 42 (θ 42 ) K 43 (θ 43 ) y R y G y B, where K ij is a 2D convolution operator with stencil θ ij Continuous NN 30

31 Outlook: Possible Extensions For now, we just introduced the very basic convolution layer. CNNs used in practice also use the following components pooling: reduce image resolution (e.g. average over patches) stride: Example: stride of two reduces image resolution by computing z only at every other pixel. Build your own parametric model (ideas for projects) M matrix for convolution cheaper convolution models: separable kernels, doubly symmetric kernels Wavelet,... other sparsity patterns Continuous NN 31

32 PDE-based Neural Networks Continuous NN 32

33 Deep Convolutional Neural Networks Deep network with convolution operators parameterized by stencils. Common Y j+1 = P j Y j + hk j,2 σ(k j,1 Y j + b j ). Motivation for second convolution? approximation properties of double layer if σ(x) = max(x, 0) entries in Y can only grow Learning tasks: image classification semantic segmentation Continuous NN 33

34 Continuous NN 34 Convolutions and PDEs Let y be 1D grid function, y y (n cells of width h x = 1/n) K(θ)y = [θ 1 θ 2 θ 3 ] y ( β1 = 4 [1 2 1] + β 2 [ 1 0 1] + β ) 3 [ 1 2 1] y 2h x Relation between β and θ given by 1/4 1/(2h x ) 1/hx 2 1/2 0 2/hx 2 1/4 1/(2h x ) 1/hx 2 }{{} =A(h x ) In the limit h x 0 this gives β 1 β 2 β 3 h 2 x = K(θ(t)) = β 1 (t) + β 2 (t) x + β 3 (t) 2 x. θ 1 θ 2 θ 3.

35 Algebraic Prolongation of Convolution Kernels K H = RK h P, where K h fine mesh operator (given) R restriction (e.g., averaging) P prolongation (e.g., interpolation) Remarks: Galerkin: R = γp Coarse Fine: unique if kernel size constant. only small linear solve required prolongation coarse convolution restriction fine convolution Continuous NN 35

36 Scaling Convolution Operators: Algebraic Let n c = 3, n f = 6 (see E15CoarseToFineConv1D.m) Prolongation and restriction P = 1/2 1/2 R = 1 3/4 1/4 1/4 3/4 3/4 1/4 1/4 3/4 1 1/2 1/2, 1/2 1/2 1. build K H = RK h D for jth unit vector as stencil 2. extract a column associated with interior node 3. reshape and get patch around node 4. vectorize and use as one column Continuous NN 36

37 Stability of Convolutional ResNets Residual Neural Network is discretization of t y(t) = K 2 (t)σ (K 1 (t)y(t) + b(t)), y(0) = y 0. To analyze stability need to look at eigenvalues of J(t) = K 2 (t)diag (σ (K 1 (t)y(t) + b(t))) K 1 (t) Difficult to analyze eigenvalues for general K 2 and K 1. Easy option: Set K 2 = K 1 (and drop subscript index). Doing so gives: J sym (t) = K(t) diag (σ (K 1 (t)y(t) + b(t)))k(t) symmetric negative semidefinite stability if t K small E15ParabolicCNN.m Continuous NN 37

38 Parabolic CNN In original Residual Net choose K 2 = K 1 parabolic CNN = K. This gives t Y(t) = K(t) σ(k(t)y(t) + b(t)), Y(0) = Y 0. Theorem If σ is monotonically non-decreasing, then the forward propagation is stable, i.e., there is a M > 0 such that for all T Y(T ) Y ɛ (T ) F M Y(0) Y ɛ (0) F, where Y and Y ɛ are solutions for different initial values. Use forward Euler discretization with h small enough Y j+1 = Y j hk j σ(k j Y + b j ), j = 0, 1,..., N 1 Similar to anisotropic diffusion (popular in image processing) [4] Continuous NN 38

39 Continuous NN 39 Proof: Stability of Parabolic Networks For ease of notation, assume no bias. We show that t Y(t) Y ɛ (t) 2 F 0. Integrating this over [0, T ] yields the stability result. Why? Note that for all t [0, T ] taking derivative gives ( K(t) σ(k(t)y) + K(t) σ(k(t)y ɛ ), Y Y ɛ ) (σ(k(t)y) σ(k(t)y ɛ ), K(t)(Y Y ɛ )) 0. Where (, ) is inner product and the inequality follows from the monotonicity of the activation function.

40 Continuous NN 40 Relation to Total Variation Note that the parabolic network can be seen as a gradient flow for φ(y) = s(k(t)y(t) + b(t)), where σ(x) = s (x). What is s? σ(x) = max(x, 0): σ(x) = tanh(x): s(x) = max(x, 0) 2 s(x) = log(exp(x) exp( x)) For K(t) = hx, this shows a relation to total variation [8]. thanks to Stanley Osher for providing this example

41 Continuous NN 41 References [1] U. Ascher. Numerical methods for Evolutionary Differential Equations. SIAM, Philadelphia, [2] U. Ascher and L. Petzold. Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations. SIAM, Philadelphia, PA, [3] U. M. Ascher and C. Greif. A First Course on Numerical Methods. SIAM, Philadelphia, [4] Y. Chen and T. Pock. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): , June [5] W. E. A Proposal on Machine Learning via Dynamical Systems. Communications in Mathematics and Statistics, 5(1):1 11, Mar [6] E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34:014004, [7] Y. LeCun, B. E. Boser, and J. S. Denker. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, pages , [8] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1-4): , 1992.

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University. Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models

More information

arxiv: v1 [cs.lg] 9 May 2017

arxiv: v1 [cs.lg] 9 May 2017 Stable Architectures for Deep Neural Networks Eldad Haber,3 and Lars Ruthotto,3 arxiv:705.0334v [cs.lg] 9 May 07 Department of Earth and Ocean Science, The University of British Columbia, Vancouver, BC,

More information

Deep Neural Networks motivated by PDEs

Deep Neural Networks motivated by PDEs Deep Neural Networks motivated by PDEs The Mathematics of Machine Learning, CCIMI, Cambridge May 24, 2018 Lars Ruthotto Departments of Mathematics and Computer Science, Emory University lruthotto@emory.edu

More information

An Optimal Control Framework for Efficient Training of Deep Neural Networks

An Optimal Control Framework for Efficient Training of Deep Neural Networks An Optimal Control Framework for Efficient Training of Deep Neural Networks CoSIP Intense Course on Deep Learning, Berlin December 1, 2017 Lars Ruthotto Department of Mathematics and Computer Science,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Spatial Transformation

Spatial Transformation Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/2 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun The

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Minghang Zhao, Myeongsu Kang, Baoping Tang, Michael Pecht 1 Backgrounds Accurate fault diagnosis is important to ensure

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Lecture 2: Modular Learning

Lecture 2: Modular Learning Lecture 2: Modular Learning Deep Learning @ UvA MODULAR LEARNING - PAGE 1 Announcement o C. Maddison is coming to the Deep Vision Seminars on the 21 st of Sep Author of concrete distribution, co-author

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Nonparametric regression using deep neural networks with ReLU activation function

Nonparametric regression using deep neural networks with ReLU activation function Nonparametric regression using deep neural networks with ReLU activation function Johannes Schmidt-Hieber February 2018 Caltech 1 / 20 Many impressive results in applications... Lack of theoretical understanding...

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Machine Learning with Quantum-Inspired Tensor Networks

Machine Learning with Quantum-Inspired Tensor Networks Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David

More information

Neural Networks (and Gradient Ascent Again)

Neural Networks (and Gradient Ascent Again) Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Invariant Scattering Convolution Networks

Invariant Scattering Convolution Networks Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

More on Neural Networks

More on Neural Networks More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

APPLICATIONS OF FD APPROXIMATIONS FOR SOLVING ORDINARY DIFFERENTIAL EQUATIONS

APPLICATIONS OF FD APPROXIMATIONS FOR SOLVING ORDINARY DIFFERENTIAL EQUATIONS LECTURE 10 APPLICATIONS OF FD APPROXIMATIONS FOR SOLVING ORDINARY DIFFERENTIAL EQUATIONS Ordinary Differential Equations Initial Value Problems For Initial Value problems (IVP s), conditions are specified

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

More information

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Channel Pruning and Other Methods for Compressing CNN

Channel Pruning and Other Methods for Compressing CNN Channel Pruning and Other Methods for Compressing CNN Yihui He Xi an Jiaotong Univ. October 12, 2017 Yihui He (Xi an Jiaotong Univ.) Channel Pruning and Other Methods for Compressing CNN October 12, 2017

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

arxiv: v1 [stat.ml] 27 Jul 2016

arxiv: v1 [stat.ml] 27 Jul 2016 Convolutional Neural Networks Analyzed via Convolutional Sparse Coding arxiv:1607.08194v1 [stat.ml] 27 Jul 2016 Vardan Papyan*, Yaniv Romano*, Michael Elad October 17, 2017 Abstract In recent years, deep

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information