Continuous NN. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.
|
|
- Sharlene McDonald
- 5 years ago
- Views:
Transcription
1 Continuous NN Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Continuous NN 1
2 Course Overview Continuous NN 2
3 Course Overview Lecture 1: Linear Models 1. Introduction and Applications 2. Linear Models: Least-Squares and Logistic Regression Lecture 2: Neural Networks 1. Introduction to Nonlinear Models 2. Single Layer Neural Networks 3. Training Algorithms for Single Layer Neural Networks 4. Neural Networks and Residual Neural Networks (ResNets) Lecture 3: Neural Networks as Differential Equations 1. Resnets as ODEs 2. Residual CNN and their relation to PDEs Continuous NN 3
4 Stability Continuous NN 4
5 Continuous NN 5 Motivation: Nonlinear Models In general, impossible to find a linear separator between classes input features transformed features Goal/Trick Embed the points in higher dimension and/or move the points to make them linearly separable
6 Stability of Deep Residual Networks Why are ResNets more stable? A small change Y 1 = Y 0 + hσ(k 0 Y 0 + b 0 ). =. Y N = Y N 1 + hσ(k N 1 Y N 1 + b N 1 ) This is nothing but a forward Euler discretization of the Ordinary Differential Equation (ODE) t Y(t) = σ(k(t)y(t) + b(t)), Y(0) = Y 0 We can understand the behavior by learning the dynamics of nonlinear ODEs [6, 5]. Continuous NN 6
7 Crash Course: ODE Continuous NN 7
8 Continuous NN 8 Crash Course on ODEs Given the ODE Assumptions: t y(t) = f (t, y(t)) 1. f differentiable with Jacobian J(t, y) = ( ) f y 2. J changes sufficiently slowly in time Then (see also [2, 3, 1]) If Re(eig(J)) > 0 If Re(eig(J)) < 0 point) If Re(eig(J)) = 0 Unstable Stable (converge to a stationary Stable, energy bounded
9 Continuous NN 9 Stability of Residual Network Assume forward propagation of single example y 0 t y(t) = σ(k(t)y(t) + b(t)), y(0) = y 0 The Jacobian is J(t) = diag (σ (K(t)y(t) + b(t))) K(t) Here, σ (x) 0 for tanh, ReLU,... Hence, problem is stable when 1. J changes slowly in time 2. Re(eigK(t)) 0 for every t Remember that we learn K ensure stability by regularization/constraints!
10 Continuous NN 10 Residual Network as a Path Planning Problem Forward propagation in residual network (continuous) t Y(t) = σ(k(t)y(t) + b(t)), Y(0) = Y 0 The goal is to plan a path (via K and b) such that the initial data can be linearly separated Question: What is a layer, what is depth?
11 Continuous NN 11 Stability: Continuous vs. Discrete Assume K is chosen so that the (continuous) forward propagation is stable t Y(t) = σ(ky(t) + b(t)), Y(0) = Y 0 And assume we use the forward Euler method to discretize Y j+1 = Y j + hσ(k j Y j + b j ) Is the network stable? Not always...
12 Continuous NN 12 Stability: A Simple Example Look at the simplest possible forward propagation t Y(t) = λy(t) And assume we use the forward Euler to discretize Y j+1 = Y j + hλy j = (1 + hλ)y j Then the method is stable only if 1 + hλ 1 Not every network is stable! Time step size depends on λ.
13 Why you should care about stability min θ 2 Y N(θ) C 2 F Y j+1 (θ) = Y j (θ) + 10 N tanh (KY j(θ)) where C = Y 200 (1, 1), Y 0 N (0, 1), and θ 1 θ 2 θ 1 θ 2 K(θ) = θ 2 θ 1 θ 2 θ 1 θ 1 θ 2 θ 1 θ 2 objective, N = 5 objective, N = 100 Next: Compare different inputs generalization Continuous NN 13
14 Continuous NN 14 Why you should care about stability - 2 objective, Y train 0 objective, Y test 0 abs. diff stable, N = 100 unstable, N = 5
15 Continuous NN 15 Stability: A Non-Trivial Example Consider the antisymmetric kernel model K(t) = K(t) K(t). Here, Re(eig(J(t))) = 0 for all θ. Assume we use the forward Euler to discretize Y j+1 = Y j + hσ((k j K j )Y j + b j ). Tricky question: How to pick h to ensure stability? Answer: Impossible since eigenvalues of Jacobian are imaginary. Need other method than forward Euler.
16 Experiment: Peaks Compare the three approaches for training a residual neural network EResNet PeaksSGD.m - stochastic gradient descent EResNet PeaksNewtonCG.m - Newton CG with block-diagonal Hessian approximation EResNet PeaksVarPro.m - Fully coupled solver. Eliminate θ and use steepest descent/newton CG for reduced problem. Continuous NN 16
17 Parametric Models Continuous NN 17
18 Motivation Recall single layer Z = σ(ky + b), where Y R n f n, K R m n f, b R, and σ element-wise activation. We saw that m n f needed to fit training data. Conservative example: Consider MNIST (n f = 28 2 ) and use m = n f 614, 656 unknowns for a single layer. Famous quote: With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. Possible remedies: Regularization: penalize K Parametric model: K(θ) where θ R p with p m n f. Continuous NN 18
19 Some Simple Parametric Models Diagonal scaling: K(θ) = diag(θ) R n f n f Advantage: preserves size and structure of data. Antisymmetric kernel 0 θ 1 θ 2 K(θ) = θ 1 0 θ 3 θ 2 θ 3 0 Advantage?: real(λ i (K(θ))) = 0. M-matrix θ 1 + θ 2 θ 1 θ 2 K(θ) = θ 3 θ 3 + θ 4 θ 4 θ 0 θ 5 θ 6 θ 5 + θ 6 Advantage: like differential operator Continuous NN 19
20 Continuous NN 20 Differentiating Parametric Models Need derivatives of model to optimize θ in E(Wσ(K(θ)Y + b), C) (we can re-use previous derivatives and use chain rule) Note that all previous models are linear in the following sense K(θ) = mat(q θ) Therefore, matrix-vector products with the Jacobian simply are J θ (K(θ))v = mat(q v) and J θ (K(θ)) w = Q w where v R p and w R m.
21 Example: Derivative of M-matrix θ 1 + θ 2 θ 1 θ 2 K(θ) = θ 3 θ 3 + θ 4 θ 4 θ 0 θ 5 θ 6 θ 5 + θ 6 verify that this can be written as K(θ) = mat(q θ) where Q = R Note: not efficient to construct Q when p large but helpful when computing derivatives Continuous NN 21
22 Convolutional Neural Networks [7] y R input features z R output features θ R 5 5 convolution kernel useful for speech, images, videos,... efficient parameterization, efficient codes (GPUs,... ) now: CNNs as parametric model and PDEs, simple code see E13Conv2D.m Continuous NN 22
23 Continuous NN 23 Convolutions in 1D Let y, z, θ : R R, z : R R be continuous functions then z(x) = (θ y)(x) = θ(x t)y(t)dt. Assume θ(x) 0 only in interval [ a, a] (compact support). A few properties θ y = F 1 ((Fθ)(Fy)), F is Fourier transform θ y = y θ
24 Discrete Convolutions in 1D Let θ R 2k+1 be stencil, y R n f grid function z i = (θ y) i = k θ j y i 1. j= k Example: Discretize θ R 3 (non-zeros only), y, z R 4 on regular grid z 1 = θ 3 w 1 + θ 2 x 1 + θ 1 x 2 z 2 = θ 3 x 1 + θ 2 x 2 + θ 1 x 3 z 3 = θ 3 x 2 + θ 2 x 3 + θ 1 x 4 z 4 = θ 3 x 3 + θ 2 x 4 + θ 1 w 2 where w 1, w 2 are used to implement different boundary conditions (right choice? depends... ). Continuous NN 24
25 Structured Matrices - 1 z 1 z 2 z 3 z 4 = θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 w 1 x 1 x 2 x 3 x 4 w 2 Different boundary conditions lead to different structures Zero boundary conditions: w 1 = w 2 = 0 z 1 θ 2 θ 1 z 2 z 3 = θ 3 θ 2 θ 1 θ 3 θ 2 θ 1 z 4 θ 3 θ 2 x 1 x 2 x 3 x 4 This is a Toeplitz matrix (constant along diagonals). Continuous NN 25
26 Structured Matrices - 2 Periodic boundary conditions: w 1 = x 4 and w 2 = x 1 z 1 θ 2 θ 1 θ 3 x 1 z 2 z 3 = θ 3 θ 2 θ 1 x 2 θ 3 θ 2 θ 1 x 3 z 4 θ 1 θ 3 θ 2 x 4 this is a circulant matrix (each row/column is periodic shift of previous row/column) An attractive property of a circulant matrix is that we can efficiently compute its eigendecomposition K(θ) = F diag(λ)f where F is the discrete Fourier transform and the eigenvalues, λ C 4, can be computed using first column λ = F(K(θ)u 1 ) where u 1 = (1, 0, 0, 0). Continuous NN 26
27 Coding: 1D Convolution using FFTs Let θ R 3 be some stencil and n f = m = build a sparse matrix K for computing the convolution with periodic boundary conditions. Hint: spdiags 2. compute the eigenvalues of K using eig(full(k)) and using fft and first column of K. Compare! 3. verify that norm(k*y - real(ifft(lam.*fft(y)))) is small. 4. repeat previous item for transpose. 5. write code that computes eigenvalues for arbitrary stencil size without building K. Hint: circshift Continuous NN 27
28 Derivatives of 1D Convolution - 1 Recall that we need a way to compute J θ (K(θ)Y)v and J θ (K(θ)Y) w, (J θ R m p ) (note that we put Y inside the bracket to avoid tensors) Assume single example, y. Since we have periodic boundary conditions K(θ)y = real(f (λ(θ) Fy)) = real(f diag(fy) λ(θ)), λ(θ) = F(K(θ)u 1 ). Need to differentiate eigenvalues w.r.t. θ. Note linearity K(θ)u 1 = Qθ, Q =? Continuous NN 28
29 Continuous NN 29 Derivatives of 1D Convolution - 2 Assume we have K(θ)y = real(f diag(fy) FQθ)) Then mat-vecs with Jacobian are easy to compute J θ (K(θ)y)v = real(f (diag(fy)fqv)) and (note that F = F and (F ) = F ) J θ (K(θ)y) w = real(q Fdiag(Fy)F w) Code this and check Jacobian and its transpose using conv1d.m!
30 Extension: Width of CNNs RGB image input channels output channels Width of CNN can be controlled by number of input and output channels of each layer. Let y = (y R, y G, y B ), then we might compute z 1 z 2 z 3 z 4 = K 11 (θ 11 ) K 12 (θ 12 ) K 11 (θ 13 ) K 21 (θ 21 ) K 22 (θ 22 ) K 23 (θ 23 ) K 31 (θ 31 ) K 32 (θ 32 ) K 33 (θ 33 ) K 41 (θ 41 ) K 42 (θ 42 ) K 43 (θ 43 ) y R y G y B, where K ij is a 2D convolution operator with stencil θ ij Continuous NN 30
31 Outlook: Possible Extensions For now, we just introduced the very basic convolution layer. CNNs used in practice also use the following components pooling: reduce image resolution (e.g. average over patches) stride: Example: stride of two reduces image resolution by computing z only at every other pixel. Build your own parametric model (ideas for projects) M matrix for convolution cheaper convolution models: separable kernels, doubly symmetric kernels Wavelet,... other sparsity patterns Continuous NN 31
32 PDE-based Neural Networks Continuous NN 32
33 Deep Convolutional Neural Networks Deep network with convolution operators parameterized by stencils. Common Y j+1 = P j Y j + hk j,2 σ(k j,1 Y j + b j ). Motivation for second convolution? approximation properties of double layer if σ(x) = max(x, 0) entries in Y can only grow Learning tasks: image classification semantic segmentation Continuous NN 33
34 Continuous NN 34 Convolutions and PDEs Let y be 1D grid function, y y (n cells of width h x = 1/n) K(θ)y = [θ 1 θ 2 θ 3 ] y ( β1 = 4 [1 2 1] + β 2 [ 1 0 1] + β ) 3 [ 1 2 1] y 2h x Relation between β and θ given by 1/4 1/(2h x ) 1/hx 2 1/2 0 2/hx 2 1/4 1/(2h x ) 1/hx 2 }{{} =A(h x ) In the limit h x 0 this gives β 1 β 2 β 3 h 2 x = K(θ(t)) = β 1 (t) + β 2 (t) x + β 3 (t) 2 x. θ 1 θ 2 θ 3.
35 Algebraic Prolongation of Convolution Kernels K H = RK h P, where K h fine mesh operator (given) R restriction (e.g., averaging) P prolongation (e.g., interpolation) Remarks: Galerkin: R = γp Coarse Fine: unique if kernel size constant. only small linear solve required prolongation coarse convolution restriction fine convolution Continuous NN 35
36 Scaling Convolution Operators: Algebraic Let n c = 3, n f = 6 (see E15CoarseToFineConv1D.m) Prolongation and restriction P = 1/2 1/2 R = 1 3/4 1/4 1/4 3/4 3/4 1/4 1/4 3/4 1 1/2 1/2, 1/2 1/2 1. build K H = RK h D for jth unit vector as stencil 2. extract a column associated with interior node 3. reshape and get patch around node 4. vectorize and use as one column Continuous NN 36
37 Stability of Convolutional ResNets Residual Neural Network is discretization of t y(t) = K 2 (t)σ (K 1 (t)y(t) + b(t)), y(0) = y 0. To analyze stability need to look at eigenvalues of J(t) = K 2 (t)diag (σ (K 1 (t)y(t) + b(t))) K 1 (t) Difficult to analyze eigenvalues for general K 2 and K 1. Easy option: Set K 2 = K 1 (and drop subscript index). Doing so gives: J sym (t) = K(t) diag (σ (K 1 (t)y(t) + b(t)))k(t) symmetric negative semidefinite stability if t K small E15ParabolicCNN.m Continuous NN 37
38 Parabolic CNN In original Residual Net choose K 2 = K 1 parabolic CNN = K. This gives t Y(t) = K(t) σ(k(t)y(t) + b(t)), Y(0) = Y 0. Theorem If σ is monotonically non-decreasing, then the forward propagation is stable, i.e., there is a M > 0 such that for all T Y(T ) Y ɛ (T ) F M Y(0) Y ɛ (0) F, where Y and Y ɛ are solutions for different initial values. Use forward Euler discretization with h small enough Y j+1 = Y j hk j σ(k j Y + b j ), j = 0, 1,..., N 1 Similar to anisotropic diffusion (popular in image processing) [4] Continuous NN 38
39 Continuous NN 39 Proof: Stability of Parabolic Networks For ease of notation, assume no bias. We show that t Y(t) Y ɛ (t) 2 F 0. Integrating this over [0, T ] yields the stability result. Why? Note that for all t [0, T ] taking derivative gives ( K(t) σ(k(t)y) + K(t) σ(k(t)y ɛ ), Y Y ɛ ) (σ(k(t)y) σ(k(t)y ɛ ), K(t)(Y Y ɛ )) 0. Where (, ) is inner product and the inequality follows from the monotonicity of the activation function.
40 Continuous NN 40 Relation to Total Variation Note that the parabolic network can be seen as a gradient flow for φ(y) = s(k(t)y(t) + b(t)), where σ(x) = s (x). What is s? σ(x) = max(x, 0): σ(x) = tanh(x): s(x) = max(x, 0) 2 s(x) = log(exp(x) exp( x)) For K(t) = hx, this shows a relation to total variation [8]. thanks to Stanley Osher for providing this example
41 Continuous NN 41 References [1] U. Ascher. Numerical methods for Evolutionary Differential Equations. SIAM, Philadelphia, [2] U. Ascher and L. Petzold. Computer Methods for Ordinary Differential Equations and Differential-Algebraic Equations. SIAM, Philadelphia, PA, [3] U. M. Ascher and C. Greif. A First Course on Numerical Methods. SIAM, Philadelphia, [4] Y. Chen and T. Pock. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): , June [5] W. E. A Proposal on Machine Learning via Dynamical Systems. Communications in Mathematics and Statistics, 5(1):1 11, Mar [6] E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34:014004, [7] Y. LeCun, B. E. Boser, and J. S. Denker. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, pages , [8] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1-4): , 1992.
Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.
Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models
More informationarxiv: v1 [cs.lg] 9 May 2017
Stable Architectures for Deep Neural Networks Eldad Haber,3 and Lars Ruthotto,3 arxiv:705.0334v [cs.lg] 9 May 07 Department of Earth and Ocean Science, The University of British Columbia, Vancouver, BC,
More informationDeep Neural Networks motivated by PDEs
Deep Neural Networks motivated by PDEs The Mathematics of Machine Learning, CCIMI, Cambridge May 24, 2018 Lars Ruthotto Departments of Mathematics and Computer Science, Emory University lruthotto@emory.edu
More informationAn Optimal Control Framework for Efficient Training of Deep Neural Networks
An Optimal Control Framework for Efficient Training of Deep Neural Networks CoSIP Intense Course on Deep Learning, Berlin December 1, 2017 Lars Ruthotto Department of Mathematics and Computer Science,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationLecture 2: Learning with neural networks
Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for
More informationStatistical Machine Learning
Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationFeedforward Neural Networks
Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationSpatial Transformation
Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationIntroduction to (Convolutional) Neural Networks
Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/2 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun The
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationMultiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis
Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Minghang Zhao, Myeongsu Kang, Baoping Tang, Michael Pecht 1 Backgrounds Accurate fault diagnosis is important to ensure
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationLecture 2: Modular Learning
Lecture 2: Modular Learning Deep Learning @ UvA MODULAR LEARNING - PAGE 1 Announcement o C. Maddison is coming to the Deep Vision Seminars on the 21 st of Sep Author of concrete distribution, co-author
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationNonparametric regression using deep neural networks with ReLU activation function
Nonparametric regression using deep neural networks with ReLU activation function Johannes Schmidt-Hieber February 2018 Caltech 1 / 20 Many impressive results in applications... Lack of theoretical understanding...
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationMachine Learning with Quantum-Inspired Tensor Networks
Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David
More informationNeural Networks (and Gradient Ascent Again)
Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationInvariant Scattering Convolution Networks
Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationMore on Neural Networks
More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationIntroduction to Deep Learning CMPT 733. Steven Bergner
Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationRegularization in Neural Networks
Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationAPPLICATIONS OF FD APPROXIMATIONS FOR SOLVING ORDINARY DIFFERENTIAL EQUATIONS
LECTURE 10 APPLICATIONS OF FD APPROXIMATIONS FOR SOLVING ORDINARY DIFFERENTIAL EQUATIONS Ordinary Differential Equations Initial Value Problems For Initial Value problems (IVP s), conditions are specified
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationFeedforward Neural Networks. Michael Collins, Columbia University
Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationCSC321 Lecture 15: Exploding and Vanishing Gradients
CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationChannel Pruning and Other Methods for Compressing CNN
Channel Pruning and Other Methods for Compressing CNN Yihui He Xi an Jiaotong Univ. October 12, 2017 Yihui He (Xi an Jiaotong Univ.) Channel Pruning and Other Methods for Compressing CNN October 12, 2017
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationarxiv: v1 [stat.ml] 27 Jul 2016
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding arxiv:1607.08194v1 [stat.ml] 27 Jul 2016 Vardan Papyan*, Yaniv Romano*, Michael Elad October 17, 2017 Abstract In recent years, deep
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More information