RegML 2018 Class 8 Deep learning
|
|
- Holly Shelton
- 5 years ago
- Views:
Transcription
1 RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018
2 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x X unsupervised learning of Φ supervised learning of w
3 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x X unsupervised learning of Φ supervised learning of w But can we perform only one learning step?
4 In practice all is multi-layer! (an old slide) Typical data representation schemes, e.g. in vision or speech, involve multiple stages (layers). Pipeline Raw data are often processed: first computing some of low level features, then learning some mid level representation,... finally using supervised learning. These stages are often done separately, but is it possible to design end-to-end learning systems? 3
5 In practice all is (becoming) deep-learning! (updated slide) Typical data representation schemes e.g. in vision or speech, involve deep learning. Pipeline Design some wild- but differentiable hierarchical architecture. Proceed with end-to-end learning!! Ok, maybe not all is deep learning but let s take a look 4
6 Shallow nets f(x) = w, Φ(x), x Φ(x) }{{} Fixed 5
7 Shallow nets f(x) = w, Φ(x), x Φ(x) }{{} Fixed Empirical Risk Minimization (ERM) min w 1 n n (y i w, Φ(x i ) ) 2 i=1 5
8 Neural Nets Basic idea of neural networks: functions obtained by composition. Φ = Φ L Φ 2 Φ 1 6
9 Neural Nets Basic idea of neural networks: functions obtained by composition. Φ = Φ L Φ 2 Φ 1 Let d 0 = d and Φ l : R d l 1 R d l, l = 1,..., L 6
10 Neural Nets Basic idea of neural networks: functions obtained by composition. Φ = Φ L Φ 2 Φ 1 Let d 0 = d and Φ l : R d l 1 R d l, l = 1,..., L and in particular Φ l = σ W l, l = 1,..., L 6
11 Neural Nets Basic idea of neural networks: functions obtained by composition. Φ = Φ L Φ 2 Φ 1 Let d 0 = d and Φ l : R d l 1 R d l, l = 1,..., L and in particular where linear/affine Φ l = σ W l, l = 1,..., L W l : R d l 1 R d l, l = 1,..., L
12 Neural Nets Basic idea of neural networks: functions obtained by composition. Φ = Φ L Φ 2 Φ 1 Let d 0 = d and Φ l : R d l 1 R d l, l = 1,..., L and in particular Φ l = σ W l, l = 1,..., L where W l : R d l 1 R d l, l = 1,..., L linear/affine and σ is a non linear map acting component-wise σ : R R.
13 Deep neural nets f(x) = w, Φ L (x), Φ L = Φ L Φ 1 }{{} compositional representation Φ 1 = σ W 1... Φ L = σ W L
14 Deep neural nets f(x) = w, Φ L (x), Φ L = Φ L Φ 1 }{{} compositional representation Φ 1 = σ W 1... Φ L = σ W L ERM 1 min w,(w j) j n n (y i w, Φ L (x i ) ) 2 i=1
15 Neural networks terminology Φ L (x) = σ(w L... σ(w 2 σ(w 1 x))) Each intermediate representation corresponds to a (hidden) layer The dimensionalities (d l ) l correspond to the number of hidden units the non linearity is called activation function
16 Neural networks illustrated y W x Each neuron compute an inner product based on a column of a weight matrix W The non-linearity σ is the neuron activation function. 9
17 Activation functions logistic function s(α) = (1 + e α ) 1, α R, hyperbolic tangent s(α) = (e α e α )/(e α + e α ), α R, hinge s(α) = s +, α R. Note: -If the activation is chosen to be linear the architecture is equivalent to one layer.
18 Neural networks function spaces Consider the non linear space of functions of the form f w,(wl ) l : X R, f w,(wl ) l (x) = w, Φ (Wl ) l (x), Φ (Wl ) l = σ(w L... σ(w 2 σ(w 1 x))) Very little structure, but we can : train by gradient descent (next) get (some) approximation/statistical guarantees (later) 11
19 One layer neural networks Consider only one hidden layer: f w,w (x) = w, σ(w x) = typically optimized given supervised data 1 n u w j σ ( W j, x ) j=1 n (y i f w,w (x i )) 2, i=1 possibly with norm constraints on the weights (regularization).
20 One layer neural networks Consider only one hidden layer: f w,w (x) = w, σ(w x) = typically optimized given supervised data 1 n u w j σ ( W j, x ) j=1 n (y i f w,w (x i )) 2, i=1 possibly with norm constraints on the weights (regularization). Problem is non-convex! (maybe possibly smooth depending on σ)
21 Back-propagation Empirical risk minimization, min Ê(w, W ), Ê(w, W ) = w,w n (y i f (w,w ) (x i ))) 2. i=1 An approximate minimizer is computed via the following update rules w t+1 j = w t j γ t Ê w j (w t, W t ) W t+1 j,k = W t j,k γ t Ê W j,k (w t+1, W t ) where the step-size (γ t ) t is often called learning rate. 13
22 Back-propagation & chain rule Direct computations show that: Ê w j (w, W ) = 2 Ê W j,k (w, W ) = 2 n (y i f (w,w ) (x i ))) h j,i }{{} i=1 j,i n (y i f (w,w ) (x i )))w j σ ( w j, x ) }{{} i=1 η i,k k xi Back-prop equations: η i,k = j,i c j s ( w j, x ) Using above equations, the updates are performed in two steps: Forward pass compute function values keeping weights fixed, Backward pass compute errors and propagate Hence the weights are updated. 14
23 Few remarks Multiple layers can be analogously considered Batch gradients descent can be replaced by stochastic gradient. Faster iterations are available, e.g. variable metric/accelerated gradient.... Online update rules are potentially biologically plausible Hebbian learning rules describing neuron plasticity.
24 Computations n min Ê(w, W ), Ê(w, W ) = (y i f (w,w ) (x i ))) 2. w,w Convex vs. Nonconvex i=1 Optimization In practice, no access to f u but only to approximate minimizers. Unique optimum: global/local. Multiple local optima Non-convex problem Convergence of back-prop to a reasonable local minimum can depend heavily on the initialization. Empirically: the more the layers the easier to find good minima.
25 An older idea: pre-training and unsupervised learning Pre-training Use unsupervised training of each layer to initialize supervised training. Potential benefit of unlabeled data.
26 Auto-encoders x W x A neural network with one input layer, one output layer and one (or more) hidden layers connecting them. The output layer has equally many nodes as the input layer, It is trained to predict the input rather than some target output.
27 Auto-encoders (cont.) An auto encoder with one hidden layer of k units, can be seen as a representation-reconstruction pair: with F k = R k, k < d and Φ : X F k, Φ(x) = σ (W x), x X Ψ : F k X, Ψ(β) = σ (W β), β F k.
28 Auto-encoders & dictionary learning Φ(x) = σ (W x), Ψ(β) = σ (W β) The above formulation is closely related to dictionary learning. The weights can be seen as dictionary atoms. Reconstructive approaches have connections with so called energy models [LeCun et al.... ] Possible probabilistic/bayesian interpretations/variations (e.g. Boltzmann machine [Hinton et al.... ])
29 Stacked auto-encoders Multiple layers of auto-encoders can be stacked [Hinton et al 06]... (Φ 1 Ψ 1 ) (Φ 2 Ψ 2 ) (Φ l Ψ l ) }{{} Autoencoder... with the potential of obtaining richer representations. 21
30 Beyond reconstruction In many applications the connectivity of neural networks is limited in a specific way.
31 Beyond reconstruction In many applications the connectivity of neural networks is limited in a specific way. Weights in the first few layers have smaller support and are repeated.
32 Beyond reconstruction In many applications the connectivity of neural networks is limited in a specific way. Weights in the first few layers have smaller support and are repeated. Subsampling (pooling) is interleaved with standard neural nets computations.
33 Beyond reconstruction In many applications the connectivity of neural networks is limited in a specific way. Weights in the first few layers have smaller support and are repeated. Subsampling (pooling) is interleaved with standard neural nets computations. The obtained architectures are called convolutional neural networks.
34 Convolutional layers Consider the composite representation Φ : X F, Φ = σ W, with representation by filtering W : X F, representation by pooling σ : F F. 23
35 Convolutional layers Consider the composite representation Φ : X F, Φ = σ W, with representation by filtering W : X F, representation by pooling σ : F F. Note: σ, W are more complex than in standard NN. 23
36 The matrix W is made of blocks Convolution and filtering W = (G t1,..., G tt ) each block is a convolution matrix obtained transforming a vector (template) t, e.g. G t = (g 1 t,..., g N t). e.g. t 1 t 2 t 3... t d t d t 1 t 2... t d 1 G t = t d 1 t d t 1... t d t 2 t 3 t 4... t 1 For all x X, W (x)(j, i) = g i t j, x. 24
37 Pooling The pooling map aggregates (pools) the values corresponding to the same transformed template g 1 t, x,..., ( g N t, x, and can be seen as a form of subsampling. 25
38 Pooling functions Given a template t, let β = (s( g 1 t, x ),..., s( g N t, x )). for some non-linearity s, e.g. s( ) = +.
39 Pooling functions Given a template t, let β = (s( g 1 t, x ),..., s( g N t, x )). for some non-linearity s, e.g. s( ) = +. Examples of pooling max pooling average pooling l p pooling max j=1,...,n βj, 1 N N β j, j=1 N β p = β j p j=1 1 p. 26
40 Why pooling? The intuition is that pooling can provide some form of robustness and even invariance to the transformations. Invariance & selectivity A good representation should be invariant to semantically irrelevant transformations. Yet, it should be discriminative with respect to relevant information (selective).
41 Basic computations: simple & complex cells (Hubel, Wiesel 62) Simple cells Complex cells x x, g 1 t,..., x, g N t x, g 1 t,..., x, g N t..., x, g N t g x, gt + 28
42 Basic computations: convolutional networks (Le Cun 88) Convolutional filters Subsampling/pooling x x, g 1 t,..., x, g N t x, g 1 t,..., x, g N t..., x, g N t g x, gt +
43 Deep convolutional networks Filtering Pooling Filtering Pooling Input First Layer Second Layer Classifier Output In practice: multiple convolution layers are stacked, 30
44 Deep convolutional networks Filtering Pooling Filtering Pooling Input First Layer Second Layer Classifier Output In practice: multiple convolution layers are stacked, pooling is not global, but over a subset of transformations (receptive field), 30
45 Deep convolutional networks Filtering Pooling Filtering Pooling Input First Layer Second Layer Classifier Output In practice: multiple convolution layers are stacked, pooling is not global, but over a subset of transformations (receptive field), the receptive fields size increases in higher layers.
46 A biological motivation Classification units Visual cortex The processing in DCN has analogies with computational neuroscience models of the information processing in the visual cortex see [Poggio et al.... ]. PIT/AIT V4/PIT V2/V4 V1/V2 Figure 2: Sketch of the Hmax hierarchical model of visual processing: Acronyms: V1, V2 and V4 correspond to primary, second and fourth visual areas, PIT and AIT to posterior and anterior inferotemporal areas, L.Rosasco, respectively RegML 2018
47 Theory Φ L (x) = σ(w L... σ(w 2 (σ(w 1 x))) No pooling: metric properties of networks with random weights connection with compressed sensing [Giryes et al. 15]
48 Theory Φ L (x) = σ(w L... σ(w 2 (σ(w 1 x))) No pooling: metric properties of networks with random weights connection with compressed sensing [Giryes et al. 15] Invariance x = gx Φ(x ) = Φ(x) [Anselmi et al. 12, R. Poggio 15, Mallat 12, Soatto, Chiuso 13] and covariance for multiple layers [Anselmi et al. 12].
49 Theory Φ L (x) = σ(w L... σ(w 2 (σ(w 1 x))) No pooling: metric properties of networks with random weights connection with compressed sensing [Giryes et al. 15] Invariance x = gx Φ(x ) = Φ(x) [Anselmi et al. 12, R. Poggio 15, Mallat 12, Soatto, Chiuso 13] and covariance for multiple layers [Anselmi et al. 12]. Selectivity/Maximal Invariance, i.e. injectivity modulo transformations Φ(x ) = Φ(x) x = gx [R. Poggio 15, Soatto, Chiuso 15]
50 Theory (cont.) Similarity preservation Φ(x ) Φ(x) min x gx??? g Stability to diffeomorphisms [Mallat, 12] Φ(x) Φ(d(x)) d x Reconstruction: connection to phase retrieval/one bit compressed sensing [Bruna et al 14].
51 This class Neural nets Autoencoders Convolutional neural nets
52 FINE
Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationBACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation
BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationDeep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning
Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional
More informationGeneralization and Properties of the Neural Response Jake Bouvrie, Tomaso Poggio, Lorenzo Rosasco, Steve Smale, and Andre Wibisono
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2010-051 CBCL-292 November 19, 2010 Generalization and Properties of the Neural Response Jake Bouvrie, Tomaso Poggio,
More informationDerived Distance: Beyond a model, towards a theory
Derived Distance: Beyond a model, towards a theory 9.520 April 23 2008 Jake Bouvrie work with Steve Smale, Tomaso Poggio, Andrea Caponnetto and Lorenzo Rosasco Reference: Smale, S., T. Poggio, A. Caponnetto,
More informationPlan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation
Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationNeural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationOslo Class 4 Early Stopping and Spectral Regularization
RegML2017@SIMULA Oslo Class 4 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT June 28, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationIntroduction to (Convolutional) Neural Networks
Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationNeural Networks and Deep Learning.
Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationTUM 2016 Class 3 Large scale learning by regularization
TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNeural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10
CS9840 Learning and Computer Vision Prof. Olga Veksler Lecture 0 Neural Networks Many slides are from Andrew NG, Yann LeCun, Geoffry Hinton, Abin - Roozgard Outline Short Intro Perceptron ( layer NN) Multilayer
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationDeep Learning Architectures and Algorithms
Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More informationUnsupervised Neural Nets
Unsupervised Neural Nets (and ICA) Lyle Ungar (with contributions from Quoc Le, Socher & Manning) Lyle Ungar, University of Pennsylvania Semi-Supervised Learning Hypothesis:%P(c x)%can%be%more%accurately%computed%using%
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More information4 3hrs. Stefano Rovetta Introduction to neural networks 20/23-Jul / 109
4 3hrs Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 1 / 109 Back to optimization Stefano Rovetta Introduction to neural networks 20/23-Jul-2016 2 / 109 Stochastic optimization Optimize
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationClass Learning Data Representations: beyond DeepLearning: the Magic Theory. Tomaso Poggio. Thursday, December 5, 13
Class 24-26 Learning Data Representations: beyond DeepLearning: the Magic Theory Tomaso Poggio Connection with the topic of learning theory 2 Notices of the American Mathematical Society (AMS), Vol. 50,
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationMaster Recherche IAC TC2: Apprentissage Statistique & Optimisation
Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationA Deep Representation for Invariance And Music Classification
arxiv:1404.0400v1 [cs.sd] 1 Apr 2014 CBMM Memo No. 002 March 17 2014 A Deep Representation for Invariance And Music Classification by Chiyuan Zhang, Georgios Evangelopoulos, Stephen Voinea, Lorenzo Rosasco,
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationAgenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT
versus 1 Agenda Deep Learning: Motivation Learning: Backpropagation Deep architectures I: Convolutional Neural Networks (CNN) Deep architectures II: Stacked Auto Encoders (SAE) Caffe Deep Learning Toolbox:
More information