Deep Neural Networks
|
|
- Mabel Kelley
- 5 years ago
- Views:
Transcription
1 Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH VT / 45
2 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi Layer Perceptron Error Backpropagation Hybrid HMM-MLP Deep Learning (Initialization) Deep Neural Networks Restricted Boltzmann Machines Deep Belief Networks 2 / 45
3 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi Layer Perceptron Error Backpropagation Hybrid HMM-MLP Deep Learning (Initialization) Deep Neural Networks Restricted Boltzmann Machines Deep Belief Networks 3 / 45
4 State-to-Output Probability Model... z n 1 z n z n x n 1 x n x n+1... is responsible for the discriminative power of the whole model GMMs used because easy to train and adapt discriminative training can improve results 4 / 45
5 State-to-Output Probability Model... z n 1 z n z n x n 1 x n x n+1... is responsible for the discriminative power of the whole model Alternatives: artificial neural networks (ANNs) deep neural networks (DNNs) support vector machines (SVMs) not used for ASR 4 / 45
6 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi Layer Perceptron Error Backpropagation Hybrid HMM-MLP Deep Learning (Initialization) Deep Neural Networks Restricted Boltzmann Machines Deep Belief Networks 5 / 45
7 Perceptron Known since the 1950 s [4] Input Transfer layer Sum Function x 0 Output x 1 x 2 x 3 w 0 w 1 w 2 w 3 w 4 f y x 4 [4] F. Rosenblatt. The perceptron: A perceiving and recognizing automaton. Tech. rep Cornell Aeronautical Laboratory, / 45
8 Perceptron input/output where y = f ( b + i w i x i ) f (z) = e z sigmoid f (z) = ez e z e z + e z hyperbolic tangent f (z) = max(0, z) rectified linear unit Equivalent to logistic regression (b = w 0 x 0 bias) 7 / 45
9 Perceptron: Linear Classification Learning adjust weights to correct errors y 2 a y 1 y 1 8 / 45
10 Multi-layer Perceptron [3] Output y 1 y 2 y 3 Hidden h 0 h 1 h 2 h 3 h 4 h 5 Input x 0 x 1 x 2 x 3 x 4 [3] F. Rosenblatt. Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Tech. rep. DTIC Document, / 45
11 Universal Approximation Theorem First proposed by Gybenko [1] one single hidden layer and finite but appropriate number of neurons can approximate any function in R N with mild constraints [1] G. Gybenko. Approximation by superposition of sigmoidal functions. In: Mathematics of Control, Signals and Systems 2.4 (1989), pp / 45
12 Multi-layer Perceptron: Training Backpropagation algorithm [5] output unit hidden layer bias unit input units x 0 x 1 x 2 x d [5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Tech. rep. DTIC Document, / 45
13 Multi-layer Perceptron: Training Backpropagation algorithm [5] E output unit hidden layer bias unit input units x 0 x 1 x 2 x d [5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Tech. rep. DTIC Document, / 45
14 Learning Criteria Ideally minimise Expected Loss: J EL = E [ J(W, B, o, y) ] = J(W, B, o, y)p(o)do where o = features, y = labels but we do not know p(o) Use empirical learning criteria instead: Mean Square Error (MSE) Cross Entropy (CE) o 12 / 45
15 Mean Square Error Criterion J MSE = 1 M M J MSE (W, B, o m, y m ) m=1 J MSE (W, B, o m, y m ) = 1 v L y 2 2 = 1 ( v L y ) T ( v L y ) 2 13 / 45
16 Cross Entropy Criterion J CE = 1 M M J CE (W, B, o m, y m ) m=1 J CE (W, B, o m, y m ) = C i=1 y i log v L i Equivalent to minimising Kullback-Leibler divergence (KLD) 14 / 45
17 Update rules Wt+1 l Wt l ɛ Wt l bt+1 l bt l ɛ bt l To compute W l t and b l t we need the gradient of the criterion function. Key trick: chain rule of gradients f (g(x)): f x = f g g x 15 / 45
18 Backpropagation: Properties weights only depend on neighbouring variables algorithm finds local optimum sensitive to initialisation 16 / 45
19 Practical Issues preprocessing: Cepstral Mean Normalisation initialisation: random (symmetry breaking), linear range of activation function regularisation (weight decay, dropout) batch size selection sample randomisation momentum learning rate and stopping criterion 17 / 45
20 Output Layer Regression tasks: Linear layer v L = z L = W L v L 1 + b L Classification tasks: Softmax layer v L i = softmax i (z L ) = e zl i C j=1 ezl j 18 / 45
21 Probabilistic Interpretation 1. vi L [0, 1] i 2. C j=1 v i L = 1 Output activations are posterior probabilities of the classes given the observations v L i In speech: P(state sounds) = P(i o) 19 / 45
22 Hybrid HMM+Multi Layer Perceptron Figure from Yu and Deng 20 / 45
23 Combining probabilities HMMs use likelihoods P(sound state) MLPs and DNNs estimate posteriors P(state sound) We can combine with Bayes: P(sound state) = P(state sound)p(sound) P(state) P(state) can be estimated from the training set P(sound) is constant and can be ignored Use scaled likelihoods: P(sound state) = P(state sound) P(state) 21 / 45
24 Time Dependent and Recurrent ANNs 22 / 45
25 ANNs in ASR: Advantages discriminative in nature powerful time model: Time-Delayed Neural Networks (TDNNs) Recurrent Neural Networks (RNNs) 23 / 45
26 ANNs in ASR: Disadvantages training requires state level annotations (no EM available) usually annotations obtained with forced alignment not easy to adapt 24 / 45
27 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi Layer Perceptron Error Backpropagation Hybrid HMM-MLP Deep Learning (Initialization) Deep Neural Networks Restricted Boltzmann Machines Deep Belief Networks 25 / 45
28 Deep Neural Network Output y 1 y 2 y 3 Hidden 4 h 30 h 31 h 32 h 33 h 34 h 35 Hidden 3 h 20 h 21 h 22 h 23 h 24 h 25 Hidden 2 h 10 h 11 h 12 h 13 h 14 h 15 Input x 0 x 1 x 2 x 3 x 4 26 / 45
29 DNN: Motivation depth abstraction good initialisation (see later) fast computers, large datasets 27 / 45
30 DNN and MLPs no conceptual difference from MLPs Backpropagation alone not powerful enough local minima vanishing gradients 28 / 45
31 Deep Learning for Acoustic Modelling most promising technique at the moment pioneered by Geoffry Hinton (Univ. Toronto) most of the large companies are using it (Microsoft, Google, Nuance, IBM) unifies properties of generative and discriminative models 29 / 45
32 Deep Learning: Idea discriminative stuff observation label
33 Deep Learning: Idea discriminative stuff observation label
34 Deep Learning: Idea discriminative stuff generative stuff observation label observation label
35 Deep Learning: Idea discriminative stuff generative stuff observation label observation label 30 / 45
36 Deep Learning: Idea discriminative stuff generative stuff observation label observation label... but HMMs with Gaussian Mixure Models are also generative: why is this better? 30 / 45
37 Deep Learning: Idea #2 1. initialise DNN with Restricted Boltzmann Machines (RBM) that can be trained unsupervised 2. use fast learning procedure (Hinton) 3. use ridiculous amounts of unlabelled (cheap) data to train a ridiculous number of parameters in an unsupervised fashion 4. at the end, use small amounts of labelled (expensive) data and backpropagation to learn the labels 31 / 45
38 Restricted Boltzmann Machines (RBMs) First called Harmonium [6] binary nodes: Bernoulli distribution continuous nodes: Gaussian-Benoulli [6] P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. In: Department of Computer Science, University of Colorado, Boulder, Chap / 45
39 Restricted Boltzmann Machines (RBMs) Energy (Benoulli): E(v, h) = a T v b T h h T Wv Energy (Gaussian-Benoulli): E(v, h) = 1 2 (v a)t (v a) b T h h T Wv 33 / 45
40 RBM: Probabilistic Interpretation P(v, h) = e E(v,h) v,h e E(v,h) Posteriors (conditional independence): P(h v) = = i P(h i v) and P(v h) = = i P(v i h) 34 / 45
41 Binary Units: Cond Prob Posterior equals sigmoid function!! P(h i = 1 v) = = e (b i1+1w i, v) e (b i1+1w i, v) + e (b i0+0w i, v) e (b i1+1w i, v) e (b i1+1w i, v) + 1 = σ(b i 1 + 1W i, v) Same as Multi Layer Perceptron (viable for initialisation!) 35 / 45
42 Gaussian Units: Cond Prob with P(v h) = N (v; µ, Σ) µ = W T h + a Σ = I 36 / 45
43 RBM Training Stochastic Gradient Descend (minimise the negative log likelihood) J NLL (W, a, b, v) = log P(v) = F (v)+log v F (v) e where ( ) F (v) = log e E(v,h) is the free energy of the system. BUT: the gradient can not be computed exactly h 37 / 45
44 RBM Gradient J NLL (W, a, b, v) θ = F (v) θ ṽ F (ṽ) p(ṽ) θ first term increases prob of training data second term decreases prob density defined by the model 38 / 45
45 RBM Stochastic Gradient The general form is: [ E(v, h) θ J NLL (W, a, b, v) = θ data ] E(v, h) θ model Example: visible layer wij J NLL (W, a, b, v) = [ v i h j data v i h j model ] 39 / 45
46 Gibbs Sampling v i h j model computed with sampling Sample joint distribution of N variables, one at a time: P(X i X i ) where X i are all the other variables BUT: it takes exponential time to compute exactly 40 / 45
47 Contrastive Divergence Two tricks: 1. initialise the chain with a training sample 2. do not wait for convergence 41 / 45
48 RBMs and Deep Belief Networks Figure from Yu and Deng 42 / 45
49 Deep Belief Networks: Training Yee-Whye Teh (one of Hinton s students) observed that DBNs can be trained greedily for each layer: 1. train a RBM unsupervised 2. excite the network with training data to produce outputs 3. use the outputs to train next RBM 43 / 45
50 Final Step: Supervised Training Output y 1 y 2 y 3 Hidden 4 h 30 h 31 h 32 h 33 h 34 h 35 Hidden 3 h 20 h 21 h 22 h 23 h 24 h 25 Hidden 2 h 10 h 11 h 12 h 13 h 14 h 15 Input x 0 x 1 x 2 x 3 x 4 44 / 45
51 Deep Learning: Performance state-of-the-art on most ASR tasks Made people from University of Toronto, Microsoft, Google and IBM write a paper together [2] experiments with learning the features from speech signal Yu and Deng s book has many examples [2] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition. In: IEEE Signal Processing Magazine (2012) 45 / 45
Speaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationSpeech recognition. Lecture 14: Neural Networks. Andrew Senior December 12, Google NYC
Andrew Senior 1 Speech recognition Lecture 14: Neural Networks Andrew Senior Google NYC December 12, 2013 Andrew Senior 2 1
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationarxiv: v1 [cs.cl] 23 Sep 2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition Xin Zheng 1,2, Zhiyong Wu 1,2,3, Helen Meng 1,3, Weifeng Li 1, Lianhong Cai 1,2 arxiv:1309.6176v1 [cs.cl] 23 Sep
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationAn efficient way to learn deep generative models
An efficient way to learn deep generative models Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto Joint work with: Ruslan Salakhutdinov, Yee-Whye
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationDeep Learning & Neural Networks Lecture 2
Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationDeep Boltzmann Machines
Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationRobust Classification using Boltzmann machines by Vasileios Vasilakakis
Robust Classification using Boltzmann machines by Vasileios Vasilakakis The scope of this report is to propose an architecture of Boltzmann machines that could be used in the context of classification,
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationMLPR: Logistic Regression and Neural Networks
MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer
More informationOutline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.
Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More information10. Artificial Neural Networks
Foundations of Machine Learning CentraleSupélec Fall 217 1. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationDeep Recurrent Neural Networks
Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationA Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning
A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning Norbert Ayine Agana Advisor: Abdollah Homaifar Autonomous Control & Information Technology Institute
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationLinks between Perceptrons, MLPs and SVMs
Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:
More informationNeural Networks and Deep Learning.
Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationBeyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition
INTERSPEECH 2014 Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition Zhen Huang 1, Jinyu Li 2, Chao Weng 1, Chin-Hui Lee
More informationNeural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1
More informationAutomatic Speech Recognition (CS753)
Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationVery Deep Convolutional Neural Networks for LVCSR
INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationHow to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto
1 How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto What is wrong with back-propagation? It requires labeled training data. (fixed) Almost
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationLecture 4 Towards Deep Learning
Lecture 4 Towards Deep Learning (January 30, 2015) Mu Zhu University of Waterloo Deep Network Fields Institute, Toronto, Canada 2015 by Mu Zhu 2 Boltzmann Distribution probability distribution for a complex
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationA Stochastic Model of Voice Control System
A Stochastic Model of Voice Control System R. T. Ilarionov Department of Computer Systems and Technology Technical University of Gabrovo, Bulgaria ilar@tugab.bg C. Jayne Computing Department Coventry University,
More information