Speech recognition. Lecture 14: Neural Networks. Andrew Senior December 12, Google NYC

Size: px
Start display at page:

Download "Speech recognition. Lecture 14: Neural Networks. Andrew Senior December 12, Google NYC"

Transcription

1 Andrew Senior 1 Speech recognition Lecture 14: Neural Networks Andrew Senior Google NYC December 12, 2013

2 Andrew Senior 2 1 Introduction to Neural networks 2 Neural networks for speech recognition Neural network features for speech recognition Hybrid neural networks History Variations 3 Language modelling

3 Andrew Senior 3 The perceptron Input x 1 Input x 2 Input x 3 Input x 4 Input x w 4 5 Output A perceptron is a linear classifier: f (x) = 1 if w.x > 0 (1) = 0 otherwise. (2) Add an extra always one input to provide an offset or bias. The weights w can be learned for a given task with the Perceptron Algorithm.

4 Andrew Senior 4 Perceptron algorithm (Rosenblatt, 1957) Adapt the weights w, example-by example: 1 Initialise the weights and the threshold. 2 For each example j in our training set D, perform the following steps over the input x j and desired output ŷ j : 3 1 Calculate the actual output: y j (t) = f [w(t) x j ] = f [w 0 (t) + w 1 (t)x j,1 + w 2 (t)x j,2 + + w n (t)x j,n ] 2 Update the weights: w i (t + 1) = w i (t) + α(ŷ j y j (t))x j,i, for all nodes 0 i n. 4 Repeat Step 2 until the iteration error 1 s s j [ŷ j y j (t)] is less than a user-specified error threshold γ, or a predetermined number of iterations have been completed.

5 Andrew Senior 5 Nonlinear perceptrons Introduce a nonlinearity: y i = σ( j w ij x j ) Each unit is a simple nonlinear function of a linear combination of its inputs Typically logistic sigmoid: or tanh: σ(z) = e z σ(z) = tanh z

6 Andrew Senior 6 Multilayer perceptrons Extend the network to multiple layers Now a hidden layer of nodes computes a function of the inputs, and output nodes compute a function of the hidden nodes activations. Input layer Hidden layer Output layer Input x 1 Input x 2 Input x 3 y 1 y 2 y 3 Input x 4

7 Andrew Senior 7 Cost function Such networks can be optimized ( trained ) to minimize a cost function (Loss function or objective function) that is a numerical score of the network s performance with respect to targets ŷ i (t). Squared Error L SE = 1 (y i (t) ŷ i (t)) 2 2 t i This is a frame-based criterion, where t would ideally be across the entire space of decoding frames, but in practice is across the training set, and we measure it across a development set. Cross Entropy L CE = t ŷ i (t) log y i (t) i

8 Andrew Senior 8 Targets We need targets / labels ŷ i (t), for each frame usually provided by forced-alignment. (Lecture 8) Viterbi alignment gives one target class for each frame t. Baum-Welch soft-alignments gives a target distribution across ŷ i (t) for each t

9 Andrew Senior 9 Softmax output layer If the output units are logistic, then they are suitable for representing Multivariate Bernouilli random variables P(ŷ i = 1 x) To model a multi-class categorical distribution then we use the Softmax (?) y i = P(c i x) = exp(z i) j exp(z j) which is normalized to sum to one. This reduces to the logistic sigmoid when there are two output classes

10 Andrew Senior 10 Gradient descent To minimize the loss L, compute a gradient L w update it using simple gradient descent: for each parameter w and w = w η L w η is a learning rate which is chosen (typically by cross-validation) but may be set automatically. We can apply the chain rule to compute L w for parameters deep in the network.

11 Andrew Senior 11 Back-propagation 0 Derivatives of the loss functions: L CE = ŷ j (t) log y j (t) (3) y i y i j = ŷi(t) y i (t) (4) L SE = 1 (y j (t) ŷ j (t)) 2 y i y i 2 (5) i = y j (t) ŷ j (t) (6)

12 Andrew Senior 12 Back-propagation 0 Derivative of Logistic activation function: Because y i z i = = 1 z i (7) 1 + e z i e z i (1 + e z (8) i ) 2 = y i (1 y i ) (9) (10) 1 y = (1 + e z ) 1 (1 + e z ) (11) (12)

13 Andrew Senior 13 Back-propagation 0 Derivative of Softmax activation function: y k = e zk z i z i j ez j (13) = δ ik( j ez j )e z k e z k e z i ( j ez j ) 2 (14) = ez i j ez j ( j ez j )δ ik e z k j ez j (15) = y i (δ ik y k ) (16)

14 Back-propagation I For a weight in the final layer, by the chain rule for one example: L = L y k z i (17) w ij y k z i w ij k For Softmax & L CE L CE = ŷk y k y k [Outer gradient.] (18) y k = y i (δ ik y k ) [Derivative of softmax activation function.] z i (19) z i = x j w ij (20) L = ŷ k y k (δ ik y i )x j (21) w ij y k k = x j ŷ k (δ ik y i ) (22) k = x j (ŷ i y i ) (23) Andrew Senior <andrewsenior@google.com> 14

15 Andrew Senior 15 Back-propagation II Back-propagating (Rumelhart et al., 1986) to an earlier hidden layer with weights w jk, activations x j and inputs x k : x j = σ(z j ) = σ( k w jk x k ) (24) First find the gradient w.r.t. the hidden layer activation x j : L x j = i L y i y i z i z i x j (25) z i x j = w ij (26) i.e. we pass the vector of gradients L y i back through the nonlinearity and then project back with the layer s output back through the weight matrix W.

16 Andrew Senior 16 Back-propagation III L w jk x j z j = L x j x j z j z j w jk [Same form as eqn. 17.] (27) = x j (1 x j ) [Derivative of sigmoid activation function.] (28) z j w jk = x k (29) Continue to arbitrary depth: compute activations gradients and then weight gradients for each layer.

17 Andrew Senior 17 Stochastic Gradient Descent Since L is typically defined on the entire training-set, it takes a long time to compute it and its derivatives (summed across all exemplars), and it s only an approximation to the true loss on the theoretical set of all utterances. We can compute a noisy estimate of L w on a small subset of the training set, and make a Stochastic Gradient Descent (SGD) update very quickly. In the limit, we could update on every frame, but a useful compromise is to use a minibatch of around 200 frames.

18 Andrew Senior 18 Second-order optimization Compute the second derivative and optimize a second-order approximation to the error- surface. More computation per step. Requires less-noisy estimates of gradient / curvature (bigger batches). Each step is more effective. Variants: Newton-Raphson Quickprop LBFGS Hessian-free Conjugate gradient

19 Andrew Senior 19 1 Introduction to Neural networks 2 Neural networks for speech recognition Neural network features for speech recognition Hybrid neural networks History Variations 3 Language modelling

20 Andrew Senior 20 Two main paradigms for neural networks for speech Use neural networks to compute nonlinear feature representations. Bottleneck or tandem features (Hermansky et al., 2000) Low-dimensional representation is modelled conventionally with GMMs. Allows all the GMM machinery and tricks to be exploited. Use neural networks to estimate CD state probabilities.

21 Andrew Senior 21 Outline 1 Introduction to Neural networks 2 Neural networks for speech recognition Neural network features for speech recognition Hybrid neural networks History Variations 3 Language modelling

22 Andrew Senior 22 Neural network features Train a neural network to discriminate classes. Use output or a low-dimensional bottleneck layer representation as features. x 1 x 2 x 3 x 4 Input layer Hidden layers Bottleneck layer Output layer y 1 y 2 y 3 y 4 y 5

23 Andrew Senior 23 Neural network features TRAP: Concatenate PLP-HLDA features and NN features. Bottleneck outperforms posterior features (Grezl et al., 2007) Generally DNN features + GMMs reach about the same performance as hybrid DNN-GMM systems, but are much more complex.

24 Andrew Senior 24 Outline 1 Introduction to Neural networks 2 Neural networks for speech recognition Neural network features for speech recognition Hybrid neural networks History Variations 3 Language modelling

25 Andrew Senior 25 Hybrid networks: Decoding (recap) Recall (Lecture 1) that we choose the decoder output as the optimal word sequence ŵ for an observation sequence o: and ŵ = arg max Pr[w o] (30) w Σ = arg max Pr[o w]pr[w] (31) w Σ Pr(o w) = d,c,p Pr(o c)pr(c p)pr(p w) (32) Where p is the phone sequence and c is the CD state sequence.

26 Empirically (by cross validation) we actually find better results with a prior smoothing term α 0.8. Andrew Senior <andrewsenior@google.com> 26 Hybrid Neural network decoding Now we model P(o c) with a Neural network instead of a Gaussian Mixture model. Everything else stays the same. P(o c) = t P(o t c t ) (33) P(o t c t ) = P(c t o t )P(o t ) P(c t ) P(c t o t ) P(c t ) For observations o t at time t and a CD state sequence c t. We can ignore P(o t ) since it is the same for all decoding paths. The last term is called the scaled posterior : (34) (35) log P(o t c t ) = log P(c t o t ) α log P(c t ) (36)

27 Andrew Senior 27 Input features Neural networks can handle high-dimensional features with correlated features. Use (26) stacked filterbank inputs. (40-dimensional mel-spaced filterbanks) Example filters learned in the first layer:

28 Andrew Senior 28 Outline 1 Introduction to Neural networks 2 Neural networks for speech recognition Neural network features for speech recognition Hybrid neural networks History Variations 3 Language modelling

29 Andrew Senior 29 Rough History Multi-layer perceptron 1986 Speech recognition with neural networks Superseded by GMMs Neural network features 2002 Deep networks 2006 (Hinton, 2002) Deep networks for speech recognition Good results on TIMIT (Mohamed et al., 2009) Results on large vocabulary systems 2010 (Dahl et al., 2011) Google launches DNN ASR product 2011 Dominant paradigm for ASR 2012 (Hinton et al., 2012)

30 Andrew Senior 30 What is new? Fast GPU-based training (distributed CPU-based training is even faster) Pretraining (turns out not to be important) Deeper networks - enabled by faster training Large datasets Machine learning understanding

31 Andrew Senior 31 State of the art Google s current speech production systems 26 frames of 40-dimensional filterbank inputs 8 hidden layers of 2560 hidden units. Rectified Linear nonlinearity (Zeiler et al., 2013) 14,000 outputs 85 million parameters, trained on 2,000 hours of speech data. Running quantized with 8 bit integer weights. On Android phones we run a smaller model with 2.7M parameters.

32 Andrew Senior 32 Outline 1 Introduction to Neural networks 2 Neural networks for speech recognition Neural network features for speech recognition Hybrid neural networks History Variations 3 Language modelling

33 Andrew Senior 33 Sequence training for neural networks Neural networks are trained with a frame-level discriminative criterion (cross-entropy L CE ) Far from the minimum WER criterion we care about. GMM-HMMs trained with sequence-level discriminative training (MMI, bmmi (Povey et al., 2008), MPE, MBR etc.) outperform Maximum-Likelihood models. Kingsbury (2009) shows how to compute a gradient for back-propagation from the numerator and denominator statistics for truth / alternative hypothesis lattices. Given this outer gradient we use back-propagation to compute parameter updates for the neural network.

34 Andrew Senior 34 Pretraining If we have a small amount of supervised data, we can use unlabelled data to get the parameters into reasonable places to model the distribution of the inputs, without knowing the labels. Pretraining is done layer-by layer so is faster than supervised training. There are several methods Contrastive divergence RBM training; Autoencoder; Greedy-layerwise [actually supervised] but none seems necessary for large speech corpora.

35 Andrew Senior 35 Alternative nonlinearities 1 Sigmoid σ(z) = 1 + e z (37) Tanh σ(z) = tanh(z) (38) ReLU σ(z) = max(z, 0) (39) z Softsign σ(z) = 1+ z (40) Softplus σ(z) = log(1 + e z ) (41)

36 Andrew Senior 36 Alternative nonlinearities Note: ReLU gives sparse activations. ReLU Gradient is zero x < 0, one x > 0, so propagated gradients don t attenuate as much as in other nonlinearities. ReLU & softsign are unbounded. Gradients asymptote differently for other nonlinearities.

37 Andrew Senior 37 Neural network variants Many variations Convolutional neural networks (Abdel-Hamid et al., 2012) Convolve a filter with the input weight sharing saves parameters and gives invariance to frequency shifts. Recurrent neural networks Take one frame at a time but store a history of the previous frames, so could theoretically model long-term context. Long-Short Term Memory (Graves et al., 2013) A successful specialization of the recurrent neural network. With complex memory cells.

38 Andrew Senior 38 Recurrent neural networks A recurrent neural network has additional output nodes which are copied back to its inputs with a time delay. (Robinson et al., 1993) Training is with Back-Propagation Through Time. x 1 x 2 x 3 x 4 y 1 y 2 y 3 y 4 y 5 r 1 r 2 r 3 r 4 r 5 r 6

39 Andrew Senior 39 Neural network language modelling Model P(w n w n 1, w n 2, w n 3...) with a neural network instead of with an n-gram (pure frequency counts with back-off). Simply train a softmax for each w n, and use an input representation of w n 1, w n 2, w n 3,.... Even more effectively, train a recurrent neural network. (Mikolov et al., 2010) Leads to word-embeddings - a linear projection of sparse word identities (O(millions)) into a lower-dimensional (O(hundreds)) dense vector space. Easy to add other features (class, part-of-speech) Best performance when combined with an n-gram. Hard to do real-time decoding, though much of the performance can be retained when knowledge is extracted and stored in a WFST. (Arisoy et al., 2013)

40 Andrew Senior 40 Bibliography I Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., and Penn, G. (2012). Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In ICASSP, pages IEEE. Arisoy, E., Chen, S. F., Ramabhadran, B., and Sethy, A. (2013). Converting neural network language models into back-off language models for efficient decoding in automatic speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages IEEE. Dahl, G., Yu, D., Li, D., and Acero, A. (2011). Large vocabulary continuous speech recognition with context-dependent dbn-hmms. In ICASSP. Graves, A., Jaitly, N., and Mohamed, A. (2013). Hybrid speech recognition with deep bidirectional LSTM. In ASRU. Grezl, Karafiat, and Cernocky (2007). Neural network topologies and bottleneck features. Speech Recognition. Hermansky, H., Ellis, D., and Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In ICASSP. Hinton, G., Deng, L., Yu, D., Dahl, G., A., M., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29: Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation. Kingsbury, B. (2009). Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In ICASSP, pages Mikolov, T., Karafiát, M., Burget, L., Cernocky, J., and Khudanpur, S. (2010). Recurrent neural network based language model. In Interspeech. Mohamed, A., Dahl, G., and Hinton, G. (2009). Deep belief networks for phone recognition. In NIPS. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., and Visweswariah, K. (2008). Boosted MMI for model and feature-space discriminative training. In Proc. ICASSP. Robinson, A. J., Almeida, L., m. Boite, J., Bourlard, H., Fallside, F., Hochberg, M., Kershaw, D., Kohn, P., Konig, Y., Morgan, N., Neto, J. P., Renals, S., Saerens, M., and Wooters, C. (1993). A neural network based, speaker independent, large vocabulary, continuous speech recognition system: The Wernicke project. In PROC. EUROSPEECH 93, pages Rosenblatt, F. (1957). The perceptron a perceiving and recognizing automaton. Technical Report , Cornell Aeronautical Laboratory. Rumelhart, D. E., Hinton, G., and Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323(6088): Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., and Hinton, G. (2013). On rectified linear units for speech processing. In ICASSP.

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks

Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Haşim Sak, Andrew Senior, Oriol Vinyals, Georg Heigold, Erik McDermott, Rajat Monga, Mark Mao, Françoise Beaufays

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Very Deep Convolutional Neural Networks for LVCSR

Very Deep Convolutional Neural Networks for LVCSR INTERSPEECH 2015 Very Deep Convolutional Neural Networks for LVCSR Mengxiao Bi, Yanmin Qian, Kai Yu Key Lab. of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering SpeechLab,

More information

Unfolded Recurrent Neural Networks for Speech Recognition

Unfolded Recurrent Neural Networks for Speech Recognition INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 8: Tied state HMMs + DNNs in ASR Instructor: Preethi Jyothi Aug 17, 2017 Final Project Landscape Voice conversion using GANs Musical Note Extraction Keystroke

More information

Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition

Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition INTERSPEECH 2014 Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition Zhen Huang 1, Jinyu Li 2, Chao Weng 1, Chin-Hui Lee

More information

SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ. Yongqiang Wang, Jinyu Li and Yifan Gong

SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ. Yongqiang Wang, Jinyu Li and Yifan Gong SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ Yongqiang Wang, Jinyu Li and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 {erw, jinyli,

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

] Automatic Speech Recognition (CS753)

] Automatic Speech Recognition (CS753) ] Automatic Speech Recognition (CS753) Lecture 17: Discriminative Training for HMMs Instructor: Preethi Jyothi Sep 28, 2017 Discriminative Training Recall: MLE for HMMs Maximum likelihood estimation (MLE)

More information

MULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING. Liang Lu and Steve Renals

MULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING. Liang Lu and Steve Renals MULTI-FRAME FACTORISATION FOR LONG-SPAN ACOUSTIC MODELLING Liang Lu and Steve Renals Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK {liang.lu, s.renals}@ed.ac.uk ABSTRACT

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Highway-LSTM and Recurrent Highway Networks for Speech Recognition Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara N. Sainath Google Inc., New York, NY, USA {golan, tsainath}@google.com Abstract Recently, very deep networks, with

More information

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Speaker recognition by means of Deep Belief Networks

Speaker recognition by means of Deep Belief Networks Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Deep Neural Network and Its Application in Speech Recognition

Deep Neural Network and Its Application in Speech Recognition Deep Learning What, Why, and How A Tutorial Given at NLP&CC 2013 Deep Neural Network and Its Application in Speech Recognition Dong Yu Microsoft Research Thanks to my collaborators: Li Deng, Frank Seide,

More information

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition INTERSPEECH 017 August 0 4, 017, Stockholm, Sweden Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition Jaeyoung Kim 1, Mostafa El-Khamy 1, Jungwon Lee 1 1 Samsung Semiconductor,

More information

SPEECH recognition systems based on hidden Markov

SPEECH recognition systems based on hidden Markov IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. X, 2014 1 Probabilistic Linear Discriminant Analysis for Acoustic Modelling Liang Lu, Member, IEEE and Steve Renals, Fellow, IEEE Abstract In this letter, we

More information

arxiv: v1 [cs.cl] 23 Sep 2013

arxiv: v1 [cs.cl] 23 Sep 2013 Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition Xin Zheng 1,2, Zhiyong Wu 1,2,3, Helen Meng 1,3, Weifeng Li 1, Lianhong Cai 1,2 arxiv:1309.6176v1 [cs.cl] 23 Sep

More information

A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property

A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property Jianshu Chen Department of Electrical Engineering University of California Los Angeles, CA 90034, USA

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries

Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS. Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton

SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS. Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton SPEECH RECOGITIO WITH DEEP RECURRET EURAL ETWORKS Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton Department of Computer Science, University of Toronto ABSTRACT Recurrent neural networks (Rs) are

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Feature-space Speaker Adaptation for Probabilistic Linear Discriminant Analysis Acoustic Models

Feature-space Speaker Adaptation for Probabilistic Linear Discriminant Analysis Acoustic Models Feature-space Speaker Adaptation for Probabilistic Linear Discriminant Analysis Acoustic Models Liang Lu, Steve Renals Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK {liang.lu,

More information

arxiv: v2 [cs.ne] 7 Apr 2015

arxiv: v2 [cs.ne] 7 Apr 2015 A Simple Way to Initialize Recurrent Networks of Rectified Linear Units arxiv:154.941v2 [cs.ne] 7 Apr 215 Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton Google Abstract Learning long term dependencies

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Deep Learning & Neural Networks Lecture 4

Deep Learning & Neural Networks Lecture 4 Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

4F10: Deep Learning. Mark Gales. Michaelmas 2016

4F10: Deep Learning. Mark Gales. Michaelmas 2016 4F10: Deep Learning Mark Gales Michaelmas 2016 What is Deep Learning? From Wikipedia: Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

arxiv: v4 [cs.cl] 5 Jun 2017

arxiv: v4 [cs.cl] 5 Jun 2017 Multitask Learning with CTC and Segmental CRF for Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, and Noah A Smith Toyota Technological Institute at Chicago, USA School of Computer Science, Carnegie

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

ACCELERATING RECURRENT NEURAL NETWORK TRAINING VIA TWO STAGE CLASSES AND PARALLELIZATION

ACCELERATING RECURRENT NEURAL NETWORK TRAINING VIA TWO STAGE CLASSES AND PARALLELIZATION ACCELERATING RECURRENT NEURAL NETWORK TRAINING VIA TWO STAGE CLASSES AND PARALLELIZATION Zhiheng Huang, Geoffrey Zweig, Michael Levit, Benoit Dumoulin, Barlas Oguz and Shawn Chang Speech at Microsoft,

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Using Deep Belief Networks for Vector-Based Speaker Recognition

Using Deep Belief Networks for Vector-Based Speaker Recognition INTERSPEECH 2014 Using Deep Belief Networks for Vector-Based Speaker Recognition W. M. Campbell MIT Lincoln Laboratory, Lexington, MA, USA wcampbell@ll.mit.edu Abstract Deep belief networks (DBNs) have

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information