Neural Architectures for Image, Language, and Speech Processing

Size: px
Start display at page:

Download "Neural Architectures for Image, Language, and Speech Processing"


1 Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, / 31

2 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Example: Bidirectional LSTM Network for POS Tagging Encoder-Decoder Models Example: RNN-Based Seq2Seq Bonus: Connectionist Temporal Classification (CTC) 2 / 31

3 What s a Neural Network? 3 / 31

4 What s a Neural Network? Just a composition of linear/nonlinear functions. ( ( ) ) f(x) = W (L) tanh W (L 1) tanh W (1) x 3 / 31

5 What s a Neural Network? Just a composition of linear/nonlinear functions. ( ( ) ) f(x) = W (L) tanh W (L 1) tanh W (1) x More like a paradigm, not a specific model. 1. Transform your input x f(x). 2. Define loss between f(x) and the target label y. 3. Train parameters by minimizing the loss. 3 / 31

6 You ve Already Seen Some Neural Networks... Log-linear model is a neural network with 0 hidden layer and a softmax output layer: p(y x) := exp([w x] y) y exp([w x] y ) = softmax y(w x) Get W by minimizing L(W ) = i log p(y i x i ). 4 / 31

7 You ve Already Seen Some Neural Networks... Log-linear model is a neural network with 0 hidden layer and a softmax output layer: p(y x) := exp([w x] y) y exp([w x] y ) = softmax y(w x) Get W by minimizing L(W ) = i log p(y i x i ). Linear regression is a neural network with 0 hidden layer and the identity output layer: f(x) := W x Get W by minimizing L(W ) = i (y i f i (x)) 2. 4 / 31

8 Feedforward Network Think: log-linear with extra transformation 5 / 31

9 Feedforward Network Think: log-linear with extra transformation With 1 hidden layer: h (1) = tanh(w (1) x) p(y x) = softmax y (h (1) ) 5 / 31

10 Feedforward Network Think: log-linear with extra transformation With 1 hidden layer: h (1) = tanh(w (1) x) p(y x) = softmax y (h (1) ) With 2 hidden layers: h (1) = tanh(w (1) x) h (2) = tanh(w (2) h (1) ) p(y x) = softmax y (h (2) ) Again, get parameters W (l) by minimizing i log p(y i x i ). 5 / 31

11 Feedforward Network Think: log-linear with extra transformation With 1 hidden layer: h (1) = tanh(w (1) x) p(y x) = softmax y (h (1) ) With 2 hidden layers: h (1) = tanh(w (1) x) h (2) = tanh(w (2) h (1) ) p(y x) = softmax y (h (2) ) Again, get parameters W (l) by minimizing i log p(y i x i ). Q. What s the catch? 5 / 31

12 Training = Loss Minimization We can decrease any continuous loss by following the subgradient. 1. Differentiate the loss wrt. model parameters (backprop) 2. Take a gradient step 6 / 31

13 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Example: Bidirectional LSTM Network for POS Tagging Encoder-Decoder Models Example: RNN-Based Seq2Seq Bonus: Connectionist Temporal Classification (CTC) 7 / 31

14 Neural Networks are (Finite-Sample) Universal Learners! Theorem. (Zhang et al., 2016) Give me any 1. Set of n samples S = { x (1)... x (n)} R d 2. Function f : S R that assigns some arbitrary value f(x (i) ) to each i = 1... n Then I can specify a 1-hidden-layer feedforward network C : S R with 2n + d parameters such that C(x (i) ) = f(x (i) ) for all i = 1... n. 8 / 31

15 Neural Networks are (Finite-Sample) Universal Learners! Theorem. (Zhang et al., 2016) Give me any 1. Set of n samples S = { x (1)... x (n)} R d 2. Function f : S R that assigns some arbitrary value f(x (i) ) to each i = 1... n Then I can specify a 1-hidden-layer feedforward network C : S R with 2n + d parameters such that C(x (i) ) = f(x (i) ) for all i = 1... n. Proof. Define C(x) = w relu((a x... a x) + b) where w, b R n and a R d are network parameters. Choose a, b so that the matrix A i,j := [max { 0, a x (i) b j } ] is triangular. Solve for w in f(x (1) ). f(x (n) ) = Aw 8 / 31

16 So Why Not Use a Simple Feedforward for Everything? 9 / 31

17 So Why Not Use a Simple Feedforward for Everything? Computational reasons For example, using a GIANT feedforward to cover instances of different sizes is clearly inefficient. 9 / 31

18 So Why Not Use a Simple Feedforward for Everything? Computational reasons For example, using a GIANT feedforward to cover instances of different sizes is clearly inefficient. Empirical reasons In principle, we can learn any function. This tells us nothing about how to get there. How many samples do we need? How can we find the right parameters? Specializing an architecture to a particular type of computation allows us to incorporate inductive bias. Right architecture is absolutely critical in practice. 9 / 31

19 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Example: Bidirectional LSTM Network for POS Tagging Encoder-Decoder Models Example: RNN-Based Seq2Seq Bonus: Connectionist Temporal Classification (CTC) 10 / 31

20 Image = Cube X R w h d : width w, height h, depth d (e.g., 3 RGB values) (scene from Your Name (2016)) 11 / 31

21 Convolutional Neural Network (CNN) Think: Slide various types of filters across image 12 / 31

22 Convolutional Neural Network (CNN) Think: Slide various types of filters across image We say we apply a filter F i R f f d with stride s N and zero-padding size p N to an image X R w h d and obtain a slice Z i R w h 1 where w = (w f + 2p)/s + 1 and h = (h f + 2p)/s + 1. Each entry of Z i is given by a dot product between F i and the corresponding receptive field in X (with zero-padding): Z i t,t,1 = a,b,c F i a,b,c X t+a,t +b,c 12 / 31

23 Convolutional Neural Network (CNN) Think: Slide various types of filters across image We say we apply a filter F i R f f d with stride s N and zero-padding size p N to an image X R w h d and obtain a slice Z i R w h 1 where w = (w f + 2p)/s + 1 and h = (h f + 2p)/s + 1. Each entry of Z i is given by a dot product between F i and the corresponding receptive field in X (with zero-padding): Z i t,t,1 = a,b,c F i a,b,c X t+a,t +b,c We use multiple filters and stack the slices: if m filters are used, the output is Z R w h m. 12 / 31

24 An Example ConvNet Architecture Input: image X R w h d Convolutional layer: use m filters to obtain Z R w h m. 13 / 31

25 An Example ConvNet Architecture Input: image X R w h d Convolutional layer: use m filters to obtain Z R w h m. Relu layer: Apply element-wise relu to Z. 13 / 31

26 An Example ConvNet Architecture Input: image X R w h d Convolutional layer: use m filters to obtain Z R w h m. Relu layer: Apply element-wise relu to Z. Max pooling layer: use 2 2 filter with stride 2 and appropriate zero-padding size to downsample Z to P R (w /2) (h /2) m. 13 / 31

27 An Example ConvNet Architecture Input: image X R w h d Convolutional layer: use m filters to obtain Z R w h m. Relu layer: Apply element-wise relu to Z. Max pooling layer: use 2 2 filter with stride 2 and appropriate zero-padding size to downsample Z to P R (w /2) (h /2) m. Fully-connected layer: equivalent to convolutional layer with K giant filters Q k R (w /2) (h /2) m 13 / 31

28 An Example ConvNet Architecture Input: image X R w h d Convolutional layer: use m filters to obtain Z R w h m. Relu layer: Apply element-wise relu to Z. Max pooling layer: use 2 2 filter with stride 2 and appropriate zero-padding size to downsample Z to P R (w /2) (h /2) m. Fully-connected layer: equivalent to convolutional layer with K giant filters Q k R (w /2) (h /2) m Output: vector v R 1 1 K used for K-way classification of X 13 / 31

29 In Practice, Many Such Layers 14 / 31

30 Filters Correspond to Patterns 15 / 31

31 Filters at Lower Layer vs Higher Layer Zeiler and Fergus (2013) 16 / 31

32 Vanishing/Exploding Gradient Problem in Deep Networks h l = Conv(Conv(Conv(Conv(Conv( (x) ))))) ResNet (He et al., 2015): at each layer l, h l = Conv(h l 1 )+h l 1 17 / 31

33 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Example: Bidirectional LSTM Network for POS Tagging Encoder-Decoder Models Example: RNN-Based Seq2Seq Bonus: Connectionist Temporal Classification (CTC) 18 / 31

34 Language/Speech = Sequence x 1... x N R d : each x i is a d-dimensional embedding of the i-th input Text: x i is a word embedding (to be also learned) Speech: x i is an acoustic feature vector Two properties of this type of data 1. Length N not fixed 2. Typically processed from left to right (most of the time) 19 / 31

35 Recurrent Neural Network (RNN) Think: HMM (or Kalman filter) with extra transformation 20 / 31

36 Recurrent Neural Network (RNN) Think: HMM (or Kalman filter) with extra transformation Input: sequence x 1... x N R d For i = 1... N, h i = tanh (W x i + V h i 1 ) Output: sequence h 1... h N R d 20 / 31

37 RNN Deep Feedforward Unroll the expression for the last output vector h N : ( ( ) )) h N = tanh W x N + V + V tanh (W x 1 + V h 0 It s just a deep feedforward network with one important difference: parameters are reused (V, W ) are applied N times Training: do backprop on this unrolled network, update parameters 21 / 31

38 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Example: Bidirectional LSTM Network for POS Tagging Encoder-Decoder Models Example: RNN-Based Seq2Seq Bonus: Connectionist Temporal Classification (CTC) 22 / 31

39 LSTM RNN produces a sequence of output vectors x 1... x N h 1... h N LSTM produces memory cell vectors along with output x 1... x N c 1... c N, h 1... h N These c 1... c N enable the network to keep or drop information from previous states. 23 / 31

40 LSTM: Details At each time step i, Compute a masking vector for the memory cell: q i = σ ( U q x + V q h i 1 + W i c i 1 ) [0, 1] d 24 / 31

41 LSTM: Details At each time step i, Compute a masking vector for the memory cell: q i = σ ( U q x + V q h i 1 + W i c i 1 ) [0, 1] d Use q i to keep/forget dimensions in previous memory cell: c i = (1 q i ) c i 1 + q i tanh (U c x + V c h i 1 ) 24 / 31

42 LSTM: Details At each time step i, Compute a masking vector for the memory cell: q i = σ ( U q x + V q h i 1 + W i c i 1 ) [0, 1] d Use q i to keep/forget dimensions in previous memory cell: c i = (1 q i ) c i 1 + q i tanh (U c x + V c h i 1 ) Compute another masking vector for the output: o i = σ (U o x + V o h i 1 + W o c i ) [0, 1] d 24 / 31

43 LSTM: Details At each time step i, Compute a masking vector for the memory cell: q i = σ ( U q x + V q h i 1 + W i c i 1 ) [0, 1] d Use q i to keep/forget dimensions in previous memory cell: c i = (1 q i ) c i 1 + q i tanh (U c x + V c h i 1 ) Compute another masking vector for the output: o i = σ (U o x + V o h i 1 + W o c i ) [0, 1] d Use o i to keep/forget dimensions in current memory cell: h i = o i tanh(c i ) 24 / 31

44 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) Long Short-Term Memory Networks (LSTMs) Example: Bidirectional LSTM Network for POS Tagging Encoder-Decoder Models Example: RNN-Based Seq2Seq Bonus: Connectionist Temporal Classification (CTC) 25 / 31

45 Build Your Network Model Parameters Embedding e c for each character type c 2 LSTMs (forward/backward) at character level Embedding e w for each word type w 2 LSTMs (forward/backward) at word level Feedforward network (U, V ) at the output the dog saw the cat 26 / 31

46 1. Character-Level LSTMs Forward character-level LSTM Backward character-level LSTM e d eo eg f c 1 f c 2 f c 3 R dc eg eo e d b c 1 b c 2 b c 3 R dc Get a character-aware encoding of dog: f c 3 x dog = b c 3 R 2dc+dw e dog 27 / 31

47 2. Word-Level LSTMs Forward word-level LSTM x the x dog xsaw x the xcat f1 w f2 w f3 w f4 w f5 w R d Backward word-level LSTM xcat x the xsaw x dog x the b w 1 b w 2 b w 3 b w 4 b w 5 R d Get a sentence-aware encoding of dog: [ ] f w z dog = 2 R 2d b w 4 28 / 31

48 3. Feedforward The final vector h dog R m has dimension equal to the the number of tag types m, computed by feedforward Think h dog = V tanh(uz dog ) [h dog ] N score of tag N for dog 29 / 31

49 Greedy Model Define distribution over tags at each position as: p(n dog) = softmax N (h dog ) Given tagged words (x (1), y (1) )... (x (n), y (n) ), minimize the loss m L(Θ) = log p(y (i) x (i) ) i=1 30 / 31

50 References CNN tutorial: LSTM tutorial: Understanding-LSTMs/ 31 / 31

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation

More information

CSC321 Lecture 20: Reversible and Autoregressive Models

CSC321 Lecture 20: Reversible and Autoregressive Models CSC321 Lecture 20: Reversible and Autoregressive Models Roger Grosse Roger Grosse CSC321 Lecture 20: Reversible and Autoregressive Models 1 / 23 Overview Four modern approaches to generative modeling:

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Lecture 35: Optimization and Neural Nets

Lecture 35: Optimization and Neural Nets Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Deep Learning for Automatic Speech Recognition Part II

Deep Learning for Automatic Speech Recognition Part II Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» Books reviews»»»

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) Lecture 5 Neural models for NLP Julia Hockenmaier 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Recurrent Neural Networks. Why sequence models?

Recurrent Neural Networks. Why sequence models? Recurrent Neural Networks Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

End-to-end Automatic Speech Recognition

End-to-end Automatic Speech Recognition End-to-end Automatic Speech Recognition Markus Nussbaum-Thom IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA Markus Nussbaum-Thom. February 22, 2017 Nussbaum-Thom: IBM Thomas J. Watson

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari This is part of lecture slides on Deep Learning: 1 Topics in Sequence Modeling

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT versus 1 Agenda Deep Learning: Motivation Learning: Backpropagation Deep architectures I: Convolutional Neural Networks (CNN) Deep architectures II: Stacked Auto Encoders (SAE) Caffe Deep Learning Toolbox:

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

An overview of deep learning methods for genomics

An overview of deep learning methods for genomics An overview of deep learning methods for genomics Matthew Ploenzke STAT115/215/BIO/BIST282 Harvard University April 19, 218 1 Snapshot 1. Brief introduction to convolutional neural networks What is deep

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Recurrent Neural Networks

Recurrent Neural Networks Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Structured Neural Networks (I)

Structured Neural Networks (I) Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Lecture 4 Backpropagation

Lecture 4 Backpropagation Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Recurrent neural networks

Recurrent neural networks 12-1: Recurrent neural networks Prof. J.C. Kao, UCLA Recurrent neural networks Motivation Network unrollwing Backpropagation through time Vanishing and exploding gradients LSTMs GRUs 12-2: Recurrent neural

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation

More information

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Contents. (75pts) COS495 Midterm. (15pts) Short answers Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

Notes on Deep Learning for NLP

Notes on Deep Learning for NLP Notes on Deep Learning for NLP Antoine J.-P. Tixier Computer Science Department (DaSciM team) École Polytechnique, Palaiseau, France Last updated Thursday 30 th August, 2018

More information


NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

EE-559 Deep learning LSTM and GRU

EE-559 Deep learning LSTM and GRU EE-559 Deep learning 11.2. LSTM and GRU François Fleuret Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter

More information

Understanding How ConvNets See

Understanding How ConvNets See Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,

More information

CS395T: Structured Models for NLP Lecture 17: CNNs. Greg Durrett

CS395T: Structured Models for NLP Lecture 17: CNNs. Greg Durrett CS395T: Structured Models for NLP Lecture 17: CNNs Greg Durrett Project 2 Results Top 3 scores: Su Wang: 90.13 UAS Greedy logisrc regression with extended feature set, trained for 30 epochs with Adagrad

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Recurrent Neural Networks. COMP-550 Oct 5, 2017 Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs. CNNs. Administrivia. This Lecture. Greg Durrett. Project 1 due today at 5pm

CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs. CNNs. Administrivia. This Lecture. Greg Durrett. Project 1 due today at 5pm CS388: Natural Language Processing Lecture 9: CNNs, Neural CRFs Project 1 due today at 5pm Administrivia Mini 2 out tonight, due in two weeks Greg Durrett This Lecture CNNs CNNs for SenLment Neural CRFs

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

NLP Programming Tutorial 8 - Recurrent Neural Nets

NLP Programming Tutorial 8 - Recurrent Neural Nets NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Feed Forward Neural Nets All connections point forward ϕ( x) y It is a directed acyclic

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office

More information

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley .

More information

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes) Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following

More information

WaveNet: A Generative Model for Raw Audio

WaveNet: A Generative Model for Raw Audio WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio

More information