Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Size: px
Start display at page:

Download "Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves"

Transcription

1 Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves

2 Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though

3 Sequential Data: Text What

4 Sequential Data: Text What about

5 Sequential Data: Text What about inputs that appear in sequences, such as text? Could a neural network handle such modalities?

6 Memory needed Pr x = Pr x i x 1,, x i 1 ) i What about inputs that appear in sequences, such as text? Could a neural network handle such modalities?

7 Sequential data: Video

8 Quiz: Other sequential data?

9 Quiz: Other sequential data? Time series data Stock exchange Biological measurements Climate measurements Market analysis Speech/Music User behavior in websites...

10 Applications Click to go to the video in Youtube Click to go to the website

11 Machine Translation The phrase in the source language is one sequence Today the weather is good The phrase in the target language is also a sequence Погода сегодня хорошая Погода сегодня хорошая <EOS> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Today the weather is good <EOS> Погода сегодня хорошая Encoder Decoder

12 Image captioning An image is a thousand words, literally! Pretty much the same as machine transation Replace the encoder part with the output of a Convnet E.g. use Alexnet or a VGG16 network Keep the decoder part to operate as a translator Today the weather is good <EOS> Convnet LSTM LSTM LSTM LSTM LSTM LSTM Today the weather is good

13 Demo Click to go to the video in Youtube

14 Question answering Bleeding-edge research, no real consensus Very interesting open, research problems Again, pretty much like machine translation Again, Encoder-Decoder paradigm Insert the question to the encoder part Model the answer at the decoder part Question answering with images also Again, bleeding-edge research How/where to add the image? What has been working so far is to add the image only in the beginning Q: John entered the living room, where he met Mary. She was drinking some wine and watching a movie. What room did John enter? A: John entered the living room. Q: what are the people playing? A: They play beach football

15 Demo Click to go to the website

16 Handwriting Click to go to the website

17 Recurrent Networks Output NN Cell State Recurrent connections Input

18 Sequences Next data depend on previous data Roughly equivalent to predicting what comes next Pr x = Pr x i x 1,, x i 1 ) i What about inputs that appear in sequences,suc as text?could a neural network h handle such modalities?

19 Why sequences? Parameters are reused Fewer parameters Easier modelling Generalizes well to arbitrary lengths Not constrained to specific length RecurrentModel( RecurrentModel( I think, therefore, I am! ) Everything is repeated, in a circle. History is a master because it teaches us that it doesn't exist. It's the permutations that matter. ) However, often we pick a frame T instead of an arbitrary length Pr x = Pr x i x i T,, x i 1 ) i

20 Quiz: What is really a sequence? Data inside a sequence are? McGuire Bond I am Bond, James tired am!

21 Quiz: What is really a sequence? Data inside a sequence are non i.i.d. Identically, independently distributed The next word depends on the previous words Ideally on all of them We need context, and we need memory! How to model context and memory? McGuire Bond I am Bond, James Bond tired am!

22 x i One-hot vectors A vector with all zeros except for the active dimension 12 words in a sequence 12 One-hot vectors After the one-hot vectors apply an embedding Word2Vec, GloVE Vocabulary I am Bond James tired, McGuire! I am Bond James tired, McGuire! 1 I am Bond James tired, McGuire! One-hot vectors I 1 am Bond James tired, McGuire! I am Bond James tired, McGuire! 1 1

23 Quiz: Why not just indices? One-hot representation OR? Index representation x t=1,2,3,4 = I am James McGuire I am James McGuire x "I" = 1 x "am" = 2 x "James" = 4 x "McGuire" = 7

24 Quiz: Why not just indices? One-hot representation OR? Index representation I am James McGuire 1 x "I" x "am" x "James" x "McGuire" distance(x "am", x "McGuire" ) = 1 distance(x "I", x "am" ) = 1 I am James McGuire q "I" = 1 q "am" = 2 q "James" = 4 q "McGuire" = 7 distance(q "am", q "McGuire" ) = 5 distance(q "I", q "am" ) = 1 No, because then some words get closer together for no good reason (artificial bias between words)

25 Memory A representation of the past A memory must project information at timestep t on a latent space c t using parameters θ Then, re-use the projected information from t at t + 1 c t+1 = h(x t+1, c t ; θ) Memory parameters θ are shared for all timesteps t =, c t+1 = h(x t+1, h(x t, h(x t 1, h x 1, c ; θ ; θ); θ); θ)

26 Memory as a Graph Simplest model Input with parameters U Memory embedding with parameters W Output with parameters V Output parameters Memory parameters Output W Input parameters V U y t c t Input x t Memory

27 Memory as a Graph Simplest model Input with parameters U Memory embedding with parameters W Output with parameters V Output y t y t+1 Output parameters Memory parameters W V c t W c t+1 Memory Input parameters U Input x t x t+1

28 Memory as a Graph Simplest model Input with parameters U Memory embedding with parameters W Output with parameters V Output y t y t+1 y t+2 y t+3 Output parameters V Memory parameters W W V W c t c t+1 c t+2 c t+3 Memory V W V W Input parameters U Input x t U U x t+1 x t+2 x t+3 U

29 Folding the memory Unrolled/Unfolded Network Folded Network y t y t+1 y t+2 y t V V V W W W W c t c t+1 c t+2 c t U U U (ct 1) x t x t+1 x t+2 x t

30 Recurrent Neural Network (RNN) Only two equations y t c t = tanh(u x t + Wc t 1 ) y t = softmax(v c t ) V U c t W (c t 1 ) x t

31 RNN Example Vocabulary of 5 words A memory of 3 units Hyperparameter that we choose like layer size c t : 3 1, W: [3 3] An input projection of 3 dimensions U: [3 5] An output projections of 1 dimensions V: [1 3] U x t=4 = c t = tanh(u x t + Wc t 1 ) y t = softmax(v c t )

32 Rolled Network vs. MLP? What is really different? Steps instead of layers Step parameters shared in Recurrent Network In a Multi-Layer Network parameters are different y 1 y 2 y 3 Final output Layer/Step 1 V Layer/Step 2 V V Layer/Step 3 W x W U 3-gram Unrolled Recurrent Network U W U x Layer 1 Layer 2 Layer 3 W 1 W 2 W 3 y 3-layer Neural Network

33 Rolled Network vs. MLP? What is really different? Steps instead of layers Step parameters shared in Recurrent Network In a Multi-Layer Network parameters are different y 1 y 2 y 3 Final output Layer/Step 1 V Layer/Step 2 V V Layer/Step 3 x W W W W 1 W 2 W 3 x Layer 1 Layer 2 Layer 3 y U 3-gram Unrolled Recurrent Network U U 3-layer Neural Network

34 Rolled Network vs. MLP? What is really different? Steps instead of layers Sometimes intermediate outputs are not even needed Removing them, we almost end up to a standard Neural Network Step parameters shared in Recurrent Network In a Multi-Layer Network parameters are different y 1 y 2 y 3 Final output Layer/Step 1 V Layer/Step 2 V V Layer/Step 3 x W W W W 1 W 2 W 3 x Layer 1 Layer 2 Layer 3 y U 3-gram Unrolled Recurrent Network U U 3-layer Neural Network

35 Rolled Network vs. MLP? What is really different? Steps instead of layers Step parameters shared in Recurrent Network In a Multi-Layer Network parameters are different V Layer/Step 1 Layer/Step 2 Layer/Step 3 Final output y W W W W W 1 2 W 3 x x Layer 1 Layer 2 Layer 3 y U 3-gram Unrolled Recurrent Network U U 3-layer Neural Network

36 Training Recurrent Networks Cross-entropy loss l P = y tk tk L = log P = L t = 1 T t,k t t Backpropagation Through Time (BPTT) Again, chain rule Only difference: Gradients survive over time steps l t log y t

37 Training RNNs is hard Vanishing gradients After a few time steps the gradients become almost Exploding gradients After a few time steps the gradients become huge Can t capture long-term dependencies

38 Alternative formulation for RNNs An alternative formulation to derive conclusions and intuitions c t = W tanh(c t 1 ) + U x t + b L = L t (c t ) t

39 Another look at the gradients L = L(c T (c T 1 c 1 x 1, c ; W ; W ; W ; W) L c t c t c τ = L c t L t t L t c t c t c τ c τ W W = τ=1 c t c t 1 c τ+1 c t 1 c t 2 c τ η t τ L t c t Rest short-term factors t τ long-term factors The RNN gradient is a recursive product of c t c t 1

40 Exploding/Vanishing gradients L = L c t c T L = L c t c T c T c T 1 c T 1 c T 2 c t+1 c ct < 1 < 1 < 1 c T c T 1 c T 1 c T 2 c 1 c ct > 1 > 1 > 1 L W 1 Vanishing gradient L W 1 Exploding gradient

41 Vanishing gradients The gradient of the error w.r.t. to intermediate cell L t t W = τ=1 L r y t y t c t c t c τ c τ W c t c τ = t k τ c k = W tanh(c c k 1 ) k 1 t k τ

42 Vanishing gradients The gradient of the error w.r.t. to intermediate cell L t t W = τ=1 L r y t y t c t c t c τ c τ W c t c τ = t k τ c k = W tanh(c c k 1 ) k 1 t k τ Long-term dependencies get exponentially smaller weights

43 Gradient clipping for exploding gradients Scale the gradients to a threshold Pseudocode 1. g L W 2. if g > θ : g θ g g g θ g else: g print( Do nothing ) θ Simple, but works!

44 Rescaling vanishing gradients? Not good solution Weights are shared between timesteps Loss summed over timesteps L = L t L W = L t W t t t t L t W = L t c τ c τ W = L t c t c τ c t c τ W τ=1 τ=1 Rescaling for one timestep ( L t W ) affects all timesteps The rescaling factor for one timestep does not work for another

45 More intuitively L 1 W L 2 W L = L 1 + L 2 W W W + L 3 + L 4 + L 5 W W W L 3 W L 4 W L 5 W

46 More intuitively Let s say L 1 1, L 2 1/1, L 3 1/1, L 4 1/1, L 5 1/1 W W W W W L W = L r W = r If L W rescaled to 1 L 5 W 1 5 Longer-term dependencies negligible Weak recurrent modelling Learning focuses on the short-term only L 1 W L = L 1 + L 2 W W W + L 3 + L 4 + L 5 W W W L 2 W L 3 W L 4 W L 5 W

47 Recurrent networks Chaotic systems

48 Fixing vanishing gradients Regularization on the recurrent weights Force the error signal not to vanish Ω = Ω t = t t L c t+1 c t+1 c t L c t Advanced recurrent modules Long-Short Term Memory module Gated Recurrent Unit module Pascanu et al., On the diffculty of training Recurrent Neural Networks, 213

49 Advanced Recurrent Nets c t 1 + c t tanh f t i t o t c t tanh m t 1 m t Output Input x t

50 How to fix the vanishing gradients? Error signal over time must have not too large, not too small norm Let s have a look at the loss function L t t W = τ=1 c t c τ = t k τ L r y t y t c t c t c τ c τ W c k c k 1

51 How to fix the vanishing gradients? Error signal over time must have not too large, not too small norm Let s have a look at the loss function L t t W = τ=1 c t c τ = t k τ L r y t y t c t c t c τ c τ W c k c k 1

52 How to fix the vanishing gradients? Error signal over time must have not too large, not too small norm Solution: have an activation function with gradient equal to 1 L t t W = τ=1 c t c τ = t k τ L r y t y t c t c t c τ c τ W c k c k 1 Identify function has a gradient equal to 1 By doing so, gradients do not become too small not too large

53 Long Short-Term Memory LSTM: Beefed up RNN Simple RNN c t = W tanh(c t 1 ) + U x t + b LSTM i = x t U (i) + m t 1 W (i) f = x t U (f) + m t 1 W (f) o = x t U (o) + m t 1 W (o) c t = tanh(x t U g + m t 1 W (g) ) c t 1 f t i t + o t tanh c t c t = c t 1 f + c t i m t = tanh c t o c t tanh - The previous state c t 1 is connected to new c t with no nonlinearity (identity function). - The only other factor is the forget gate f which rescales the previous LSTM state. m t 1 x t Input m t Output More info at:

54 Cell state The cell state carries the essential information over time Cell state line i = x t U (i) + m t 1 W (i) f = x t U (f) + m t 1 W (f) o = x t U (o) + m t 1 W (o) c t = tanh(x t U g + m t 1 W (g) ) c t 1 f t i t + o t tanh c t c t = c t 1 f + c t i m t = tanh c t o c t tanh m t 1 m t xt

55 LSTM nonlinearities (, 1): control gate something like a switch tanh 1, 1 : recurrent nonlinearity i = x t U (i) + m t 1 W (i) f = x t U (f) + m t 1 W (f) o = x t U (o) + m t 1 W (o) c t = tanh(x t U g + m t 1 W (g) ) c t 1 f t i t + o t tanh c t c t = c t 1 f + c t i m t = tanh c t o c t tanh m t 1 m t Output Input x t

56 LSTM Step-by-Step: Step (1) E.g. LSTM on Yesterday she slapped me. Today she loves me. Decide what to forget and what to remember for the new memory Sigmoid 1 Remember everything Sigmoid Forget everything i t = x t U (i) + m t 1 W (i) f t = x t U (f) + m t 1 W (f) c t 1 f t i t + o t tanh c t o t = x t U (o) + m t 1 W (o) c t = tanh(x t U g + m t 1 W (g) ) c t tanh c t = c t 1 f + c t i m t 1 m t m t = tanh c t o xt

57 LSTM Step-by-Step: Step (2) Decide what new information is relevant from the new input and should be add to the new memory Modulate the input i t Generate candidate memories c t c t 1 + c t i t = x t U (i) + m t 1 W (i) f t = x t U (f) + m t 1 W (f) f t i t o t tanh o t = x t U (o) + m t 1 W (o) c t = tanh(x t U g + m t 1 W (g) ) c t tanh c t = c t 1 f + c t i m t 1 m t m t = tanh c t o xt

58 LSTM Step-by-Step: Step (3) Compute and update the current cell state c t Depends on the previous cell state What we decide to forget What inputs we allow The candidate memories i t = x t U (i) + m t 1 W (i) f t = x t U (f) + m t 1 W (f) o t = x t U (o) + m t 1 W (o) c t 1 c t = tanh(x t U g + m t 1 W (g) ) f t i t + c t tanh o t tanh c t c t = c t 1 f + c t i m t 1 m t m t = tanh c t o xt

59 LSTM Step-by-Step: Step (4) Modulate the output Does the new cell state relevant? Sigmoid 1 If not Sigmoid Generate the new memory c t 1 + c t i t = x t U (i) + m t 1 W (i) f t = x t U (f) + m t 1 W (f) f t i t o t tanh o t = x t U (o) + m t 1 W (o) c t = tanh(x t U g + m t 1 W (g) ) c t tanh c t = c t 1 f + c t i m t 1 m t m t = tanh c t o xt

60 LSTM Unrolled Network Macroscopically very similar to standard RNNs The engine is a bit different (more complicated) Because of their gates LSTMs capture long and short term dependencies tanh tanh tanh tanh tanh tanh

61 Beyond RNN & LSTM LSTM with peephole connections Gates have access also to the previous cell states c t 1 (not only memories) Coupled forget and input gates, c t = f t c t f t c t Bi-directional recurrent networks Gated Recurrent Units (GRU) Deep recurrent architectures Recursive neural networks Tree structured Multiplicative interactions Generative recurrent architectures LSTM (2) LSTM (1) LSTM (2) LSTM (1) LSTM (2) LSTM (1)

62 Take-away message Recurrent Neural Networks (RNN) for sequences Backpropagation Through Time Vanishing and Exploding Gradients and Remedies RNNs using Long Short-Term Memory (LSTM) Applications of Recurrent Neural Networks

63 Thank you!

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 315: Artificial Intelligence through Deep Learning W&L Winter Term 2017 Prof. Levy Recurrent Neural Networks (Chapter 7) Recall our first-week discussion... How do we know stuff? (MIT Press 1996)

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

RECURRENT NETWORKS I. Philipp Krähenbühl

RECURRENT NETWORKS I. Philipp Krähenbühl RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION

More information

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes) Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

EE-559 Deep learning LSTM and GRU

EE-559 Deep learning LSTM and GRU EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks. deeplearning.ai. Why sequence models? Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Recurrent neural networks

Recurrent neural networks 12-1: Recurrent neural networks Prof. J.C. Kao, UCLA Recurrent neural networks Motivation Network unrollwing Backpropagation through time Vanishing and exploding gradients LSTMs GRUs 12-2: Recurrent neural

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural

More information

(

( Class 15 - Long Short-Term Memory (LSTM) Study materials http://colah.github.io/posts/2015-08-understanding-lstms/ (http://colah.github.io/posts/2015-08-understanding-lstms/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/

More information

Structured Neural Networks (I)

Structured Neural Networks (I) Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) Long Short-Term Memory (LSTM) A brief introduction Daniel Renshaw 24th November 2014 1 / 15 Context and notation Just to give the LSTM something to do: neural network language modelling Vocabulary, size

More information

Recurrent Neural Networks (RNNs) Lecture 9 - Networks for Sequential Data RNNs & LSTMs. RNN with no outputs. RNN with no outputs

Recurrent Neural Networks (RNNs) Lecture 9 - Networks for Sequential Data RNNs & LSTMs. RNN with no outputs. RNN with no outputs Recurrent Neural Networks (RNNs) RNNs are a family of networks for processing sequential data. Lecture 9 - Networks for Sequential Data RNNs & LSTMs DD2424 September 6, 2017 A RNN applies the same function

More information

Long-Short Term Memory

Long-Short Term Memory Long-Short Term Memory Sepp Hochreiter, Jürgen Schmidhuber Presented by Derek Jones Table of Contents 1. Introduction 2. Previous Work 3. Issues in Learning Long-Term Dependencies 4. Constant Error Flow

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Deep Learning Recurrent Networks 10/11/2017

Deep Learning Recurrent Networks 10/11/2017 Deep Learning Recurrent Networks 10/11/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata

More information

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

Generating Sequences with Recurrent Neural Networks

Generating Sequences with Recurrent Neural Networks Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Better Conditional Language Modeling. Chris Dyer

Better Conditional Language Modeling. Chris Dyer Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with

More information

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST 1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second

More information

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Contents. (75pts) COS495 Midterm. (15pts) Short answers Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this

More information

Recurrent Neural Networks

Recurrent Neural Networks Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation

More information

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

High Order LSTM/GRU. Wenjie Luo. January 19, 2016 High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Deep Learning Recurrent Networks 10/16/2017

Deep Learning Recurrent Networks 10/16/2017 Deep Learning Recurrent Networks 10/16/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Lecture 5: Recurrent Neural Networks

Lecture 5: Recurrent Neural Networks 1/25 Lecture 5: Recurrent Neural Networks Nima Mohajerin University of Waterloo WAVE Lab nima.mohajerin@uwaterloo.ca July 4, 2017 2/25 Overview 1 Recap 2 RNN Architectures for Learning Long Term Dependencies

More information

Tracking the World State with Recurrent Entity Networks

Tracking the World State with Recurrent Entity Networks Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Other Artificial Neural Networks Samy Bengio IDIAP Research Institute, Martigny, Switzerland,

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Backpropagation and Neural Networks part 1. Lecture 4-1

Backpropagation and Neural Networks part 1. Lecture 4-1 Lecture 4: Backpropagation and Neural Networks part 1 Lecture 4-1 Administrative A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures

More information

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Lecture 2: Modular Learning

Lecture 2: Modular Learning Lecture 2: Modular Learning Deep Learning @ UvA MODULAR LEARNING - PAGE 1 Announcement o C. Maddison is coming to the Deep Vision Seminars on the 21 st of Sep Author of concrete distribution, co-author

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Content Quick Review of Recurrent Neural Network Introduction

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu-Chiang Frank Wang 王鈺強, Associate Professor

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information