Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

Size: px
Start display at page:

Download "Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs"

Transcription

1 Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata and Frank Keller 26 January 2018 Scool of Informatics University of Edinburg Backpropagation Troug Time Long sort-term memory Reading: Mikolov et al (2010), Ola (2015). 1 2 Most models in NLP are probabilistic models E.g. language model decomposed wit cain rule of probability. k P(w 1 w k ) = P(w i w 1,, w i 1 ) i=1 Recap: probability, language models, and feedforward networks Modeling decision: Markov assumption P(w i w 1,, w i 1 ) P(w i w i n1,, w i 1 ) Rules of probability (remember: vocabulary V is finite) P : V R P(w w i n1,, w i 1 ) = 1 w V 3

2 MLPs (aka deep NNs) are functions from a vector to a vector Probability distributions are vectors! Summer is ot winter is Wat functions can we use? Matrix multiplication: convert an m-element vector to an n-element vector. Parameters are usually of tis form. Sigmoid, exp, tan, RELU, etc: elementwise nonlinear transform from m-element vector to m-element vector. Concatenate an m-element and n-element vector into an (m n)-element vector. Multiple functions can also sare and substructure. cold 0.6 grey 0.3 winter 0.1 is 0 ot 0 summer 0 4 Softmax will convert any vector to a probability distribution. 5 Elements of discrete vocabularies are vectors! Feedforward LM: function from a vectors to a vector Summer is ot winter is is cold grey ot summer winter Use one-ot encoding to represent any element of a finite set. 6 7

3 How muc context do we need? Te roses are red. Te roses in te vase are red. Te roses in te vase by te door are red. Te roses in te vase by te door to te kitcen are red. Simple Recurrent Networks Captain Aab nursed is grudge for many years before seeking te Wite Donald Trump nursed is grudge for many years before seeking te Wite 8 Modeling Context Context is important in language modeling: n-gram language models use a limited context (fixed n); feedforward networks can be used for language modeling, but teir is also of fixed size; but linguistic dependencies can be arbitrarily long. Tis is were neural networks come in: te of an RNN includes a copy of te previous idden layer of te network; effectively, te RNN buffers all te s it as seen before; it can tus model context dependencies of arbitrary lengt. We will look at simple networks first. Arcitecture Te simple networks only looks back one time step: x(t) s(t-1) V U s(t) W y(t) 9 10

4 Arcitecture Input and Output We ave layer x, idden layer s (state), output layer y. Te at time t is x(t), output is y(t), and idden layer s(t). s j (t) = f (net j (t)) (1) net j (t) = l x i (t)v ji s (t 1)u j (2) i y k (t) = g(net k (t)) (3) net k (t) = s j (t)w kj (4) j were f (z) is te sigmoid, and g(z) te softmax function: f (z) = 1 1 e z g(z m ) = ezm k ez k For initialization, set s and x to small random values; for eac time step, copy s(t 1) and use it to compute s(t); vector x(t) uses 1-of-N (one ot) encoding over te words in te vocabulary; output vector y(t) is a probability distribution over te next word given te current word w(t) and context s(t 1); size of idden layer is usually units, depending on size of training data Training We can use standard backprop wit stocastic gradient descent: simply treat te network as a feedforward network wit s(t 1) as additional ; backpropagate te error to adjust weigt matrices U and V; present all of te training data in eac epoc; test on validation data to see if log-likeliood of training data improves; adjust learning rate if necessary. Backpropagation Troug Time Error signal for training: error(t) = desired(t) y(t) were desired(t) is te one-ot encoding of te correct next word. 13

5 From Simple to Full RNNs Arcitecture Te full RNN looks at all te previous time steps: x(t) Let s drop te assumption tat only te idden layer from te previous time step is used; instead use all previous time steps; we can tink of tis as unfolding over time: te RNN is unfolded into a sequence of feedforward networks; we need a new learning algoritm: backpropagation troug time (BPTT). x(t-2) V U x(t-1) s(t-2) V U s(t-1) V U s(t) W y(t) s(t-3) Standard Backpropagation Going Back in Time For output units, we update te weigts W using: n w kj = η δ pk s pj δ pk = (d pk y pk )g (net pk ) p were d pk is te desired output of unit k for training pattern p. For idden units, we update te weigts V using: n o v ji = η δ pj x pi δ pj = δ pk w kj f (net pj ) p k Tis is just standard backprop, wit notation adjusted for RNNs! If we only go back one time step, ten we can update te weigts U using te standard delta rule: u ji = η n δ pj (t)s p (t 1) δ pj (t) = p o δ pk w kj f (net pj ) However, if we go furter back in time, ten we need to apply te delta rule to te previous time step as well: δ pj (t 1) = δ p (t)u j f (s pj (t 1)) were is te index for te idden unit at time step t, and j for te idden unit at time step t 1. k 16 17

6 Going Back in Time We can do tis for an arbitrary number of time steps τ, adding up te resulting deltas to compute u ji. Te RNN effectively becomes a deep network of dept τ. For language modeling, Mikolov et al. sow tat increased τ improves performance. As we backpropagate troug time, gradients tend toward 0 We adjust U using backprop troug time. For timestep t: n o u ji = η δ pj (t)s p (t 1) δ pj (t) = δ pk w kj f (net pj ) p k For timestep t 1: δ pj (t 1) = δ p (t)u j f (s pj (t 1)) For time step t 2: δ pj (t 2) = δ p (t 1)u j f (s pj (t 2)) = δ p1 (t)u 1 jf (s pj (t 1))u j f (s pj (t 2)) As we backpropagate troug time, gradients tend toward 0 As we backpropagate troug time, gradients tend toward 0 At every time step, we multiply te weigts wit anoter gradient. Te gradients are < 1 so te deltas become smaller and smaller. So in fact, te RNN is not able to learn long-range dependencies well, as te gradient vanises: it rapidly forgets previous s: [Source: ttps://teclevermacine.wordpress.com/] 20 [Source: Graves, Supervised Sequence Labelling wit RNNs, 2012.] 21

7 A better RNN: Long Sort-term Memory Solution: network can sometimes pass on information from previous time steps uncanged, so tat it can learn from distant s: Long sort-term memory 22 Arcitecture of te LSTM Te Gates and te Memory Cell To acieve tis, we need to make te units of te network more complicated: LSTMs ave a idden layer of memory blocks; eac block contains a memory cell and tree multiplicative units: te, output and forget gates; te gates are trainable: eac block can learn weter to keep information across time steps or not. In contrast, te RNN uses simple idden units, wic just sum te and pass it troug an activation function. Eac memory block consists of four units: [Source: Graves, Supervised Sequence Labelling wit RNNs, 2012.] O: open gate --: closed gate black: ig activation wite: low activation Input gate: controls weter te to is passed on to te memory cell or ignored; Output gate: controls weter te current activation vector of te memory cell is passed on to te output layer or not; Forget gate: controls weter te activation vector of te memory cell is reset to zero or maintained; Memory cell: stores te current activation vector; wit connection to itself controlled by forget gate. Tere are also peepole connections; we won t discuss tese

8 A Single LSTM Memory Block RNN Unit compared to LSTM Memory Block SRN unit output g block output LSTM block peepoles cell forget gate block output g output gate gate Legend unweigted connection weigted connection connection wit time-lag brancing point mutliplication sum over all s g gate activation function (always sigmoid) activation function (usually tan) output activation function (usually tan) [Source: Klaus Greff et al.: LSTM: A Searc Space Odyssey, 2015.] [Source: Graves, Supervised Sequence Labelling wit RNNs, 2012.] Te Gates and te Memory Cell Putting LSTM Memory Blocks Togeter Gates are regular idden units: tey sum teir and pass it troug a sigmoid activation function; all four s to te block are te same: te layer and te layer (idden layer at previous time step); all gates ave multiplicative connections: if te activation is close to zero, ten te gate doesn t let anyting troug; te memory cell itself is linear: it as no activation function; but te block as a wole as and output activation functions (can be tan or sigmoid); all connections witin te block are unweigted: tey just pass on information (i.e., copy te incoming vector); te only output tat te rest of te network sees is wat te output gate lets troug. 27 Network wit four units, a idden layer of two memory blocks and five output units: [Source: Graves, Supervised Sequence Labelling wit RNNs, 2012.] 28

9 Vanising Gradients Again Wy does tis solve te vanising gradient problem? te memory cell is linear, so its gradient doesn t vanis; an LSTM block can retain information indefinitely: if te forget gate is open (close to 1) and te gate is closed (close to 0), ten te activation of te cell persists; in addition, te block can decide wen to output information by opening te output gate; te block can terefore retain information over an arbitrary number of time steps before it outputs it; te block learns wen to accept, produce output, and forget information: te gates ave trainable weigts. Applications LSTMs are useful for lots of sequence labeling tasks: part of speec tagging and parsing; semantic role labeling; opinion mining. Wit modification, also widely used for sequence-to-sequence problems: macine translation question answering; summarization; sentence compression and simplification. We will see some of tese applications in te rest of te course Summary Recurrent networks encode a complete sequence. RNNs can be trained wit standard backprop. We can also unfold an RNN over time and train it wit backpropagation troug time; Turns te RNN into a deep network; even better language modeling performance. Backprop troug time wit RNNs as te problem tat gradients vanis wit increasing timesteps. Te LSTM is a way of addressing tis problem. It replaces additive idden units wit complex memory blocks. 31

Notes on Neural Networks

Notes on Neural Networks Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible. 004 Algebra Pretest answers and scoring Part A. Multiple coice questions. Directions: Circle te letter ( A, B, C, D, or E ) net to te correct answer. points eac, no partial credit. Wic one of te following

More information

Lab 6 Derivatives and Mutant Bacteria

Lab 6 Derivatives and Mutant Bacteria Lab 6 Derivatives and Mutant Bacteria Date: September 27, 20 Assignment Due Date: October 4, 20 Goal: In tis lab you will furter explore te concept of a derivative using R. You will use your knowledge

More information

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 315: Artificial Intelligence through Deep Learning W&L Winter Term 2017 Prof. Levy Recurrent Neural Networks (Chapter 7) Recall our first-week discussion... How do we know stuff? (MIT Press 1996)

More information

Chapters 19 & 20 Heat and the First Law of Thermodynamics

Chapters 19 & 20 Heat and the First Law of Thermodynamics Capters 19 & 20 Heat and te First Law of Termodynamics Te Zerot Law of Termodynamics Te First Law of Termodynamics Termal Processes Te Second Law of Termodynamics Heat Engines and te Carnot Cycle Refrigerators,

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

3.4 Worksheet: Proof of the Chain Rule NAME

3.4 Worksheet: Proof of the Chain Rule NAME Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

THE hidden Markov model (HMM)-based parametric

THE hidden Markov model (HMM)-based parametric JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,

More information

Symmetry Labeling of Molecular Energies

Symmetry Labeling of Molecular Energies Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry

More information

Adaptive Neural Filters with Fixed Weights

Adaptive Neural Filters with Fixed Weights Adaptive Neural Filters wit Fixed Weigts James T. Lo and Justin Nave Department of Matematics and Statistics University of Maryland Baltimore County Baltimore, MD 150, U.S.A. e-mail: jameslo@umbc.edu Abstract

More information

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator. Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

2.3 Product and Quotient Rules

2.3 Product and Quotient Rules .3. PRODUCT AND QUOTIENT RULES 75.3 Product and Quotient Rules.3.1 Product rule Suppose tat f and g are two di erentiable functions. Ten ( g (x)) 0 = f 0 (x) g (x) + g 0 (x) See.3.5 on page 77 for a proof.

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

Announcements. Final exam: Dec. 21 st, 1:10-4pm. Class par9cipa9on grades in courseworks: 10% of grade

Announcements. Final exam: Dec. 21 st, 1:10-4pm. Class par9cipa9on grades in courseworks: 10% of grade Text Summarization Announcements Final exam: Dec. 21 st, 1:10-4pm Class par9cipa9on grades in courseorks: 10% of grade AlpaGo documentary free screening. 5:30pm, Tuesday November 21, Roone Arledge Cinema,

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff

More information

A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction

A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction A Long Sort-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction Abdeladi Azzouni and Guy Pujolle LIP6 / UPMC; Paris, France {abdeladi.azzouni,guy.pujolle}@lip6.fr arxiv:1705.05690v3

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Long-Short Term Memory

Long-Short Term Memory Long-Short Term Memory Sepp Hochreiter, Jürgen Schmidhuber Presented by Derek Jones Table of Contents 1. Introduction 2. Previous Work 3. Issues in Learning Long-Term Dependencies 4. Constant Error Flow

More information

Cubic Functions: Local Analysis

Cubic Functions: Local Analysis Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps

More information

Section 15.6 Directional Derivatives and the Gradient Vector

Section 15.6 Directional Derivatives and the Gradient Vector Section 15.6 Directional Derivatives and te Gradient Vector Finding rates of cange in different directions Recall tat wen we first started considering derivatives of functions of more tan one variable,

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

Online Learning: Bandit Setting

Online Learning: Bandit Setting Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Overdispersed Variational Autoencoders

Overdispersed Variational Autoencoders Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Continuous Stochastic Processes

Continuous Stochastic Processes Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

1 2 x Solution. The function f x is only defined when x 0, so we will assume that x 0 for the remainder of the solution. f x. f x h f x.

1 2 x Solution. The function f x is only defined when x 0, so we will assume that x 0 for the remainder of the solution. f x. f x h f x. Problem. Let f x x. Using te definition of te derivative prove tat f x x Solution. Te function f x is only defined wen x 0, so we will assume tat x 0 for te remainder of te solution. By te definition of

More information

Lecture 10: Carnot theorem

Lecture 10: Carnot theorem ecture 0: Carnot teorem Feb 7, 005 Equivalence of Kelvin and Clausius formulations ast time we learned tat te Second aw can be formulated in two ways. e Kelvin formulation: No process is possible wose

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

MULTI-DISTRIBUTION DEEP BELIEF NETWORK FOR SPEECH SYNTHESIS. Shiyin Kang, Xiaojun Qian and Helen Meng

MULTI-DISTRIBUTION DEEP BELIEF NETWORK FOR SPEECH SYNTHESIS. Shiyin Kang, Xiaojun Qian and Helen Meng MULTI-DISTRIBUTION DEEP BELIEF NETORK FOR SPEECH SYNTHESIS Siyin Kang, Xiaojun Qian and Helen Meng Human Computer Communications Laboratory, Department of Systems Engineering and Engineering Management,

More information

0.1 Differentiation Rules

0.1 Differentiation Rules 0.1 Differentiation Rules From our previous work we ve seen tat it can be quite a task to calculate te erivative of an arbitrary function. Just working wit a secon-orer polynomial tings get pretty complicate

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes) Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following

More information

Fundamentals of Concept Learning

Fundamentals of Concept Learning Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml

More information

Deep Generative Models

Deep Generative Models Deep Generative Models Durk Kingma Max Welling Deep Probabilistic Models Worksop Wednesday, 1st of Oct, 2014 D.P. Kingma Deep generative models Transformations between Bayes nets and Neural nets Transformation

More information

MTH-112 Quiz 1 Name: # :

MTH-112 Quiz 1 Name: # : MTH- Quiz Name: # : Please write our name in te provided space. Simplif our answers. Sow our work.. Determine weter te given relation is a function. Give te domain and range of te relation.. Does te equation

More information

Function Composition and Chain Rules

Function Composition and Chain Rules Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function

More information

10 Derivatives ( )

10 Derivatives ( ) Instructor: Micael Medvinsky 0 Derivatives (.6-.8) Te tangent line to te curve yf() at te point (a,f(a)) is te line l m + b troug tis point wit slope Alternatively one can epress te slope as f f a m lim

More information

Part 2: Introduction to Open-Channel Flow SPRING 2005

Part 2: Introduction to Open-Channel Flow SPRING 2005 Part : Introduction to Open-Cannel Flow SPRING 005. Te Froude number. Total ead and specific energy 3. Hydraulic jump. Te Froude Number Te main caracteristics of flows in open cannels are tat: tere is

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Derivatives of Exponentials

Derivatives of Exponentials mat 0 more on derivatives: day 0 Derivatives of Eponentials Recall tat DEFINITION... An eponential function as te form f () =a, were te base is a real number a > 0. Te domain of an eponential function

More information

Tracking the World State with Recurrent Entity Networks

Tracking the World State with Recurrent Entity Networks Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the

More information

The total error in numerical differentiation

The total error in numerical differentiation AMS 147 Computational Metods and Applications Lecture 08 Copyrigt by Hongyun Wang, UCSC Recap: Loss of accuracy due to numerical cancellation A B 3, 3 ~10 16 In calculating te difference between A and

More information

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks A Teoretically Grounded Application of Dropout in Recurrent Neural Networks Yarin Gal University of Cambridge {yg279,zg201}@cam.ac.uk oubin Garamani Abstract Recurrent neural networks (RNNs) stand at te

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Practice Problem Solutions: Exam 1

Practice Problem Solutions: Exam 1 Practice Problem Solutions: Exam 1 1. (a) Algebraic Solution: Te largest term in te numerator is 3x 2, wile te largest term in te denominator is 5x 2 3x 2 + 5. Tus lim x 5x 2 2x 3x 2 x 5x 2 = 3 5 Numerical

More information

Basic Nonparametric Estimation Spring 2002

Basic Nonparametric Estimation Spring 2002 Basic Nonparametric Estimation Spring 2002 Te following topics are covered today: Basic Nonparametric Regression. Tere are four books tat you can find reference: Silverman986, Wand and Jones995, Hardle990,

More information

158 Calculus and Structures

158 Calculus and Structures 58 Calculus and Structures CHAPTER PROPERTIES OF DERIVATIVES AND DIFFERENTIATION BY THE EASY WAY. Calculus and Structures 59 Copyrigt Capter PROPERTIES OF DERIVATIVES. INTRODUCTION In te last capter you

More information

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) Long Short-Term Memory (LSTM) A brief introduction Daniel Renshaw 24th November 2014 1 / 15 Context and notation Just to give the LSTM something to do: neural network language modelling Vocabulary, size

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

1 Limits and Continuity

1 Limits and Continuity 1 Limits and Continuity 1.0 Tangent Lines, Velocities, Growt In tion 0.2, we estimated te slope of a line tangent to te grap of a function at a point. At te end of tion 0.3, we constructed a new function

More information

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,

More information

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225 THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:

More information

Chapter 1D - Rational Expressions

Chapter 1D - Rational Expressions - Capter 1D Capter 1D - Rational Expressions Definition of a Rational Expression A rational expression is te quotient of two polynomials. (Recall: A function px is a polynomial in x of degree n, if tere

More information

MATH1131/1141 Calculus Test S1 v8a

MATH1131/1141 Calculus Test S1 v8a MATH/ Calculus Test 8 S v8a October, 7 Tese solutions were written by Joann Blanco, typed by Brendan Trin and edited by Mattew Yan and Henderson Ko Please be etical wit tis resource It is for te use of

More information

5. (a) Find the slope of the tangent line to the parabola y = x + 2x

5. (a) Find the slope of the tangent line to the parabola y = x + 2x MATH 141 090 Homework Solutions Fall 00 Section.6: Pages 148 150 3. Consider te slope of te given curve at eac of te five points sown (see text for figure). List tese five slopes in decreasing order and

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

APPENDIXES. Let the following constants be established for those using the active Mathcad

APPENDIXES. Let the following constants be established for those using the active Mathcad 3 APPENDIXES Let te following constants be establised for tose using te active Matcad form of tis book: m.. e 9.09389700 0 3 kg Electron rest mass. q.. o.6077330 0 9 coul Electron quantum carge. µ... o.5663706

More information

1watt=1W=1kg m 2 /s 3

1watt=1W=1kg m 2 /s 3 Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory

More information