CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part V Language Models, RNN, GRU and LSTM 2 Winter 2019

Size: px
Start display at page:

Download "CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part V Language Models, RNN, GRU and LSTM 2 Winter 2019"

Transcription

1 CS224n: Naural Language Processing wih Deep Learning 1 Lecure Noes: Par V Language Models, RNN, GRU and LSTM 2 Winer Course Insrucors: Chrisopher Manning, Richard Socher 2 Auhors: Milad Mohammadi, Rohi Mundra, Richard Socher, Lisa Wang, Amia Kamah Keyphrases: Language Models. RNN. Bi-direcional RNN. Deep RNN. GRU. LSTM. 1 Language Models 1.1 Inroducion Language models compue he probabiliy of occurrence of a number of words in a paricular sequence. The probabiliy of a sequence of m words {w 1,..., w m } is denoed as P(w 1,..., w m ). Since he number of words coming before a word, w i, varies depending on is locaion in he inpu documen, P(w 1,..., w m ) is usually condiioned on a window of n previous words raher han all previous words: P(w 1,..., w m ) = i=m P(w i w 1,..., w i 1 ) i=1 i=m P(w i w i n,..., w i 1 ) (1) i=1 Equaion 1 is especially useful for speech and ranslaion sysems when deermining wheher a word sequence is an accurae ranslaion of an inpu senence. In exising language ranslaion sysems, for each phrase / senence ranslaion, he sofware generaes a number of alernaive word sequences (e.g. {I have, I had, I has, me have, me had}) and scores hem o idenify he mos likely ranslaion sequence. In machine ranslaion, he model chooses he bes word ordering for an inpu phrase by assigning a goodness score o each oupu word sequence alernaive. To do so, he model may choose beween differen word ordering or word choice alernaives. I would achieve his objecive by running all word sequence candidaes hrough a probabiliy funcion ha assigns each a score. The sequence wih he highes score is he oupu of he ranslaion. For example, he machine would give a higher score o "he ca is small" compared o "small he is ca", and a higher score o "walking home afer school" compared o "walking house afer school". 1.2 n-gram Language Models To compue he probabiliies menioned above, he coun of each n- gram could be compared agains he frequency of each word. This is

2 language models, rnn, gru and lsm 2 called an n-gram Language Model. For insance, if he model akes bi-grams, he frequency of each bi-gram, calculaed via combining a word wih is previous word, would be divided by he frequency of he corresponding uni-gram. Equaions 2 and 3 show his relaionship for bigram and rigram models. p(w 2 w 1 ) = coun(w 1, w 2 ) coun(w 1 ) (2) p(w 3 w 1, w 2 ) = coun(w 1, w 2, w 3 ) coun(w 1, w 2 ) The relaionship in Equaion 3 focuses on making predicions based on a fixed window of conex (i.e. he n previous words) used o predic he nex word. Bu how long should he conex be? In some cases, he window of pas consecuive n words may no be sufficien o capure he conex. For insance, consider he senence "As he procor sared he clock, he sudens opened heir ". If he window only condiions on he previous hree words "he sudens opened heir", he probabiliies calculaed based on he corpus may sugges ha he nex word be "books" - however, if n had been large enough o include he "procor" conex, he probabiliy migh have suggesed "exam". This leads us o wo main issues wih n-gram Language Models: Sparsiy and Sorage. 1. Sparsiy problems wih n-gram Language models Sparsiy problems wih hese models arise due o wo issues. Firsly, noe he numeraor of Equaion 3. If w 1, w 2 and w 3 never appear ogeher in he corpus, he probabiliy of w 3 is 0. To solve his, a small δ could be added o he coun for each word in he vocabulary. This is called smoohing. Secondly, consider he denominaor of Equaion 3. If w 1 and w 2 never occurred ogeher in he corpus, hen no probabiliy can be calculaed for w 3. To solve his, we could condiion on w 2 alone. This is called backoff. Increasing n makes sparsiy problems worse. Typically, n Sorage problems wih n-gram Language models We know ha we need o sore he coun for all n-grams we saw in he corpus. As n increases (or he corpus size increases), he model size increases as well. (3) 1.3 Window-based Neural Language Model The "curse of dimensionaliy" above was firs ackled by Bengio e al in A Neural Probabilisic Language Model, which inroduced he

3 language models, rnn, gru and lsm 3 firs large-scale deep learning for naural language processing model. This model learns a disribued represenaion of words, along wih he probabiliy funcion for word sequences expressed in erms of hese represenaions. Figure 1 shows he corresponding neural nework archiecure. The inpu word vecors are used by boh he hidden layer and he oupu layer. Equaion 4 represens Figure 1 and shows he parameers of he sofmax() funcion, consising of he sandard anh() funcion (i.e. he hidden layer) as well as he linear funcion, W (3) x + b (3), ha capures all he previous n inpu word vecors. ŷ = sofmax(w (2) anh(w (1) x + b (1) ) + W (3) x + b (3) ) (4) Noe ha he weigh marix W (1) is applied o he word vecors (solid green arrows in Figure 1), W (2) is applied o he hidden layer (also solid green arrow) and W (3) is applied o he word vecors (dashed green arrows). A simplified version of his model can be seen in Figure 2, where he blue layer signifies concaenaed word embeddings for he inpu words: e = [e (1) ; e (2) ; e (3) ; e (4) ], he red layer signifies he hidden layer: h = f (We + b 1 ), and he green oupu disribuion is a sofmax over he vocabulary: ŷ = sofmax(uh + b 2 ). Figure 1: The firs deep neural nework archiecure model for NLP presened by Bengio e al. 2 Recurren Neural Neworks (RNN) Unlike he convenional ranslaion models, where only a finie window of previous words would be considered for condiioning he language model, Recurren Neural Neworks (RNN) are capable of condiioning he model on all previous words in he corpus. Figure 3 inroduces he RNN archiecure where each verical recangular box is a hidden layer a a ime-sep,. Each such layer holds a number of neurons, each of which performs a linear marix operaion on is inpus followed by a non-linear operaion (e.g. anh()). A each ime-sep, here are wo inpus o he hidden layer: he oupu of he previous layer h 1, and he inpu a ha imesep x. The former inpu is muliplied by a weigh marix W (hh) and he laer by a weigh marix W (hx) o produce oupu feaures h, which are muliplied wih a weigh marix W (S) and run hrough a sofmax over he vocabulary o obain a predicion oupu ŷ of he nex word (Equaions 5 and 6). The inpus and oupus of each single neuron are illusraed in Figure 4. Figure 2: A simplified represenaion of Figure 1. y 1 y y +1 h 1 h h +1 W" W" x 1 x x +1 Figure 3: A Recurren Neural Nework (RNN). Three ime-seps are shown. h = σ(w (hh) h 1 + W (hx) x [] ) (5) ŷ = so f max(w (S) h ) (6)

4 language models, rnn, gru and lsm 4 Wha is ineresing here is ha he same weighs W (hh) and W (hx) are applied repeaedly a each imesep. Thus, he number of parameers he model has o learn is less, and mos imporanly, is independen of he lengh of he inpu sequence - hus defeaing he curse of dimensionaliy! Below are he deails associaed wih each parameer in he nework: x 1,..., x 1, x, x +1,...x T : he word vecors corresponding o a corpus wih T words. Figure 4: The inpus and oupus o a neuron of a RNN h = σ(w (hh) h 1 + W (hx) x ): he relaionship o compue he hidden layer oupu feaures a each ime-sep x R d : inpu word vecor a ime. W hx R D h d : weighs marix used o condiion he inpu word vecor, x W hh R D h D h : weighs marix used o condiion he oupu of he previous ime-sep, h 1 h 1 R D h: oupu of he non-linear funcion a he previous ime-sep, 1. h 0 R D h is an iniializaion vecor for he hidden layer a ime-sep = 0. σ(): he non-lineariy funcion (sigmoid here) ŷ = so f max(w (S) h ): he oupu probabiliy disribuion over he vocabulary a each ime-sep. Essenially, ŷ is he nex prediced word given he documen conex score so far (i.e. h 1 ) and he las observed word vecor x (). Here, W (S) R V D h and ŷ R V where V is he vocabulary. An example of an RNN language model is shown in Figure 5. The noaion in his image is slighly differen: here, he equivalen of W (hh) is W h, W (hx) is W e, and W (S) is U. E convers word inpus x () o word embeddings e (). The final sofmax over he vocabulary shows us he probabiliy of various opions for oken x (5), condiioned on all previous okens. The inpu could be much longer han 4-5 okens. Figure 5: An RNN Language Model 2.1 RNN Loss and Perplexiy The loss funcion used in RNNs is ofen he cross enropy error inroduced in earlier noes. Equaion 7 shows his funcion as he sum over he enire vocabulary a ime-sep. J () (θ) = V j=1 y,j log(ŷ,j ) (7)

5 language models, rnn, gru and lsm 5 The cross enropy error over a corpus of size T is: J = 1 T T J () (θ) = 1 T =1 T =1 V j=1 y,j log(ŷ,j ) (8) Equaion 9 is called he perplexiy relaionship; i is basically 2 o he power of he negaive log probabiliy of he cross enropy error funcion shown in Equaion 8. Perplexiy is a measure of confusion where lower values imply more confidence in predicing he nex word in he sequence (compared o he ground ruh oucome). Perplexiy = 2 J (9) 2.2 Advanages, Disadvanages and Applicaions of RNNs RNNs have several advanages: 1. They can process inpu sequences of any lengh 2. The model size does no increase for longer inpu sequence lenghs 3. Compuaion for sep can (in heory) use informaion from many seps back. 4. The same weighs are applied o every imesep of he inpu, so here is symmery in how inpus are processed However, hey also have some disadvanages: 1. Compuaion is slow - because i is sequenial, i canno be parallelized 2. In pracice, i is difficul o access informaion from many seps back due o problems like vanishing and exploding gradiens, which we discuss in he following subsecion The amoun of memory required o run a layer of RNN is proporional o he number of words in he corpus. We can consider a senence as a minibach, and a senence wih k words would have k word vecors o be sored in memory. Also, he RNN mus mainain wo pairs of W, b marices. As aforemenioned, while he size of W could be very large, i does no scale wih he size of he corpus (unlike he radiional language models). For a RNN wih 1000 recurren layers, he marix would be regardless of he corpus size. RNNs can be used for many asks, such as agging (e.g. par-ofspeech, named eniy recogniion), senence classificaion (e.g. senimen classificaion), and encoder modules (e.g. quesion answering,

6 language models, rnn, gru and lsm 6 machine ranslaion, and many oher asks). In he laer wo applicaions, we wan a represenaion for he senence, which we can obain by aking he elemen-wise max or mean of all hidden saes of he imeseps in ha senence. Noe: Figure 6 is an alernaive represenaion of RNNs used in some publicaions. I represens he RNN hidden layer as a loop. 2.3 Vanishing Gradien & Gradien Explosion Problems Recurren neural neworks propagae weigh marices from one imesep o he nex. Recall he goal of a RNN implemenaion is o enable propagaing conex informaion hrough faraway ime-seps. For example, consider he following wo senences: h Figure 6: The illusraion of a RNN as a loop over ime-seps Senence 1 "Jane walked ino he room. John walked in oo. Jane said hi o " Senence 2 "Jane walked ino he room. John walked in oo. I was lae in he day, and everyone was walking home afer a long day a work. Jane said hi o " In boh senences, given heir conex, one can ell he answer o boh blank spos is mos likely "John". I is imporan ha he RNN predics he nex word as "John", he second person who has appeared several ime-seps back in boh conexs. Ideally, his should be possible given wha we know abou RNNs so far. In pracice, however, i urns ou RNNs are more likely o correcly predic he blank spo in Senence 1 han in Senence 2. This is because during he back-propagaion phase, he conribuion of gradien values gradually vanishes as hey propagae o earlier imeseps, as we will show below. Thus, for long senences, he probabiliy ha "John" would be recognized as he nex word reduces wih he size of he conex. Below, we discuss he mahemaical reasoning behind he vanishing gradien problem. Consider Equaions 5 and 6 a a ime-sep ; o compue he RNN error, de/dw, we sum he error a each ime-sep. Tha is, de /dw for every ime-sep,, is compued and accumulaed. T E W = E W =1 (10) The error for each ime-sep is compued hrough applying he chain rule differeniaion o Equaions 6 and 5; Equaion 11 shows he corresponding differeniaion. Noice dh /dh k refers o he parial

7 language models, rnn, gru and lsm 7 derivaive of h wih respec o all previous k ime-seps. E W = E y h h k y k=1 h h k W (11) Equaion 12 shows he relaionship o compue each dh /dh k ; his is simply a chain rule differeniaion over all hidden layers wihin he [k, ] ime inerval. h h k = j=k+1 h j h j 1 = W T diag[ f (j j 1 )] (12) j=k+1 Because h R D n, each h j / h j 1 is he Jacobian marix for h: h j,1... h j 1,1 h j h j h... j = [... ] =... h j 1 h j 1,1 h j 1,Dn... h j,dn... h j 1,1 h j,1 h j 1,Dn h j,dn h j 1,Dn (13) Puing Equaions 10, 11, 12 ogeher, we have he following relaionship. T E W = E y h j ( y =1 h ) h k (14) h j 1 W k=1 j=k+1 Equaion 15 shows he norm of he Jacobian marix relaionship in Equaion 13. Here, β W and β h represen he upper bound values for he wo marix norms. The norm of he parial gradien a each ime-sep,, is herefore, calculaed hrough he relaionship shown in Equaion 15. h j h j 1 W T diag[ f (h j 1 )] β W β h (15) The norm of boh marices is calculaed hrough aking heir L2- norm. The norm of f (h j 1 ) can only be as large as 1 given he sigmoid non-lineariy funcion. h h k = j=k+1 h j h j 1 (β W β h ) k (16) The exponenial erm (β W β h ) k can easily become a very small or large number when β W β h is much smaller or larger han 1 and k is sufficienly large. Recall a large k evaluaes he cross enropy error due o faraway words. The conribuion of faraway words o predicing he nex word a ime-sep diminishes when he gradien vanishes early on.

8 language models, rnn, gru and lsm 8 During experimenaion, once he gradien value grows exremely large, i causes an overflow (i.e. NaN) which is easily deecable a runime; his issue is called he Gradien Explosion Problem. When he gradien value goes o zero, however, i can go undeeced while drasically reducing he learning qualiy of he model for far-away words in he corpus; his issue is called he Vanishing Gradien Problem. Due o vanishing gradiens, we don know wheher here is no dependency beween seps and + n in he daa, or we jus canno capure he rue dependency due o his issue. To gain pracical inuiion abou he vanishing gradien problem, you may visi he following example websie. 2.4 Soluion o he Exploding & Vanishing Gradiens Now ha we gained inuiion abou he naure of he vanishing gradiens problem and how i manifess iself in deep neural neworks, le us focus on a simple and pracical heurisic o solve hese problems. To solve he problem of exploding gradiens, Thomas Mikolov firs inroduced a simple heurisic soluion ha clips gradiens o a small number whenever hey explode. Tha is, whenever hey reach a cerain hreshold, hey are se back o a small number as shown in Algorihm 1. ĝ E W if ĝ hreshold hen ĝ hreshold ĝ ĝ end if Algorihm 1: Pseudo-code for norm clipping in he gradiens whenever hey explode Figure 7 visualizes he effec of gradien clipping. I shows he decision surface of a small recurren neural nework wih respec o is W marix and is bias erms, b. The model consiss of a single uni of recurren neural nework running hrough a small number of imeseps; he solid arrows illusrae he raining progress on each gradien descen sep. When he gradien descen model his he high error wall in he objecive funcion, he gradien is pushed off o a far-away locaion on he decision surface. The clipping model produces he dashed line where i insead pulls back he error gradien o somewhere close o he original gradien landscape. To solve he problem of vanishing gradiens, we inroduce wo echniques. The firs echnique is ha insead of iniializing W (hh) randomly, sar off from an ideniy marix iniializaion. The second echnique is o use he Recified Linear Unis (ReLU) insead of he sigmoid funcion. The derivaive for he ReLU is eiher 0 or Figure 7: Gradien explosion clipping visualizaion

9 language models, rnn, gru and lsm 9 1. This way, gradiens would flow hrough he neurons whose derivaive is 1 wihou geing aenuaed while propagaing back hrough ime-seps. 2.5 Deep Bidirecional RNNs So far, we have focused on RNNs ha condiion on pas words o predic he nex word in he sequence. I is possible o make predicions based on fuure words by having he RNN model read hrough he corpus backwards. Irsoy e al. shows a bi-direcional deep neural nework; a each ime-sep,, his nework mainains wo hidden layers, one for he lef-o-righ propagaion and anoher for he righo-lef propagaion. To mainain wo hidden layers a any ime, his nework consumes wice as much memory space for is weigh and bias parameers. The final classificaion resul, ŷ, is generaed hrough combining he score resuls produced by boh RNN hidden layers. Figure 8 shows he bi-direcional nework archiecure, and Equaions 17 and 18 show he mahemaical formulaion behind seing up he bidirecional RNN hidden layer. The only difference beween hese wo relaionships is in he direcion of recursing hrough he corpus. Equaion 19 shows he classificaion relaionship used for predicing he nex word via summarizing pas and fuure word represenaions. Bidirecionaliy y h h! = f (W!"! x +V!" h! = f (W!" x +V!" y = g(u[h! ;h! ] h = f ( W x + V h 1 + b ) (17) h = f ( W x + V h +1 + b ) (18) x h = [h! ;h! ] now represens (summarizes) he pas and Figure 8: A bi-direcional RNN model around a single oken. ŷ = g(uh + c) = g(u[ h ; h ] + c) (19) RNNs can also be muli-layered. Figure 9 shows a muli-layer bidirecional RNN where each lower layer feeds he nex layer. As shown in his figure, in his nework archiecure, a ime-sep each inermediae neuron receives one se of parameers from he previous imesep (in he same RNN layer), and wo ses of parameers from he previous RNN hidden layer; one inpu comes from he lef-o-righ RNN and he oher from he righ-o-lef RNN. To consruc a Deep RNN wih L layers, he above relaionships are modified o he relaionships in Equaions 20 and 21 where he inpu o each inermediae neuron a level i is he oupu of he RNN a layer i 1 a he same ime-sep,. The oupu, ŷ, a each ime-sep is he resul of propagaing inpu parameers hrough all hidden layers (Equaion 22). h (i) = f ( W (i) h (i 1) + V (i) h (i) 1 + b (i) ) (20) Going Deep y h (3) h (2) h (1) x h! (i) = f (W!"! (i) h (i 1 h! (i) = f (W!" (i)h (i 1 y = g(u[h! (L)! ;h Each memory layer passes an inermediae seq Figure represenaion 9: A deep bi-direcional o he nex. RNN wih hree RNN layers.

10 language models, rnn, gru and lsm 10 h (i) = f ( W (i) h (i 1) + V (i) h (i) +1 + b (i) ) (21) ŷ = g(uh + c) = g(u[ h (L) ; h (L) ] + c) (22) 2.6 Applicaion: RNN Translaion Model Tradiional ranslaion models are quie complex; hey consis of numerous machine learning algorihms applied o differen sages of he language ranslaion pipeline. In his secion, we discuss he poenial for adoping RNNs as a replacemen o radiional ranslaion modules. Consider he RNN example model shown in Figure 10; here, he German phrase Ech dicke Kise is ranslaed o Awesome sauce. The firs hree hidden layer ime-seps encode he German language words ino some language word feaures (h 3 ). The las wo ime-seps decode h 3 ino English word oupus. Equaion 23 shows he relaionship for he Encoder sage and Equaions 24 and 25 show he equaion for he Decoder sage. h = φ(h 1, x ) = f (W (hh) h 1 + W (hx) x ) (23) h = φ(h 1 ) = f (W (hh) h 1 ) (24) h 1 h 2 h 3 W x 1 x 2 x 3 W Awesome sauce y 1 y 2 Figure 10: A RNN-based ranslaion model. The firs hree RNN hidden layers belong o he source language model encoder, and he las wo belong o he desinaion language model decoder. y = so f max(w (S) h ) (25) One may naively assume his RNN model along wih he crossenropy funcion shown in Equaion 26 can produce high-accuracy ranslaion resuls. In pracice, however, several exensions are o be added o he model o improve is ranslaion accuracy performance. max θ 1 N N log p θ (y (n) x (n) ) (26) n=1 Exension I: rain differen RNN weighs for encoding and decoding. This decouples he wo unis and allows for more accuracy predicion of each of he wo RNN modules. This means he φ() funcions in Equaions 23 and 24 would have differen W (hh) marices. Exension II: compue every hidden sae in he decoder using hree differen inpus: The previous hidden sae (sandard) Las hidden layer of he encoder (c = h T in Figure 11) Figure 11: Language model wih hree inpus o each decoder neuron: (h 1, c, y 1 )

11 language models, rnn, gru and lsm 11 Previous prediced oupu word, ŷ 1 Combining he above hree inpus ransforms he φ funcion in he decoder funcion of Equaion 24 o he one in Equaion 27. Figure 11 illusraes his model. h = φ(h 1, c, y 1 ) (27) Exension III: rain deep recurren neural neworks using muliple RNN layers as discussed earlier in his chaper. Deeper layers ofen improve predicion accuracy due o heir higher learning capaciy. Of course, his implies a large raining corpus mus be used o rain he model. Exension IV: rain bi-direcional encoders o improve accuracy similar o wha was discussed earlier in his chaper. Exension V: given a word sequence A B C in German whose ranslaion is X Y in English, insead of raining he RNN using A B C X Y, rain i using C B A X Y. The inuiion behind his echnique is ha A is more likely o be ranslaed o X. Thus, given he vanishing gradien problem discussed earlier, reversing he order of he inpu words can help reduce he error rae in generaing he oupu phrase. 3 Gaed Recurren Unis Beyond he exensions discussed so far, RNNs have been found o perform beer wih he use of more complex unis for acivaion. So far, we have discussed mehods ha ransiion from hidden sae h 1 o h using an affine ransformaion and a poin-wise nonlineariy. Here, we discuss he use of a gaed acivaion funcion hereby modifying he RNN archiecure. Wha moivaes his? Well, alhough RNNs can heoreically capure long-erm dependencies, hey are very hard o acually rain o do his. Gaed recurren unis are designed in a manner o have more persisen memory hereby making i easier for RNNs o capure long-erm dependencies. Le us see mahemaically how a GRU uses h 1 and x o generae he nex hidden sae h. We will hen dive ino he inuiion of his archiecure. z = σ(w (z) x + U (z) h 1 ) r = σ(w (r) x + U (r) h 1 ) h = anh(r Uh 1 + Wx ) h = (1 z ) h + z h 1 (Updae gae) (Rese gae) (New memory) (Hidden sae)

12 language models, rnn, gru and lsm 12 The above equaions can be hough of a GRU s four fundamenal operaional sages and hey have inuiive inerpreaions ha make his model much more inellecually saisfying (see Figure 12): 1. New memory generaion: A new memory h is he consolidaion of a new inpu word x wih he pas hidden sae h 1. Anhropomorphically, his sage is he one who knows he recipe of combining a newly observed word wih he pas hidden sae h 1 o summarize his new word in ligh of he conexual pas as he vecor h. 2. Rese Gae: The rese signal r is responsible for deermining how imporan h 1 is o he summarizaion h. The rese gae has he abiliy o compleely diminish pas hidden sae if i finds ha h 1 is irrelevan o he compuaion of he new memory. 3. Updae Gae: The updae signal z is responsible for deermining how much of h 1 should be carried forward o he nex sae. For insance, if z 1, hen h 1 is almos enirely copied ou o h. Conversely, if z 0, hen mosly he new memory h is forwarded o he nex hidden sae. 4. Hidden sae: The hidden sae h is finally generaed using he pas hidden inpu h 1 and he new memory generaed h wih he advice of he updae gae. I is imporan o noe ha o rain a GRU, we need o learn all he differen parameers: W, U, W (r), U (r), W (z), U (z). These follow he same backpropagaion procedure we have seen in he pas. Figure 12: The deailed inernals of a GRU

13 language models, rnn, gru and lsm 13 4 Long-Shor-Term-Memories Long-Shor-Term-Memories are anoher ype of complex acivaion uni ha differ a lile from GRUs. The moivaion for using hese is similar o hose for GRUs however he archiecure of such unis does differ. Le us firs ake a look a he mahemaical formulaion of LSTM unis before diving ino he inuiion behind his design: i = σ(w (i) x + U (i) h 1 ) f = σ(w ( f ) x + U ( f ) h 1 ) o = σ(w (o) x + U (o) h 1 ) c = anh(w (c) x + U (c) h 1 ) c = f c 1 + i c h = o anh(c ) (Inpu gae) (Forge gae) (Oupu/Exposure gae) (New memory cell) (Final memory cell) Figure 13: The deailed inernals of a LSTM

14 language models, rnn, gru and lsm 14 We can gain inuiion of he srucure of an LSTM by hinking of is archiecure as he following sages: 1. New memory generaion: This sage is analogous o he new memory generaion sage we saw in GRUs. We essenially use he inpu word x and he pas hidden sae h 1 o generae a new memory c which includes aspecs of he new word x (). 2. Inpu Gae: We see ha he new memory generaion sage doesn check if he new word is even imporan before generaing he new memory his is exacly he inpu gae s funcion. The inpu gae uses he inpu word and he pas hidden sae o deermine wheher or no he inpu is worh preserving and hus is used o gae he new memory. I hus produces i as an indicaor of his informaion. 3. Forge Gae: This gae is similar o he inpu gae excep ha i does no make a deerminaion of usefulness of he inpu word insead i makes an assessmen on wheher he pas memory cell is useful for he compuaion of he curren memory cell. Thus, he forge gae looks a he inpu word and he pas hidden sae and produces f. 4. Final memory generaion: This sage firs akes he advice of he forge gae f and accordingly forges he pas memory c 1. Similarly, i akes he advice of he inpu gae i and accordingly gaes he new memory c. I hen sums hese wo resuls o produce he final memory c. 5. Oupu/Exposure Gae: This is a gae ha does no explicily exis in GRUs. I s purpose is o separae he final memory from he hidden sae. The final memory c conains a lo of informaion ha is no necessarily required o be saved in he hidden sae. Hidden saes are used in every single gae of an LSTM and hus, his gae makes he assessmen regarding wha pars of he memory c needs o be exposed/presen in he hidden sae h. The signal i produces o indicae his is o and his is used o gae he poin-wise anh of he memory.

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oc 11, 2015 Learning o Process Naural Language in Big Daa Environmen Hang Li Noah s Ark Lab Huawei Technologies Par 2: Useful Deep Learning Tools Powerful Deep Learning Tools (Unsupervised

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Echocardiography Project and Finite Fourier Series

Echocardiography Project and Finite Fourier Series Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Chapter 7: Solving Trig Equations

Chapter 7: Solving Trig Equations Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes Half-Range Series 2.5 Inroducion In his Secion we address he following problem: Can we find a Fourier series expansion of a funcion defined over a finie inerval? Of course we recognise ha such a funcion

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Introduction to Probability and Statistics Slides 4 Chapter 4

Introduction to Probability and Statistics Slides 4 Chapter 4 Inroducion o Probabiliy and Saisics Slides 4 Chaper 4 Ammar M. Sarhan, asarhan@mahsa.dal.ca Deparmen of Mahemaics and Saisics, Dalhousie Universiy Fall Semeser 8 Dr. Ammar Sarhan Chaper 4 Coninuous Random

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial

More information

15. Vector Valued Functions

15. Vector Valued Functions 1. Vecor Valued Funcions Up o his poin, we have presened vecors wih consan componens, for example, 1, and,,4. However, we can allow he componens of a vecor o be funcions of a common variable. For example,

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4. PHY1 Elecriciy Topic 7 (Lecures 1 & 11) Elecric Circuis n his opic, we will cover: 1) Elecromoive Force (EMF) ) Series and parallel resisor combinaions 3) Kirchhoff s rules for circuis 4) Time dependence

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Mathcad Lecture #8 In-class Worksheet Curve Fitting and Interpolation

Mathcad Lecture #8 In-class Worksheet Curve Fitting and Interpolation Mahcad Lecure #8 In-class Workshee Curve Fiing and Inerpolaion A he end of his lecure, you will be able o: explain he difference beween curve fiing and inerpolaion decide wheher curve fiing or inerpolaion

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Analyze patterns and relationships. 3. Generate two numerical patterns using AC

Analyze patterns and relationships. 3. Generate two numerical patterns using AC envision ah 2.0 5h Grade ah Curriculum Quarer 1 Quarer 2 Quarer 3 Quarer 4 andards: =ajor =upporing =Addiional Firs 30 Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 andards: Operaions and Algebraic Thinking

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

Basic Circuit Elements Professor J R Lucas November 2001

Basic Circuit Elements Professor J R Lucas November 2001 Basic Circui Elemens - J ucas An elecrical circui is an inerconnecion of circui elemens. These circui elemens can be caegorised ino wo ypes, namely acive and passive elemens. Some Definiions/explanaions

More information

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

The Arcsine Distribution

The Arcsine Distribution The Arcsine Disribuion Chris H. Rycrof Ocober 6, 006 A common heme of he class has been ha he saisics of single walker are ofen very differen from hose of an ensemble of walkers. On he firs homework, we

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively: XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t) EECS 4 Spring 23 Lecure 2 EECS 4 Spring 23 Lecure 2 More igial Logic Gae delay and signal propagaion Clocked circui elemens (flip-flop) Wriing a word o memory Simplifying digial circuis: Karnaugh maps

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

Integration Over Manifolds with Variable Coordinate Density

Integration Over Manifolds with Variable Coordinate Density Inegraion Over Manifolds wih Variable Coordinae Densiy Absrac Chrisopher A. Lafore clafore@gmail.com In his paper, he inegraion of a funcion over a curved manifold is examined in he case where he curvaure

More information

Probabilistic learning

Probabilistic learning Probabilisic learning Charles Elkan November 8, 2012 Imporan: These lecure noes are based closely on noes wrien by Lawrence Saul. Tex may be copied direcly from his noes, or paraphrased. Also, hese ypese

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Isolated-word speech recognition using hidden Markov models

Isolated-word speech recognition using hidden Markov models Isolaed-word speech recogniion using hidden Markov models Håkon Sandsmark December 18, 21 1 Inroducion Speech recogniion is a challenging problem on which much work has been done he las decades. Some of

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Lab #2: Kinematics in 1-Dimension

Lab #2: Kinematics in 1-Dimension Reading Assignmen: Chaper 2, Secions 2-1 hrough 2-8 Lab #2: Kinemaics in 1-Dimension Inroducion: The sudy of moion is broken ino wo main areas of sudy kinemaics and dynamics. Kinemaics is he descripion

More information

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff

Laplace transfom: t-translation rule , Haynes Miller and Jeremy Orloff Laplace ransfom: -ranslaion rule 8.03, Haynes Miller and Jeremy Orloff Inroducory example Consider he sysem ẋ + 3x = f(, where f is he inpu and x he response. We know is uni impulse response is 0 for

More information

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits EEE25 ircui Analysis I Se 4: apaciors, Inducors, and Firs-Order inear ircuis Shahriar Mirabbasi Deparmen of Elecrical and ompuer Engineering Universiy of Briish olumbia shahriar@ece.ubc.ca Overview Passive

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Computer-Aided Analysis of Electronic Circuits Course Notes 3

Computer-Aided Analysis of Electronic Circuits Course Notes 3 Gheorghe Asachi Technical Universiy of Iasi Faculy of Elecronics, Telecommunicaions and Informaion Technologies Compuer-Aided Analysis of Elecronic Circuis Course Noes 3 Bachelor: Telecommunicaion Technologies

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Math 333 Problem Set #2 Solution 14 February 2003

Math 333 Problem Set #2 Solution 14 February 2003 Mah 333 Problem Se #2 Soluion 14 February 2003 A1. Solve he iniial value problem dy dx = x2 + e 3x ; 2y 4 y(0) = 1. Soluion: This is separable; we wrie 2y 4 dy = x 2 + e x dx and inegrae o ge The iniial

More information

Assignment 6. Tyler Shendruk December 6, 2010

Assignment 6. Tyler Shendruk December 6, 2010 Assignmen 6 Tyler Shendruk December 6, 1 1 Harden Problem 1 Le K be he coupling and h he exernal field in a 1D Ising model. From he lecures hese can be ransformed ino effecive coupling and fields K and

More information

Removing Useless Productions of a Context Free Grammar through Petri Net

Removing Useless Productions of a Context Free Grammar through Petri Net Journal of Compuer Science 3 (7): 494-498, 2007 ISSN 1549-3636 2007 Science Publicaions Removing Useless Producions of a Conex Free Grammar hrough Peri Ne Mansoor Al-A'ali and Ali A Khan Deparmen of Compuer

More information

RC, RL and RLC circuits

RC, RL and RLC circuits Name Dae Time o Complee h m Parner Course/ Secion / Grade RC, RL and RLC circuis Inroducion In his experimen we will invesigae he behavior of circuis conaining combinaions of resisors, capaciors, and inducors.

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Traveling Waves. Chapter Introduction

Traveling Waves. Chapter Introduction Chaper 4 Traveling Waves 4.1 Inroducion To dae, we have considered oscillaions, i.e., periodic, ofen harmonic, variaions of a physical characerisic of a sysem. The sysem a one ime is indisinguishable from

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Predator - Prey Model Trajectories and the nonlinear conservation law

Predator - Prey Model Trajectories and the nonlinear conservation law Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories

More information

= ( ) ) or a system of differential equations with continuous parametrization (T = R

= ( ) ) or a system of differential equations with continuous parametrization (T = R XIII. DIFFERENCE AND DIFFERENTIAL EQUATIONS Ofen funcions, or a sysem of funcion, are paramerized in erms of some variable, usually denoed as and inerpreed as ime. The variable is wrien as a funcion of

More information

18 Biological models with discrete time

18 Biological models with discrete time 8 Biological models wih discree ime The mos imporan applicaions, however, may be pedagogical. The elegan body of mahemaical heory peraining o linear sysems (Fourier analysis, orhogonal funcions, and so

More information

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is

( ) ( ) if t = t. It must satisfy the identity. So, bulkiness of the unit impulse (hyper)function is equal to 1. The defining characteristic is UNIT IMPULSE RESPONSE, UNIT STEP RESPONSE, STABILITY. Uni impulse funcion (Dirac dela funcion, dela funcion) rigorously defined is no sricly a funcion, bu disribuion (or measure), precise reamen requires

More information

LAPLACE TRANSFORM AND TRANSFER FUNCTION

LAPLACE TRANSFORM AND TRANSFER FUNCTION CHBE320 LECTURE V LAPLACE TRANSFORM AND TRANSFER FUNCTION Professor Dae Ryook Yang Spring 2018 Dep. of Chemical and Biological Engineering 5-1 Road Map of he Lecure V Laplace Transform and Transfer funcions

More information

!!"#"$%&#'()!"#&'(*%)+,&',-)./0)1-*23)

!!#$%&#'()!#&'(*%)+,&',-)./0)1-*23) "#"$%&#'()"#&'(*%)+,&',-)./)1-*) #$%&'()*+,&',-.%,/)*+,-&1*#$)()5*6$+$%*,7&*-'-&1*(,-&*6&,7.$%$+*&%'(*8$&',-,%'-&1*(,-&*6&,79*(&,%: ;..,*&1$&$.$%&'()*1$$.,'&',-9*(&,%)?%*,('&5

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information