arxiv: v1 [stat.ml] 24 May 2016 Abstract

Size: px
Start display at page:

Download "arxiv: v1 [stat.ml] 24 May 2016 Abstract"

Transcription

1 Sequenial Neural Models wih Sochasic Layers Marco Fraccaro Søren Kaae Sønderby Ulrich Paque Ole Winher Technical Universiy of Denmark Universiy of Copenhagen, Denmark arxiv: v1 [sa.ml] 24 May 2016 Absrac How can we efficienly propagae uncerainy in a laen sae represenaion wih recurren neural neworks? This paper inroduces sochasic recurren neural neworks which glue a deerminisic recurren neural nework and a sae space model ogeher o form a sochasic and sequenial neural generaive model. The clear separaion of deerminisic and sochasic layers allows a srucured variaional inference nework o rack he facorizaion of he model s poserior disribuion. By reaining boh he nonlinear recursive srucure of a recurren neural nework and averaging over he uncerainy in a laen pah, like a sae space model, we improve he sae of he ar resuls on he Blizzard and TIMIT speech modeling daa ses by a large margin, while achieving comparable performances o compeing mehods on polyphonic music modeling. 1 Inroducion Recurren neural neworks (RNNs) are able o represen long-erm dependencies in sequenial daa, by adaping and propagaing a deerminisic hidden (or laen) sae [6, 17]. There is recen evidence ha when complex sequences such as speech and music are modeled, he performances of RNNs can be dramaically improved when uncerainy is included in heir hidden saes [3, 4, 8, 12, 13, 16]. In his paper we add a new direcion o he explorer s map of reaing he hidden RNN saes as uncerain pahs, by including he world of sae space models (SSMs) as an RNN layer. By cleanly delineaing a SSM layer, cerain independence properies of variables arise, which are beneficial for making efficien poserior inferences. The resul is a generaive model for sequenial daa, wih a maching inference nework ha has is roos in variaional auo-encoders (VAEs). SSMs can be viewed as a probabilisic exension of RNNs, where he hidden saes are assumed o be random variables. Alhough SSMs have an illusrious hisory [27], heir sochasiciy has limied heir widespread use in he deep learning communiy, as inference can only be exac for wo relaively simple classes of SSMs, namely hidden Markov models and linear Gaussian models, neiher of which are well-suied o modeling long-erm dependencies and complex probabiliy disribuions over high-dimensional sequences. On he oher hand modern RNNs rely on gaed nonlineariies such as long shor-erm memory (LSTM) [17] cells or gaed recurren unis (GRUs) [7], ha le he deerminisic hidden sae of he RNN ac as an inernal memory for he model. This inernal memory seems fundamenal o capuring complex relaionships in he daa hrough a saisical model. This paper inroduces he sochasic recurren neural nework (SRNN) in Secion 3. SRNNs combine he gaed acivaion mechanism of RNNs wih he sochasic saes of SSMs, and are formed by sacking a RNN and a nonlinear SSM. The sae ransiions of he SSM are nonlinear and are paramerized by a neural nework ha also depend on he corresponding RNN hidden sae. The SSM can herefore uilize long-erm informaion capured by he RNN. We use recen advances in variaional inference o efficienly approximae he inracable poserior disribuion over he laen saes wih an inference nework [21, 26]. The form of our variaional Now a Google DeepMind.

2 x 1 x x +1 x 1 x x +1 d 1 d d +1 z 1 z z +1 u 1 u u +1 u 1 u u +1 (a) RNN (b) SSM Figure 1: Graphical models o generae x 1:T wih a recurren neural nework (RNN) and a sae space model (SSM). Diamond-shaped unis are used for deerminisic saes, while circles are used for sochasic ones. For sequence generaion, like in a language model, one can use u = x 1. approximaion is inspired by he independence properies of he rue poserior disribuion over he laen saes of he model, and allows us o improve inference by convenienly using he informaion coming from he whole sequence a each ime sep. The poserior disribuion over he laen saes of he SRNN is highly non-saionary while we are learning he parameers of he model. To furher improve he variaional approximaion, we show ha we can consruc he inference nework so ha i only needs o learn how o compue he mean of he variaional approximaion a each ime sep given he mean of he predicive prior disribuion. In Secion 4 we es he performances of SRNN on speech and polyphonic music modelling asks. SRNN improves he sae of he ar resuls on he Blizzard and TIMIT speech daa ses by a large margin, and performs comparably o compeing models on polyphonic music modeling. Finally, oher models ha exend RNNs by adding sochasic unis will be reviewed and compared o SRNN in Secion 5. 2 Recurren Neural Neworks and Sae Space Models Recurren neural neworks and sae space models are widely used o model emporal sequences of vecors x 1:T = (x 1, x 2,..., x T ) ha possibly depend on inpus u 1:T = (u 1, u 2,..., u T ). Boh models res on he assumpion ha he sequence x 1: of observaions up o ime can be summarized by a laen sae d or z, which is deerminisically deermined (d in a RNN) or reaed as a random variable which is averaged away (z in a SSM). The difference in reamen of he laen sae has radiionally led o vasly differen models: RNNs recursively compue d = f(d 1, u ) using a parameerized nonlinear funcion f, like a LSTM cell or a GRU. The RNN observaion probabiliies p(x d ) are equally modeled wih nonlinear funcions. SSMs, like linear Gaussian or hidden Markov models, explicily model uncerainy in he laen process hrough z 1:T. Parameer inference in a SSM require z 1:T o be averaged ou, and hence p(z z 1, u ) and p(x z ) are ofen resriced o he exponenial family of disribuions o make many exising approximae inference algorihms applicable. On he oher hand, averaging a funcion over he deerminisic pah d 1:T in a RNN is a rivial operaion. The sriking similariy in facorizaion beween hese models is illusraed in Figures 1a and 1b. Can we combine he bes of boh worlds, and make he sochasic sae ransiions of SSMs nonlinear whils keeping he gaed acivaion mechanism of RNNs? Below, we show ha a more expressive model can be creaed by sacking a SSM on op of a RNN, and ha by keeping hem layered, he funcional form of he rue poserior disribuion over z 1:T guides he design of a backwards-recursive srucured variaional approximaion. 3 Sochasic Recurren Neural Neworks We define a SRNN as a generaive model p θ by emporally inerlocking a SSM wih a RNN, as illusraed in Figure 2a. The join probabiliy of a single sequence and is laen saes, assuming knowledge of he saring saes z 0 = 0 and d 0 = 0, and inpus u 1:T, facorizes as 2

3 x 1 x x +1 x 1 x x +1 z 1 z z +1 z 1 z z +1 d 1 d d +1 a 1 a a +1 u 1 u u +1 d 1 d d +1 (a) Generaive model p θ (b) Inference nework q φ Figure 2: A SRNN as a generaive model p θ for a sequence x 1:T. Poserior inference of z 1:T and d 1:T is done hrough an inference nework q φ, which uses a backwards-recurren sae a o approximae he nonlinear dependence of z on fuure observaions x :T and saes d :T ; see Equaion (7). p θ (x 1:T, z 1:T, d 1:T u 1:T, z 0, d 0 ) = p θx (x 1:T z 1:T, d 1:T ) p θz (z 1:T d 1:T, z 0 ) p θd (d 1:T u 1:T, d 0 ) T = p θx (x z, d ) p θz (z z 1, d ) p θd (d d 1, u ). (1) =1 The SSM and RNN are furher ied wih skip-connecions from d o x. The join densiy in (1) is parameerized by θ = {θ x, θ z, θ d }, which will be adaped ogeher wih parameers φ of a so-called inference nework q φ o bes model N independenly observed daa sequences {x i 1:T i } N i=1 ha are described by he log marginal likelihood or evidence ( L(θ) = log p θ {x i 1:Ti } {u i 1:T i, z i 0, d i 0} N ) i=1 = log p θ (x i 1:T i u i 1:T i, z i 0, d i 0) = L i (θ). (2) i i Throughou he paper, we omi superscrip i when only one sequence is referred o, or when i is clear from he conex. In each log likelihood erm L i (θ) in (2), he laen saes z 1:T and d 1:T were averaged ou of (1). Inegraing ou d 1:T is done by simply subsiuing is deerminisically obained value, bu z 1:T requires more care, and we reurn o i in Secion 3.2. Following Figure 2a, he saes d 1:T are deermined from d 0 and u 1:T hrough he recursion d = f θd (d 1, u ). In our implemenaion f θd is a GRU nework wih parameers θ d. For laer convenience we denoe he value of d 1:T, as compued by applicaion of f θd, by d 1:T. Therefore p θd (d d 1, u ) = δ(d d ), i.e. d 1:T follows a dela disribuion cenered a d 1:T. Unlike he VRNN [8], z direcly depends on z 1, as i does in a SSM, via p θz (z z 1, d ). This spli makes a clear separaion beween he deerminisic and sochasic pars of p θ ; he RNN remains enirely deerminisic and is recurren unis do no depend on noisy samples of z, while he prior over z follows he Markov srucure of SSMs. The spli allows us o laer mimic he srucure of he poserior disribuion over z 1:T and d 1:T in is approximaion q φ. We le he prior ransiion disribuion p θz (z z 1, d ) = N (z ; µ (p), v (p) ) be a Gaussian wih a diagonal covariance marix, whose mean and log-variance are parameerized by neural neworks ha depend on z 1 and d, µ (p) = NN (p) 1 (z 1, d ), log v (p) = NN (p) 2 (z 1, d ), (3) where NN denoes a neural nework. Parameers θ z denoe all weighs of NN (p) 1 and NN (p) 2, which are wo-layer feed-forward neworks in our implemenaion. Similarly, he parameers of he emission disribuion p θx (x z, d ) depend on z and d hrough a similar neural nework ha is parameerized by θ x. 3.1 Variaional inference for he SRNN The sochasic variables z 1:T of he nonlinear SSM canno be analyically inegraed ou o obain L(θ) in (2). Insead of maximizing L wih respec o θ, we maximize a variaional evidence lower 3

4 bound (ELBO) F(θ, φ) = i F i(θ, φ) L(θ) wih respec o boh θ and he variaional parameers φ [18]. The ELBO is a sum of lower bounds F i (θ, φ) L i (θ), one for each sequence i, F i (θ, φ) = q φ (z 1:T, d 1:T x 1:T, A) log p θ(x 1:T, z 1:T, d 1:T A) q φ (z 1:T, d 1:T x 1:T, A) dz 1:T dd 1:T, (4) where A = {u 1:T, z 0, d 0 } is a noaional shorhand. Each sequence s approximaion q φ shares parameers φ wih all ohers, o form he auo-encoding variaional Bayes inference nework or variaional auo encoder (VAE) [21, 26] shown in Figure 2b. Maximizing F(θ, φ) which we call raining he neural nework archiecure wih parameers θ and φ is done by sochasic gradien ascen, and in doing so, boh he poserior and is approximaion q φ change simulaneously. All he inracable expecaions in (4) would ypically be approximaed by sampling, using he reparameerizaion rick [21, 26] or conrol variaes [24] o obain low-variance esimaors of is gradiens. We use he reparameerizaion rick in our implemenaion. Ieraively maximizing F over θ and φ separaely would yield an expecaion maximizaion-ype algorihm, which has formed a backbone of saisical modeling for many decades [9]. The ighness of he bound depends on how well we can approximae he i = 1,..., N facors p θ (z i 1:T i, d i 1:T i x i 1:T i, A i ) ha consiue he rue poserior over all laen variables wih heir corresponding facors q φ (z i 1:T i, d i 1:T i x i 1:T i, A i ). In wha follows, we show how q φ could be judiciously srucured o mach he poserior facors. We add iniial srucure o q φ by noicing ha he prior p θd (d 1:T u 1:T, d 0 ) in he generaive model is a dela funcion over d 1:T, and so is he poserior p θ (d 1:T x 1:T, u 1:T, d 0 ). Consequenly, we le he inference nework use exacly he same deerminisic sae seing d 1:T as ha of he generaive model, and we decompose i as q φ (z 1:T, d 1:T x 1:T, u 1:T, z 0, d 0 ) = q φ (z 1:T d 1:T, x 1:T, z 0 ) q(d 1:T x 1:T, u 1:T, d 0 ). (5) }{{} = p θd (d 1:T u 1:T,d 0) This choice exacly approximaes one dela-funcion by iself, and simplifies he ELBO by leing hem cancel ou. By furher aking he ouer average in (4), one obains F i (θ, φ) = E qφ [log p θ (x 1:T z 1:T, d ] ( 1:T ) KL q φ (z 1:T d 1:T, x 1:T, z 0 ) ) p θ (z 1:T d 1:T, z 0 ), (6) which sill depends on θ d, u 1:T and d 0 via d 1:T. The firs erm is an expeced log likelihood under q φ (z 1:T d 1:T, x 1:T, z 0 ), while KL denoes he Kullback-Leibler divergence beween wo disribuions. Having saed he second facor in (5), we now urn our aenion o parameerizing he firs facor in (5) o resemble is poserior equivalen, by exploiing he emporal srucure of p θ. 3.2 Exploiing he emporal srucure The rue poserior disribuion of he sochasic saes z 1:T, given boh he daa and he deerminisic saes d 1:T, facorizes as p θ (z 1:T d 1:T, x 1:T, u 1:T, z 0 ) = p θ(z z 1, d :T, x :T ). This can be verified by considering he condiional independence properies of he graphical model in Figure 2a using d-separaion [14]. This shows ha, knowing z 1, he poserior disribuion of z does no depend on he pas oupus and deerminisic saes, bu only on he presen and fuure ones; his was also noed in [22]. Insead of facorizing q φ as a mean-field approximaion across ime seps, we keep he srucured form of he poserior facors, including z s dependence on z 1, in he variaional approximaion q φ (z 1:T d 1:T, x 1:T, z 0 ) = q φ (z z 1, d :T, x :T ) = q φz (z z 1, a = g φa (a +1, [d, x ])), (7) where [d, x ] is he concaenaion of he vecors d and x. The graphical model for he inference nework is shown in Figure 2b. Apar from he direc dependence of he poserior approximaion a ime on boh d :T and x :T, he disribuion also depends on d 1: 1 and x 1: 1 hrough z 1. We mimic each poserior facor s nonlinear long-erm dependence on d :T and x :T hrough a backwardsrecurren funcion g φa, shown in (7), which we will reurn o in greaer deail in Secion 3.3. The inference nework in Figure 2b is herefore parameerized by φ = {φ z, φ a } and θ d. In (7) all ime seps are aken ino accoun when consrucing he variaional approximaion a ime ; his can herefore be seen as a smoohing problem. In our experimens we also consider filering, 4

5 where only he informaion up o ime is used o define q φ (z z 1, d, x ). As he parameers φ are shared across ime seps, we can easily handle sequences wih variable lengh in boh cases. As boh he generaive model and inference nework facorize over ime seps in (1) and (7), he ELBO in (6) separaes as a sum over he ime seps F i (θ, φ) = [ E q φ (z 1) E qφ (z z 1, d :T,x :T )[ log pθ (x z, d ) ] + ( KL q φ (z z 1, d :T, x :T ) p θ (z z 1, d )] ), (8) where q φ (z 1) denoes he marginal disribuion of z 1 in he variaional approximaion o he poserior q φ (z 1: 1 d 1:T, x 1:T, z 0 ), given by [ ] qφ(z 1 ) = q φ (z 1: 1 d 1:T, x 1:T, z 0 ) dz 1: 2 = E q φ (z 2) q φ (z 1 z 2, d 1:T, x 1:T ). (9) We can inerpre (9) as having a VAE a each ime sep, wih he VAE being condiioned on he pas hrough he sochasic variable z 1. To compue (8), he dependence on z 1 needs o be inegraed ou, using our poserior knowledge a ime 1 which is given by qφ (z 1). We approximae he ouer expecaion in (8) using a Mone Carlo esimae, as samples from qφ (z 1) can be efficienly obained by ancesral sampling. The sequenial formulaion of he inference model in (7) allows such samples o be drawn and reused, as given a sample z (s) 2 from q φ (z 2), a sample z (s) 1 from q φ (z 1 z (s) 2, d 1:T, x 1:T ) will be disribued according o qφ (z 1). 3.3 Parameerizaion of he inference nework The variaional disribuion q φ (z z 1, d :T, x :T ) needs o approximae he dependence of he rue poserior p θ (z z 1, d :T, x :T ) on d :T and x :T, and as alluded o in (7), his is done by running a RNN wih inpus d :T and x :T backwards in ime. Specifically, we iniialize he hidden sae of he backwards-recursive RNN in Figure 2b as a T +1 = 0, and recursively compue a = g φa (a +1, [ d, x ]). The funcion g φa represens a recurren neural nework wih, for example, LSTM or GRU unis. Each sequence s variaional approximaion facorizes over ime wih q φ (z 1:T d 1:T, x 1:T, z 0 ) = q φ z (z z 1, a ), as shown in (7). We le q φz (z z 1, a ) be a Gaussian wih diagonal covariance, whose mean and he log-variance are parameerized wih φ z as µ (q) = NN (q) 1 (z 1, a ), log v (q) = NN (q) 2 (z 1, a ). (10) Insead of smoohing, we can also do filering by using a neural nework o approximae he dependence of he rue poserior p θ (z z 1, d, x ) on d and x, hrough for insance a = NN (a) (d, x ). Improving he poserior approximaion. In our experimens we found ha during raining, he parameerizaion inroduced in (10) can lead o small values of he KL erm KL(q φ (z z 1, a ) p θ (z z 1, d )) in he ELBO in (8). This happens when g φ in he inference nework does no rely on he informaion propagaed back from fuure oupus in a, bu i is mosly using he hidden sae d o imiae he behavior of he prior. The inference nework could herefore ge suck by rying o opimize he ELBO hrough sampling from he prior of he model, making he variaional approximaion o he poserior useless. To overcome his issue, we direcly include some knowledge of he predicive prior dynamics in he parameerizaion of he inference nework, using our approximaion of he poserior disribuion qφ (z 1) over he previous laen saes. In he spiri of sequenial Mone Carlo mehods [11], we improve he parameerizaion of q φ (z z 1, a ) by using qφ (z 1) from (9). As we are consrucing he variaional disribuion sequenially, we approximae he predicive prior mean, i.e. our bes guess on he prior dynamics of z, as µ (p) = NN (p) 1 (z 1, d ) p(z 1 x 1:T ) dz 1 NN (p) 1 (z 1, d ) qφ(z 1 ) dz 1, (11) where we used he parameerizaion of he prior disribuion in (3). We esimae he inegral required o compue µ (p) by reusing he samples ha were needed for he Mone Carlo esimae of he ELBO 5

6 in (8). This predicive prior mean can hen be used in he parameerizaion of he mean of he variaional approximaion q φ (z z 1, a ), µ (q) = µ (p) and we refer o his parameerizaion as Res q in he resuls in Secion 4. Raher han direcly learning µ (q), we learn he residual beween µ (p) and µ (q). I is sraighforward o show ha wih his parameerizaion he KL-erm in (8) will no depend on µ (p), bu only NN (q) 1 (z 1, a ). Learning he residual improves inference, making i seemingly easier for he inference nework o rack changes in he generaive model while he model is rained, as i will only have o learn how o correc he predicive prior dynamics by using he informaion coming from d :T and x :T. We did no see any improvemen in resuls by parameerizing log v (q) in a similar way. The inference procedure of + NN (q) 1 (z 1, a ), (12) Algorihm 1 Inference of SRNN wih Res q parameerizaion from (12). 1: inpus: d 1:T and a 1:T 2: Iniialize z 0 3: for = 1 o T do 4: µ (p) = NN (p) 1 (z 1, d ) 5: µ (q) = µ (p) + NN (q) 1 (z 1, a ) 6: log v (q) = NN (q) 2 (z 1, a ) 7: z N (z ; µ (q), v (q) ) 8: end for SRNN wih Res q parameerizaion for one sequence is summarized in Algorihm 1. 4 Resuls In his secion he SRNN is evaluaed on he modeling of speech and polyphonic music daa, as hey have shown o be difficul o model wihou a good represenaion of he uncerainy in he laen saes [3, 8, 12, 13, 16]. We es SRNN on he Blizzard [19] and TIMIT raw audio daa ses (Table 1) used in [8]. The preprocessing of he daa ses and he esing performance measures are idenical o hose repored in [8]. Blizzard is a daase of 300 hours of English, spoken by a single female speaker. TIMIT is a daase of 6300 English senences read by 630 speakers. As done in [8], for Blizzard we repor he average log-likelihood for half-second sequences and for TIMIT we repor he average log likelihood per sequence for he es se sequences. Noe ha he sequences in he TIMIT es se are on average 3.1s long, and herefore 6 imes longer han hose in Blizzard. For he raw audio daases we use a fully facorized Gaussian oupu disribuion. Addiionally, we es SRNN for modeling sequences of polyphonic music (Table 2), using he four daa ses of MIDI songs inroduced in [4]. Each daa se conains more han 7 hours of polyphonic music of varying complexiy: folk unes (Noingham daa se), he four-par chorales by J. S. Bach (JSB chorales), orchesral music (MuseDaa) and classical piano music (Piano-midi.de). For polyphonic music we use a Bernoulli oupu disribuion o model he binary sequences of piano noes. All models where implemened using Theano [2], Lasagne [10] and Parmesan 2. Training using a NVIDIA Tian X GPU ook around 1.5 hours for TIMIT, 18 hours for Blizzard, less han 15 minues for he JSB chorales and Piano-midi.de daa ses, and around 30 minues for he Noingham and MuseDaa daa ses. To reduce he compuaional requiremens we use only 1 sample o approximae all he inracable expecaions in he ELBO (noice ha he KL erm can be compued analyically). Furher implemenaion and experimenal deails can be found in he Supplemenary Maerial. Blizzard and TIMIT. Table 1 compares he average log-likelihood per es sequence of SRNN o he resuls from [8]. For RNNs and VRNNs he auhors of [8] es wo differen oupu disribuions, namely a Gaussian disribuion (Gauss) and a Gaussian Mixure Model (GMM). VRNN-I differs from he VRNN in ha he prior over he laen variables is independen across ime seps, and i is herefore similar o STORN [3]. For SRNN we compare he smoohing and filering performance (denoed as smooh and fil in Table 1), boh wih he residual erm in (12) and wihou i in (10) (denoed as Res q if presen). We prefer o only repor he more conservaive evidence lower bound for SRNN, as he approximaion of he log-likelihood using sandard imporance sampling is known o be difficul o compue accuraely in he sequenial seing [11]. We see from Table 1 ha SRNN ouperforms all he compeing mehods for speech modeling. As he es sequences in TIMIT are on average more han 6 imes longer han he ones for Blizzard, he resuls obained wih SRNN for TIMIT are in line wih hose obained for Blizzard. The VRNN, which performs well when he voice 2 hps://gihub.com/casperkaae/parmesan. The code for SRNN will be made available online. 6

7 Models Blizzard TIMIT SRNN (smooh+res q) SRNN (smooh) SRNN (fil+res q ) SRNN (fil) VRNN-GMM VRNN-Gauss VRNN-I-Gauss RNN-GMM RNN-Gauss Table 1: Average log-likelihood per sequence on he es ses. For TIMIT he average es se lengh is 3.1s, while he Blizzard sequences are all 0.5s long. The non-srnn resuls are repored as in [8]. Smooh: g φa is a GRU running backwards; fil: g φa is a feed-forward nework; Res q : parameerizaion wih residual in (12). Avg. KL(q p) Raw signal Recon. µ Recon. logσ 2 Example 1 Example Seconds Seconds Figure 3: Visualizaion of he average KL erm and reconsrucions of he oupu mean and log-variance for wo examples from he Blizzard es se. Models Noingham JSB chorales MuseDaa Piano-midi.de SRNN (smooh+res q ) TSBN NASMC STORN RNN-NADE RNN Table 2: Average log-likelihood on he es ses. The TSBN resuls are from [13], NASMC from [16], STORN from [3], RNN-NADE and RNN from [4]. of he single speaker from Blizzard is modeled, seems o encouner difficulies when modeling he 630 speakers in he TIMIT daa se. As expeced, for SRNN he variaional approximaion ha is obained when fuure informaion is also used (smoohing) is beer han he one obained by filering. Learning he residual beween he prior mean and he mean of he variaional approximaion, given in (12), furher improves he performance in 3 ou of 4 cases. In he firs wo lines of Figure 3 we plo wo raw signals from he Blizzard es se and he average KL erm beween he variaional approximaion and he prior disribuion. We see ha he KL erm increases whenever here is a ransiion in he raw audio signal, meaning ha he inference nework is using he informaion coming from he oupu symbols o improve inference. Finally, he reconsrucions of he oupu mean and log-variance in he las wo lines of Figure 3 look consisen wih he original signal. Polyphonic music. Table 2 compares he average log-likelihood on he es ses obained wih SRNN and he models inroduced in [3, 4, 13, 16]. As done for he speech daa, we prefer o repor he more conservaive esimae of he ELBO in Table 2, raher han approximaing he log-likelihood wih imporance sampling as some of he oher mehods do. We see ha SRNN performs comparably o oher sae of he ar mehods in all four daa ses. We repor he resuls using smoohing and learning he residual beween he mean of he predicive prior and he one of he variaional approximaion, bu he performances using filering and learning direcly he mean of he variaional approximaion are now similar. We believe ha his is due o he small amoun of daa and he fac ha modeling MIDI music is much simpler han modeling raw speech signals. 7

8 5 Relaed work A number of works have exended RNNs wih sochasic unis o model moion capure, speech and music daa [3, 8, 12, 13, 16]. The performances of hese models are highly dependen on how he dependence among sochasic unis is modeled over ime, on he ype of ineracion beween sochasic unis and deerminisic ones, and on he procedure ha is used o evaluae he ypically inracable log likelihood. Figure 4 highlighs how SRNN differs from some of hese works. In STORN [3] (Figure 4a) and DRAW [15] he sochasic unis a each ime sep have an isoropic Gaussian prior and are independen beween ime seps. The sochasic unis are used as an inpu o he deerminisic unis in a RNN. As in our work, he reparamerizaion rick [21, 26] is used o opimize an ELBO. The auhors of he VRNN [8] (Figure x x x 4b) noe ha i is beneficial o add informaion coming from he pas z z 1 z z 1 z saes o he prior over laen variables z.the VRNN les he prior p θz (z d ) over he sochasic unis d 1 d d 1 d depend on he deerminisic unis d, which in urn depend on boh he deerminisic and he sochasic unis a u u u he previous ime sep hrough he recursion d = f(d 1, z 1, u ). (a) STORN (b) VRNN (c) Deep Kalman Filer The SRNN differs by clearly separaing he deerminisic and sochasic Figure 4: Generaive models of x 1:T ha are relaed o SRNN. par, as shown in Figure 2a. The separaion of deerminisic and sochasic unis allows us o improve he poserior approximaion by doing smoohing, as he sochasic unis sill depend on each oher when we condiion on d 1:T. In he VRNN, on he oher hand, he sochasic unis are condiionally independen given he saes d 1:T. Because he inference and generaive neworks in he VRNN share he deerminisic unis, he variaional approximaion would no improve by making i dependen on he fuure hrough a, when calculaed wih a backward GRU, as we do in our model. Unlike STORN, DRAW and VRNN, he SRNN separaes he noisy sochasic unis from he deerminisic ones, forming an enire layer of inerconneced sochasic unis. We found in pracice ha his gave beer performance and was easier o rain. The works by [1, 22] (Figure 4c) show ha i is possible o improve inference in SSMs by using ideas from VAEs, similar o wha is done in he sochasic par (he op layer) of SRNN. Towards he periphery of relaed works, [16] approximaes he log likelihood of a SSM wih sequenial Mone Carlo, by learning flexible proposal disribuions parameerized by deep neworks, while [13] uses a recurren model wih discree sochasic unis ha is opimized using he NVIL algorihm [23]. 6 Conclusion This work has shown how o exend he modeling capabiliies of recurren neural neworks by combining hem wih nonlinear sae space models. Inspired by he independence properies of he inracable rue poserior disribuion over he laen saes, we designed an inference nework in a principled way. The variaional approximaion for he sochasic layer was improved by using he informaion coming from he whole sequence and by using he Res q parameerizaion o help he inference nework o rack he non-saionary poserior. SRNN achieves sae of he ar performances in he Blizzard and TIMIT speech daa se, and performs comparably o compeing mehods for polyphonic music modeling. Acknowledgemens We hank Casper Kaae Sønderby and Lars Maaløe for many fruiful discussions, and NVIDIA Corporaion for he donaion of TITAN X and Tesla K40 GPUs. Marco Fraccaro is suppored by Microsof Research hrough is PhD Scholarship Programme. 8

9 References [1] E. Archer, I. M. Park, L. Buesing, J. Cunningham, and L. Paninski. Black box variaional inference for sae space models. arxiv: , [2] F. Basien, P. Lamblin, R. Pascanu, J. Bergsra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio. Theano: new feaures and speed improvemens. arxiv: , [3] J. Bayer and C. Osendorfer. Learning sochasic recurren neworks. arxiv: , [4] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincen. Modeling emporal dependencies in highdimensional sequences: Applicaion o polyphonic music generaion and ranscripion. arxiv: , [5] S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio. Generaing senences from a coninuous space. arxiv: , [6] K. Cho, B. Van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase represenaions using RNN encoder decoder for saisical machine ranslaion. In EMNLP, pages , [7] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluaion of gaed recurren neural neworks on sequence modeling. arxiv: , [8] J. Chung, K. Kasner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio. A recurren laen variable model for sequenial daa. In NIPS, pages , [9] A. P. Dempser, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplee daa via he EM algorihm. Journal of he Royal Saisical Sociey, Series B, 39(1), [10] S. Dieleman, J. Schlüer, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri, E. Baenberg, and A. van den Oord. Lasagne: Firs release, [11] A. Douce, N. de Freias, and N. Gordon. An inroducion o sequenial Mone Carlo mehods. In Sequenial Mone Carlo Mehods in Pracice, Saisics for Engineering and Informaion Science [12] O. Fabius and J. R. van Amersfoor. Variaional recurren auo-encoders. arxiv: , [13] Z. Gan, C. Li, R. Henao, D. E. Carlson, and L. Carin. Deep emporal sigmoid belief neworks for sequence modeling. In NIPS, pages , [14] D. Geiger, T. Verma, and J. Pearl. Idenifying independence in Bayesian neworks. Neworks, 20: , [15] K. Gregor, I. Danihelka, A. Graves, and D. Wiersra. DRAW: A recurren neural nework for image generaion. In ICML, [16] S. Gu, Z. Ghahramani, and R. E. Turner. Neural adapive sequenial Mone Carlo. In NIPS, pages , [17] S. Hochreier and J. Schmidhuber. Long shor-erm memory. Neural Compuaion, 9(8): , Nov [18] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An inroducion o variaional mehods for graphical models. Machine Learning, 37(2): , [19] S. King and V. Karaiskos. The Blizzard challenge In The Ninh Annual Blizzard Challenge, [20] D. Kingma and J. Ba. Adam: A mehod for sochasic opimizaion. arxiv: , [21] D. Kingma and M. Welling. Auo-encoding variaional Bayes. In ICLR, [22] R. G. Krishnan, U. Shali, and D. Sonag. Deep Kalman filers. arxiv: , [23] A. Mnih and K. Gregor. Neural variaional inference and learning in belief neworks. arxiv: , [24] J. W. Paisley, D. M. Blei, and M. I. Jordan. Variaional Bayesian inference wih sochasic search. In ICML,

10 [25] T. Raiko, H. Valpola, M. Harva, and J. Karhunen. Building blocks for variaional Bayesian learning of laen variable models. Journal of Machine Learning Research, 8: , [26] D. J. Rezende, S. Mohamed, and D. Wiersra. Sochasic backpropagaion and approximae inference in deep generaive models. In ICML, pages , [27] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Compuaion, 11(2):305 45, [28] C. K. Sønderby, T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winher. How o rain deep variaional auoencoders and probabilisic ladder neworks. arxiv: , A Experimenal seup A.1 Blizzard and TIMIT The sampling rae is 16KHz and he raw audio signal is normalized using he global mean and sandard deviaion of he raning se. We spli he raw audio signals in chunks of 2 seconds. The waveforms are hen divided ino non-overlapping vecors of size 200. The RNN hus runs for 160 seps 3. The model is rained o predic he nex vecor (x ) given he curren one (u ). During raining we use backpropagaion hrough ime (BPTT) for 0.5 seconds, i.e we have 4 updaes for each 2 seconds of audio. For he firs 0.5 second we iniialize hidden unis wih zeros and for he subsequen 3 chunks we use he previous hidden saes as iniializaion. For Blizzard we spli he daa using 90% for raining, 5% for validaion and 5% for esing. For esing we repor he average log-likelihood per 0.5s sequences. For TIMIT we use he predefined es se for esing and spli he res of he daa ino 95% for raining and 5% for validaion. The raining and esing seup are idenical o he ones for Blizzard. For TIMIT he es sequences have variable lengh and are on average 3.1s, i.e. more han 6 imes longer han Blizzard. We model he oupu using a fully facorized Gaussian disribuion for p θx (x z, d ). The deerminisic RNNs use GRUs [7], wih 2048 unis for Blizzard and 1024 unis for TIMIT. In boh cases, z is a 256-dimensional vecor. All he neural neworks have 2 layers, wih 1024 unis for Blizzard and 512 for TIMIT, and use leaky recified nonlineariies wih leakiness 1 3 and clipped a ±3. In boh generaive and inference models we share a neural nework o exrac feaures from he raw audio signal. The sizes of he models were chosen o roughly mach he number of parameers used in [8]. In all experimens i was fundamenal o gradually inroduce he KL erm in he ELBO, as shown in [5, 28, 25]. We herefore muliply a emperaure β o he KL erm, i.e. βkl, and linearly increase β from 0.2 o 1 in he beginning of raining (for Blizzard we increase i by afer each updae, while for TIMIT by ). In boh daa ses we used he ADAM opimizer [20]. For Blizzard we use a learning rae of and bach size of 128, for TIMIT hey are and 64 respecively. A.2 Polyphonic music We use he same model archiecure as in Secion 4, excep for he oupu Bernoulli variables used o model he acive noes. We reduced he number of parameers in he model o 300 deerminisic hidden unis for he GRU neworks, and 100 sochasic unis whose disribuions are parameerized wih neural neworks wih 1 layer of 500 unis. 3 2s 16Khz / 200 =

Sequential Neural Models with Stochastic Layers

Sequential Neural Models with Stochastic Layers Sequenial Neural Models wih Sochasic Layers Marco Fraccaro Søren Kaae Sønderby Ulrich Paque * Ole Winher Technical Universiy of Denmark Universiy of Copenhagen * Google DeepMind Absrac How can we efficienly

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Variational Structured Stochastic Network

Variational Structured Stochastic Network Variaional Srucured Sochasic Nework Hao Liu * 1 Xinyi Yang * Zenglin Xu Absrac High dimensional sequenial daa exhibis complex srucure, a successful generaive model for such daa mus involve highly dependen,

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

An EM based training algorithm for recurrent neural networks

An EM based training algorithm for recurrent neural networks An EM based raining algorihm for recurren neural neworks Jan Unkelbach, Sun Yi, and Jürgen Schmidhuber IDSIA,Galleria 2, 6928 Manno, Swizerland {jan.unkelbach,yi,juergen}@idsia.ch hp://www.idsia.ch Absrac.

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

A variational radial basis function approximation for diffusion processes.

A variational radial basis function approximation for diffusion processes. A variaional radial basis funcion approximaion for diffusion processes. Michail D. Vreas, Dan Cornford and Yuan Shen {vreasm, d.cornford, y.shen}@ason.ac.uk Ason Universiy, Birmingham, UK hp://www.ncrg.ason.ac.uk

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Comparing Interpretable Inference Models for Videos of Physical Motion

Comparing Interpretable Inference Models for Videos of Physical Motion s Symposium on Advances in Approximae Bayesian Inference, 5 Comparing Inerpreable Inference Models for Videos of Physical Moion Michael Pearce Silvia Chiappa Ulrich Paque DeepMind, London michaelpearce@google.com

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif Chaper Kalman Filers. Inroducion We describe Bayesian Learning for sequenial esimaion of parameers (eg. means, AR coeciens). The updae procedures are known as Kalman Filers. We show how Dynamic Linear

More information

A Bayesian Approach to Spectral Analysis

A Bayesian Approach to Spectral Analysis Chirped Signals A Bayesian Approach o Specral Analysis Chirped signals are oscillaing signals wih ime variable frequencies, usually wih a linear variaion of frequency wih ime. E.g. f() = A cos(ω + α 2

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

EKF SLAM vs. FastSLAM A Comparison

EKF SLAM vs. FastSLAM A Comparison vs. A Comparison Michael Calonder, Compuer Vision Lab Swiss Federal Insiue of Technology, Lausanne EPFL) michael.calonder@epfl.ch The wo algorihms are described wih a planar robo applicaion in mind. Generalizaion

More information

Maximum Likelihood Parameter Estimation in State-Space Models

Maximum Likelihood Parameter Estimation in State-Space Models Maximum Likelihood Parameer Esimaion in Sae-Space Models Arnaud Douce Deparmen of Saisics, Oxford Universiy Universiy College London 4 h Ocober 212 A. Douce (UCL Maserclass Oc. 212 4 h Ocober 212 1 / 32

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS

SEIF, EnKF, EKF SLAM. Pieter Abbeel UC Berkeley EECS SEIF, EnKF, EKF SLAM Pieer Abbeel UC Berkeley EECS Informaion Filer From an analyical poin of view == Kalman filer Difference: keep rack of he inverse covariance raher han he covariance marix [maer of

More information

Isolated-word speech recognition using hidden Markov models

Isolated-word speech recognition using hidden Markov models Isolaed-word speech recogniion using hidden Markov models Håkon Sandsmark December 18, 21 1 Inroducion Speech recogniion is a challenging problem on which much work has been done he las decades. Some of

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Probabilistic Robotics SLAM

Probabilistic Robotics SLAM Probabilisic Roboics SLAM The SLAM Problem SLAM is he process by which a robo builds a map of he environmen and, a he same ime, uses his map o compue is locaion Localizaion: inferring locaion given a map

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oc 11, 2015 Learning o Process Naural Language in Big Daa Environmen Hang Li Noah s Ark Lab Huawei Technologies Par 2: Useful Deep Learning Tools Powerful Deep Learning Tools (Unsupervised

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Temporal probability models. Chapter 15, Sections 1 5 1

Temporal probability models. Chapter 15, Sections 1 5 1 Temporal probabiliy models Chaper 15, Secions 1 5 Chaper 15, Secions 1 5 1 Ouline Time and uncerainy Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic Bayesian

More information

Temporal probability models

Temporal probability models Temporal probabiliy models CS194-10 Fall 2011 Lecure 25 CS194-10 Fall 2011 Lecure 25 1 Ouline Hidden variables Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Probabilistic Robotics SLAM

Probabilistic Robotics SLAM Probabilisic Roboics SLAM The SLAM Problem SLAM is he process by which a robo builds a map of he environmen and, a he same ime, uses his map o compue is locaion Localizaion: inferring locaion given a map

More information

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides Hidden Markov Models Adaped from Dr Caherine Sweeney-Reed s slides Summary Inroducion Descripion Cenral in HMM modelling Exensions Demonsraion Specificaion of an HMM Descripion N - number of saes Q = {q

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Testing the Random Walk Model. i.i.d. ( ) r

Testing the Random Walk Model. i.i.d. ( ) r he random walk heory saes: esing he Random Walk Model µ ε () np = + np + Momen Condiions where where ε ~ i.i.d he idea here is o es direcly he resricions imposed by momen condiions. lnp lnp µ ( lnp lnp

More information

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering

Announcements. Recap: Filtering. Recap: Reasoning Over Time. Example: State Representations for Robot Localization. Particle Filtering Inroducion o Arificial Inelligence V22.0472-001 Fall 2009 Lecure 18: aricle & Kalman Filering Announcemens Final exam will be a 7pm on Wednesday December 14 h Dae of las class 1.5 hrs long I won ask anyhing

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

A Reinforcement Learning Approach for Collaborative Filtering

A Reinforcement Learning Approach for Collaborative Filtering A Reinforcemen Learning Approach for Collaboraive Filering Jungkyu Lee, Byonghwa Oh 2, Jihoon Yang 2, and Sungyong Park 2 Cyram Inc, Seoul, Korea jklee@cyram.com 2 Sogang Universiy, Seoul, Korea {mrfive,yangjh,parksy}@sogang.ac.kr

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Tracking. Announcements

Tracking. Announcements Tracking Tuesday, Nov 24 Krisen Grauman UT Ausin Announcemens Pse 5 ou onigh, due 12/4 Shorer assignmen Auo exension il 12/8 I will no hold office hours omorrow 5 6 pm due o Thanksgiving 1 Las ime: Moion

More information

13.3 Term structure models

13.3 Term structure models 13.3 Term srucure models 13.3.1 Expecaions hypohesis model - Simples "model" a) shor rae b) expecaions o ge oher prices Resul: y () = 1 h +1 δ = φ( δ)+ε +1 f () = E (y +1) (1) =δ + φ( δ) f (3) = E (y +)

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models Journal of Saisical and Economeric Mehods, vol.1, no.2, 2012, 65-70 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2012 A Specificaion Tes for Linear Dynamic Sochasic General Equilibrium Models

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

The equation to any straight line can be expressed in the form:

The equation to any straight line can be expressed in the form: Sring Graphs Par 1 Answers 1 TI-Nspire Invesigaion Suden min Aims Deermine a series of equaions of sraigh lines o form a paern similar o ha formed by he cables on he Jerusalem Chords Bridge. Deermine he

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

WATER LEVEL TRACKING WITH CONDENSATION ALGORITHM

WATER LEVEL TRACKING WITH CONDENSATION ALGORITHM WATER LEVEL TRACKING WITH CONDENSATION ALGORITHM Shinsuke KOBAYASHI, Shogo MURAMATSU, Hisakazu KIKUCHI, Masahiro IWAHASHI Dep. of Elecrical and Elecronic Eng., Niigaa Universiy, 8050 2-no-cho Igarashi,

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

MANY FACET, COMMON LATENT TRAIT POLYTOMOUS IRT MODEL AND EM ALGORITHM. Dimitar Atanasov

MANY FACET, COMMON LATENT TRAIT POLYTOMOUS IRT MODEL AND EM ALGORITHM. Dimitar Atanasov Pliska Sud. Mah. Bulgar. 20 (2011), 5 12 STUDIA MATHEMATICA BULGARICA MANY FACET, COMMON LATENT TRAIT POLYTOMOUS IRT MODEL AND EM ALGORITHM Dimiar Aanasov There are many areas of assessmen where he level

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Block Diagram of a DCS in 411

Block Diagram of a DCS in 411 Informaion source Forma A/D From oher sources Pulse modu. Muliplex Bandpass modu. X M h: channel impulse response m i g i s i Digial inpu Digial oupu iming and synchronizaion Digial baseband/ bandpass

More information

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM

Robot Motion Model EKF based Localization EKF SLAM Graph SLAM Robo Moion Model EKF based Localizaion EKF SLAM Graph SLAM General Robo Moion Model Robo sae v r Conrol a ime Sae updae model Noise model of robo conrol Noise model of conrol Robo moion model

More information

Inferring Dynamic Dependency with Applications to Link Analysis

Inferring Dynamic Dependency with Applications to Link Analysis Inferring Dynamic Dependency wih Applicaions o Link Analysis Michael R. Siracusa Massachuses Insiue of Technology 77 Massachuses Ave. Cambridge, MA 239 John W. Fisher III Massachuses Insiue of Technology

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

Math 333 Problem Set #2 Solution 14 February 2003

Math 333 Problem Set #2 Solution 14 February 2003 Mah 333 Problem Se #2 Soluion 14 February 2003 A1. Solve he iniial value problem dy dx = x2 + e 3x ; 2y 4 y(0) = 1. Soluion: This is separable; we wrie 2y 4 dy = x 2 + e x dx and inegrae o ge The iniial

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19

m = 41 members n = 27 (nonfounders), f = 14 (founders) 8 markers from chromosome 19 Sequenial Imporance Sampling (SIS) AKA Paricle Filering, Sequenial Impuaion (Kong, Liu, Wong, 994) For many problems, sampling direcly from he arge disribuion is difficul or impossible. One reason possible

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Probabilistic Robotics

Probabilistic Robotics Probabilisic Roboics Bayes Filer Implemenaions Gaussian filers Bayes Filer Reminder Predicion bel p u bel d Correcion bel η p z bel Gaussians : ~ π e p N p - Univariae / / : ~ μ μ μ e p Ν p d π Mulivariae

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

A unit root test based on smooth transitions and nonlinear adjustment

A unit root test based on smooth transitions and nonlinear adjustment MPRA Munich Personal RePEc Archive A uni roo es based on smooh ransiions and nonlinear adjusmen Aycan Hepsag Isanbul Universiy 5 Ocober 2017 Online a hps://mpra.ub.uni-muenchen.de/81788/ MPRA Paper No.

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information