Department of Computer Science, University of Toronto, Toronto, ON M5S 3H5, Canada

Size: px
Start display at page:

Download "Department of Computer Science, University of Toronto, Toronto, ON M5S 3H5, Canada"

Transcription

1 Machine Learning,?, {3 (997) c 997 Kluwer Academic Publishers, Boson. Manufacured in The Neherlands. Facorial Hidden Markov Models ZOUBIN GHAHRAMANI zoubin@cs.orono.edu Deparmen of Compuer Science, Universiy of Torono, Torono, ON M5S 3H5, Canada MICHAEL I. JORDAN jordan@psyche.mi.edu Deparmen of Brain & Cogniive Sciences, Massachuses Insiue of Technology, Cambridge, MA 0239, USA Edior: Padhraic Smyh Absrac. Hidden Markov models (HMMs) have proven o be one of he mos widely used ools for learning probabilisic models of ime series daa. In an HMM, informaion abou he pas is conveyed hrough a single discree variable he hidden sae. We discuss a generalizaion of HMMs in which his sae is facored ino muliple sae variables and is herefore represened in a disribued manner. We describe an exac algorihm for inferring he poserior probabiliies of he hidden sae variables given he observaions, and relae i o he forward{backward algorihm for HMMs and o algorihms for more general graphical models. Due o he combinaorial naure of he hidden sae represenaion, his exac algorihm is inracable. As in oher inracable sysems, approximae inference can be carried ou using Gibbs sampling or variaional mehods. Wihin he variaional framework, we presen a srucured approximaion in which he he sae variables are decoupled, yielding a racable algorihm for learning he parameers of he model. Empirical comparisons sugges ha hese approximaions are ecien and provide accurae alernaives o he exac mehods. Finally, we use he srucured approximaion o model Bach's chorales and show ha facorial HMMs can capure saisical srucure in his daa se which an unconsrained HMM canno. Keywords: Hidden Markov models, ime series, EM algorihm, graphical models, Bayesian neworks, mean eld heory. Inroducion Due o is exibiliy and o he simpliciy and eciency of is parameer esimaion algorihm, he hidden Markov model (HMM) has emerged as one of he basic saisical ools for modeling discree ime series, nding widespread applicaion in he areas of speech recogniion (Rabiner & Juang, 986) and compuaional molecular biology (Krogh, Brown, Mian, Sjolander, & Haussler, 994). An HMM is essenially a mixure model, encoding informaion abou he hisory of a ime series in he value of a single mulinomial variable he hidden sae which can ake on one of K discree values. This mulinomial assumpion suppors an ecien parameer esimaion algorihm he Baum-Welch algorihm which considers each of he K seings of he hidden sae a each ime sep. However, he mulinomial assumpion also severely limis he represenaional capaciy of HMMs. For example, o represen 30 bis of informaion abou he hisory of a ime sequence, an HMM would need K = 2 30 disinc saes. On he oher hand, an HMM wih a disribued sae represenaion could achieve he same ask wih 30 binary sae

2 2 Z. GHAHRAMANI AND M.I. JORDAN variables (Williams & Hinon, 99). This paper addresses he problem of consrucing ecien learning algorihms for hidden Markov models wih disribued sae represenaions. The need for disribued sae represenaions in HMMs can be moivaed in wo ways. Firs, such represenaions le he model auomaically decompose he sae space ino feaures ha decouple he dynamics of he process ha generaed he daa. Second, disribued sae represenaions simplify he ask of modeling ime series ha are known a priori o be generaed from an ineracion of muliple, loosely-coupled processes. For example, a speech signal generaed by he superposiion of muliple simulaneous speakers can be poenially modeled wih such an archiecure. Williams and Hinon (99) rs formulaed he problem of learning in HMMs wih disribued sae represenaions and proposed a soluion based on deerminisic Bolzmann learning. The approach presened in his paper is similar o Williams and Hinon's in ha i can also be viewed from he framework of saisical mechanics and mean eld heory. However, our learning algorihm is quie dieren in ha i makes use of he special srucure of HMMs wih a disribued sae represenaion, resuling in a signicanly more ecien learning procedure. Anicipaing he resuls in Secion 3, his learning algorihm obviaes he need for he wo-phase procedure of Bolzmann machines, has an exac M sep, and makes use of he forward{backward algorihm from classical HMMs as a subrouine. A dieren approach comes from Saul and Jordan (995), who derived a se of rules for compuing he gradiens required for learning in HMMs wih disribued sae spaces. However, heir mehods can only be applied o a limied class of archiecures. Hidden Markov models wih disribued sae represenaions are a paricular class of probabilisic graphical model (Pearl, 988; Laurizen & Spiegelhaler, 988), which represen probabiliy disribuions as graphs in which he nodes correspond o random variables and he links represen condiional independence relaions. The relaion beween hidden Markov models and graphical models has recenly been reviewed in Smyh, Heckerman and Jordan (997). Alhough exac probabiliy propagaion algorihms exis for general graphical models (Jensen, Laurizen, & Olesen, 990), hese algorihms are inracable for densely-conneced models such as he ones we consider in his paper. One approach o dealing wih his issue is o uilize sochasic sampling mehods (Kanazawa e al., 995). Anoher approach, which provides he basis for algorihms described in he curren paper, is o make use of variaional mehods (cf. Saul, Jaakkola, & Jordan, 996). In he following secion we dene he probabilisic model for facorial HMMs and in Secion 3 we presen algorihms for inference and learning. In Secion 4 we describe empirical resuls comparing exac and approximae algorihms for learning on he basis of ime complexiy and model qualiy. We also apply facorial HMMs o a real ime series daa se consising of he melody lines from a collecion of chorales by J. S. Bach. We discuss several generalizaions of he probabilisic model

3 FACTORIAL HIDDEN MARKOV MODELS 3 (a) (b) S () - S () S () + S - S S + S (2) - S (2) S (2) + Y- Y Y+ S (3) - S (3) S (3) + Y- Y Y+ Figure. (a) A direced acyclic graph (DAG) specifying condiional independence relaions for a hidden Markov model. Each node is condiionally independen from is non-descendans given is parens. (b) A DAG represening he condiional independence relaions in a facorial HMM wih M = 3 underlying Markov chains. in Secion 5, and we conclude in Secion 6. Where necessary, deails of derivaions are provided in he appendixes. 2. The probabilisic model We begin by describing he hidden Markov model, in which a sequence of observaions fy g where = ; : : :T, is modeled by specifying a probabilisic relaion beween he observaions and a sequence of hidden saes fs g, and a Markov ransiion srucure linking he hidden saes. The model assumes wo ses of condiional independence relaions: ha Y is independen of all oher observaions and saes given S, and ha S is independen of S : : :S 2 given S (he rs-order Markov propery). Using hese independence relaions, he join probabiliy for he sequence of saes and observaions can be facored as P (fs ; Y g) = P (S )P (Y js ) TY =2 P (S js )P (Y js ): () The condiional independencies specied by equaion () can be expressed graphically in he form of Figure (a). The sae is a single mulinomial random variable ha can ake one of K discree values, S 2 f; : : :; Kg. The sae ransiion probabiliies, P (S js ), are specied by a K K ransiion marix. If he observaions are discree symbols aking on one of D values, he observaion probabiliies P (Y js ) can be fully specied as a K D observaion marix. For a coninuous observaion vecor, P (Y js ) can be modeled in many dieren forms, such as a Gaussian, a mixure of Gaussians, or even a neural nework. 2 In he presen paper, we generalize he HMM sae represenaion by leing he sae be represened by a collecion of sae variables S = S () ; : : :S ; : : :; S (M) ; (2)

4 4 Z. GHAHRAMANI AND M.I. JORDAN each of which can ake on K values. We refer o hese models as facorial hidden Markov models, as he sae space consiss of he cross produc of hese sae variables. For simpliciy, we will assume ha K = K, for all m, alhough all he resuls we presen can be rivially generalized o he case of diering K. Given ha he sae space of his facorial HMM consiss of all K M combinaions of he S variables, placing no consrains on he sae ransiion srucure would resul in a K M K M ransiion marix. Such an unconsrained sysem is unineresing for several reasons: i is equivalen o an HMM wih K M saes; i is unlikely o discover any ineresing srucure in he K sae variables, as all variables are allowed o inerac arbirarily; and boh he ime complexiy and sample complexiy of he esimaion algorihm are exponenial in M. We herefore focus on facorial HMMs in which he underlying sae ransiions are consrained. A naural srucure o consider is one in which each sae variable evolves according o is own dynamics, and is a priori uncoupled from he oher sae variables: P (S js ) = MY P (S js ): (3) A graphical represenaion for his model is presened in Figure (b). The ransiion srucure for his sysem can be represened as M disinc K K marices. Generalizaions ha allow coupling beween he sae variables are briey discussed in Secion 5. As shown in Figure (b), in a facorial HMM he observaion a ime sep can depend on all he sae variables a ha ime sep. For coninuous observaions, one simple form for his dependence is linear Gaussian; ha is, he observaion Y is a Gaussian random vecor whose mean is a linear funcion of he sae variables. We represen he sae variables as K vecors, where each of he K discree values corresponds o a in one posiion and 0 elsewhere. The resuling probabiliy densiy for a D observaion vecor Y is where P (Y js ) = jcj =2 (2) D=2 = exp 2 (Y ) 0 C (Y ) ; (4a) W S : (4b) Each W marix is a D K marix whose columns are he conribuions o he means for each of he seings of S, C is he D D covariance marix, 0 denoes marix ranspose, and j j is he marix deerminan operaor. One way o undersand he observaion model in equaions (4a) and (4b) is o consider he marginal disribuion for Y, obained by summing over he possible saes. There are K seings for each of he M sae variables, and hus here

5 FACTORIAL HIDDEN MARKOV MODELS 5 are K M possible mean vecors obained by forming sums of M columns where one column is chosen from each of he W marices. The resuling marginal densiy of Y is hus a Gaussian mixure model, wih K M Gaussian mixure componens each having a consan covariance marix C. This saic mixure model, wihou inclusion of he ime index and he Markov dynamics, is a facorial parameerizaion of he sandard mixure of Gaussians model ha has ineres in is own righ (Zemel, 993; Hinon & Zemel, 994; Ghahramani, 995). The model we are considering in he curren paper exends his model by allowing Markov dynamics in he discree sae variables underlying he mixure. Unless oherwise saed, we will assume he Gaussian observaion model hroughou he paper. The hidden sae variables a one ime sep, alhough marginally independen, become condiionally dependen given he observaion sequence. This can be deermined by applying he semanics of direced graphs, in paricular he d-separaion crierion (Pearl, 988), o he graphical model in Figure (b). Consider he Gaussian model in equaions (4a)-(4b). Given an observaion vecor Y, he poserior probabiliy of each of he seings of he hidden sae variables is proporional o he probabiliy of Y under a Gaussian wih mean. Since is a funcion of all he sae variables, he probabiliy of a seing of one of he sae variables will depend on he seing of he oher sae variables. 3 This dependency eecively couples all of he hidden sae variables for he purposes of calculaing poserior probabiliies and makes exac inference inracable for he facorial HMM. 3. Inference and learning The inference problem in a probabilisic graphical model consiss of compuing he probabiliies of he hidden variables given he observaions. In he conex of speech recogniion, for example, he observaions may be acousic vecors and he goal of inference may be o compue he probabiliy for a paricular word or sequence of phonemes (he hidden sae). This problem can be solved ecienly via he forward{backward algorihm (Rabiner & Juang, 986), which can be shown o be a special case of he Jensen, Laurizen, and Olesen (990) algorihm for probabiliy propagaion in more general graphical models (Smyh e al., 997). In some cases, raher han a probabiliy disribuion over hidden saes i is desirable o infer he single mos probable hidden sae sequence. This can be achieved via he Vierbi (967) algorihm, a form of dynamic programming ha is very closely relaed o he forward{backward algorihm and also has analogues in he graphical model lieraure (Dawid, 992). The learning problem for probabilisic models consiss of wo componens: learning he srucure of he model and learning is parameers. Srucure learning is a opic of curren research in boh he graphical model and machine learning communiies (e.g. Heckerman, 995; Solcke & Omohundro, 993). In he curren paper we deal exclusively wih he problem of learning he parameers for a given srucure.

6 6 Z. GHAHRAMANI AND M.I. JORDAN 3.. The EM algorihm The parameers of a facorial HMM can be esimaed via he expecaion maximizaion (EM) algorihm (Dempser, Laird, & Rubin, 977), which in he case of classical HMMs is known as he Baum{Welch algorihm (Baum, Perie, Soules, & Weiss, 970). This procedure ieraes beween a sep ha xes he curren parameers and compues poserior probabiliies over he hidden saes (he E sep) and a sep ha uses hese probabiliies o maximize he expeced log likelihood of he observaions as a funcion of he parameers (he M sep). Since he E sep of EM is exacly he inference problem as described above, we subsume he discussion of boh inference and learning problems ino our descripion of he EM algorihm for facorial HMMs. The EM algorihm follows from he deniion of he expeced log likelihood of he complee (observed and hidden) daa: Q( new j) = E log P (fs ; Y gj new ) j ; fy g ; (5) where Q is a funcion of he parameers new, given he curren parameer esimae and he observaion sequence fy g. For he facorial HMM he parameers of he model are = fw ; ; P ; Cg, where = P (S ) and P = P (S js ). The E sep consiss of compuing Q. By expanding (5) using equaions (){(4b), we nd ha Q can be expressed as a funcion of hree ypes of expecaions over he hidden sae variables: hs i, hs S (n)0 i, and hs S0 i, where hi has been used o abbreviae Efj; fy gg. In he HMM noaion of Rabiner and Juang (986), hs i corresponds o, he vecor of sae occupaion probabiliies, hs i corresponds o, he K K marix of S0 sae occupaion probabiliies a wo consecuive ime seps, and hs S (n)0 i has no analogue when here is only a single underlying Markov model. The M sep uses hese expecaions o maximize Q as a funcion of new. Using Jensen's inequaliy, Baum, Perie, Soules & Weiss (970) showed ha each ieraion of he E and M seps increases he likelihood, P (fy gj), unil convergence o a (local) opimum. As in hidden Markov models, he exac M sep for facorial HMMs is simple and racable. In paricular, he M sep for he parameers of he oupu model described in equaions (4a)-(4b) can be found by solving a weighed linear regression problem. Similarly, he M seps for he priors,, and sae ransiion marices, P, are idenical o he ones used in he Baum{Welch algorihm. The deails of he M sep are given in Appendix A. We now urn o he subsanially more dicul problem of compuing he expecaions required for he E sep Exac inference Unforunaely, he exac E sep for facorial HMMs is compuaionally inracable. This fac can bes be shown by making reference o sandard algorihms for prob-

7 FACTORIAL HIDDEN MARKOV MODELS 7 abilisic inference in graphical models (Laurizen & Spiegelhaler, 988), alhough i can also be derived readily from direc applicaion of Bayes rule. Consider he compuaions ha are required for calculaing poserior probabiliies for he facorial HMM shown in Figure (b) wihin he framework of he Laurizen and Spiegelhaler algorihm. Moralizing and riangulaing he graphical srucure for he facorial HMM resuls in a juncion ree (in fac a chain) wih T (M + ) M cliques of size M +. The resuling probabiliy propagaion algorihm has ime complexiy O(T MK M+ ) for a single observaion sequence of lengh T. We presen a forward{backward ype recursion ha implemens he exac E sep in Appendix B. The naive exac algorihm which consiss of ranslaing he facorial HMM ino an equivalen HMM wih K M saes and using he forward{backward algorihm, has ime complexiy O(T K 2M ). Like oher models wih muliple densely-conneced hidden variables, his exponenial ime complexiy makes exac learning and inference inracable. Thus, alhough he Markov propery can be used o obain forward{backwardlike facorizaions of he expecaions across ime seps, he sum over all possible conguraions of he oher hidden sae variables wihin each ime sep is unavoidable. This inracabiliy is due inherenly o he cooperaive naure of he model: for he Gaussian oupu model, for example, he seings of all he sae variables a one ime sep cooperae in deermining he mean of he observaion vecor Inference using Gibbs sampling Raher han compuing he exac poserior probabiliies, one can approximae hem using a Mone Carlo sampling procedure, and hereby avoid he sum over exponenially many sae paerns a some cos in accuracy. Alhough here are many possible sampling schemes (for a review see Neal, 993), here we presen one of he simples Gibbs sampling (Geman & Geman, 984). For a given observaion sequence fy g, his procedure sars wih a random seing of he hidden saes fs g. A each sep of he sampling process, each sae vecor is updaed sochasically according o is probabiliy disribuion condiioned on he seing of all he oher sae vecors. The graphical model is again useful here, as each node is condiionally independen of all oher nodes given is Markov blanke, dened as he se of children, parens, and parens of he children of a node. To sample from a ypical sae variable S we only need o examine he saes of a few neighboring nodes: S sampled from P (S jfs (n) : n 6= mg; S ; S ; Y + ) / P (S js ) P (S + js ) P (Y js () ; : : :; S ; : : :; S (M) ): Sampling once from each of he T M hidden variables in he model resuls in a new sample of he hidden sae of he model and requires O(T MK) operaions. The sequence of overall saes resuling from each pass of Gibbs sampling denes a Markov chain over he sae space of he model. Assuming ha all probabiliies are bounded away from zero, his Markov chain is guaraneed o converge o he

8 8 Z. GHAHRAMANI AND M.I. JORDAN poserior probabiliies of he saes given he observaions (Geman & Geman, 984). Thus, afer some suiable ime, samples from he Markov chain can be aken as approximae samples from he poserior probabiliies. The rs and second-order saisics needed o esimae hs i, hs S (n)0 i and hs S0 i are colleced using he saes visied and he probabiliies esimaed during his sampling process are used in he approximae E sep of EM Compleely facorized variaional inference There also exiss a second approximaion of he poserior probabiliy of he hidden saes ha is boh racable and deerminisic. The basic idea is o approximae he poserior disribuion over he hidden variables P (fs gjfy g) by a racable disribuion Q(fS g). This approximaion provides a lower bound on he log likelihood ha can be used o obain an ecien learning algorihm. The argumen can be formalized following he reasoning of Saul, Jaakkola, and Jordan (996). Any disribuion over he hidden variables Q(fS g) can be used o dene a lower bound on he log likelihood X log P (fy g) = log P (fs ; Y g) fs X g P (fs ; Y g) = log Q(fS g) Q(fS g) fs g X P (fs ; Y g) Q(fS g) log ; Q(fS g) fs g where we have made use of Jensen's inequaliy in he las sep. The dierence beween he lef-hand side and he righ-hand side of his inequaliy is given by he Kullback-Leibler divergence (Cover & Thomas, 99): KL(QkP ) = X fs g Q(fS g) log Q(fS g) P (fs gjfy g) : (6) The complexiy of exac inference in he approximaion given by Q is deermined by is condiional independence relaions, no by is parameers. Thus, we can chose Q o have a racable srucure a graphical represenaion ha eliminaes some of he dependencies in P. Given his srucure, we are free o vary he parameers of Q so as o obain he ighes possible bound by minimizing (6). We will refer o he general sraegy of using a parameerized approximaing disribuion as a variaional approximaion and refer o he free parameers of he disribuion as variaional parameers. To illusrae, consider he simples variaional approximaion, in which he sae variables are assumed independen given he observaions (Figure 2 (a)). This disribuion can be wrien as

9 FACTORIAL HIDDEN MARKOV MODELS 9 (a) (b) S () - S () S () + S () - S () S () + S (2) - S (2) S (2) + S (2) - S (2) S (2) + S (3) - S (3) S (3) + S (3) - S (3) S (3) + Figure 2. (a) The compleely facorized variaional approximaion assumes ha all he sae variables are independen (condiional on he observaion sequence). (b) The srucured variaional approximaion assumes ha he sae variables reain heir Markov srucure wihin each chain, bu are independen across chains. Q(fS gj) = TY MY = Q(S j ): (7) The variaional parameers, = f g, are he means of he sae variables, where, as before, a sae variable S is represened as a K-dimensional vecor wih a in he k h posiion and 0 elsewhere, if he m h Markov chain is in sae k a ime. The elemens of he vecor herefore dene he sae occupaion probabiliies for he mulinomial variable S under he disribuion Q: Q(S j ) = KY k= ;k S ;k where X K S ;k 2 f0; g; S ;k = : (8) k= This compleely facorized approximaion is ofen used in saisical physics, where i provides he basis for simple ye powerful mean eld approximaions o saisical mechanical sysems (Parisi, 988). To make he bound as igh as possible we vary separaely for each observaion sequence so as o minimize he KL divergence. Taking he derivaives of (6) wih respec o and seing hem o zero, we obain he se of xed poin equaions (see Appendix C) dened by new = ' W 0 C Y ~ 2 + (log P ) + (log P ) 0 + where Y ~ is he residual error in Y given he predicions from all he sae variables no including m: ~Y Y `6=m (9a) W (`) (`) ; (9b)

10 0 Z. GHAHRAMANI AND M.I. JORDAN is he vecor of diagonal elemens of W 0 C W, and 'fg is he sofmax operaor, which maps a vecor A ino a vecor B of he same size, wih elemens B i = expfa ig ; (0) expfa j g X j and log P denoes he elemenwise logarihm of he ransiion marix P. The rs erm of (9a) is he projecion of he error in reconsrucing he observaion ono he weighs of sae vecor m he more a paricular seing of a sae vecor can reduce his error, he larger is associaed variaional parameer. The second erm arises from he fac ha he second order correlaion hs S i evaluaed under he variaional disribuion is a diagonal marix composed of he elemens of. The las wo erms inroduce dependencies forward and backward in ime. 5 Therefore, alhough he poserior disribuion over he hidden variables is approximaed wih a compleely facorized disribuion, he xed poin equaions couple he parameers associaed wih each node wih he parameers of is Markov blanke. In his sense, he xed poin equaions propagae informaion along he same pahways as hose dening he exac algorihms for probabiliy propagaion. The following may provide an inuiive inerpreaion of he approximaion being made by his disribuion. Given a paricular observaion sequence, he hidden sae variables for he M Markov chains a ime sep are sochasically coupled. This sochasic coupling is approximaed by a sysem in which he hidden variables are uncorrelaed bu have coupled means. The variaional or \mean-eld" equaions solve for he deerminisic coupling of he means ha bes approximaes he sochasically coupled sysem. Each hidden sae vecor is updaed in urn using (9a), wih a ime complexiy of O(T MK 2 ) per ieraion. Convergence is deermined by monioring he KL divergence in he variaional disribuion beween successive ime seps; in pracice convergence is very rapid (abou 2 o 0 ieraions of (9a)). Once he xed poin equaions have converged, he expecaions required for he E sep can be obained as a simple funcion of he parameers (equaions (C.6){(C.8) in Appendix C) Srucured variaional inference The approximaion presened in he previous secion facors he poserior probabiliy such ha all he sae variables are saisically independen. In conras o his raher exreme facorizaion, here exiss a hird approximaion ha is boh racable and preserves much of he probabilisic srucure of he original sysem. In his scheme, he facorial HMM is approximaed by M uncoupled HMMs as shown in Figure 2 (b). Wihin each HMM, ecien and exac inference is implemened via he forward{backward algorihm. The approach of exploiing such racable subsrucures was rs suggesed in he machine learning lieraure by Saul and Jordan (996).

11 FACTORIAL HIDDEN MARKOV MODELS Noe ha he argumens presened in he previous secion did no hinge on he he form of he approximaing disribuion. Therefore, more srucured variaional approximaions can be obained by using more srucured variaional disribuions Q. Each such Q provides a lower bound on he log likelihood and can be used o obain a learning algorihm. We wrie he srucured variaional approximaion as Q(fS gj) = Z Q k= MY Q(S j) TY =2 Q(S js ; ); where Z Q is a normalizaion consan ensuring ha Q inegraes o one and KY S Q(S j) = h ;k ;k k Q(S js ; ) = K Y k= = KY k= h ;k h ;k KX j= KY j= P k;j S ;j P k;j A S ;j S ;k A S ;k (a) (b) ; (c) where he las equaliy follows from he fac ha S is a vecor wih a in one posiion and 0 elsewhere. The parameers of his disribuion are = f ; P ; h g he original priors and sae ransiion marices of he facorial HMM and a imevarying bias for each sae variable. Comparing equaions (a){(c) o equaion (), we can see ha he K vecor h plays he role of he probabiliy of an observaion (P (Y js ) in ()) for each of he K seings of S. For example, Q(S ;j = j) = h ;j P (S ;j = j) corresponds o having an observaion a ime = ha under sae S ;j = has probabiliy h ;j. Inuiively, his approximaion uncouples he M Markov chains and aaches o each sae variable a disinc ciious observaion. The probabiliy of his ciious observaion can be varied so as o minimize he KL divergence beween Q and P. Applying he same argumens as before, we obain a se of xed poin equaions for h ha minimize KL(QkP ), as deailed in Appendix D: new h = exp W 0 C Y ~ 2 ; (2a) where is dened as before, and where we redene he residual error o be ~Y Y `6=m W (`) hs (`) i: (2b)

12 2 Z. GHAHRAMANI AND M.I. JORDAN The parameer h obained from hese xed poin equaions is he observaion probabiliy associaed wih sae variable S in hidden Markov model m. Using hese probabiliies, he forward{backward algorihm is used o compue a new se of expecaions for hs i, which are fed back ino (2a) and (2b). This algorihm is herefore used as a subrouine in he minimizaion of he KL divergence. Noe he similariy beween equaions (2a){(2b) and equaions (9a){(9b) for he compleely facorized sysem. In he compleely facorized sysem, since hs i =, he xed poin equaions can be wrien explicily in erms of he variaional parameers. In he srucured approximaion, he dependence of hs i on h is compued via he forward{backward algorihm. Noe also ha (2a) does no conain erms involving he prior,, or ransiion marix, P. These erms have cancelled by our choice of approximaion Choice of approximaion The heory of he EM algorihm as presened in Dempser e al. (977) assumes he use of an exac E sep. For models in which he exac E sep is inracable, one mus insead use an approximaion like hose we have jus described. The choice among hese approximaions mus ake ino accoun several heoreical and pracical issues. Mone Carlo approximaions based on Markov chains, such as Gibbs sampling, oer he heoreical assurance ha he sampling procedure will converge o he correc poserior disribuion in he limi. Alhough his means ha one can come arbirarily close o he exac E sep, in pracice convergence can be slow (especially for mulimodal disribuions) and i is ofen very dicul o deermine how close one is o convergence. However, when sampling is used for he E sep of EM, here is a ime radeo beween he number of samples used and he number of EM ieraions. I seems waseful o wai unil convergence early on in learning, when he poserior disribuion from which samples are drawn is far from he poserior given he opimal parameers. In pracice we have found ha even approximae E seps using very few Gibbs samples (e.g. around en samples of each hidden variable) end o increase he rue likelihood. Variaional approximaions oer he heoreical assurance ha a lower bound on he likelihood is being maximized. Boh he minimizaion of he KL divergence in he E sep and he parameer updae in he M sep are guaraneed no o decrease his lower bound, and herefore convergence can be dened in erms of he bound. An alernaive view given by Neal and Hinon (993) describes EM in erms of he negaive free energy, F, which is a funcion of he parameers,, he observaions, Y, and a poserior probabiliy disribuion over he hidden variables, Q(S): F (Q; ) = E Q flog P (Y; Sj)g E Q flog Q(S)g ; where E Q denoes expecaion over S using he disribuion Q(S). The exac E sep in EM maximizes F wih respec o Q given. The variaional E seps used

13 FACTORIAL HIDDEN MARKOV MODELS 3 here maximize F wih respec o Q given, subjec o he consrain ha Q is of a paricular racable form. Given his view, i seems clear ha he srucured approximaion is preferable o he compleely facorized approximaion since i places fewer consrains on Q, a no cos in racabiliy. 4. Experimenal resuls To invesigae learning and inference in facorial HMMs we conduced wo experimens. The rs experimen compared he dieren approximae and exac mehods of inference on he basis of compuaion ime and he likelihood of he model obained from synheic daa. The second experimen sough o deermine wheher he decomposiion of he sae space in facorial HMMs presens any advanage in modeling a real ime series daa se ha we migh assume o have complex inernal srucure Bach's chorale melodies. 4.. Experimen : Performance and iming benchmarks Using daa generaed from a facorial HMM srucure wih M underlying Markov models wih K saes each, we compared he ime per EM ieraion and he raining and es se likelihoods of ve models: HMM rained using he Baum-Welch algorihm; Facorial HMM rained wih exac inference for he E sep, using a sraighforward applicaion of he forward{backward algorihm, raher han he more ecien algorihm oulined in Appendix B; Facorial HMM rained using Gibbs sampling for he E sep wih he number of samples xed a 0 samples per variable; 6 Facorial HMM rained using he compleely facorized variaional approximaion; and Facorial HMM rained using he srucured variaional approximaion. All facorial HMMs consised of M underlying Markov models wih K saes each, whereas he HMM had K M saes. The daa were generaed from a facorial HMM srucure wih M sae variables, each of which could ake on K discree values. All of he parameers of his model, excep for he oupu covariance marix, were sampled from a uniform [0; ] disribuion and appropriaely normalized o saisfy he sum-o-one consrains of he ransiion marices and priors. The covariance marix was se o a muliple of he ideniy marix C = 0:0025I. The raining and es ses consised of 20 sequences of lengh 20, where he observable was a four-dimensional vecor. For each randomly sampled se of parameers, a separae raining se and es se were generaed and each algorihm was run once.

14 4 Z. GHAHRAMANI AND M.I. JORDAN Fifeen ses of parameers were generaed for each of he four problem sizes. Algorihms were run for a maximum of 00 ieraions of EM or unil convergence, dened as he ieraion k a which he log likelihood L(k), or approximae log likelihood if an approximae algorihm was used, saised L(k) L(k ) < 0 5 (L(k ) L(2)). A he end of learning, he log likelihoods on he raining and es se were compued for all models using he exac algorihm. Also included in he comparison was he log likelihood of he raining and es ses under he rue model ha generaed P he daa. The es se log likelihood for N observaion sequences is dened N (n) as log P (Y n= ; : : :; Y (n) T j); where is obained by maximizing he log likelihood over a raining se ha is disjoin from he es se. This provides a measure of how well he model generalizes o a novel observaion sequence from he same disribuion as he raining daa. Resuls averaged over 5 runs for each algorihm on each of he four problem sizes (a oal of 300 runs) are presened in Table. Even for he smalles problem size (M = 3 and K = 2), he sandard HMM wih K M saes suers from overing: he es se log likelihood is signicanly worse han he raining se log likelihood. As expeced, his overing problem becomes worse as he size of he sae space increases; i is paricularly serious for M = 5 and K = 3. For he facorial HMMs, he log likelihoods for each of he hree approximae EM algorihms were compared o he exac algorihm. Gibbs sampling appeared o have he poores performance: for each of he hree smaller size problems is log likelihood was signicanly worse han ha of he exac algorihm on boh he raining ses and es ses (p < 0:05). This may be due o insucien sampling. However, we will soon see ha running he Gibbs sampler for more han 0 samples, while poenially improving performance, makes i subsanially slower han he variaional mehods. Surprisingly, he Gibbs sampler appears o do quie well on he larges size problem, alhough he dierences o he oher mehods are no saisically signican. The performance of he compleely facorized variaional approximaion was no saisically signicanly dieren from ha of he exac algorihm on eiher he raining se or he es se for any of he problem sizes. The performance of he srucured variaional approximaion was no saisically dieren from ha of he exac mehod on hree of he four problem sizes, and appeared o be beer on one of he problem sizes (M = 5; K = 2; p < 0:05). Alhough his resul may be a uke arising from random variabiliy, here is anoher more ineresing (and speculaive) explanaion. The exac EM algorihm implemens unconsrained maximizaion of F, as dened in secion 3.6, while he variaional mehods maximize F subjec o a consrained disribuion. These consrains could presumably ac as regularizers, reducing overing. There was a large amoun of variabiliy in he nal log likelihoods for he models learned by all he algorihms. We subraced he log likelihood of he rue generaive model from ha of each rained model o eliminae he main eec of he randomly sampled generaive model and o reduce he variabiliy due o raining and es ses. One imporan remaining source of variance was he random seed used in

15 FACTORIAL HIDDEN MARKOV MODELS 5 Table. Comparison of he facorial HMM on four problems of varying size. The negaive log likelihood for he raining and es se, plus or minus one sandard deviaion, is shown for each problem size and algorihm, measured in bis per observaion (log likelihood in bis divided by N T ) relaive o he log likelihood under he rue generaive model for ha daa se. 7 True is he rue generaive model (he log likelihood per symbol is dened o be zero for his model by our measure); HMM is he hidden Markov model wih K M saes; Exac is he facorial HMM rained using an exac E sep; Gibbs is he facorial HMM rained using Gibbs sampling; CFVA is he facorial HMM rained using he compleely facorized variaional approximaion; SVA is he facorial HMM rained using he srucured variaional approximaion. M K Algorihm Training Se Tes Se 3 2 True HMM Exac Gibbs CFVA SVA True HMM Exac Gibbs CFVA SVA True HMM Exac Gibbs CFVA SVA True HMM Exac Gibbs CFVA SVA

16 6 Z. GHAHRAMANI AND M.I. JORDAN (a) 5 (b) log likelihood (bis) 3 2 log likelihood (bis) ieraions of EM ieraions of EM (c) 5 (d) log likelihood (bis) 3 2 log likelihood (bis) ieraions of EM ieraions of EM Figure 3. Learning curves for ve runs of each of he four learning algorihms for facorial HMMs: (a) exac; (b) compleely facorized variaional approximaion; (c) srucured variaional approximaion; and (d) Gibbs sampling. A single raining se sampled from he M = 3; K = 2 problem size was used for all hese runs. The solid lines show he negaive log likelihood per observaion (in bis) relaive o he rue model ha generaed he daa, calculaed using he exac algorihm. The circles denoe he poin a which he convergence crierion was me and he run ended. For he hree approximae algorihms, he dashed lines show an approximae negaive log likelihood. 8 each raining run, which deermined he iniial parameers and he samples used in he Gibbs algorihm. All algorihms appeared o be very sensiive o his random seed, suggesing ha dieren runs on each raining se found dieren local maxima or plaeaus of he likelihood (Figure 3). Some of his variabiliy could be eliminaed by explicily adding a regularizaion erm, which can be viewed as a prior on he parameers in maximum a poseriori parameer esimaion. Alernaively, Bayesian (or ensemble) mehods could be used o average ou his variabiliy by inegraing over he parameer space. The iming comparisons conrm he fac ha boh he sandard HMM and he exac E sep facorial HMM are exremely slow for models wih large sae spaces (Fig-

17 FACTORIAL HIDDEN MARKOV MODELS 7 Time/ieraion (sec) HMM Exac fhmm G ibbs fhmm CFVA fhmm SVA fhmm M Sae space size (K ) 5 2 Figure 4. Time per ieraion of EM on a Silicon Graphics R4400 processor running Malab. ure 4). Gibbs sampling was slower han he variaional mehods even when limied o en samples of each hidden variable per ieraion of EM. Since one pass of he variaional xed poin equaions has he same ime complexiy as one pass of Gibbs sampling, and since he variaional xed poin equaions were found o converge very quickly, hese experimens sugges ha Gibbs sampling is no as compeiive ime-wise as he variaional mehods. The ime per ieraion for he variaional mehods scaled well o large sae spaces Experimen 2: Bach chorales Musical pieces naurally exhibi complex srucure a many dieren ime scales. Furhermore, one can imagine ha o represen he \sae" of he musical piece a any given ime i would be necessary o specify a conjuncion of many dieren feaures. For hese reasons, we chose o es wheher a facorial HMM would provide an advanage over a regular HMM in modeling a collecion of musical pieces. The daa se consised of discree even sequences encoding he melody lines of J. S. Bach's Chorales, obained from he UCI Reposiory for Machine Learning Daabases (Merz & Murphy, 996) and originally discussed in Conklin and Wien (995). Each even in he sequence was represened by six aribues, described in Table 2. Sixy-six chorales, wih 40 or more evens each, were divided ino a raining se (30 chorales) and a es se (36 chorales). Using he rs se, hidden Markov models wih sae space ranging from 2 o 00 saes were rained unil convergence (30 2 seps of EM). Facorial HMMs of varying sizes (K ranging from 2 o 6; M ranging from 2 o 9) were also rained on he same daa. To

18 8 Z. GHAHRAMANI AND M.I. JORDAN Table 2. Aribues in he Bach chorale daa se. The key signaure and ime signaure aribues were consan over he duraion of he chorale. All aribues were reaed as real numbers and modeled as linear-gaussian observaions (4a). Aribue Descripion Represenaion pich pich of he even in [0; 27] keysig key signaure in [ 7; 7] imesig ime signaure (/6 noes) fermaa even under fermaa? binary s sar ime of even in (/6 noes) dur duraion of even in (/6 noes) approximae he E sep for facorial HMMs we used he srucured variaional approximaion. This choice was moivaed by hree consideraions. Firs, for he size of sae space we wished o explore, he exac algorihms were prohibiively slow. Second, he Gibbs sampling algorihm did no appear o presen any advanages in speed or performance and required some heurisic mehod for deermining he number of samples. Third, heoreical argumens sugges ha he srucured approximaion should in general be superior o he compleely facorized variaional approximaion, since more of he dependencies of he original model are preserved. The es se log likelihoods for he HMMs, shown in Figure 5 (a), exhibied he ypical U-shaped curve demonsraing a rade-o beween bias and variance (Geman, Bienensock, & Doursa, 992). HMMs wih fewer han 0 saes did no predic well, while HMMs wih more han 40 saes over he raining daa and herefore provided a poor model of he es daa. Ou of he 75 runs, he highes es se log likelihood per observaion was 9:0 bis, obained by an HMM wih 30 hidden saes. 9 The facorial HMM provides a more saisfacory model of he chorales from hree poins of view. Firs, he ime complexiy is such ha i is possible o consider models wih signicanly larger sae spaces; in paricular, we models wih up o 000 saes. Second, given he componenial paramerizaion of he facorial HMM, hese large sae spaces do no require excessively large numbers of parameers relaive o he number of daa poins. In paricular, we saw no evidence of overing even for he larges facorial HMM as seen in Figures 5 (c) & (d). Finally, his approach resuled in signicanly beer predicors; he es se likelihood for he bes facorial HMM was an order of magniude larger han he es se likelihood for he bes HMM, as Figure 5 (d) reveals. While he facorial HMM is clearly a beer predicor han a single HMM, i should be acknowledged ha neiher approach produces models ha are easily inerpreable from a musicological poin of view. The siuaion is reminiscen of ha in speech recogniion, where HMMs have proved heir value as predicive models of he speech signal wihou necessarily being viewed as causal generaive models of speech. A facorial HMM is clearly an impoverished represenaion of

19 FACTORIAL HIDDEN MARKOV MODELS 9 log likelihood (bis) (a) (b) (c) Size of sae space (d) Figure 5. Tes se log likelihood per even of he Bach chorale daa se as a funcion of number of saes for (a) HMMs, and facorial HMMs wih (b) K = 2, (c) K = 3, and (d) K = 4 ('s; heavy dashed line) and K = 5 ('s; solid line). Each symbol represens a single run; he lines indicae he mean performances. The hin dashed line in (b){(d) indicaes he log likelihood per observaion of he bes run in (a). The facorial HMMs were rained using he srucured approximaion. For boh mehods he rue likelihood was compued using he exac algorihm. musical srucure, bu is promising performance as a predicor provides hope ha i could serve as a sep on he way oward increasingly srucured saisical models for music and oher complex mulivariae ime series. 5. Generalizaions of he model In his secion, we describe four variaions and generalizaions of he facorial HMM. 5.. Discree observables The probabilisic model presened in his paper has assumed real-valued Gaussian observaions. One of he advanages arising from his assumpion is ha he condiional densiy of a D-dimensional observaion, P (Y js () ; : : :; S (M) ), can be compacly specied hrough M mean marices of dimension D K, and one D D covariance marix. Furhermore, he M sep for such a model reduces o a se of weighed leas squares equaions. The model can be generalized o handle discree observaions in several ways. Consider a single D-valued discree observaion Y. In analogy o radiional HMMs, he oupu probabiliies could be modeled using a marix. However, in he case of a facorial HMM, his marix would have D K M enries for each combinaion of he sae variables and observaion. Thus he compacness of he represenaion would be enirely los. Sandard mehods from graphical models sugges approximaing such large marices wih \noisy-or" (Pearl, 988) or \sigmoid" (Neal, 992) models of ineracion. For example, in he sofmax model, which generalizes he sigmoid model o D > 2, P (Y js () ; : : :; S (M) ) is mulinomial wih mean proporional o

20 20 Z. GHAHRAMANI AND M.I. JORDAN P o expn m W S. Like he Gaussian model, his specicaion is again compac, using M marices of size D K. (As in he linear-gaussian model, he W are overparamerized since hey can each model he overall mean of Y, as shown in Appendix A.) While he nonlineariy induced by he sofmax funcion makes boh he E sep and M sep of he algorihm more dicul, ieraive numerical mehods can be used in he M sep whereas Gibbs sampling and variaional mehods can again be used in he E sep (see Neal, 992; Hinon e al., 995; and Saul e al., 996, for discussions of dieren approaches o learning in sigmoid neworks) Inroducing couplings The archiecure for facorial HMMs presened in Secion 2 assumes ha he underlying Markov chains inerac only hrough he observaions. This consrain can be relaxed by inroducing couplings beween he hidden sae variables (cf. Saul & Jordan, 997). For example, if S depends on S and S(m ), equaion (3) is replaced by he following facorizaion Y M P (S js ) = P (S () js () ) P (S js ; S(m ) ): (3) Similar exac, variaional, and Gibbs sampling procedures can be dened for his archiecure. However, noe ha hese couplings mus be inroduced wih cauion, as hey may resul in an exponenial growh in parameers. For example, he above facorizaion requires ransiion marices of size K 2 K. Raher han specifying hese higher-order couplings hrough probabiliy ransiion marices, one can inroduce second-order ineracion erms in he energy (log probabiliy) funcion. Such erms eecively couple he chains wihou he number of parameers incurred by a full probabiliy ransiion marix. 0 In he graphical model formalism hese correspond o symmeric undireced links, making he model a chain graph. While he Jensen, Laurizen and Olesen (990) algorihm can sill be used o propagae informaion exacly in chain graphs, such undireced links cause he normalizaion consan of he probabiliy disribuion he pariion funcion o depend on he coupling parameers. As in Bolzmann machines (Hinon & Sejnowski, 986), boh a clamped and an unclamped phase are herefore required for learning, where he goal of he unclamped phase is o compue he derivaive of he pariion funcion wih respec o he parameers (Neal, 992) Condiioning on inpus Like he hidden Markov model, he facorial HMM provides a model of he uncondiional densiy of he observaion sequences. In cerain problem domains, some of he observaions can be beer hough of as inpus or explanaory variables, and

21 FACTORIAL HIDDEN MARKOV MODELS 2 he ohers as oupus or response variables. The goal, in hese cases, is o model he condiional densiy of he oupu sequence given he inpu sequence. In machine learning erminology, uncondiional densiy esimaion is unsupervised while condiional densiy esimaion is supervised. Several algorihms for learning in hidden Markov models ha are condiioned on inpus have been recenly presened in he lieraure (Cacciaore & Nowlan, 994; Bengio & Frasconi, 995; Meila & Jordan, 996). Given a sequence of inpu vecors fx g, he probabilisic model for an inpu-condiioned facorial HMM is P (fs ; Y gjfx g) = MY TY P (S jx )P (Y js ; X ) MY =2 P (S js ; X )P (Y js ; X ): (4) The model depends on he specicaion of P (Y js ; X ) and P (S js ; X ), which are condiioned boh on a discree sae variable and on a (possibly coninuous) inpu vecor. The approach used in Bengio and Frasconi's Inpu Oupu HMMs (IOHMMs) suggess modeling P (S js ; X ) as K separae neural neworks, one for each seing of S. This decomposiion ensures ha a valid probabiliy ransiion marix is dened a each poin in inpu space if a sum-o-one consrain (e.g., sofmax nonlineariy) is used in he oupu of hese neworks. Using he decomposiion of each condiional probabiliy ino K neworks, inference in inpu-condiioned facorial HMMs is a sraighforward generalizaion of he algorihms we have presened for facorial HMMs. The exac forward{backward algorihm in Appendix B can be adaped by using he appropriae condiional probabiliies. Similarly, he Gibbs sampling procedure is no more complex when condiioned on inpus. Finally, he compleely facorized and srucured approximaions can also be generalized readily if he approximaing disribuion has a dependence on he inpu similar o he model's. If he probabiliy ransiion srucure P (S js ; X ) is no decomposed as above, bu has a complex dependence on he previous sae variable and inpu, inference may become considerably more complex. Depending on he form of he inpu condiioning, he Maximizaion sep of learning may also change considerably. In general, if he oupu and ransiion probabiliies are modeled as neural neworks, he M sep can no longer be solved exacly and a gradien-based generalized EM algorihm mus be used. For log-linear models, he M sep can be solved using an inner loop of ieraively reweighed leas-squares (McCullagh & Nelder, 989) Hidden Markov decision rees An ineresing generalizaion of facorial HMMs resuls if one condiions on an inpu X and orders he M sae variables such ha S depends on S (n) for

22 22 Z. GHAHRAMANI AND M.I. JORDAN X - X X + S () - S () S () + S (2) - S (2) S (2) + S (3) - S (3) S (3) + Y - Y Y + Figure 6. The hidden Markov decision ree. n < m (Figure 6). The resuling archiecure can be seen as a probabilisic decision ree wih Markovian dynamics linking he decision variables. Consider how his probabilisic model would generae daa a he rs ime sep, =. Given inpu X, he op node S () can ake on K values. This sochasically pariions X space ino K decision regions. The nex node down he hierarchy, S (2), subdivides each of hese regions ino K subregions, and so on. The oupu Y is generaed from he inpu X and he K-way decisions a each of he M hidden nodes. A he nex ime sep, a similar procedure is used o generae daa from he model, excep ha now each decision in he ree is dependen on he decision aken a ha node in he previous ime sep. Thus, he \hierarchical mixure of expers" archiecure (Jordan & Jacobs, 994) is generalized o include Markovian dynamics for he decisions. Hidden Markov decision rees provide a useful saring poin for modeling ime series wih boh emporal and spaial srucure a muliple resoluions. We explore his generalizaion of facorial HMMs in Jordan, Ghahramani, and Saul (997). 6. Conclusion In his paper we have examined he problem of learning for a class of generalized hidden Markov models wih disribued sae represenaions. This generalizaion provides boh a richer modeling ool and a mehod for incorporaing prior srucural informaion abou he sae variables underlying he dynamics of he sysem generaing he daa. Alhough exac inference in his class of models is generally inracable, we provided a srucured variaional approximaion ha can be compued racably. This approximaion forms he basis of he Expecaion sep in an EM algorihm for learning he parameers of he model. Empirical comparisons o several oher approximaions and o he exac algorihm show ha his approximaion is boh ecien o compue and accurae. Finally, we have shown ha

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

EKF SLAM vs. FastSLAM A Comparison

EKF SLAM vs. FastSLAM A Comparison vs. A Comparison Michael Calonder, Compuer Vision Lab Swiss Federal Insiue of Technology, Lausanne EPFL) michael.calonder@epfl.ch The wo algorihms are described wih a planar robo applicaion in mind. Generalizaion

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Nonlinear Parametric Hidden Markov Models

Nonlinear Parametric Hidden Markov Models M.I.T. Media Laboraory Percepual Compuing Secion Technical Repor No. Nonlinear Parameric Hidden Markov Models Andrew D. Wilson Aaron F. Bobick Vision and Modeling Group MIT Media Laboraory Ames S., Cambridge,

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif

hen found from Bayes rule. Specically, he prior disribuion is given by p( ) = N( ; ^ ; r ) (.3) where r is he prior variance (we add on he random drif Chaper Kalman Filers. Inroducion We describe Bayesian Learning for sequenial esimaion of parameers (eg. means, AR coeciens). The updae procedures are known as Kalman Filers. We show how Dynamic Linear

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

An EM based training algorithm for recurrent neural networks

An EM based training algorithm for recurrent neural networks An EM based raining algorihm for recurren neural neworks Jan Unkelbach, Sun Yi, and Jürgen Schmidhuber IDSIA,Galleria 2, 6928 Manno, Swizerland {jan.unkelbach,yi,juergen}@idsia.ch hp://www.idsia.ch Absrac.

More information

Testing for a Single Factor Model in the Multivariate State Space Framework

Testing for a Single Factor Model in the Multivariate State Space Framework esing for a Single Facor Model in he Mulivariae Sae Space Framework Chen C.-Y. M. Chiba and M. Kobayashi Inernaional Graduae School of Social Sciences Yokohama Naional Universiy Japan Faculy of Economics

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Variational Learning for Switching State-Space Models

Variational Learning for Switching State-Space Models LEER Communicaed by Volker resp Variaional Learning for Swiching Sae-Space odels Zoubin Ghahramani Geoffrey E. Hinon Gasby Compuaional euroscience Uni, Universiy College London, London WC 3R, U.K. We inroduce

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France ADAPTIVE SIGNAL PROCESSING USING MAXIMUM ENTROPY ON THE MEAN METHOD AND MONTE CARLO ANALYSIS Pavla Holejšovsá, Ing. *), Z. Peroua, Ing. **), J.-F. Bercher, Prof. Assis. ***) Západočesá Univerzia v Plzni,

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

A variational radial basis function approximation for diffusion processes.

A variational radial basis function approximation for diffusion processes. A variaional radial basis funcion approximaion for diffusion processes. Michail D. Vreas, Dan Cornford and Yuan Shen {vreasm, d.cornford, y.shen}@ason.ac.uk Ason Universiy, Birmingham, UK hp://www.ncrg.ason.ac.uk

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Temporal probability models

Temporal probability models Temporal probabiliy models CS194-10 Fall 2011 Lecure 25 CS194-10 Fall 2011 Lecure 25 1 Ouline Hidden variables Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

2.3 SCHRÖDINGER AND HEISENBERG REPRESENTATIONS

2.3 SCHRÖDINGER AND HEISENBERG REPRESENTATIONS Andrei Tokmakoff, MIT Deparmen of Chemisry, 2/22/2007 2-17 2.3 SCHRÖDINGER AND HEISENBERG REPRESENTATIONS The mahemaical formulaion of he dynamics of a quanum sysem is no unique. So far we have described

More information

Isolated-word speech recognition using hidden Markov models

Isolated-word speech recognition using hidden Markov models Isolaed-word speech recogniion using hidden Markov models Håkon Sandsmark December 18, 21 1 Inroducion Speech recogniion is a challenging problem on which much work has been done he las decades. Some of

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models Journal of Saisical and Economeric Mehods, vol.1, no.2, 2012, 65-70 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2012 A Specificaion Tes for Linear Dynamic Sochasic General Equilibrium Models

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Y. Xiang, Learning Bayesian Networks 1

Y. Xiang, Learning Bayesian Networks 1 Learning Bayesian Neworks Objecives Acquisiion of BNs Technical conex of BN learning Crierion of sound srucure learning BN srucure learning in 2 seps BN CPT esimaion Reference R.E. Neapolian: Learning

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

A Variational Learning Algorithm for the Abstract Hidden Markov Model

A Variational Learning Algorithm for the Abstract Hidden Markov Model A Variaional Learning Algorihm for he Absrac Hidden Markov Model Jeff Johns and Sridhar Mahadevan Compuer Science Deparmen Universiy of Massachuses Amhers, MA 01003 {johns, mahadeva}@cs.umass.edu Absrac

More information

Temporal probability models. Chapter 15, Sections 1 5 1

Temporal probability models. Chapter 15, Sections 1 5 1 Temporal probabiliy models Chaper 15, Secions 1 5 Chaper 15, Secions 1 5 1 Ouline Time and uncerainy Inerence: ilering, predicion, smoohing Hidden Markov models Kalman ilers (a brie menion) Dynamic Bayesian

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j =

12: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME. Σ j = 1: AUTOREGRESSIVE AND MOVING AVERAGE PROCESSES IN DISCRETE TIME Moving Averages Recall ha a whie noise process is a series { } = having variance σ. The whie noise process has specral densiy f (λ) = of

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides

Hidden Markov Models. Adapted from. Dr Catherine Sweeney-Reed s slides Hidden Markov Models Adaped from Dr Caherine Sweeney-Reed s slides Summary Inroducion Descripion Cenral in HMM modelling Exensions Demonsraion Specificaion of an HMM Descripion N - number of saes Q = {q

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

Product Integration. Richard D. Gill. Mathematical Institute, University of Utrecht, Netherlands EURANDOM, Eindhoven, Netherlands August 9, 2001

Product Integration. Richard D. Gill. Mathematical Institute, University of Utrecht, Netherlands EURANDOM, Eindhoven, Netherlands August 9, 2001 Produc Inegraion Richard D. Gill Mahemaical Insiue, Universiy of Urech, Neherlands EURANDOM, Eindhoven, Neherlands Augus 9, 21 Absrac This is a brief survey of produc-inegraion for biosaisicians. 1 Produc-Inegraion

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

α=1 α=2 α=

α=1 α=2 α= To be published in: Advances in Neural Informaion Processing Sysems 9, M C Mozer, M I Jordan, T Pesche (eds.), MIT Press, Cambridge, MA Auhor for correspondence: Peer Sollich Online learning from nie raining

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information

1.1. Example: Polynomial Curve Fitting 4 1. INTRODUCTION

1.1. Example: Polynomial Curve Fitting 4 1. INTRODUCTION 4. INTRODUCTION Figure.2 Plo of a raining daa se of N = poins, shown as blue circles, each comprising an observaion of he inpu variable along wih he corresponding arge variable. The green curve shows he

More information

. Now define y j = log x j, and solve the iteration.

. Now define y j = log x j, and solve the iteration. Problem 1: (Disribued Resource Allocaion (ALOHA!)) (Adaped from M& U, Problem 5.11) In his problem, we sudy a simple disribued proocol for allocaing agens o shared resources, wherein agens conend for resources

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Inferring Dynamic Dependency with Applications to Link Analysis

Inferring Dynamic Dependency with Applications to Link Analysis Inferring Dynamic Dependency wih Applicaions o Link Analysis Michael R. Siracusa Massachuses Insiue of Technology 77 Massachuses Ave. Cambridge, MA 239 John W. Fisher III Massachuses Insiue of Technology

More information

Failure of the work-hamiltonian connection for free energy calculations. Abstract

Failure of the work-hamiltonian connection for free energy calculations. Abstract Failure of he work-hamilonian connecion for free energy calculaions Jose M. G. Vilar 1 and J. Miguel Rubi 1 Compuaional Biology Program, Memorial Sloan-Keering Cancer Cener, 175 York Avenue, New York,

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar CONROL OF SOCHASIC SYSEMS P.R. Kumar Deparmen of Elecrical and Compuer Engineering, and Coordinaed Science Laboraory, Universiy of Illinois, Urbana-Champaign, USA. Keywords: Markov chains, ransiion probabiliies,

More information