Robust Adaptive Markov Decision Processes in Multivehicle
|
|
- Eric Stevens
- 6 years ago
- Views:
Transcription
1 Robust Adaptve Markov Decson Processes n Multvehcle Applcatons The MIT Faculty has made ths artcle openly avalable. Please share how ths access benefts you. Your story matters. Ctaton As Publshed Publsher Bertuccell, L.F., B. Bethke, and J.P. How. Robust adaptve Markov Decson Processes n mult-vehcle applcatons. Amercan Control Conference, ACC ' Copyrght 2009 IEEE Insttute of Electrcal and Electroncs Engneers Verson Fnal publshed verson Accessed Mon Apr 09 08::06 EDT 208 Ctable Lnk Terms of Use Detaled Terms Artcle s made avalable n accordance wth the publsher's polcy and may be subject to US copyrght law. Please refer to the publsher's ste for terms of use.
2 2009 Amercan Control Conference Hyatt Regency Rverfront, St. Lous, MO, USA June 0-2, 2009 WeB9.4 Robust Adaptve Markov Decson Processes n Mult-vehcle Applcatons Luca F. Bertuccell, Brett Bethke, and Jonathan P. How Aerospace Controls Laboratory Massachusetts Insttute of Technology {lucab, bbethke, jhow}@mt.edu Abstract Ths paper presents a new robust and adaptve framework for Markov Decson Processes that accounts for errors n the transton probabltes. Robust polces are typcally found off-lne, but can be extremely conservatve when mplemented n the real system. Adaptve polces, on the other hand, are specfcally suted for on-lne mplementaton, but may dsplay undesrable transent performance as the model s updated though learnng. A new method that explots the ndvdual strengths of the two approaches s presented n ths paper. Ths robust and adaptve framework protects the adaptaton process from exhbtng a worst-case performance durng the model updatng, and s shown to converge to the true, optmal value functon n the lmt of a large number of state transton observatons. The proposed framework s nvestgated n smulaton and actual flght experments, and shown to mprove transent behavor n the adaptaton process and overall msson performance. I. INTRODUCTION Many decson processes, such as Markov Decson Processes (MDPs) and Partally Observable MDPs (POMDPs) are modeled as a probablstc process drven by a known Markov Chan. In practce however, the true parameters of the Markov Chan are frequently unavalable to the modeler, and many researchers have recently addressed the ssue of robust performance n these decson systems [] [4]. Whle many authors have studed the problem of MDPs wth uncertan transton probabltes [5] [7], robust counterparts to these MDPs have been obtaned only recently. Robust MDP counterparts have been ntroduced n the work of Bagnell et al [8], Nlm [] and Iyengar [2]. Bagnell presented a robust value teraton algorthm for solvng the robust MDPs. The convergence of robust value teraton was formally proved by Nlm [] and Iyengar [2]. Both Nlm and Iyengar ntroduced meanngful uncertanty sets for the transton probabltes that could be effcently solved by addng an addtonal, nner optmzaton on the uncertan transton probabltes. One of the methods for fndng a robust polcy n [] was to use scenaro-based methods, wheren the performance s optmzed for dfferent realzatons of the transton probabltes. However, t was recently shown that a scenaro-based approach may requre an extremely large number of realzatons to yeld a robust polcy [4]. Ths observaton motvated the development of a specfc scenaro selecton process usng the frst two moments of a Bayesan pror to obtan robust polces usng much fewer scenaros [4], [2]. Robust methods fnd robust polces that hedge aganst errors n the transton probabltes. However, there are many cases when ths type of an approach s too conservatve. For example, t may be possble to dentfy the transton probabltes by observng state transtons, and obtan mproved estmates, and resolve the optmzaton to fnd a less conservatve polcy. Model-based learnng of MDPs s closely related to ndrect adaptve control [9] n that the transton probabltes are estmated n real-tme usng a maxmum lkelhood estmator. At each tme step, certanty equvalence s assumed on the transton probabltes, and a new polcy s found wth the new model estmate [0]. Jaulmes et al. [], [2] study ths problem n an actve estmaton context usng POMDPs. Marbach [3] consders ths problem, when the transton probabltes depend on a parameter vector. Konda and Tstskls [4] consder the problem of slowly-varyng Markov Chans n the context of renforcement learnng. Sato [5] consders ths problem and shows asymptotc convergence of the probablty estmates also n the context of dual control. Kumar [6] also consdered the adaptaton problem. Ford and Moore [7] consder the problem of estmatng the parameters of a non-statonary Hdden Markov Model. Ths paper demonstrates the need to account for both robust plannng and adaptaton n MDPs wth uncertanty n ther transton probabltes. Just lke n control [8] or n task assgnment problems [9], adaptaton alone s generally not suffcent to ensure relable operaton of the overall control system. Ths paper shows that robustness s crtcal to mtgatng worst-case performance, partcularly durng the transent perods of the adaptaton. Ths paper contrbutes a new combned robust and adaptve problem formulaton for MDPs wth errors n the transton probabltes. The key result of ths paper shows that robust and adaptve MDPs can converge to the truly optmal objectve n the lmt of a large number of observatons. We demonstrate the robust component of ths approach by usng a Bayesan pror, and fnds the robust polcy by usng scenaro-based methods. We then augment the robust approach wth an adaptaton scheme that s more effectve at ncorporatng new nformaton n the models. The MDP framework s dscussed n Secton II, the mpact of uncertanty s demonstrated n Secton III, and then we present the ndvdual components of robustness and adaptaton n /09/$ AACC 304
3 Secton IV. The combned robust and adaptve MDP s shown to converge to the true, optmal value functon n the lmt of a large number of observatons. The paper concludes n Secton VI wth a set of demonstratve numercal smulatons and actual flght results on our UAV testbed. A. Problem Formulaton II. MARKOV DECISION PROCESS The Markov Decson Process (MDP) framework that we consder n ths paper conssts of a set of states S of cardnalty N, a set of control actons u U of cardnalty M wth a correspondng polcy µ : S U, a transton model gven by A u j = Pr( k+ j k,u k ), and a reward model g(,u). The tme-addtve objectve functon s defned as N J µ = g N ( N ) + φ k g k ( k,u k ) () k=0 where 0 < φ s an approprate dscount factor. The goal s to fnd an optmal control polcy, µ, that maxmzes an expected objectve gven some known transton model A u J = max µ E [ J µ ( 0 ) ] (2) In an nfnte horzon settng (N ), the soluton to Eq. 2 can be found by solvng the Bellman Equaton [ ] J () = max g() + φ A u u jj ( j), (3) j The optmal control s found by solvng u () argmax u U E[ J µ ( 0 ) ] S (4) The optmal polcy can be found n many dfferent ways usng Value Iteraton or Polcy Iteraton, whle Lnear Programmng can be used for moderately szed problems [20]. III. MODEL UNCERTAINTY It has been shown that the value functon can be based n the presence of small errors n the transton probabltes [3], and that the optmal polcy µ can be extremely senstve to small errors n the model parameters. For example, n the context of UAV mssons, t has been shown that errors n the state transton matrx, Ã u, can result n ncreased UAV crashes when mplemented n real systems [2]. An example of ths suboptmal performance s reflected n Fgure, whch shows two summary plots for a 2-UAV persstent survellance msson formulated as an MDP [22] averaged over 00 Monte Carlo smulatons. The smulatons were performed wth modelng errors: the polcy was found by usng an estmated probablty shown on the y-axs ( Modeled ), but mplemented on the real system that assumed a nomnal probablty shown on the x-axs, ( Actual ). Fgure (a) shows the mean number of faled vehcles n the msson. Note that n the regon labeled Rsky, the falure rate s ncreased sgnfcantly such that all the vehcles n the msson are lost due to the modelng error. Fgure (b) shows the penalty n total coverage tme when the transton probablty s underestmated (n the area denoted (a) Total number of faled vehcles (b) Mean coverage tme vs msmatched fuel flow probabltes Fg.. Impact on modelng error on the overall msson effectveness as Rsky n the fgure). In ths regon, the total coverage tme decreases from approxmately 40 tme steps (out of a 50 tme step msson) to only 0 tme steps. It s of paramount mportance to develop precse mathematcal descrptons for these errors and use ths nformaton to fnd robust polces. Whle there are many methods to descrbe uncertanty sets [], [2], our approach reles on a Bayesan descrpton of ths uncertanty. Ths choce s prmarly motvated by the need to update estmates of these probabltes n realtme n a computatonally tractable manner. Ths approach assumes a pror Drchlet dstrbuton on each row of the transton matrx, and recursvely updates ths dstrbuton wth observatons. The Drchlet dstrbuton f D at tme k for a row of the N-dmensonal transton model s gven by p k = [p, p 2,..., p N ] T and postve dstrbuton parameters α(k) = [α,α 2,...,α N ] T, s defned as f D (p k α(k)) = K N p α =, p = (5) = K p α p α N ( p ) α N = where K s a normalzng factor that ensures the probablty dstrbuton ntegrates to unty. Each p s the th entry of 305
4 the m th row, that s: p = A u m, and 0 p and p =. The prmary reasons for usng the Drchlet dstrbuton s that the mean p satsfes the requrements of a probablty vector 0 p and p = by constructon. Furthermore, the parameters α can be nterpreted as counts, or tmes that a partcular state transton was observed. Ths enables computatonally tractable updates on the dstrbuton based on new observatons. The uncertanty set descrpton for the Drchlet s known as a credblty regon, and can be found by Monte Carlo ntegraton. IV. ADAPTATION AND ROBUSTNESS Ths secton dscusses ndvdual methods for adaptng to changes n the transton probabltes, as well as methods for accountng for robustness n the presence of the transton probablty uncertanty. A. Adaptaton It s well known that the Drchlet dstrbuton s conjugate to the multnomal dstrbuton, mplyng a measurement update step that can be expressed n closed form usng the prevously observed counts α(k). The posteror dstrbuton f D (p k+ α(k + )) s gven n terms of the pror f D (p k α(k)) as f D (p k+ α(k + )) f D (p k α(k)) f M (β(k) p k ) = N = p α p β N = = p α +β where f M (β(k) p k ) s a multnomal dstrbuton wth hyperparameters β(k) = [β,...,β N ]. Each β s the total number of transtons observed from state to a new state : mathematcally β = δ, and { f transton δ, = 0 Otherwse ndcates how many tmes transtons were observed from state to state. For the next dervatons, we assume that only a sngle transton can occur per tme step, β = δ,. Upon recept of the observatons β(k), the parameters α(k) are updated accordng to α (k + ) = α (k) + δ, (6) and the mean can be found by normalzng these parameters p = α /α 0. Our recent work [23] has shown that the mean p can be equvalently expressed recursvely n terms of the prevous mean and varance p = α /α 0 (7) Σ = α (α 0 α ) α0 2(α (8) 0 + ) by recursvely wrtng these moments as δ, p (k) p (k + ) = p (k) + Σ (k) p (k)( p (k)) Σ (k + ) = γ k+ Σ (k) + p (k+)( p (k+)) p (k)( p (k)) where γ k+ = p (k+)( p (k+)). Furthermore, t was shown that these mean-varance recursons, just as ther countequvalent counterparts, can be slow n detectng changes f the model s non-statonary. Hence, a modfed set of recursons was derved that showed that the followng recursons provded a much more effectve change-detecton mechansm. δ, p (k) p (k + ) = p (k) + /λ k Σ (k) p (k)( p (k)) Σ (k + ) = λ k γ k+ Σ (k) + p (k)( p (k)) (9) (0) The key change was the addton of an effectve process through the use of a dscount factor 0 < λ k, and ths allowed for a much faster estmator response [23]. B. Robustness Whle an adaptaton mechansm s useful to account for changes n the transton probabltes, the estmates of the transton probabltes are only guaranteed to converge n the lmt of an nfnte number of observatons. Whle n practce the estmates do not requre an unbounded number of observatons, smply replacng the uncertan model à wth the best estmate  may lead to a based value functon [3] and senstve polces, especally f the estmator has not yet converged to the true parameter A. For the purposes of ths paper, the robust counterpart of Eq. (2) s defned as [], [2] JR = mn max E[ J µ ( 0 ) ] () à A µ Lke the nomnal problem, the objectve functon s maxmzed wth respect to the control polcy; however, for the robust counterpart, the objectve s mnmzed wth respect to the uncertanty set A. When the uncertanty model A s descrbed by a Bayesan pror, scenaro-based methods can be used to generate realzatons of the transton probablty model. Ths gves rse to a scenaro-based robust method whch can turn out to be computatonally ntensve, snce the total number of scenaros needs to be large [2]. Ths motvated our work [4] that, gven a pror Drchlet dstrbuton on the transton probabltes, determnstcally generates samples of each transton probablty row Y (so-called Drchlet Sgma Ponts) usng the frst two statstcal moments of each row of the transton probablty matrx, p and Σ, Y 0 = p { Y = p + β ( Σ /2) p β ( Σ /2) =,...,N = N +,...,2N where β s a tunng parameter that depends on the level of desred conservatsm, whch n turn depends on the sze of the credblty regon. Here, Σ /2 denotes the th row of the matrx square root of Σ. The uncertanty set A contans the determnstc samples Y {,2,...,2N}. 306
5 V. ROBUST ADAPTATION There are many choces for replannng effcently usng model-based methods, such as Real Tme Dynamc Programmng (RTDP) [24], [25]. RTDP assumes that the transton probabltes are unknown, and are contnually updated through an agent s actons n the state space. Due to computatonal consderatons, only a sngle sweep of the value teraton s performed at each measurement update. The result of Gullapall [25] shows that f each state and acton are executed nfntely often, then the (asynchronous) value teraton algorthm converges to the true value functon. An alternatve strategy s to perform synchronous value teraton, by usng a bootstrappng approach where the old polcy s used as the ntal guess for the new polcy [26]. In ths secton, we consder the full robust replannng problem (see Algorthm ). The two man steps are an adaptaton step, where the Drchlet dstrbutons (or alternatvely, the Drchlet Sgma Ponts) for each row and acton are updated based on the most recent observatons, and a robust replan step. For ths paper, we use the Drchlet Sgma Ponts to fnd the robust polcy by usng scenaro-based methods, but we note that the followng theoretcal results apply to any robust value functon. Whle appealng to account for both robustness and adaptaton, t s crtcal to demonstrate that the proposed algorthm n fact converges to the true, optmal soluton n the lmt. We show ths next. A. Convergence Gullapall and Barto [25] showed that n an adaptve (but non-robust) settng, an asynchronous verson of the Value Iteraton algorthm converges to the optmal value functon. Theorem : [25] (Convergence of an adaptve, asynchronous value teraton algorthm) For any fnte state, fnte acton MDP wth an nfnte-horzon dscounted performance measure, an ndrect adaptve asynchronous value teraton algorthm converges to the optmal value functon wth probablty one f: ) the condtons for convergence of the non-adaptve algorthm are met; 2) n the lmt, every acton s executed from every state nfntely often; 3) the estmates of the state transton probabltes reman bounded and converge n the lmt to ther true values wth probablty one. Proof: See [25]. Usng the framework of the above theorem, the robust counterpart to ths theorem s stated next. Theorem 2: (Convergence of a robust adaptve, asynchronous value teraton algorthm) For any fnte state, fnte acton MDP wth an nfnte-horzon dscounted performance measure, a robust, ndrect adaptve asynchronous value teraton algorthm of Theorem { mnµ max J k+ () = Ak A k E[J µ ] f B k S (4) J k () Otherwse converges to the optmal value functon wth probablty one f the condtons of Theorem are satsfed, and the Algorthm Robust Replannng Intalze uncertanty model: for example, Drchlet dstrbuton parameters α whle Not fnshed do Usng a statstcally effcent estmator, update estmates of the transton probabltes (for each row, acton). For example usng the dscounted estmator of Eq. 9 δ, p (k) p (k + ) = p (k) + /λ k Σ (k) p (k)( p (k)) Σ (k + ) = λ k γ k+ Σ (k) + p (k)( p (k)) For each uncertan row of the transton probablty matrx, fnd the robust polcy usng robust DP mn maxe[ ] J µ (2) A µ For example, update the Drchlet Sgma Ponts (for each row, acton), Y 0 = p ( Y = p + β Σ /2) =,...,N (3) ( Y = p β Σ /2) = N +,...,2N and fnd new robust polcy mn µ max A Y E [ ] J µ Return end whle uncertanty set A k converges to the sngleton  k, n other words, lm k A k = { k }. Here B k denotes the subset of states that are updated at each tme step. Proof: The key dfference between ths theorem and Theorem s the maxmzaton over the uncertanty set A k. However, as addtonal observatons are ncurred and by vrtue of the convergent, unbased estmator, the sze of the uncertanty set wll decrease to the sngleton unbased estmate  k. Furthermore, snce the robust operator gven by T. = mn µ max Ak A k s a contracton mappng [], [2]. Usng both of these arguments, and snce ths unbased estmate wll n turn converge to the true value of the transton probablty, then the robust adaptve asynchronous value teraton algorthm wll converge to the true, optmal soluton. Corollary 3: (Convergence of synchronous verson) The synchronous verson of the robust, adaptve MDP wll converge to the true, optmal value functon. Proof: In the event that an entre sweep of the state space occurs at each value teraton, then the uncertanty set A k wll stll converge to the sngleton { k }. Remark: (Convergence of robust adaptaton wth Drchlet Sgma Ponts) For the Drchlet Sgma Ponts, the dscounted estmator of Eq. 9 converges n the lmt of a large number of observatons (wth approprate choce of λ k ), and the covarance Σ s eventually drven to 0, then each of the Drchlet Sgma Ponts wll collapse to the sngleton, the unbased estmate of the true transton probabltes. Ths means that the model wll have converged, and that the robust soluton wll n fact have converged to the optmal value functon. 307
6 VI. NUMERICAL RESULTS Ths secton presents actual flght demonstratons of the proposed robust and adaptve algorthm on a persstent survellance msson n the RAVEN testbed [22]. The UAVs are ntally located at a base locaton, whch s separated by some (possbly large) dstance from the survellance locaton. The objectve of the problem s to mantan a specfed number r of requested UAVs over the survellance locaton at all tmes. The base locaton s denoted by Y b, the survellance locaton s denoted by Y s, and a dscretzed set of ntermedate locatons are denoted by {Y 0,...,Y s }. Vehcles can move between adjacent locatons at a rate of one unt per tme step. The UAVs have a specfed maxmum fuel capacty F max, and we assume that the rate Ḟ burn at whch they burn fuel may vary randomly durng the msson: the probablty of nomnal fuel flow s gven by p nom. Ths uncertanty n the fuel flow may be attrbuted to aggressve maneuverng that may be requred for short tme perods, for example. Thus, the total flght tme each vehcle acheves on a gven flght s a random varable, and ths uncertanty must be accounted for n the problem. If a vehcle runs out of fuel whle n flght, t crashes and s lost. The vehcles can refuel (at a rate Ḟ re f uel ) by returnng to the base locaton. In ths secton, the adaptve replannng was mplemented by explctly accountng for the uncertanty n the probablty of nomnal fuel flow, p nom. The replannng archtecture updates both the mean and varance of the fuel flow transton probablty, whch s then passed to the onlne MDP solver, whch computes the robust polcy. Ths robust polcy s then passed to the polcy executer, whch mplements the control decson on the sytem. The Drchlet Sgma Ponts were formed usng updated mean and varance Y 0 = ˆp nom Y = ˆp nom + β σ p Y 2 = ˆp nom β σ p and used to fnd the robust polcy. Usng the results from the earler chapters, approprate choces of β could range from to 5, where β 3 corresponds to a 99% certanty regon for the Drchlet (n ths case, the Beta densty). For ths scalar problem, the robust soluton of the MDP corresponds to usng a value of ˆp nom βσ p n place of the nomnal probablty estmate ˆp nom. Flght experments were performed for a case when the probablty estmate ˆp nom was vared n md-msson, and three dfferent replannng strateges were compared Adaptve only: The frst replan strategy nvolved only an adaptve strategy, wth λ = 0.8, and usng only the estmate ˆp nom Robust replan, undscounted adaptaton: Ths replan strategy used the undscounted mean-varance estmator λ =, and set β = 4 for the Drchlet Sgma Ponts Robust replan, dscounted adaptaton: Ths replan strategy used the undscounted mean-varance estmator λ = 0.8, and set β = 4 for the Drchlet Sgma Ponts (a) Fast adaptaton (λ = 0.8) wth no robustness (β = 0) (b) Hgh robustness (β = 4) but slow adaptaton (λ = ) Fg. 2. Expermental results showng vehcle trajectores (red and blue), and probablty estmate used n the plannng (black) In all cases, the vehcle takes off from base, travels through 2 ntermedate areas, and then reaches the survellance locaton. In the nomnal fuel flow settng losng unt of fuel per tme step, the vehcle can safely reman at the survellance regon for 4 tme steps, but n the off-nomnal fuel flow settng (losng 2 unts), the vehcle can only reman on survellance for only tme step. The man results are shown n Fgure 2, where the transton n p nom occurred at t = 7 tme steps. At ths pont n tme, one of the vehcles s just completng the survellance, and s ntatng the return to base to refuel, as the second vehcle s headng to the survellance area. The key to the successful msson, n the sense of avodng vehcle crashes, s to ensure that the change s detected suffcently quckly, and that the planner mantans some level of cautousness n ths estmate by embeddng robustness. The successful msson wll detect ths change rapdly, and leave the UAVs on target for a shorter tme. The result of Fgure 2(a) gnores any uncertanty n the estmate but has a fast adaptaton (snce t uses the factor λ = 0.8). However, by not embeddng the uncertanty, the estmator whle detectng the change n p nom quckly, nonetheless allocates the second vehcle to reman at the survellance regon. Consequently, one of the vehcles runs out of fuel 308
7 ACKNOWLEDGEMENTS Research supported by AFOSR grant FA Fg. 3. Fast adaptaton (λ = 0.8) wth robustness (β = 4) and crashes. At the second cycle of the msson, the second vehcle remans at the survellance area for only tme step. The result of Fgure 2(b) accounts for uncertanty n the estmate but has a slow adaptaton (snce t uses the factor λ = ). However, whle embeddng the uncertanty, the replannng s not done quckly, and for ths dfferent reason from the adaptve, non-robust example, one of the vehcle runs out of fuel, and crashes. At the second cycle of the msson, the second vehcle remans at the survellance area for only tme step. Fgure 3 shows the robustness and adaptaton actng together to cautously allocate the vehcles, whle respondng quckly to changes n p nom. The second vehcle s allocated to perform survellance for only 2 tme steps (nstead of 3), and safely returns to base wth no fuel remanng. At the second cycle, both vehcles only stay at the survellance area for tme step. Hence, the robustness and adaptaton have together been able to recover msson effcency by brngng n ther relatve strengths: the robustness by accountng for uncertanty n the probablty, and the adaptaton by quckly respondng to the changes n the probablty. VII. CONCLUSIONS Ths paper has presented a combned robust and adaptve framework that accounts for errors n the transton probabltes. Ths framework s shown to converge to the true, optmal value functon n the lmt of a large number of observatons. The proposed framework has been verfed both n smulaton and actual flght experments, and shown to mprove transent behavor n the adaptaton process and overall msson performance. Our current work s addressng a more actve learnng mechansm for the transton probabltes, by the use of exploratory actons specfcally taken to reduce the uncertanty n the transton probabltes. Our future work wll consder the problem of decentralzaton of the robust adaptve framework across multple vehcles, specfcally addressng the ssues of model consensus n a mult-agent system, and the mpact of any dsagreement on the robust soluton. REFERENCES [] A. Nlm and L. E. Ghaou, Robust Solutons to Markov Decson Problems wth Uncertan Transton Matrces, Operatons Research, vol. 53, no. 5, [2] G. Iyengar, Robust Dynamc Programmng, Math. Oper. Res., vol. 30, no. 2, pp , [3] S. Mannor, D. Smester, P. Sun, and J. Tstskls, Bas and Varance Approxmaton n Value Functon Estmates, Management Scence, vol. 52, no. 2, pp , [4] L. F. Bertuccell and J. P. How, Robust Decson-Makng for Uncertan Markov Decson Processes Usng Sgma Pont Samplng, IEEE Amercan Controls Conference, [5] D. E. Brown and C. C. Whte., Methods for reasonng wth mprecse probabltes n ntellgent decson systems, IEEE Conference on Systems, Man and Cybernetcs, pp. 6 63, 990. [6] J. K. Sata and R. E. Lave., Markovan Decson Processes wth Uncertan Transton Probabltes, Operatons Research, vol. 2, no. 3, 973. [7] C. C. Whte and H. K. Eldeb., Markov Decson Processes wth Imprecse Transton Probabltes, Operatons Research, vol. 42, no. 4, 994. [8] A. Bagnell, A. Ng, and J. Schneder, Solvng Uncertan Markov Decson Processes, NIPS, 200. [9] K. J. Astrom and B. Wttenmark, Adaptve Control. Boston, MA, USA: Addson-Wesley Longman Publshng Co., Inc., 994. [0] R. S. Sutton and A. G. Barto, Renforcement Learnng: An Introducton (Adaptve Computaton and Machne Learnng). The MIT Press, 998. [] R. Jaulmes, J. Pneau, and D. Precup., Actve Learnng n Partally Observable Markov Decson Processes, European Conference on Machne Learnng (ECML), [2] R. Jaulmes, J. Pneau, and D. Precup., Learnng n Non-Statonary Partally Observable Markov Decson Processes, ECML Workshop on Renforcement Learnng n Non-Statonary Envronments, [3] P. Marbach, Smulaton-based methods for Markov Decson Processes. PhD thess, MIT, 998. [4] V. Konda and J. Tstskls, Lnear stochastc approxmaton drven by slowly varyng Markov chans, Systems and Control Letters, vol. 50, [5] M. Sato, K. Abe, and H. Takeda., Learnng Control of Fnte Markov Chans wth Unknown Transton Probabltes, IEEE Trans. on Automatc Control, vol. AC-27, no. 2, 982. [6] P. R. Kumar and W. Ln., Smultaneous Identfcaton and Adaptve Control of Unknown Systems over Fnte Parameters Sets., IEEE Trans. on Automatc Control, vol. AC-28, no., 983. [7] J. Ford and J. Moore, Adaptve Estmaton of HMM Transton Probabltes, IEEE Transactons on Sgnal Processng, vol. 46, no. 5, 998. [8] P. A. Ioannou and J. Sun, Robust Adaptve Control. Prentce-Hall, 996. [9] M. Alghanbar and J. P. How, A Robust Approach to the UAV Task Assgnment Problem, Internatonal Journal of Robust and Nonlnear Control, vol. 8, no. 2, [20] M. Puterman, Markov Decson Processes: Dscrete Stochastc Dynamc Programmng. Wley, [2] L. F. Bertuccell, Robust Decson-Makng wth Model Uncertanty n Aerospace Systems. PhD thess, MIT, [22] B. Bethke, J. How, and J. Van., Group Health Management of UAV Teams Wth Applcatons to Persstent Survellance, IEEE Amercan Controls Conference, [23] L. F. Bertuccell and J. P. How, Estmaton of Non-Statonary Markov Chan Transton Models, IEEE Conference on Decson and Control, [24] A. Barto, S. Bradtke, and S. Sngh., Learnng to Act usng Real-Tme Dynamc Programmng, Artfcal Intellgence, vol. 72, pp. 8 38, 993. [25] V. Gullapall and A. Barto., Convergence of Indrect Adaptve Asynchronous Value Iteraton Algorthms, Advances n NIPS, 994. [26] B. Bethke, L. Bertuccell, and J. P. How, Expermental Demonstraton of MDP- Based Plannng wth Model Uncertanty, n AIAA Gudance Navgaton and Control Conference, Aug AIAA
LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationDynamic Programming. Lecture 13 (5/31/2017)
Dynamc Programmng Lecture 13 (5/31/2017) - A Forest Thnnng Example - Projected yeld (m3/ha) at age 20 as functon of acton taken at age 10 Age 10 Begnnng Volume Resdual Ten-year Volume volume thnned volume
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationWinter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan
Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationChapter - 2. Distribution System Power Flow Analysis
Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Adjusted Control Lmts for U Charts Copyrght 207 by Taylor Enterprses, Inc., All Rghts Reserved. Adjusted Control Lmts for U Charts Dr. Wayne A. Taylor Abstract: U charts are used
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More information8. Modelling Uncertainty
8. Modellng Uncertanty. Introducton. Generatng Values From Known Probablty Dstrbutons. Monte Carlo Smulaton 4. Chance Constraned Models 5 5. Markov Processes and Transton Probabltes 6 6. Stochastc Optmzaton
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationSimultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals
Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationOutline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]
DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationResource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud
Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationThe Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction
ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationTime-Varying Systems and Computations Lecture 6
Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy
More informationA linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:
Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationPortfolios with Trading Constraints and Payout Restrictions
Portfolos wth Tradng Constrants and Payout Restrctons John R. Brge Northwestern Unversty (ont wor wth Chrs Donohue Xaodong Xu and Gongyun Zhao) 1 General Problem (Very) long-term nvestor (eample: unversty
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationCopyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor
Taylor Enterprses, Inc. Control Lmts for P Charts Copyrght 2017 by Taylor Enterprses, Inc., All Rghts Reserved. Control Lmts for P Charts Dr. Wayne A. Taylor Abstract: P charts are used for count data
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationDETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH
Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationAppendix B. The Finite Difference Scheme
140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton
More informationDUE: WEDS FEB 21ST 2018
HOMEWORK # 1: FINITE DIFFERENCES IN ONE DIMENSION DUE: WEDS FEB 21ST 2018 1. Theory Beam bendng s a classcal engneerng analyss. The tradtonal soluton technque makes smplfyng assumptons such as a constant
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationReinforcement learning
Renforcement learnng Nathanel Daw Gatsby Computatonal Neuroscence Unt daw @ gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~daw Mostly adapted from Andrew Moore s tutorals, copyrght 2002, 2004 by Andrew
More informationTransfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system
Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationA New Evolutionary Computation Based Approach for Learning Bayesian Network
Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationRyan (2009)- regulating a concentrated industry (cement) Firms play Cournot in the stage. Make lumpy investment decisions
1 Motvaton Next we consder dynamc games where the choce varables are contnuous and/or dscrete. Example 1: Ryan (2009)- regulatng a concentrated ndustry (cement) Frms play Cournot n the stage Make lumpy
More informationAn Integrated Asset Allocation and Path Planning Method to to Search for a Moving Target in in a Dynamic Environment
An Integrated Asset Allocaton and Path Plannng Method to to Search for a Movng Target n n a Dynamc Envronment Woosun An Mansha Mshra Chulwoo Park Prof. Krshna R. Pattpat Dept. of Electrcal and Computer
More informationk t+1 + c t A t k t, t=0
Macro II (UC3M, MA/PhD Econ) Professor: Matthas Kredler Fnal Exam 6 May 208 You have 50 mnutes to complete the exam There are 80 ponts n total The exam has 4 pages If somethng n the queston s unclear,
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationHongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)
ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationA Hybrid Variational Iteration Method for Blasius Equation
Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More information