Simultaneous estimation of rewards and dynamics from noisy expert demonstrations

Size: px
Start display at page:

Download "Simultaneous estimation of rewards and dynamics from noisy expert demonstrations"

Transcription

1 Smultneous estmton of rewrds nd dynmcs from nosy expert demonstrtons Mchel Hermn,2, Tobs Gndele, Jo rg Wgner, Felx Schmtt, nd Wolfrm Burgrd2 - Robert Bosch GmbH Stuttgrt - Germny 2- Unversty of Freburg - Deprtment of Computer Scence 790 Freburg - Germny Abstrct. Inverse Renforcement Lernng (IRL) descrbes the problem of lernng n unknown rewrd functon of Mrkov Decson Process (MDP) from demonstrtons of n expert. Current pproches typclly requre the system dynmcs to be known or ddtonl demonstrtons of stte trnstons to be vlble to solve the nverse problem ccurtely. If these ssumptons re not stsfed, heurstcs cn be used to compenste the lck of model of the system dynmcs. However, heurstcs cn dd bs to the soluton. To overcome ths, we present grdent-bsed pproch, whch smultneously estmtes rewrds, dynmcs, nd the prmeterzble stochstc polcy of n expert from demonstrtons, whle the stochstc polcy s functon of optml Q-vlues. Introducton The growng number of utonomous systems requres effcent methods to djust the system to new envronments nd tsks. Lernng from demonstrton offers methods to prmeterze desred behvor nd cn be splt nto two subfelds: Behvorl Clonng nd Inverse Renforcement Lernng (IRL). Behvorl Clonng estmtes polcy from demonstrtons nd therefore mmcs the expert drectly. Especlly, f the envronment or ts dynmcs chnge, pretrned polces cn be npproprte. Therefore, IRL [] hs been ntroduced, whch descrbes the problem of recoverng rewrd functon from demonstrtons, s the rewrd functon encodes the expert s gol. Approches hve been proposed whch solve the IRL problem under vrous ssumptons, e.g. [2, 3, 4, 5]. The cted pproches requre the true system dynmcs to be known. Inccurte trnston models cn bs the rewrd estmte. Snce the system dynmcs re often unknown, model-free IRL lgorthms hve been proposed, such s n [6, 7, 8]. Typclly, those pproches requre ccess to ddtonl observtons of trnstons. If these cnnot be obtned, the pproches tend to suffer from wrong generlztons due to heurstcs. Often, experts re unble to produce optml demonstrtons. As consequence, IRL pproches re necessry tht del wth stochstc behvor. In [9, 0, ], stochstc polces of mxmum (cusl) entropy re trned under the constrnt of mtchng feture expecttons. Ths cuses the stochstc polcy to be Boltzmnn dstrbuton over soft Q-vlues. However, f the expert s stochstc polcy follows dfferent type of dstrbuton, these pproches cn be npproprte. 677

2 Our contrbuton s to generlze IRL to the cse of unknown dynmcs nd unknown stochstc polces. We propose n pproch tht smultneously optmzes rewrds, dynmcs, nd the expert s stochstc polcy by mxmzng the posteror probblty of the demonstrtons. Even though mny trnstons hve never been observed, they nfluenced the expert s polcy nd cn therefore to some degree be nferred from demonstrtons. The expert s stochstc polcy s modeled s prmetrc functon of optml Q-vlues, whch ssumes tht the expert s ble to correctly estmte the vlue of dfferent ctons, but s unble to choose them pproprtely. We provde grdent-bsed soluton nd evlute our pproch on synthetc grdworld stellte nvgton tsk. 2 Fundmentls An MDP s tuple M = {S, A, P (s s, ), γ, P (s0 ), R}, where S s the stte spce wth sttes s S, A s the cton spce wth ctons A, P (s s, ) s the probblty of trnston to s when cton s ppled n stte s, γ [0, ) s dscount fctor, P (s0 ) s strt stte probblty dstrbuton, nd R : S A R s rewrd functon whch ssgns rel-vlued rewrd for pckng cton n stte s. Often, ths rewrd s expressed s lner functon R(s, ) = T f (s, ) of stte- nd cton-dependent fetures f : S A Rd wth feture weghts. The gol of n MDP s to fnd n optml polcy π (s) A, whch specfes sttedependent ctons, such P tht ts executon mxmzes the expected, dscounted, cumulted rewrd E [ t=0 γ t R (st, t ) s0 = s, π]. The optml vlue functon cn be computed by vlue terton, whch repetedly pples Eq. () to n rbtrry ntl Q-functon. After convergence, the optml polcy chooses the ctons wth the lrgest Q-vlue: π (s) = rgmx Q(s, ). Xh P (s s, ) mx Q(s, ) () Q(s, ) = R (s, ) + γ s S 3 Smultneous Estmton of Rewrds, Dynmcs, nd Stochstc Polcy (SERD-SP) We propose n pproch, clled Smultneous Estmton of Rewrds, Dynmcs, nd Stochstc Polcy (SERD-SP), to ccount for problems, where nether the rewrds, the dynmcs, nor the expert s stochstc polcy π (s, ) = P ( s) s known. Snce the expert s estmte of the trnston model my dffer from the rel one, we ntroduce ndependent models. Addtonlly, we ssume tht there exsts prmeterzble stochstc mppng π = g(q) from optml Q-vlues to the stochstc polcy of the expert. Then, the problem cn be formlzed s: Determne: Expert s rewrd functon R(s, ) Expert s estmte of the dynmcs PA (s s, ) Rel dynmcs P (s s, ) Stochstc polcy mppng g(q) 678

3 Gven: MDP M \ {R, P (s s, ), PA (s s, )} wthout rewrds nd dynmcs Demonstrtons D = {τ, τ2,..., τn } wth trjectores τ = {(sτ0, τ0 ), (sτ, τ ),..., sτtτ, τtτ of n expert ctng n M bsed on polcy tht depends on R(s, ), PA (s s, ), nd g(q) A set of prmeters of the rewrds, dynmcs, nd the stochstc polcy s ntroduced, whch should be estmted from the gven demonstrtons D: R TA T P Feture weghts of the rewrd functon R(s, ) Prmeters of the expert s trnston model PTA Prmeters of the rel trnston model PT Prmeters of the expert s stochstc polcy mppng g(q) We propose to mxmze the posteror probblty of the demonstrtons wth respect to the prmeters = R TA T P. Assumng ndependent trjectores, the lkelhood of the demonstrtons n D cn be expressed s P (D M, ) = Y P (sτ0 ) TY τ t=0 τ D π (sτt, τt ) PT sτt+ sτt, τt. (2) It should be noted tht the polcy π (s, ) depends on the prmeters R, TA, nd P. In contrst, the trnston model PT (s s, ) only depends on T. Then, the mxmum posteror estmtor of the prmeters cn be formulted: = rgmx log P (D M, ) + log P (). (3) We propose grdent-bsed method to optmze the prmeters ccordng to Eq. (3) wth L (D) = log P (D M, ) + log P (): τ X TX L (D) = log π (sτt, τt ) + log PT sτt+ sτt, τt t=0 τ D + log P (). (4) Snce the system dynmcs nd the pror re problem-dependent, the followng dervtons wll focus on the prtl dervtve log π (sτt, τt ). Ths requres the stochstc polcy mppng π = g(q) of the expert to be specfed. We wll exemplrly derve the grdent for Boltzmnn polcy wth temperture P : π (s, ) = g(q)(s, ) = P exp( P Q (s, )). ) exp Q (s, A P Then, the prtl dervtve of the log polcy log π (sτt, τt ) results n: h ( h ) Q (s, ) E Q (s, ) f 6= P π (s, log π (s, ) = P f = P Eπ (s, ) [Q (s, )] Q (s, ) 2 P 679 (5)

4 The grdent of the polcy depends on the grdent of the stte-cton vlue functon Q (s, ). Snce we ssume tht the expert chooses ctons bsed on n optml, greedy vlue functon, the dervtve of the Q-functon from Eq. () hs to be computed. Ths cn result n sub-dervtve, s the mx-functon s not dfferentble. Nevertheless, for the ske of smplcty, we cll t Q-grdent. X Q (s, ) = R f (s, ) + γ PTA (s s, ) V (s ) (6) s S X +γ PTA (s s, ) Q (s, π (s )) s S Eq. (6) shres smlrtes wth the pproch from Neu nd Szepesv r [3]. It s lner equton system nd cn be computed drectly. However, snce t s fxed pont equton, repetedly pplyng Eq. (6) to n rbtrry Q-grdent wll converge to the true one. Especlly n lrge stte nd cton spces, ths Qgrdent terton cn requre less computtons thn drectly solvng the lner equton system. Algorthm summrzes the proposed lgorthm. Algorthm SERD lgorthm Requre: MDP M \ {R, PT, PTA, g(q)}, Demonstrtons D, ntl 0, step sze α : N+ R+, t 0 whle not suffcently converged do Q QIterton(M, t ) Eq. () π DervePolcy(M, Q ) Eq. (5) dq ComputeQGrdent(M, Q, π, t ) Eq. (6) dl (D) ComputeGrdent(M, D, dq ) Eq. (4) t+ t + α(t)dl (D) t t+ end whle 4 Evluton We evlute the proposed pproch n stellte grdworld nvgton tsk, whch s llustrted n Fg.. The moton dynmcs re stochstc nd dffer n the forest nd on open terrn. The cton spce llows the gent to choose from fve dfferent ctons: movng n one of four drectons (north, est, south, or west) or remnng n the stte, respectvely. Possble successor sttes re the four neghbourng ones or the current one. On the open terrn (depcted n lght gry n Fg. (c)), the gent hs probblty of 0.8 to successfully execute the desred moton nd 0. to fll ether to the rght or to the left. In the forest (depcted n drk gry n Fg. (c)), successful motons only occur wth probblty of 0.3. The remnng successor sttes hve probblty of Styng n stte s lwys successful n both forest nd open terrn. Due to ths defnton of the moton dynmcs, the gent hs to trde off between short cuts through the forest, whch re less lkely to be successful, or longer pths on 680

5 () (b) (c) (d) (e) (f) Fg. : () Envronment, Mp dt: Google. (b) Dscretzed stte spce (Gol: green. Intl sttes: red.). (c) Forest sttes re ndcted n drk-gry nd open terrn n lght gry. (d) Rewrd (e) Vlue functon (f) Expected stte frequency. open terrn. The rewrd s functon of two fetures, whch re weghted by R = (6, 6). The frst feture encodes the normlzed gry scle vlue [0, ] of the mge, whle the second one s gol ndctor {0, }. The dscount s 0.99 nd the temperture of the Boltzmnn polcy ws set to P = 2. We compute the optml Q-functon nd smple trjectores from the resultng stochstc polcy to obtn expert demonstrtons. We ssume tht the expert hs knowledge bout the true trnston model. Therefore, the prmeters of the trnston model TA nd T re dentcl. The system dynmcs re modeled s energes of Boltzmnn dstrbutons. Snce there exst 4 moton ctons n ech, forest nd open terrn, s well s one styng cton, 9 models re trned wth 5 possble outcomes, resultng n 45 prmeters. An m-estmtor wth unform pror s used to estmte the dynmcs from demonstrtons before pplyng SERDSP or lterntve IRL pproches. The feture weghts re ntlzed rndomly ( : [ 0, 0]). We use Gussn prors for the feture weghts nd the polcy prmeter. The pror of the dynmcs s fvorng hgh entropes. We optmze ll prmeters for vrous szes of demonstrton sets wth SERD-SP nd compre t to the result of Mxmum Dscounted Cusl Entropy IRL [] (MDCE IRL), nd Reltve Entropy IRL [6] (REIRL). The ddtonl smples, whch re needed by REIRL, re smpled from the m-estmted trnston model. Fg. 2 summrzes the results. The medn log lkelhood of demonstrtons from the true model on the lerned ones n Fg. 2 () shows tht SERD-SP outperforms 0 3 SERD-SP )] 35 MDCE IRL A SERD-SP 45 MDCE IRL REIRL E[DKL (PT PT logp (D M ) D () Log lkelhood of the demonstrtons REIRL D (b) KL dvergence of the trnston model Fg. 2: () Medn wth qurtles of the log lkelhood of demonstrtons drwn from the true model under the estmted model. (b) Averge Kullbck-Lebler dvergence between the estmted dynmcs nd the true ones. 68

6 the other lgorthms, whle beng smple effcent. Ths result s understndble, s the comprtve pproches model dfferent types of stochstc polces. In ddton, Fg. 2 (b) llustrtes tht SERD-SP s further optmzng the ntlly m-estmted dynmcs, whch results n more ccurte models. 5 Concluson In ths pper, we presented grdent-bsed soluton for smultneous estmton of rewrds, dynmcs, s well s the expert s stochstc polcy. We ssume tht the expert s ble to compute n optml Q-functon, but executes suboptml ctons. Ths stochstcty s modeled by prmeterzble functon of optml Q-vlues. The evluton shows mproved performnce gnst trdtonl IRL methods wth more ccurte polces nd dynmcs. Future work could elborte on dfferent types of stochstc polces nd on the cse tht the gent s estmte of the dynmcs dffers from the true one. References [] Andrew Y. Ng nd Sturt J. Russell. Algorthms for nverse renforcement lernng. In Proceedngs of the Seventeenth Interntonl Conference on Mchne Lernng, ICML 00, pges , Sn Frncsco, CA, USA, Morgn Kufmnn Publshers Inc. [2] Peter Abbeel nd Andrew Y. Ng. Apprentceshp lernng v nverse renforcement lernng. In Proceedngs of the Twenty-frst Interntonl Conference on Mchne Lernng, ICML 04, New York, NY, USA, ACM. [3] Gergely Neu nd Csb Szepesv r. Apprentceshp lernng usng nverse renforcement lernng nd grdent methods. In UAI 2007, Proceedngs of the Twenty-Thrd Conference on Uncertnty n Artfcl Intellgence, Vncouver, BC, Cnd, July 9-22, 2007, pges , [4] Deepk Rmchndrn nd Eyl Amr. Byesn Inverse Renforcement Lernng. Proceedngs of the 20th Interntonl Jont Conference on Artfcl Intellgence, 5: , [5] Constntn A. Rothkopf nd Chrstos Dmtrkks. Preference elctton nd nverse renforcement lernng. In ECML/PKDD (3), volume 693 of Lecture Notes n Computer Scence, pges Sprnger, 20. [6] Abdeslm Boulrs, Jens Kober, nd Jn Peters. Reltve entropy nverse renforcement lernng. In Proceedngs of Fourteenth Interntonl Conference on Artfcl Intellgence nd Sttstcs (AISTATS 20), 20. [7] Edourd Klen, Mttheu Gest, Bll Pot, nd Olver Petqun. Inverse Renforcement Lernng through Structured Clssfcton. In Advnces n Neurl Informton Processng Systems (NIPS 202), Lke Thoe (NV, USA), December 202. [8] Edourd Klen, Bll Pot, Mttheu Gest, nd Olver Petqun. A cscded supervsed lernng pproch to nverse renforcement lernng. In Proceedngs of the Europen Conference on Mchne Lernng nd Prncples nd Prctce of Knowledge Dscovery n Dtbses (ECML/PKDD 203), Prgue (Czech Republc), September 203. [9] Brn D. Zebrt, Andrew Ms, J. Andrew (Drew) Bgnell, nd Annd Dey. Mxmum entropy nverse renforcement lernng. In Proceedng of AAAI 2008, July [0] Brn D. Zebrt, J. Andrew Bgnell, nd Annd K. Dey. Modelng ntercton v the prncple of mxmum cusl entropy. In Proc. of the Interntonl Conference on Mchne Lernng, pges , 200. [] Mchel Bloem nd Nchols Bmbos. Infnte tme horzon mxmum cusl entropy nverse renforcement lernng. In 53rd IEEE Conference on Decson nd Control, CDC 204, Los Angeles, CA, USA, December 5-7, 204, pges ,

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1 Denns Brcker, 2001 Dept of Industrl Engneerng The Unversty of Iow MDP: Tx pge 1 A tx serves three djcent towns: A, B, nd C. Ech tme the tx dschrges pssenger, the drver must choose from three possble ctons:

More information

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter.4 Newton-Rphson Method of Solvng Nonlner Equton After redng ths chpter, you should be ble to:. derve the Newton-Rphson method formul,. develop the lgorthm of the Newton-Rphson method,. use the Newton-Rphson

More information

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC Introducton Rnk One Updte And the Google Mtrx y Al Bernsten Sgnl Scence, LLC www.sgnlscence.net here re two dfferent wys to perform mtrx multplctons. he frst uses dot product formulton nd the second uses

More information

Remember: Project Proposals are due April 11.

Remember: Project Proposals are due April 11. Bonformtcs ecture Notes Announcements Remember: Project Proposls re due Aprl. Clss 22 Aprl 4, 2002 A. Hdden Mrov Models. Defntons Emple - Consder the emple we tled bout n clss lst tme wth the cons. However,

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter 0.04 Newton-Rphson Method o Solvng Nonlner Equton Ater redng ths chpter, you should be ble to:. derve the Newton-Rphson method ormul,. develop the lgorthm o the Newton-Rphson method,. use the Newton-Rphson

More information

Applied Statistics Qualifier Examination

Applied Statistics Qualifier Examination Appled Sttstcs Qulfer Exmnton Qul_june_8 Fll 8 Instructons: () The exmnton contns 4 Questons. You re to nswer 3 out of 4 of them. () You my use ny books nd clss notes tht you mght fnd helpful n solvng

More information

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II Mcroeconomc Theory I UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS MSc n Economcs MICROECONOMIC THEORY I Techng: A Lptns (Note: The number of ndctes exercse s dffculty level) ()True or flse? If V( y )

More information

Definition of Tracking

Definition of Tracking Trckng Defnton of Trckng Trckng: Generte some conclusons bout the moton of the scene, objects, or the cmer, gven sequence of mges. Knowng ths moton, predct where thngs re gong to project n the net mge,

More information

arxiv: v2 [cs.lg] 9 Nov 2017

arxiv: v2 [cs.lg] 9 Nov 2017 Renforcement Lernng under Model Msmtch Aurko Roy 1, Hun Xu 2, nd Sebstn Pokutt 2 rxv:1706.04711v2 cs.lg 9 Nov 2017 1 Google Eml: urkor@google.com 2 ISyE, Georg Insttute of Technology, Atlnt, GA, USA. Eml:

More information

18.7 Artificial Neural Networks

18.7 Artificial Neural Networks 310 18.7 Artfcl Neurl Networks Neuroscence hs hypotheszed tht mentl ctvty conssts prmrly of electrochemcl ctvty n networks of brn cells clled neurons Ths led McCulloch nd Ptts to devse ther mthemtcl model

More information

Principle Component Analysis

Principle Component Analysis Prncple Component Anlyss Jng Go SUNY Bufflo Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors

More information

Lecture 4: Piecewise Cubic Interpolation

Lecture 4: Piecewise Cubic Interpolation Lecture notes on Vrtonl nd Approxmte Methods n Appled Mthemtcs - A Perce UBC Lecture 4: Pecewse Cubc Interpolton Compled 6 August 7 In ths lecture we consder pecewse cubc nterpolton n whch cubc polynoml

More information

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER Yn, S.-P.: Locl Frctonl Lplce Seres Expnson Method for Dffuson THERMAL SCIENCE, Yer 25, Vol. 9, Suppl., pp. S3-S35 S3 LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN

More information

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM PRANESH KUMAR AND INDER JEET TANEJA Abstrct The mnmum dcrmnton nformton prncple for the Kullbck-Lebler cross-entropy well known n the lterture In th pper

More information

Reinforcement Learning with a Gaussian Mixture Model

Reinforcement Learning with a Gaussian Mixture Model Renforcement Lernng wth Gussn Mxture Model Alejndro Agostn, Member, IEEE nd Enrc Cely Abstrct Recent pproches to Renforcement Lernng (RL) wth functon pproxmton nclude Neurl Ftted Q Iterton nd the use of

More information

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G.

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G. CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION Indu Mnckm, Andrew S. Ln, nd Rchrd G. Brnuk Rce Unversty ABSTRACT Optmzng the selecton of lernng resources nd prctce

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9 CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9 Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project

More information

Introduction to Numerical Integration Part II

Introduction to Numerical Integration Part II Introducton to umercl Integrton Prt II CS 75/Mth 75 Brn T. Smth, UM, CS Dept. Sprng, 998 4/9/998 qud_ Intro to Gussn Qudrture s eore, the generl tretment chnges the ntegrton prolem to ndng the ntegrl w

More information

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia Vrble tme mpltude mplfcton nd quntum lgorthms for lner lgebr Andrs Ambns Unversty of Ltv Tlk outlne. ew verson of mpltude mplfcton;. Quntum lgorthm for testng f A s sngulr; 3. Quntum lgorthm for solvng

More information

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization On-lne Renforcement Lernng Usng Incrementl Kernel-Bsed Stochstc Fctorzton André M. S. Brreto School of Computer Scence McGll Unversty Montrel, Cnd msb@cs.mcgll.c Don Precup School of Computer Scence McGll

More information

10/20/2009. Announcements. Last Time: Q-Learning. Example: Pacman. Function Approximation. Feature-Based Representations

10/20/2009. Announcements. Last Time: Q-Learning. Example: Pacman. Function Approximation. Feature-Based Representations Introducton to Artfcl Intellgence V.047-00 Fll 009 Lecture : Renforcement Lernng Announcement Agnment due next Mondy t mdnght Plee end eml to me bout fnl exm Rob Fergu Dept of Computer Scence, Cournt Inttute,

More information

The Schur-Cohn Algorithm

The Schur-Cohn Algorithm Modelng, Estmton nd Otml Flterng n Sgnl Processng Mohmed Njm Coyrght 8, ISTE Ltd. Aendx F The Schur-Cohn Algorthm In ths endx, our m s to resent the Schur-Cohn lgorthm [] whch s often used s crteron for

More information

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service Dynmc Power Mngement n Moble Multmed System wth Gurnteed Qulty-of-Servce Qnru Qu, Qng Wu, nd Mssoud Pedrm Dept. of Electrcl Engneerng-Systems Unversty of Southern Clforn Los Angeles CA 90089 Outlne! Introducton

More information

4. Eccentric axial loading, cross-section core

4. Eccentric axial loading, cross-section core . Eccentrc xl lodng, cross-secton core Introducton We re strtng to consder more generl cse when the xl force nd bxl bendng ct smultneousl n the cross-secton of the br. B vrtue of Snt-Vennt s prncple we

More information

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands In ths Chpter Chp. 3 Mrov chns nd hdden Mrov models Bontellgence bortory School of Computer Sc. & Eng. Seoul Ntonl Unversty Seoul 5-74, Kore The probblstc model for sequence nlyss HMM (hdden Mrov model)

More information

International Journal of Pure and Applied Sciences and Technology

International Journal of Pure and Applied Sciences and Technology Int. J. Pure Appl. Sc. Technol., () (), pp. 44-49 Interntonl Journl of Pure nd Appled Scences nd Technolog ISSN 9-67 Avlle onlne t www.jopst.n Reserch Pper Numercl Soluton for Non-Lner Fredholm Integrl

More information

Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Improving Anytime Point-Based Value Iteration Using Principled Point Selections In In Proceedngs of the Twenteth Interntonl Jont Conference on Artfcl Intellgence (IJCAI-7) Improvng Anytme Pont-Bsed Vlue Iterton Usng Prncpled Pont Selectons Mchel R. Jmes, Mchel E. Smples, nd Dmtr A.

More information

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning 3/6/ Hdden Mrkov Models: Explnton nd Model Lernng Brn C. Wllms 6.4/6.43 Sesson 2 9/3/ courtesy of JPL copyrght Brn Wllms, 2 Brn C. Wllms, copyrght 2 Redng Assgnments AIMA (Russell nd Norvg) Ch 5.-.3, 2.3

More information

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Usng Predctons n Onlne Optmzton: Lookng Forwrd wth n Eye on the Pst Nngjun Chen Jont work wth Joshu Comden, Zhenhu Lu, Anshul Gndh, nd Adm Wermn 1 Predctons re crucl for decson mkng 2 Predctons re crucl

More information

Katholieke Universiteit Leuven Department of Computer Science

Katholieke Universiteit Leuven Department of Computer Science Updte Rules for Weghted Non-negtve FH*G Fctorzton Peter Peers Phlp Dutré Report CW 440, Aprl 006 Ktholeke Unverstet Leuven Deprtment of Computer Scence Celestjnenln 00A B-3001 Heverlee (Belgum) Updte Rules

More information

6 Roots of Equations: Open Methods

6 Roots of Equations: Open Methods HK Km Slghtly modfed 3//9, /8/6 Frstly wrtten t Mrch 5 6 Roots of Equtons: Open Methods Smple Fed-Pont Iterton Newton-Rphson Secnt Methods MATLAB Functon: fzero Polynomls Cse Study: Ppe Frcton Brcketng

More information

Statistics and Probability Letters

Statistics and Probability Letters Sttstcs nd Probblty Letters 79 (2009) 105 111 Contents lsts vlble t ScenceDrect Sttstcs nd Probblty Letters journl homepge: www.elsever.com/locte/stpro Lmtng behvour of movng verge processes under ϕ-mxng

More information

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members Onlne Appendx to Mndtng Behvorl Conformty n Socl Groups wth Conformst Members Peter Grzl Andrze Bnk (Correspondng uthor) Deprtment of Economcs, The Wllms School, Wshngton nd Lee Unversty, Lexngton, 4450

More information

Pyramid Algorithms for Barycentric Rational Interpolation

Pyramid Algorithms for Barycentric Rational Interpolation Pyrmd Algorthms for Brycentrc Rtonl Interpolton K Hormnn Scott Schefer Astrct We present new perspectve on the Floter Hormnn nterpolnt. Ths nterpolnt s rtonl of degree (n, d), reproduces polynomls of degree

More information

SCALED GRADIENT DESCENT LEARNING RATE Reinforcement learning with light-seeking robot

SCALED GRADIENT DESCENT LEARNING RATE Reinforcement learning with light-seeking robot SCALED GRADIET DESCET LEARIG RATE Renforcement lernng wth lght-seekng robot Kry Främlng Helsnk Unversty of Technology, P.O. Box 54, FI-5 HUT, Fnlnd. Eml: Kry.Frmlng@hut.f Keywords: Abstrct: Lner functon

More information

Jens Siebel (University of Applied Sciences Kaiserslautern) An Interactive Introduction to Complex Numbers

Jens Siebel (University of Applied Sciences Kaiserslautern) An Interactive Introduction to Complex Numbers Jens Sebel (Unversty of Appled Scences Kserslutern) An Interctve Introducton to Complex Numbers 1. Introducton We know tht some polynoml equtons do not hve ny solutons on R/. Exmple 1.1: Solve x + 1= for

More information

Investigation phase in case of Bragg coupling

Investigation phase in case of Bragg coupling Journl of Th-Qr Unversty No.3 Vol.4 December/008 Investgton phse n cse of Brgg couplng Hder K. Mouhmd Deprtment of Physcs, College of Scence, Th-Qr, Unv. Mouhmd H. Abdullh Deprtment of Physcs, College

More information

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and MANAGEMENT SCIENCE Vol. 53, No. 2, Februry 2007, pp. 308 322 ssn 0025-1909 essn 1526-5501 07 5302 0308 nforms do 10.1287/mnsc.1060.0614 2007 INFORMS Bs nd Vrnce Approxmton n Vlue Functon Estmtes She Mnnor

More information

A Family of Multivariate Abel Series Distributions. of Order k

A Family of Multivariate Abel Series Distributions. of Order k Appled Mthemtcl Scences, Vol. 2, 2008, no. 45, 2239-2246 A Fmly of Multvrte Abel Seres Dstrbutons of Order k Rupk Gupt & Kshore K. Ds 2 Fculty of Scence & Technology, The Icf Unversty, Agrtl, Trpur, Ind

More information

Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes

Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes Optml Resource Allocton nd Polcy Formulton n Loosely-Coupled Mrkov Decson Processes Dmtr A. Dolgov nd Edmund H. Durfee Deprtment of Electrcl Engneerng nd Computer Scence Unversty of Mchgn Ann Arbor, MI

More information

A Tri-Valued Belief Network Model for Information Retrieval

A Tri-Valued Belief Network Model for Information Retrieval December 200 A Tr-Vlued Belef Networ Model for Informton Retrevl Fernndo Ds-Neves Computer Scence Dept. Vrgn Polytechnc Insttute nd Stte Unversty Blcsburg, VA 24060. IR models t Combnng Evdence Grphcl

More information

Machine Learning Support Vector Machines SVM

Machine Learning Support Vector Machines SVM Mchne Lernng Support Vector Mchnes SVM Lesson 6 Dt Clssfcton problem rnng set:, D,,, : nput dt smple {,, K}: clss or lbel of nput rget: Construct functon f : X Y f, D Predcton of clss for n unknon nput

More information

Online Learning Algorithms for Stochastic Water-Filling

Online Learning Algorithms for Stochastic Water-Filling Onlne Lernng Algorthms for Stochstc Wter-Fllng Y G nd Bhskr Krshnmchr Mng Hseh Deprtment of Electrcl Engneerng Unversty of Southern Clforn Los Angeles, CA 90089, USA Eml: {yg, bkrshn}@usc.edu Abstrct Wter-fllng

More information

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x) DCDM BUSINESS SCHOOL NUMEICAL METHODS (COS -8) Solutons to Assgnment Queston Consder the followng dt: 5 f() 8 7 5 () Set up dfference tble through fourth dfferences. (b) Wht s the mnmum degree tht n nterpoltng

More information

Multiple view geometry

Multiple view geometry EECS 442 Computer vson Multple vew geometry Perspectve Structure from Moton - Perspectve structure from moton prolem - mgutes - lgerc methods - Fctorzton methods - Bundle djustment - Self-clrton Redng:

More information

Two Coefficients of the Dyson Product

Two Coefficients of the Dyson Product Two Coeffcents of the Dyson Product rxv:07.460v mth.co 7 Nov 007 Lun Lv, Guoce Xn, nd Yue Zhou 3,,3 Center for Combntorcs, LPMC TJKLC Nnk Unversty, Tnjn 30007, P.R. Chn lvlun@cfc.nnk.edu.cn gn@nnk.edu.cn

More information

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p*

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p* R. Smpth Kumr, R. Kruthk, R. Rdhkrshnn / Interntonl Journl of Engneerng Reserch nd Applctons (IJERA) ISSN: 48-96 www.jer.com Vol., Issue 4, July-August 0, pp.5-58 Constructon Of Mxed Smplng Plns Indexed

More information

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract Stochstc domnnce on optml portfolo wth one rsk less nd two rsky ssets Jen Fernnd Nguem LAMETA UFR Scences Economques Montpeller Abstrct The pper provdes restrctons on the nvestor's utlty functon whch re

More information

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVES Rodolphe Prm, Ntle Shlomo Southmpton Sttstcl Scences Reserch Insttute Unverst of Southmpton Unted Kngdom SAE, August 20 The BLUE-ETS Project s fnnced

More information

Many-Body Calculations of the Isotope Shift

Many-Body Calculations of the Isotope Shift Mny-Body Clcultons of the Isotope Shft W. R. Johnson Mrch 11, 1 1 Introducton Atomc energy levels re commonly evluted ssumng tht the nucler mss s nfnte. In ths report, we consder correctons to tomc levels

More information

Model Fitting and Robust Regression Methods

Model Fitting and Robust Regression Methods Dertment o Comuter Engneerng Unverst o Clorn t Snt Cruz Model Fttng nd Robust Regresson Methods CMPE 64: Imge Anlss nd Comuter Vson H o Fttng lnes nd ellses to mge dt Dertment o Comuter Engneerng Unverst

More information

Least squares. Václav Hlaváč. Czech Technical University in Prague

Least squares. Václav Hlaváč. Czech Technical University in Prague Lest squres Václv Hlváč Czech echncl Unversty n Prgue hlvc@fel.cvut.cz http://cmp.felk.cvut.cz/~hlvc Courtesy: Fred Pghn nd J.P. Lews, SIGGRAPH 2007 Course; Outlne 2 Lner regresson Geometry of lest-squres

More information

LAPLACE TRANSFORM SOLUTION OF THE PROBLEM OF TIME-FRACTIONAL HEAT CONDUCTION IN A TWO-LAYERED SLAB

LAPLACE TRANSFORM SOLUTION OF THE PROBLEM OF TIME-FRACTIONAL HEAT CONDUCTION IN A TWO-LAYERED SLAB Journl of Appled Mthemtcs nd Computtonl Mechncs 5, 4(4), 5-3 www.mcm.pcz.pl p-issn 99-9965 DOI:.75/jmcm.5.4. e-issn 353-588 LAPLACE TRANSFORM SOLUTION OF THE PROBLEM OF TIME-FRACTIONAL HEAT CONDUCTION

More information

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR REVUE D ANALYSE NUMÉRIQUE ET DE THÉORIE DE L APPROXIMATION Tome 32, N o 1, 2003, pp 11 20 THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR TEODORA CĂTINAŞ Abstrct We extend the Sheprd opertor by

More information

A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce

A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce A Regresson-Bsed Approch for Sclng-Up Personlzed Recommender Systems n E-Commerce Slobodn Vucetc 1 nd Zorn Obrdovc 1, svucetc@eecs.wsu.edu, zorn@cs.temple.edu 1 Electrcl Engneerng nd Computer Scence, Wshngton

More information

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Lner nd Nonlner Optmzton Ynyu Ye Deprtment of Mngement Scence nd Engneerng Stnford Unversty Stnford, CA 9430, U.S.A. http://www.stnford.edu/~yyye http://www.stnford.edu/clss/msnde/ Ynyu Ye, Stnford, MS&E

More information

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus ESI 34 tmospherc Dnmcs I Lesson 1 Vectors nd Vector lculus Reference: Schum s Outlne Seres: Mthemtcl Hndbook of Formuls nd Tbles Suggested Redng: Mrtn Secton 1 OORDINTE SYSTEMS n orthonorml coordnte sstem

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Sequences of Intuitionistic Fuzzy Soft G-Modules

Sequences of Intuitionistic Fuzzy Soft G-Modules Interntonl Mthemtcl Forum, Vol 13, 2018, no 12, 537-546 HIKARI Ltd, wwwm-hkrcom https://doorg/1012988/mf201881058 Sequences of Intutonstc Fuzzy Soft G-Modules Velyev Kemle nd Huseynov Afq Bku Stte Unversty,

More information

Exploiting Structure in Probability Distributions Irit Gat-Viks

Exploiting Structure in Probability Distributions Irit Gat-Viks Explotng Structure n rolty Dstrutons Irt Gt-Vks Bsed on presentton nd lecture notes of Nr Fredmn, Herew Unversty Generl References: D. Koller nd N. Fredmn, prolstc grphcl models erl, rolstc Resonng n Intellgent

More information

An Ising model on 2-D image

An Ising model on 2-D image School o Coputer Scence Approte Inerence: Loopy Bele Propgton nd vrnts Prolstc Grphcl Models 0-708 Lecture 4, ov 7, 007 Receptor A Knse C Gene G Receptor B Knse D Knse E 3 4 5 TF F 6 Gene H 7 8 Hetunndn

More information

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market CHAPTER - 7 Frefly Algorthm sed Strtegc Bddng to Mxmze Proft of IPPs n Compettve Electrcty Mrket 7. Introducton The renovton of electrc power systems plys mjor role on economc nd relle operton of power

More information

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity Interntonl Journl of Engneerng & Computer Scence IJECS-IJENS Vol:1 No:04 81 Two Actvton Functon Wvelet Network for the Identfcton of Functons wth Hgh Nonlnerty Wsm Khld Abdulkder Abstrct-- The ntegrton

More information

A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots

A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots Interntonl Conference on Control, Automton nd ystems 008 Oct. 4-7, 008 n COEX, eoul, Kore A Renforcement ernng ystem wth Chotc Neurl Networs-Bsed Adptve Herrchcl Memory tructure for Autonomous Robots Msno

More information

Utility function estimation: The entropy approach

Utility function estimation: The entropy approach Physc A 387 (28) 3862 3867 www.elsever.com/locte/phys Utlty functon estmton: The entropy pproch Andre Donso,, A. Hetor Res b,c, Lus Coelho Unversty of Evor, Center of Busness Studes, CEFAGE-UE, Lrgo Colegs,

More information

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service Dynmc Power Mngement n Moble Multmed System wth Gurnteed Qulty-of-Servce Abstrct In ths pper we ddress the problem of dynmc power mngement n dstrbuted multmed system wth requred qulty of servce (QoS).

More information

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example.

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example. CIS587 - rtfcl Intellgence Uncertnty K for medcl dgnoss. Exmple. We wnt to uld K system for the dgnoss of pneumon. rolem descrpton: Dsese: pneumon tent symptoms fndngs, l tests: Fever, Cough, leness, WC

More information

CHOVER-TYPE LAWS OF THE ITERATED LOGARITHM FOR WEIGHTED SUMS OF ρ -MIXING SEQUENCES

CHOVER-TYPE LAWS OF THE ITERATED LOGARITHM FOR WEIGHTED SUMS OF ρ -MIXING SEQUENCES CHOVER-TYPE LAWS OF THE ITERATED LOGARITHM FOR WEIGHTED SUMS OF ρ -MIXING SEQUENCES GUANG-HUI CAI Receved 24 September 2004; Revsed 3 My 2005; Accepted 3 My 2005 To derve Bum-Ktz-type result, we estblsh

More information

Review of linear algebra. Nuno Vasconcelos UCSD

Review of linear algebra. Nuno Vasconcelos UCSD Revew of lner lgebr Nuno Vsconcelos UCSD Vector spces Defnton: vector spce s set H where ddton nd sclr multplcton re defned nd stsf: ) +( + ) (+ )+ 5) λ H 2) + + H 6) 3) H, + 7) λ(λ ) (λλ ) 4) H, - + 8)

More information

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

GAUSS ELIMINATION. Consider the following system of algebraic linear equations Numercl Anlyss for Engneers Germn Jordnn Unversty GAUSS ELIMINATION Consder the followng system of lgebrc lner equtons To solve the bove system usng clsscl methods, equton () s subtrcted from equton ()

More information

Bi-level models for OD matrix estimation

Bi-level models for OD matrix estimation TNK084 Trffc Theory seres Vol.4, number. My 2008 B-level models for OD mtrx estmton Hn Zhng, Quyng Meng Abstrct- Ths pper ntroduces two types of O/D mtrx estmton model: ME2 nd Grdent. ME2 s mxmum-entropy

More information

Numerical Solution of Fredholm Integral Equations of the Second Kind by using 2-Point Explicit Group Successive Over-Relaxation Iterative Method

Numerical Solution of Fredholm Integral Equations of the Second Kind by using 2-Point Explicit Group Successive Over-Relaxation Iterative Method ITERATIOAL JOURAL OF APPLIED MATHEMATICS AD IFORMATICS Volume 9, 5 umercl Soluton of Fredholm Integrl Equtons of the Second Knd by usng -Pont Eplct Group Successve Over-Relton Itertve Method Mohn Sundrm

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Statistics 423 Midterm Examination Winter 2009

Statistics 423 Midterm Examination Winter 2009 Sttstcs 43 Mdterm Exmnton Wnter 009 Nme: e-ml: 1. Plese prnt your nme nd e-ml ddress n the bove spces.. Do not turn ths pge untl nstructed to do so. 3. Ths s closed book exmnton. You my hve your hnd clcultor

More information

Electrochemical Thermodynamics. Interfaces and Energy Conversion

Electrochemical Thermodynamics. Interfaces and Energy Conversion CHE465/865, 2006-3, Lecture 6, 18 th Sep., 2006 Electrochemcl Thermodynmcs Interfces nd Energy Converson Where does the energy contrbuton F zϕ dn come from? Frst lw of thermodynmcs (conservton of energy):

More information

INTRODUCTION TO COMPLEX NUMBERS

INTRODUCTION TO COMPLEX NUMBERS INTRODUCTION TO COMPLEX NUMBERS The numers -4, -3, -, -1, 0, 1,, 3, 4 represent the negtve nd postve rel numers termed ntegers. As one frst lerns n mddle school they cn e thought of s unt dstnce spced

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

An Introduction to Support Vector Machines

An Introduction to Support Vector Machines An Introducton to Support Vector Mchnes Wht s good Decson Boundry? Consder two-clss, lnerly seprble clssfcton problem Clss How to fnd the lne (or hyperplne n n-dmensons, n>)? Any de? Clss Per Lug Mrtell

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Haddow s Experiment:

Haddow s Experiment: schemtc drwng of Hddow's expermentl set-up movng pston non-contctng moton sensor bems of sprng steel poston vres to djust frequences blocks of sold steel shker Hddow s Experment: terr frm Theoretcl nd

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ M/G//GD/ / System! Pollcze-Khnchn (PK) Equton L q 2 2 λ σ s 2( + ρ ρ! Stedy-stte probbltes! π 0 ρ! Fndng L, q, ) 2 2 M/M/R/GD/K/K System! Drw the trnston dgrm! Derve the stedy-stte probbltes:! Fnd L,L

More information

Altitude Estimation for 3-D Tracking with Two 2-D Radars

Altitude Estimation for 3-D Tracking with Two 2-D Radars th Interntonl Conference on Informton Fuson Chcgo Illnos USA July -8 Alttude Estmton for -D Trckng wth Two -D Rdrs Yothn Rkvongth Jfeng Ru Sv Svnnthn nd Soontorn Orntr Deprtment of Electrcl Engneerng Unversty

More information

Quiz: Experimental Physics Lab-I

Quiz: Experimental Physics Lab-I Mxmum Mrks: 18 Totl tme llowed: 35 mn Quz: Expermentl Physcs Lb-I Nme: Roll no: Attempt ll questons. 1. In n experment, bll of mss 100 g s dropped from heght of 65 cm nto the snd contner, the mpct s clled

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Decentralized Stochastic Planning with Anonymity in Interactions

Decentralized Stochastic Planning with Anonymity in Interactions Decentrlzed Stochstc Plnnng wth Anonymty n Interctons Prdeep Vrnthm, Yossr Adulys, Ptrc Jllet School of Informton Systems, Sngpore ngement Unversty Sngpore-IT Allnce for Reserch nd Technology (SART), sschusetts

More information

The Number of Rows which Equal Certain Row

The Number of Rows which Equal Certain Row Interntonl Journl of Algebr, Vol 5, 011, no 30, 1481-1488 he Number of Rows whch Equl Certn Row Ahmd Hbl Deprtment of mthemtcs Fcult of Scences Dmscus unverst Dmscus, Sr hblhmd1@gmlcom Abstrct Let be X

More information

Lecture 36. Finite Element Methods

Lecture 36. Finite Element Methods CE 60: Numercl Methods Lecture 36 Fnte Element Methods Course Coordntor: Dr. Suresh A. Krth, Assocte Professor, Deprtment of Cvl Engneerng, IIT Guwht. In the lst clss, we dscussed on the ppromte methods

More information

Advanced Machine Learning. An Ising model on 2-D image

Advanced Machine Learning. An Ising model on 2-D image Advnced Mchne Lernng Vrtonl Inference Erc ng Lecture 12, August 12, 2009 Redng: Erc ng Erc ng @ CMU, 2006-2009 1 An Isng model on 2-D mge odes encode hdden nformton ptchdentty. They receve locl nformton

More information

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics Michael Herman Tobias Gindele Jörg Wagner Felix Schmitt Wolfram Burgard Robert Bosch GmbH D-70442 Stuttgart, Germany

More information

Reproducing Kernel Hilbert Space for. Penalized Regression Multi-Predictors: Case in Longitudinal Data

Reproducing Kernel Hilbert Space for. Penalized Regression Multi-Predictors: Case in Longitudinal Data Interntonl Journl of Mthemtcl Anlyss Vol. 8, 04, no. 40, 95-96 HIKARI Ltd, www.m-hkr.com http://dx.do.org/0.988/jm.04.47 Reproducng Kernel Hlbert Spce for Penlzed Regresson Mult-Predctors: Cse n Longudnl

More information

Proactive and Reactive Coordination of Non-dedicated Agent Teams Operating in Uncertain Environments

Proactive and Reactive Coordination of Non-dedicated Agent Teams Operating in Uncertain Environments Proctve nd Rectve Coordnton of Non-dedcted Agent Tems Opertng n Uncertn Envronments Prtee Agrwl, Prdeep Vrknthm School of Informton Systems, Sngpore Mngement Unversty, Sngpore 188065 prtee.2013@phds.smu.edu.sg,

More information

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters Stte Estmton n PN nd PPN Gudnce Lws by Usng Unscented nd Extended Klmn Flters S.H. oospour*, S. oospour**, mostf.sdollh*** Fculty of Electrcl nd Computer Engneerng, Unversty of brz, brz, Irn, *s.h.moospour@gml.com

More information

Manuel Pulido and Takemasa Miyoshi UMI IFAECI (CNRS-CONICET-UBA) Department of Atmospheric and.

Manuel Pulido and Takemasa Miyoshi UMI IFAECI (CNRS-CONICET-UBA) Department of Atmospheric and. Eplortory worshop DADA October 5-8 Buenos Ares Argentn Prmeter estmton usng EnKF Jun Ruz In colborton wth Mnuel Puldo nd Tems Myosh jruz@cm.cen.ub.r UMI IFAECI (CNRS-CONICET-UBA) Deprtment o Atmospherc

More information

INTERPOLATION(1) ELM1222 Numerical Analysis. ELM1222 Numerical Analysis Dr Muharrem Mercimek

INTERPOLATION(1) ELM1222 Numerical Analysis. ELM1222 Numerical Analysis Dr Muharrem Mercimek ELM Numercl Anlss Dr Muhrrem Mercmek INTEPOLATION ELM Numercl Anlss Some of the contents re dopted from Lurene V. Fusett, Appled Numercl Anlss usng MATLAB. Prentce Hll Inc., 999 ELM Numercl Anlss Dr Muhrrem

More information

Audio De-noising Analysis Using Diagonal and Non-Diagonal Estimation Techniques

Audio De-noising Analysis Using Diagonal and Non-Diagonal Estimation Techniques Audo De-nosng Anlyss Usng Dgonl nd Non-Dgonl Estmton Technques Sugt R. Pwr 1, Vshl U. Gdero 2, nd Rhul N. Jdhv 3 1 AISSMS, IOIT, Pune, Ind Eml: sugtpwr@gml.com 2 Govt Polytechnque, Pune, Ind Eml: vshl.gdero@gml.com

More information

Soft Set Theoretic Approach for Dimensionality Reduction 1

Soft Set Theoretic Approach for Dimensionality Reduction 1 Interntonl Journl of Dtbse Theory nd pplcton Vol No June 00 Soft Set Theoretc pproch for Dmensonlty Reducton Tutut Herwn Rozd Ghzl Mustf Mt Ders Deprtment of Mthemtcs Educton nversts hmd Dhln Yogykrt Indones

More information

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order Hndw Publshng Corporton Interntonl Journl of Dfferentl Equtons Volume 0, Artcle ID 7703, pges do:055/0/7703 Reserch Artcle On the Upper Bounds of Egenvlues for Clss of Systems of Ordnry Dfferentl Equtons

More information