Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
|
|
- Abraham Gallagher
- 5 years ago
- Views:
Transcription
1 Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo
2 Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model: Pr (s t s t 1, t 1 ) Rewrd model (i.e., utility): R(s t, t ) Discount fctor: 0 γ 1 Horizon (i.e., # of time steps): h Gol: find optiml policy π 2
3 Finite Horizon Policy evlution V h π s = h t=0 γ t Pr (S t = s S 0 = s, π)r(s, π t (s )) Recursive form (dynmic progrmming) V 0 π s = R(s, π 0 s ) V t π s = R s, π t s + γ Pr s s, π t s V t 1 π (s ) s 3
4 Optiml Policy π Finite Horizon V h π s V h π s π, s Optiml vlue function V (shorthnd for V π ) V 0 s = mx V t s = mx R(s, ) R s, + γ Pr s s, V t 1 (s ) s Bellmn s eqution 4
5 Vlue Itertion Algorithm vlueitertion(mdp) V 0 s mx R(s, ) s For t = 1 to h do V t s mx R s, + γ Pr s s s, V t 1 (s ) Return V s Optiml policy π t = 0: π 0 s rgmx R s, s t > 0: π t s rgmx R s, + γ Pr s s s, V t 1 (s ) NB: π is non sttionry (i.e., time dependent) s 5
6 Mtrix form: Vlue Itertion R : S 1 column vector of rewrds for V t : S 1 column vector of stte vlues T : S S mtrix of trnsition prob. for vlueitertion(mdp) V 0 mx R For t = 1 to h do V t mx R + γt V t 1 Return V 6
7 Infinite Horizon Let h Then V h π V π nd V h 1 π V π Policy evlution: V π s = R s, π s + γ s Pr s s, π s V π (s ) s Bellmn s eqution: V s = mx R s, + γ Pr s s s, V (s ) 7
8 Policy evlution Liner system of equtions V π s = R s, π s + γ s Pr s s, π s V π (s ) s Mtrix form: R: S 1 column vector of ste rewrds for π V: S 1 column vector of stte vlues for π T: S S mtrix of trnsition prob for π V = R + γtv 8
9 Solving liner equtions Liner system: V = R + γtv Gussin elimintion: I γt V = R Compute inverse: V = I γt 1 R Itertive methods Vlue itertion (.k.. Richrdson itertion) Repet V R + γtv 9
10 Contrction Let H(V) R + γtv be the policy evl opertor Lemm 1: H is contrction mpping. H V H V γ V V Proof H V H V = R + γtv R γtv (by definition) = γt V V (simplifiction) γ T V V (since AB A B ) = γ V V (since mx s s T(s, s ) = 1) 10
11 Convergence Theorem 2: Policy evlution converges to V π for ny initil estimte V lim n H(n) V = V π V Proof By definition V π = H 0, but policy evlution computes H V for ny initil V By lemm 1, H (n) V H n V γ n V V Hence, when n, then H (n) V H n 0 0 nd H V = V π V 11
12 Approximte Policy Evlution In prctice, we cn t perform n infinite number of itertions. Suppose tht we perform vlue itertion for k steps nd H k V H k 1 V = ε, how fr is H k V from V π? 12
13 Approximte Policy Evlution Theorem 3: If H k V H k 1 V ε then V π H k V ε 1 γ Proof V π H k V = H (V) H k V (by Theorem 2) = t=1 H t+k V H t+k 1 V t=1 H t+k (V) H t+k 1 V ( A + B A + B ) = t=1 γ t ε = ε 1 γ (by Lemm 1) 13
14 Optiml Vlue Function Non-liner system of equtions V s = mx R s, + γ Pr s s s, V (s ) s Mtrix form: R : S 1 column vector of rewrds for V : S 1 column vector of optiml vlues T : S S mtrix of trnsition prob for V = mx R + γt V 14
15 Contrction Let H (V) mx vlue itertion R + γt V be the opertor in Lemm 3: H is contrction mpping. H V H V γ V V Proof: without loss of generlity, let H V let s = rgmx s H (V)(s) nd R s, + γ s Pr s s, V(s ) 15
16 Proof continued: Contrction Then 0 H V s H V s (by ssumption) R s, s + γ Pr s s, s s V s (by definition) R s, s γ Pr s s, s s V s = γ Pr s s s, s V s V s γ Pr s s, s s V V (mxnorm upper bound) = γ V V (since Pr s s, s s = 1) Repet the sme rgument for H V s H (V)(s) nd for ech s 16
17 Convergence Theorem 4: Vlue itertion converges to V for ny initil estimte V lim n H (n) V = V V Proof By definition V = H 0, but vlue itertion computes H V for some initil V By lemm 3, H (n) V H n V γ n V V Hence, when n, then H (n) V H n 0 0 nd H V = V V 17
18 Vlue Itertion Even when horizon is infinite, perform finitely mny itertions Stop when V n V n 1 ε vlueitertion(mdp) V 0 mx R ; n 0 Repet n n + 1 V n mx Until V n V n 1 Return V n R + γt V n 1 ε 18
19 Induced Policy Since V n V n 1 ε, by Theorem 4: we know tht V n V ε 1 γ But, how good is the sttionry policy π n s extrcted bsed on V n? π n s = rgmx How fr is V π n from V? R s, + γ Pr s s, V n (s ) s 19
20 Induced Policy Theorem 5: V π n V Proof 2ε 1 γ V π n V = Vπ n V n + V n V V π n V n + V n V ( A + B A + B ) = H π n (V n ) V n + V n H V n ε + ε 1 γ 1 γ = 2ε 1 γ (by Theorems 2 nd 4) 20
21 Summry Vlue itertion Simple dynmic progrmming lgorithm Complexity: O(n A S 2 ) Here n is the number of itertions Cn we optimize the policy directly insted of optimizing the vlue function nd then inducing policy? Yes: by policy itertion 21
{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationMarkov Decision Processes
Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly
More informationCf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.
Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP
More informationArtificial Intelligence Markov Decision Problems
rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome
More informationReinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationWe will see what is meant by standard form very shortly
THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationFinding Correlated Equilibria in General Sum Stochastic Games
Finding Correlted Equilibri in Generl Sum Stochstic Gmes Chris Murry nd Geoff Gordon June 2007 CMU-ML-07-113 Finding Correlted Equilibri in Generl Sum Stochstic Gmes Chris Murry nd Geoff Gordon June 2007
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationChapter 3. Vector Spaces
3.4 Liner Trnsformtions 1 Chpter 3. Vector Spces 3.4 Liner Trnsformtions Note. We hve lredy studied liner trnsformtions from R n into R m. Now we look t liner trnsformtions from one generl vector spce
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving
More informationEfficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute
More informationIntroduction to Numerical Analysis
Introduction to Numericl Anlysis Doron Levy Deprtment of Mthemtics nd Center for Scientific Computtion nd Mthemticl Modeling (CSCAMM) University of Mrylnd June 14, 2012 D. Levy CONTENTS Contents 1 Introduction
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More informationUninformed Search Lecture 4
Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following
More informationApplying Q-Learning to Flappy Bird
Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this
More informationCSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes
CSE 260-002: Exm 3-ANSWERS, Spring 20 ime: 50 minutes Nme: his exm hs 4 pges nd 0 prolems totling 00 points. his exm is closed ook nd closed notes.. Wrshll s lgorithm for trnsitive closure computtion is
More informationJim Lambers MAT 169 Fall Semester Lecture 4 Notes
Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of
More information1.2. Linear Variable Coefficient Equations. y + b "! = a y + b " Remark: The case b = 0 and a non-constant can be solved with the same idea as above.
1 12 Liner Vrible Coefficient Equtions Section Objective(s): Review: Constnt Coefficient Equtions Solving Vrible Coefficient Equtions The Integrting Fctor Method The Bernoulli Eqution 121 Review: Constnt
More informationMetrics for Finite Markov Decision Processes
Metrics for Finite Mrkov Decision Processes Norm Ferns chool of Computer cience McGill University Montrél, Cnd, H3 27 nferns@cs.mcgill.c Prksh Pnngden chool of Computer cience McGill University Montrél,
More informationNon-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes
Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity
More informationCS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power
CS411-2015S-12 Turing Mchine Modifictions 1 12-0: Extending Turing Mchines When we dded stck to NFA to get PDA, we incresed computtionl power Cn we do the sme thing for Turing Mchines? Tht is, cn we dd
More informationChapter 2 Finite Automata
Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht
More informationHidden Markov Models
Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:
More informationA Variance Analysis for POMDP Policy Evaluation
Proceedings of the Twenty-Third AAAI Conference on Artificil Intelligence (2008) A Vrince Anlysis for POMDP Policy Evlution Mhdi Milni Frd nd Joelle Pineu School of Computer Science McGill University,
More informationConvergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,
More informationSearch: The Core of Planning
Serch: The Core of Plnning Dr. Neil T. Dntm CSCI-498/598 RPM, Colordo School of Mines Spring 208 Dntm (Mines CSCI, RPM) Serch Spring 208 / 75 Outline Plnning nd Serch Problems Bsic Serch Depth-First Serch
More information1 Linear Least Squares
Lest Squres Pge 1 1 Liner Lest Squres I will try to be consistent in nottion, with n being the number of dt points, nd m < n being the number of prmeters in model function. We re interested in solving
More informationHoeffding, Azuma, McDiarmid
Hoeffding, Azum, McDirmid Krl Strtos 1 Hoeffding (sum of independent RVs) Hoeffding s lemm. If X [, ] nd E[X] 0, then for ll t > 0: E[e tx ] e t2 ( ) 2 / Proof. Since e t is conve, for ll [, ]: This mens:
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationA Generalized Reinforcement-Learning Model: Convergence and. Applications
URL ftp://iserv.iki.kfki.hu/pub/ppers/icml96/szepes.greinf.ps.z WWW http://iserv.iki.kfki.hu/dptlb.html A Generlized Reinforcement-Lerning Model: Convergence nd Applictions Michel L. Littmn Deprtment of
More informationREINFORCEMENT learning (RL) was originally studied
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd
More informationReview of Gaussian Quadrature method
Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge
More informationFinite Horizon Risk Sensitive MDP and Linear Programming
Finite Horizon Risk Sensitive MDP nd Liner Progrmming Atul Kumr, Veerrun Kvith nd N. Hemchndr IEOR, Indin Institute of Technology Bomby, Indi Abstrct In the context of stndrd Mrkov decision processes (MDPs),
More informationMulti-Armed Bandits: Non-adaptive and Adaptive Sampling
CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent
More informationCS 188: Artificial Intelligence
CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction
More informationModule 6: LINEAR TRANSFORMATIONS
Module 6: LINEAR TRANSFORMATIONS. Trnsformtions nd mtrices Trnsformtions re generliztions of functions. A vector x in some set S n is mpped into m nother vector y T( x). A trnsformtion is liner if, for
More informationSOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS
ELM Numericl Anlysis Dr Muhrrem Mercimek SOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS ELM Numericl Anlysis Some of the contents re dopted from Lurene V. Fusett, Applied Numericl Anlysis using MATLAB.
More informationThe problems that follow illustrate the methods covered in class. They are typical of the types of problems that will be on the tests.
ADVANCED CALCULUS PRACTICE PROBLEMS JAMES KEESLING The problems tht follow illustrte the methods covered in clss. They re typicl of the types of problems tht will be on the tests. 1. Riemnn Integrtion
More informationChapter 3 Solving Nonlinear Equations
Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,
More informationNumerical Integration
Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the
More informationPoint-Based POMDP Algorithms: Improved Analysis and Implementation
Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA 15213 Astrct Existing complexity ounds for point-sed
More informationMatrix Solution to Linear Equations and Markov Chains
Trding Systems nd Methods, Fifth Edition By Perry J. Kufmn Copyright 2005, 2013 by Perry J. Kufmn APPENDIX 2 Mtrix Solution to Liner Equtions nd Mrkov Chins DIRECT SOLUTION AND CONVERGENCE METHOD Before
More informationCMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)
CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All
More informationProbabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford
Probbilistic Model Checking Michelms Term 2011 Dr. Dve Prker Deprtment of Computer Science University of Oxford Long-run properties Lst lecture: regulr sfety properties e.g. messge filure never occurs
More informationLearning Moore Machines from Input-Output Traces
Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model
More informationMath 4310 Solutions to homework 1 Due 9/1/16
Mth 4310 Solutions to homework 1 Due 9/1/16 1. Use the Eucliden lgorithm to find the following gretest common divisors. () gcd(252, 180) = 36 (b) gcd(513, 187) = 1 (c) gcd(7684, 4148) = 68 252 = 180 1
More informationExam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1
Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution
More informationNumerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1
AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n
More informationCS 188: Artificial Intelligence Fall 2010
CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given
More informationSturm-Liouville Theory
LECTURE 1 Sturm-Liouville Theory In the two preceing lectures I emonstrte the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series re just the tip of the iceerg of the theory
More informationA ROLLOUT CONTROL ALGORITHM FOR DISCRETE-TIME STOCHASTIC SYSTEMS
Proceedings of the ASE 2 Dynmic Systems nd Control Conference DSCC2 September 2-5, 2, Cmbridge, sschusetts, USA DSCC2- A ROLLOUT CONTROL ALGORITH FOR DISCRETE-TIE STOCHASTIC SYSTES Andres A. liopoulos
More informationDecision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information
CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI
More informationThe Value 1 Problem for Probabilistic Automata
The Vlue 1 Prolem for Proilistic Automt Bruxelles Nthnël Fijlkow LIAFA, Université Denis Diderot - Pris 7, Frnce Institute of Informtics, Wrsw University, Polnd nth@lif.univ-pris-diderot.fr June 20th,
More informationZ b. f(x)dx. Yet in the above two cases we know what f(x) is. Sometimes, engineers want to calculate an area by computing I, but...
Chpter 7 Numericl Methods 7. Introduction In mny cses the integrl f(x)dx cn be found by finding function F (x) such tht F 0 (x) =f(x), nd using f(x)dx = F (b) F () which is known s the nlyticl (exct) solution.
More informationWhere did dynamic programming come from?
Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationCAAM 453 NUMERICAL ANALYSIS I Examination There are four questions, plus a bonus. Do not look at them until you begin the exam.
Exmintion 1 Posted 23 October 2002. Due no lter thn 5pm on Mondy, 28 October 2002. Instructions: 1. Time limit: 3 uninterrupted hours. 2. There re four questions, plus bonus. Do not look t them until you
More informationOrdinary Differential Equations- Boundary Value Problem
Ordinry Differentil Equtions- Boundry Vlue Problem Shooting method Runge Kutt method Computer-bsed solutions o BVPFD subroutine (Fortrn IMSL subroutine tht Solves (prmeterized) system of differentil equtions
More informationBias and Variance Approximation in Value Function Estimates
Bis nd Vrince Approximtion in Vlue Function Estimtes Shie Mnnor Duncn Simester Peng Sun John N. Tsitsiklis July 11, 2004 Revised: July 5, 2005 Abstrct We consider Mrkov Decision Process nd study the bis
More informationNumerical Linear Algebra Assignment 008
Numericl Liner Algebr Assignment 008 Nguyen Qun B Hong Students t Fculty of Mth nd Computer Science, Ho Chi Minh University of Science, Vietnm emil. nguyenqunbhong@gmil.com blog. http://hongnguyenqunb.wordpress.com
More informationPower Constrained DTNs: Risk MDP-LP Approach
Power Constrined DTNs: Risk MDP-LP Approch Atul Kumr tulkr.in@gmil.com IEOR, IIT Bomby, Indi Veerrun Kvith vkvith@iitb.c.in, IEOR, IIT Bomby, Indi N Hemchndr nh@iitb.c.in, IEOR, IIT Bomby, Indi. Abstrct
More information4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX. be a real symmetric matrix. ; (where we choose θ π for.
4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX Some reliminries: Let A be rel symmetric mtrix. Let Cos θ ; (where we choose θ π for Cos θ 4 purposes of convergence of the scheme)
More informationCOSC 3361 Numerical Analysis I Numerical Integration and Differentiation (III) - Gauss Quadrature and Adaptive Quadrature
COSC 336 Numericl Anlysis I Numericl Integrtion nd Dierentition III - Guss Qudrture nd Adptive Qudrture Edgr Griel Fll 5 COSC 336 Numericl Anlysis I Edgr Griel Summry o the lst lecture I For pproximting
More informationModule 9: Tries and String Matching
Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer
More informationModule 9: Tries and String Matching
Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer
More informationLecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.
Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one
More information63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1
3 9. SEQUENCES AND SERIES 63. Representtion of functions s power series Consider power series x 2 + x 4 x 6 + x 8 + = ( ) n x 2n It is geometric series with q = x 2 nd therefore it converges for ll q =
More informationLearning to Serve and Bounce a Ball
Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of
More information13: Diffusion in 2 Energy Groups
3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups
More informationLumpability and Absorbing Markov Chains
umbility nd Absorbing rov Chins By Ahmed A.El-Sheih Dertment of Alied Sttistics & Econometrics Institute of Sttisticl Studies & Reserch (I.S.S.R Ciro University Abstrct We consider n bsorbing rov Chin
More informationBellman goes Relational
Bellmn goes Reltionl Kristin Kersting 1 kersting@informtik.uni-freiburg.de University of Freiburg, Mchine Lerning Lb, Georges-Koehler-Allee 079, 79110 Freiburg, Germny Mrtijn Vn Otterlo 1 otterlo@cs.utwente.nl
More informationRegular expressions, Finite Automata, transition graphs are all the same!!
CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1
More informationA Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem
Preprints of the th World Congress The Interntionl Federtion of Automtic Control Cpe Town, South Afric. August -, A Continuous-time Mrkov Decision Process Bsed Method on Pursuit-Evsion Problem Ji Shengde
More informationSufficient condition on noise correlations for scalable quantum computing
Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility
More informationMonte Carlo Value Iteration with Macro-Actions
In Advnces in Neurl Informtion Processing Systems (NIPS), 2011 Monte Crlo Vlue Itertion with Mcro-Actions Zhnwei Lim Dvid Hsu Wee Sun Lee Deprtment of Computer Science, Ntionl University of Singpore Singpore,
More informationPlanning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize
More informationThe solutions of the single electron Hamiltonian were shown to be Bloch wave of the form: ( ) ( ) ikr
Lecture #1 Progrm 1. Bloch solutions. Reciprocl spce 3. Alternte derivtion of Bloch s theorem 4. Trnsforming the serch for egenfunctions nd eigenvlues from solving PDE to finding the e-vectors nd e-vlues
More informationCompact, Convex Upper Bound Iteration for Approximate POMDP Planning
Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert trysi@cs.ulert.c Pscl Pouprt University of Wterloo ppouprt@cs.uwterloo.c Michel Bowling nd Dle Schuurmns University
More informationToday. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning
CS 188: Artificil Intelligence Advnced HMMs Dn Klein, Pieter Aeel University of Cliforni, Berkeley Demo Bonnz! Tody HMMs Demo onnz! Most likely explntion queries Speech recognition A mssive HMM! Detils
More informationFinite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh
Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is
More informationChapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY
Chpter 3 MTRIX In this chpter: Definition nd terms Specil Mtrices Mtrix Opertion: Trnspose, Equlity, Sum, Difference, Sclr Multipliction, Mtrix Multipliction, Determinnt, Inverse ppliction of Mtrix in
More informationFinite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018
Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More
More information1.1. Linear Constant Coefficient Equations. Remark: A differential equation is an equation
1 1.1. Liner Constnt Coefficient Equtions Section Objective(s): Overview of Differentil Equtions. Liner Differentil Equtions. Solving Liner Differentil Equtions. The Initil Vlue Problem. 1.1.1. Overview
More informationA-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)
A-Level Mthemtics Trnsition Tsk (compulsory for ll mths students nd ll further mths student) Due: st Lesson of the yer. Length: - hours work (depending on prior knowledge) This trnsition tsk provides revision
More information1.4 Nonregular Languages
74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll
More informationMath& 152 Section Integration by Parts
Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible
More information1 The Riemann Integral
The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More information