Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Size: px
Start display at page:

Download "Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo"

Transcription

1 Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo

2 Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model: Pr (s t s t 1, t 1 ) Rewrd model (i.e., utility): R(s t, t ) Discount fctor: 0 γ 1 Horizon (i.e., # of time steps): h Gol: find optiml policy π 2

3 Finite Horizon Policy evlution V h π s = h t=0 γ t Pr (S t = s S 0 = s, π)r(s, π t (s )) Recursive form (dynmic progrmming) V 0 π s = R(s, π 0 s ) V t π s = R s, π t s + γ Pr s s, π t s V t 1 π (s ) s 3

4 Optiml Policy π Finite Horizon V h π s V h π s π, s Optiml vlue function V (shorthnd for V π ) V 0 s = mx V t s = mx R(s, ) R s, + γ Pr s s, V t 1 (s ) s Bellmn s eqution 4

5 Vlue Itertion Algorithm vlueitertion(mdp) V 0 s mx R(s, ) s For t = 1 to h do V t s mx R s, + γ Pr s s s, V t 1 (s ) Return V s Optiml policy π t = 0: π 0 s rgmx R s, s t > 0: π t s rgmx R s, + γ Pr s s s, V t 1 (s ) NB: π is non sttionry (i.e., time dependent) s 5

6 Mtrix form: Vlue Itertion R : S 1 column vector of rewrds for V t : S 1 column vector of stte vlues T : S S mtrix of trnsition prob. for vlueitertion(mdp) V 0 mx R For t = 1 to h do V t mx R + γt V t 1 Return V 6

7 Infinite Horizon Let h Then V h π V π nd V h 1 π V π Policy evlution: V π s = R s, π s + γ s Pr s s, π s V π (s ) s Bellmn s eqution: V s = mx R s, + γ Pr s s s, V (s ) 7

8 Policy evlution Liner system of equtions V π s = R s, π s + γ s Pr s s, π s V π (s ) s Mtrix form: R: S 1 column vector of ste rewrds for π V: S 1 column vector of stte vlues for π T: S S mtrix of trnsition prob for π V = R + γtv 8

9 Solving liner equtions Liner system: V = R + γtv Gussin elimintion: I γt V = R Compute inverse: V = I γt 1 R Itertive methods Vlue itertion (.k.. Richrdson itertion) Repet V R + γtv 9

10 Contrction Let H(V) R + γtv be the policy evl opertor Lemm 1: H is contrction mpping. H V H V γ V V Proof H V H V = R + γtv R γtv (by definition) = γt V V (simplifiction) γ T V V (since AB A B ) = γ V V (since mx s s T(s, s ) = 1) 10

11 Convergence Theorem 2: Policy evlution converges to V π for ny initil estimte V lim n H(n) V = V π V Proof By definition V π = H 0, but policy evlution computes H V for ny initil V By lemm 1, H (n) V H n V γ n V V Hence, when n, then H (n) V H n 0 0 nd H V = V π V 11

12 Approximte Policy Evlution In prctice, we cn t perform n infinite number of itertions. Suppose tht we perform vlue itertion for k steps nd H k V H k 1 V = ε, how fr is H k V from V π? 12

13 Approximte Policy Evlution Theorem 3: If H k V H k 1 V ε then V π H k V ε 1 γ Proof V π H k V = H (V) H k V (by Theorem 2) = t=1 H t+k V H t+k 1 V t=1 H t+k (V) H t+k 1 V ( A + B A + B ) = t=1 γ t ε = ε 1 γ (by Lemm 1) 13

14 Optiml Vlue Function Non-liner system of equtions V s = mx R s, + γ Pr s s s, V (s ) s Mtrix form: R : S 1 column vector of rewrds for V : S 1 column vector of optiml vlues T : S S mtrix of trnsition prob for V = mx R + γt V 14

15 Contrction Let H (V) mx vlue itertion R + γt V be the opertor in Lemm 3: H is contrction mpping. H V H V γ V V Proof: without loss of generlity, let H V let s = rgmx s H (V)(s) nd R s, + γ s Pr s s, V(s ) 15

16 Proof continued: Contrction Then 0 H V s H V s (by ssumption) R s, s + γ Pr s s, s s V s (by definition) R s, s γ Pr s s, s s V s = γ Pr s s s, s V s V s γ Pr s s, s s V V (mxnorm upper bound) = γ V V (since Pr s s, s s = 1) Repet the sme rgument for H V s H (V)(s) nd for ech s 16

17 Convergence Theorem 4: Vlue itertion converges to V for ny initil estimte V lim n H (n) V = V V Proof By definition V = H 0, but vlue itertion computes H V for some initil V By lemm 3, H (n) V H n V γ n V V Hence, when n, then H (n) V H n 0 0 nd H V = V V 17

18 Vlue Itertion Even when horizon is infinite, perform finitely mny itertions Stop when V n V n 1 ε vlueitertion(mdp) V 0 mx R ; n 0 Repet n n + 1 V n mx Until V n V n 1 Return V n R + γt V n 1 ε 18

19 Induced Policy Since V n V n 1 ε, by Theorem 4: we know tht V n V ε 1 γ But, how good is the sttionry policy π n s extrcted bsed on V n? π n s = rgmx How fr is V π n from V? R s, + γ Pr s s, V n (s ) s 19

20 Induced Policy Theorem 5: V π n V Proof 2ε 1 γ V π n V = Vπ n V n + V n V V π n V n + V n V ( A + B A + B ) = H π n (V n ) V n + V n H V n ε + ε 1 γ 1 γ = 2ε 1 γ (by Theorems 2 nd 4) 20

21 Summry Vlue itertion Simple dynmic progrmming lgorithm Complexity: O(n A S 2 ) Here n is the number of itertions Cn we optimize the policy directly insted of optimizing the vlue function nd then inducing policy? Yes: by policy itertion 21

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

Markov Decision Processes

Markov Decision Processes Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly

More information

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999. Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

We will see what is meant by standard form very shortly

We will see what is meant by standard form very shortly THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

Finding Correlated Equilibria in General Sum Stochastic Games

Finding Correlated Equilibria in General Sum Stochastic Games Finding Correlted Equilibri in Generl Sum Stochstic Gmes Chris Murry nd Geoff Gordon June 2007 CMU-ML-07-113 Finding Correlted Equilibri in Generl Sum Stochstic Gmes Chris Murry nd Geoff Gordon June 2007

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Chapter 3. Vector Spaces

Chapter 3. Vector Spaces 3.4 Liner Trnsformtions 1 Chpter 3. Vector Spces 3.4 Liner Trnsformtions Note. We hve lredy studied liner trnsformtions from R n into R m. Now we look t liner trnsformtions from one generl vector spce

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving

More information

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute

More information

Introduction to Numerical Analysis

Introduction to Numerical Analysis Introduction to Numericl Anlysis Doron Levy Deprtment of Mthemtics nd Center for Scientific Computtion nd Mthemticl Modeling (CSCAMM) University of Mrylnd June 14, 2012 D. Levy CONTENTS Contents 1 Introduction

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

Applying Q-Learning to Flappy Bird

Applying Q-Learning to Flappy Bird Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this

More information

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes CSE 260-002: Exm 3-ANSWERS, Spring 20 ime: 50 minutes Nme: his exm hs 4 pges nd 0 prolems totling 00 points. his exm is closed ook nd closed notes.. Wrshll s lgorithm for trnsitive closure computtion is

More information

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of

More information

1.2. Linear Variable Coefficient Equations. y + b "! = a y + b " Remark: The case b = 0 and a non-constant can be solved with the same idea as above.

1.2. Linear Variable Coefficient Equations. y + b ! = a y + b  Remark: The case b = 0 and a non-constant can be solved with the same idea as above. 1 12 Liner Vrible Coefficient Equtions Section Objective(s): Review: Constnt Coefficient Equtions Solving Vrible Coefficient Equtions The Integrting Fctor Method The Bernoulli Eqution 121 Review: Constnt

More information

Metrics for Finite Markov Decision Processes

Metrics for Finite Markov Decision Processes Metrics for Finite Mrkov Decision Processes Norm Ferns chool of Computer cience McGill University Montrél, Cnd, H3 27 nferns@cs.mcgill.c Prksh Pnngden chool of Computer cience McGill University Montrél,

More information

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity

More information

CS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power

CS S-12 Turing Machine Modifications 1. When we added a stack to NFA to get a PDA, we increased computational power CS411-2015S-12 Turing Mchine Modifictions 1 12-0: Extending Turing Mchines When we dded stck to NFA to get PDA, we incresed computtionl power Cn we do the sme thing for Turing Mchines? Tht is, cn we dd

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Hidden Markov Models

Hidden Markov Models Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:

More information

A Variance Analysis for POMDP Policy Evaluation

A Variance Analysis for POMDP Policy Evaluation Proceedings of the Twenty-Third AAAI Conference on Artificil Intelligence (2008) A Vrince Anlysis for POMDP Policy Evlution Mhdi Milni Frd nd Joelle Pineu School of Computer Science McGill University,

More information

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,

More information

Search: The Core of Planning

Search: The Core of Planning Serch: The Core of Plnning Dr. Neil T. Dntm CSCI-498/598 RPM, Colordo School of Mines Spring 208 Dntm (Mines CSCI, RPM) Serch Spring 208 / 75 Outline Plnning nd Serch Problems Bsic Serch Depth-First Serch

More information

1 Linear Least Squares

1 Linear Least Squares Lest Squres Pge 1 1 Liner Lest Squres I will try to be consistent in nottion, with n being the number of dt points, nd m < n being the number of prmeters in model function. We re interested in solving

More information

Hoeffding, Azuma, McDiarmid

Hoeffding, Azuma, McDiarmid Hoeffding, Azum, McDirmid Krl Strtos 1 Hoeffding (sum of independent RVs) Hoeffding s lemm. If X [, ] nd E[X] 0, then for ll t > 0: E[e tx ] e t2 ( ) 2 / Proof. Since e t is conve, for ll [, ]: This mens:

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

A Generalized Reinforcement-Learning Model: Convergence and. Applications

A Generalized Reinforcement-Learning Model: Convergence and. Applications URL ftp://iserv.iki.kfki.hu/pub/ppers/icml96/szepes.greinf.ps.z WWW http://iserv.iki.kfki.hu/dptlb.html A Generlized Reinforcement-Lerning Model: Convergence nd Applictions Michel L. Littmn Deprtment of

More information

REINFORCEMENT learning (RL) was originally studied

REINFORCEMENT learning (RL) was originally studied IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Finite Horizon Risk Sensitive MDP and Linear Programming

Finite Horizon Risk Sensitive MDP and Linear Programming Finite Horizon Risk Sensitive MDP nd Liner Progrmming Atul Kumr, Veerrun Kvith nd N. Hemchndr IEOR, Indin Institute of Technology Bomby, Indi Abstrct In the context of stndrd Mrkov decision processes (MDPs),

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction

More information

Module 6: LINEAR TRANSFORMATIONS

Module 6: LINEAR TRANSFORMATIONS Module 6: LINEAR TRANSFORMATIONS. Trnsformtions nd mtrices Trnsformtions re generliztions of functions. A vector x in some set S n is mpped into m nother vector y T( x). A trnsformtion is liner if, for

More information

SOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS

SOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS ELM Numericl Anlysis Dr Muhrrem Mercimek SOLVING SYSTEMS OF EQUATIONS, ITERATIVE METHODS ELM Numericl Anlysis Some of the contents re dopted from Lurene V. Fusett, Applied Numericl Anlysis using MATLAB.

More information

The problems that follow illustrate the methods covered in class. They are typical of the types of problems that will be on the tests.

The problems that follow illustrate the methods covered in class. They are typical of the types of problems that will be on the tests. ADVANCED CALCULUS PRACTICE PROBLEMS JAMES KEESLING The problems tht follow illustrte the methods covered in clss. They re typicl of the types of problems tht will be on the tests. 1. Riemnn Integrtion

More information

Chapter 3 Solving Nonlinear Equations

Chapter 3 Solving Nonlinear Equations Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Point-Based POMDP Algorithms: Improved Analysis and Implementation Point-Bsed POMDP Algorithms: Improved Anlysis nd Implementtion Trey Smith nd Reid Simmons Rootics Institute, Crnegie Mellon University Pittsurgh, PA 15213 Astrct Existing complexity ounds for point-sed

More information

Matrix Solution to Linear Equations and Markov Chains

Matrix Solution to Linear Equations and Markov Chains Trding Systems nd Methods, Fifth Edition By Perry J. Kufmn Copyright 2005, 2013 by Perry J. Kufmn APPENDIX 2 Mtrix Solution to Liner Equtions nd Mrkov Chins DIRECT SOLUTION AND CONVERGENCE METHOD Before

More information

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!) CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All

More information

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford Probbilistic Model Checking Michelms Term 2011 Dr. Dve Prker Deprtment of Computer Science University of Oxford Long-run properties Lst lecture: regulr sfety properties e.g. messge filure never occurs

More information

Learning Moore Machines from Input-Output Traces

Learning Moore Machines from Input-Output Traces Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model

More information

Math 4310 Solutions to homework 1 Due 9/1/16

Math 4310 Solutions to homework 1 Due 9/1/16 Mth 4310 Solutions to homework 1 Due 9/1/16 1. Use the Eucliden lgorithm to find the following gretest common divisors. () gcd(252, 180) = 36 (b) gcd(513, 187) = 1 (c) gcd(7684, 4148) = 68 252 = 180 1

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1 AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n

More information

CS 188: Artificial Intelligence Fall 2010

CS 188: Artificial Intelligence Fall 2010 CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 1 Sturm-Liouville Theory In the two preceing lectures I emonstrte the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series re just the tip of the iceerg of the theory

More information

A ROLLOUT CONTROL ALGORITHM FOR DISCRETE-TIME STOCHASTIC SYSTEMS

A ROLLOUT CONTROL ALGORITHM FOR DISCRETE-TIME STOCHASTIC SYSTEMS Proceedings of the ASE 2 Dynmic Systems nd Control Conference DSCC2 September 2-5, 2, Cmbridge, sschusetts, USA DSCC2- A ROLLOUT CONTROL ALGORITH FOR DISCRETE-TIE STOCHASTIC SYSTES Andres A. liopoulos

More information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

The Value 1 Problem for Probabilistic Automata

The Value 1 Problem for Probabilistic Automata The Vlue 1 Prolem for Proilistic Automt Bruxelles Nthnël Fijlkow LIAFA, Université Denis Diderot - Pris 7, Frnce Institute of Informtics, Wrsw University, Polnd nth@lif.univ-pris-diderot.fr June 20th,

More information

Z b. f(x)dx. Yet in the above two cases we know what f(x) is. Sometimes, engineers want to calculate an area by computing I, but...

Z b. f(x)dx. Yet in the above two cases we know what f(x) is. Sometimes, engineers want to calculate an area by computing I, but... Chpter 7 Numericl Methods 7. Introduction In mny cses the integrl f(x)dx cn be found by finding function F (x) such tht F 0 (x) =f(x), nd using f(x)dx = F (b) F () which is known s the nlyticl (exct) solution.

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

CAAM 453 NUMERICAL ANALYSIS I Examination There are four questions, plus a bonus. Do not look at them until you begin the exam.

CAAM 453 NUMERICAL ANALYSIS I Examination There are four questions, plus a bonus. Do not look at them until you begin the exam. Exmintion 1 Posted 23 October 2002. Due no lter thn 5pm on Mondy, 28 October 2002. Instructions: 1. Time limit: 3 uninterrupted hours. 2. There re four questions, plus bonus. Do not look t them until you

More information

Ordinary Differential Equations- Boundary Value Problem

Ordinary Differential Equations- Boundary Value Problem Ordinry Differentil Equtions- Boundry Vlue Problem Shooting method Runge Kutt method Computer-bsed solutions o BVPFD subroutine (Fortrn IMSL subroutine tht Solves (prmeterized) system of differentil equtions

More information

Bias and Variance Approximation in Value Function Estimates

Bias and Variance Approximation in Value Function Estimates Bis nd Vrince Approximtion in Vlue Function Estimtes Shie Mnnor Duncn Simester Peng Sun John N. Tsitsiklis July 11, 2004 Revised: July 5, 2005 Abstrct We consider Mrkov Decision Process nd study the bis

More information

Numerical Linear Algebra Assignment 008

Numerical Linear Algebra Assignment 008 Numericl Liner Algebr Assignment 008 Nguyen Qun B Hong Students t Fculty of Mth nd Computer Science, Ho Chi Minh University of Science, Vietnm emil. nguyenqunbhong@gmil.com blog. http://hongnguyenqunb.wordpress.com

More information

Power Constrained DTNs: Risk MDP-LP Approach

Power Constrained DTNs: Risk MDP-LP Approach Power Constrined DTNs: Risk MDP-LP Approch Atul Kumr tulkr.in@gmil.com IEOR, IIT Bomby, Indi Veerrun Kvith vkvith@iitb.c.in, IEOR, IIT Bomby, Indi N Hemchndr nh@iitb.c.in, IEOR, IIT Bomby, Indi. Abstrct

More information

4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX. be a real symmetric matrix. ; (where we choose θ π for.

4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX. be a real symmetric matrix. ; (where we choose θ π for. 4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX Some reliminries: Let A be rel symmetric mtrix. Let Cos θ ; (where we choose θ π for Cos θ 4 purposes of convergence of the scheme)

More information

COSC 3361 Numerical Analysis I Numerical Integration and Differentiation (III) - Gauss Quadrature and Adaptive Quadrature

COSC 3361 Numerical Analysis I Numerical Integration and Differentiation (III) - Gauss Quadrature and Adaptive Quadrature COSC 336 Numericl Anlysis I Numericl Integrtion nd Dierentition III - Guss Qudrture nd Adptive Qudrture Edgr Griel Fll 5 COSC 336 Numericl Anlysis I Edgr Griel Summry o the lst lecture I For pproximting

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1

63. Representation of functions as power series Consider a power series. ( 1) n x 2n for all 1 < x < 1 3 9. SEQUENCES AND SERIES 63. Representtion of functions s power series Consider power series x 2 + x 4 x 6 + x 8 + = ( ) n x 2n It is geometric series with q = x 2 nd therefore it converges for ll q =

More information

Learning to Serve and Bounce a Ball

Learning to Serve and Bounce a Ball Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

Lumpability and Absorbing Markov Chains

Lumpability and Absorbing Markov Chains umbility nd Absorbing rov Chins By Ahmed A.El-Sheih Dertment of Alied Sttistics & Econometrics Institute of Sttisticl Studies & Reserch (I.S.S.R Ciro University Abstrct We consider n bsorbing rov Chin

More information

Bellman goes Relational

Bellman goes Relational Bellmn goes Reltionl Kristin Kersting 1 kersting@informtik.uni-freiburg.de University of Freiburg, Mchine Lerning Lb, Georges-Koehler-Allee 079, 79110 Freiburg, Germny Mrtijn Vn Otterlo 1 otterlo@cs.utwente.nl

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

A Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem

A Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem Preprints of the th World Congress The Interntionl Federtion of Automtic Control Cpe Town, South Afric. August -, A Continuous-time Mrkov Decision Process Bsed Method on Pursuit-Evsion Problem Ji Shengde

More information

Sufficient condition on noise correlations for scalable quantum computing

Sufficient condition on noise correlations for scalable quantum computing Sufficient condition on noise correltions for sclble quntum computing John Presill, 2 Februry 202 Is quntum computing sclble? The ccurcy threshold theorem for quntum computtion estblishes tht sclbility

More information

Monte Carlo Value Iteration with Macro-Actions

Monte Carlo Value Iteration with Macro-Actions In Advnces in Neurl Informtion Processing Systems (NIPS), 2011 Monte Crlo Vlue Itertion with Mcro-Actions Zhnwei Lim Dvid Hsu Wee Sun Lee Deprtment of Computer Science, Ntionl University of Singpore Singpore,

More information

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize

More information

The solutions of the single electron Hamiltonian were shown to be Bloch wave of the form: ( ) ( ) ikr

The solutions of the single electron Hamiltonian were shown to be Bloch wave of the form: ( ) ( ) ikr Lecture #1 Progrm 1. Bloch solutions. Reciprocl spce 3. Alternte derivtion of Bloch s theorem 4. Trnsforming the serch for egenfunctions nd eigenvlues from solving PDE to finding the e-vectors nd e-vlues

More information

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning

Compact, Convex Upper Bound Iteration for Approximate POMDP Planning Compct, Convex Upper Bound Itertion for Approximte POMDP Plnning To Wng University of Alert trysi@cs.ulert.c Pscl Pouprt University of Wterloo ppouprt@cs.uwterloo.c Michel Bowling nd Dle Schuurmns University

More information

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning CS 188: Artificil Intelligence Advnced HMMs Dn Klein, Pieter Aeel University of Cliforni, Berkeley Demo Bonnz! Tody HMMs Demo onnz! Most likely explntion queries Speech recognition A mssive HMM! Detils

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Chapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY

Chapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY Chpter 3 MTRIX In this chpter: Definition nd terms Specil Mtrices Mtrix Opertion: Trnspose, Equlity, Sum, Difference, Sclr Multipliction, Mtrix Multipliction, Determinnt, Inverse ppliction of Mtrix in

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

1.1. Linear Constant Coefficient Equations. Remark: A differential equation is an equation

1.1. Linear Constant Coefficient Equations. Remark: A differential equation is an equation 1 1.1. Liner Constnt Coefficient Equtions Section Objective(s): Overview of Differentil Equtions. Liner Differentil Equtions. Solving Liner Differentil Equtions. The Initil Vlue Problem. 1.1.1. Overview

More information

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student) A-Level Mthemtics Trnsition Tsk (compulsory for ll mths students nd ll further mths student) Due: st Lesson of the yer. Length: - hours work (depending on prior knowledge) This trnsition tsk provides revision

More information

1.4 Nonregular Languages

1.4 Nonregular Languages 74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

1 The Riemann Integral

1 The Riemann Integral The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information