RECURSION EQUATION FOR
|
|
- Jocelin Jennings
- 5 years ago
- Views:
Transcription
1 Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u E n 0 n r X n, u X n X 0 i. The optimal value function is W i max u W i, u. u* is an optimal policy iff each initial state i, W i, u W i. RECURSION EQUATION FOR W i, u. W i, u r i, u i j T u i i, j W j, u. W i W i max a A r i, a j T a i, j W j. DYNAMIC PROGRAMMING EQUATION FOR. Let u* be the policy such that u i the a which gives the max value for W i. Then u* is an optimal and W i, u W i.
2 Approximation procedure for W i : Suppose the states are S i 0, i, i,.... Let w be the column vector of values values W(i): w w 0, w, w,... T W i 0, W i, W i,... T (superscript T means transpose). Let w 0 0, 0, 0,... T. Define w n by: w n i max a r i,a j T a i,j w n i. This sequence converges to W i (by Lecture 8). This w n i determines a maximizing action u i = a. Continue the approximations until this action at one stage is the same as in the next. Check to see if the policy u i is an optimal policy. u i is optimal iff W i, u W i iff W i, u satisfies the dynamic programming equation for W i iff W i,u max a A r i,a j T a i,j W j,u. If not optimal, continue the approximation procedure. If the dynamic programming equation does hold for all states i, then u i is the optimal policy u* and W i W i, u is the exact value of W i.
3 Now compute W i, u from u via matrix algebra. Suppose the state space is S = {0,,,...}. LEMMA. Suppose u is the action policy. Let T u i, j be the transition matrix for going from state i to state j according to u. Let r u r 0, u 0, r, u, r, u,... T be the rewards when following policy u. Let X x 0, x, x,... T W 0, u, W, u, W, u,... T be the expected values. Then X is the solution to the matrix equation I T u X r u Proof. Since X i W i, u, the equation W i,u r i,u i j T u i i,j W j,u for all i, becomes,
4 Now compute W i, u from u via matrix algebra. Suppose the state space is S = {0,,,...}. LEMMA. Suppose u is the action policy. Let T u i, j be the transition matrix for going from state i to state j according to u. Let r u r 0, u 0, r, u, r, u,... T be the rewards when following policy u. Let X x 0, x, x,... T W 0, u, W, u, W, u,... T be the expected values. Then X is the solution to the matrix equation I T u X r u Proof. Since X i W i, u, the equation W i,u r i,u i j T u i i,j W j,u for all i, becomes, X i r i,u i j T u i i,j X j and hence..., X i..., T..., r i,u i j T u i i,j X j,... T. Note that j T u i i,j X j is the matrix product T u X. Hence X r u T u X. Hence I T u X r u.
5 `A Markov decision problem has two states i=, and two actions a = c, d. The discount is a =.9. The rewards r i, a are r, c 5, r, c, r, d 4, r, d 3. The transition matrices for actions a = c, d are T c / /3 / /3 w() w() T d /4 /3 3/4 /3 Which state does c favor? Which state does d favor? Find an optimal policy u i. Solution. w n i max a c,d r i, a j T a i, j w n i w n i max r i,c.9 j T c i,j w n i, r i,d.9 j T d i,j w n i w n?? w n?? w() w() w 0 w 0?? w?? with action u?? w?? with action u??
6 `A Markov decision problem has two states i=, and two actions a = c, d. The discount is a =.9. The rewards r i, a are r, c 5, r, c, r, d 4, r, d 3. The transition matrices for actions a = c, d are T c / /3 / /3 w() w() T d /4 /3 3/4 /3 Find an optimal policy u i. Solution. w n i max a r i,a j T a i,j w n i w n i max r i,c.9 j T c i,j w n i, r i,d.9 j T d i,j w n i w n max 5.9 w n w n, w n 3 4 w n w n max.9 3 w n 3 w n, w n 3 w n w 0 w 0 0 w() w()
7 w max , max 5, 4 5, with action u c w max.9 0 0, max, 3 3, with action u d w max , max 5, 4 5, with action u c w max.9 0 0, max, 3 3, with action u d w n max 5.9 w n w n, w n 3 4 w n w n max.9 3 w n 3 w n, w n 3 w n c d w??... max 8.6, 7.5 with action u?? w??... max 5.9, 6.3 with action u??
8 w max , max 5.9 4, max 8.6, with action u c. w max , max.9 3 3, max 5.9, with action u d. T c T d / / w() /4 3/4 w() /3 /3 w() /3 /3 w() For both w and w, u and u are the same. Hence we test to see if u c, u d is an optimal policy.???? For this u, T u.????
9 w max , max 5.9 4, max 8.6, with action u c. w max , max.9 3 3, max 5.9, with action u d. T c T d / / w() /4 3/4 w() /3 /3 w() /3 /3 w() For both w and w, u and u are the same. Hence we test to see if u c, u d is an optimal policy. / / For this u, T u. /3 /3 The first row of T u i i, j is from T c since u c The second row is from since u d. T d
10 The rewards r i, a are r, c 5, r, c, r, d 4, r, d 3. The policy is u c, u d. Thus r u r, u, r, u T =??
11 r u r, u, r, u T r, c, r, d T 5, 3 T. The equation I T u X r u becomes.9.9 x which is the system y 3 0 x 9 0 y 5 x 9y x 4 0 y 3 3x 4y 30 With solution x W, u, y W, u. Now check if W i, u satisfies the dynamic programming equation for W i : W i, u max a A r i, a j T a i, j W j, u. If so, then W i W i, u and u is optimal. For i = we get W, u max a A r, a.9 j T a, j W j, u iff max 5.9, 39.4 max 39.4, 37.8 Likewise for i = iff true. 670,4.9 4, iff
12 Suppose you have invested in a stock. When should you sell it? To make money, you must sell if for more than the amount originally paid. When the stock is selling at a high price, should you sell or wait for a higher price? For the case of the unpredictable stock market the problem is intractable. We solve this problem for Markov chains with known transition probabilities. S = {0,,,... } be the set of states for a Markov chain. T i, j is the probability of going from i to j assuming you choose to continue. r i is the reward to state i. This reward is collected iff you stop at that state. At each state i, we chose an action ai{s, c}. Choose a = s if you wish to stop and collect the state s reward. Thus r i, s r i. Choose a = c if you wish to continue. You skip the state s reward. Thus r i, c 0.
13 You continue from state to state according to the transition probabilities until you stop at a state and collect its reward. A solution to a stopping problem is a policy u i which determines, for each state i, if you should stop ( u i s) and collect the reward r i, s r i or if you should continue ( u i c) and skip the reward r i, c 0 (hoping for getting better). Let W i be the expected reward if you start at i and act optimally. Clearly W i r i. W i r i if you stop at i instead of continuing on to other states.
14 ` For each state i decide if we should stop and collect the reward, or continue. The bracketed numbers [ r(i)] are the rewards. Put the correct action u(i) in the space between ( ) s. At the bottom of each node list the expected value W i if you start at i and act optimally. Convention: unlabeled arrows have probability. You must stop sometime, infinite loops are not allowed. []( ) /3 []( ) [3]( ) /3 [0]( ) /3 /3 /3 [0]( ) / 3/4 [5]( ) /4 / [6]( )
15 []( ) [6]( ) / / [6]( ) 3[5]( ) []( ) [4]( ) / [0]( ) / [0]( ) / [0]( ) RULES FOR INITIAL KNOWN VALUES. What should you do in each case? v If r i is the maximum reward. v If r i is > all rewards reachable from i. v v If i is a dead-end state (you can't leave). If i is in an irreducible class. v If all next states j have known values W j. /
Motivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationSection Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018
Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections
More informationElementary maths for GMT
Elementary maths for GMT Linear Algebra Part 2: Matrices, Elimination and Determinant m n matrices The system of m linear equations in n variables x 1, x 2,, x n a 11 x 1 + a 12 x 2 + + a 1n x n = b 1
More informationMATH Mathematics for Agriculture II
MATH 10240 Mathematics for Agriculture II Academic year 2018 2019 UCD School of Mathematics and Statistics Contents Chapter 1. Linear Algebra 1 1. Introduction to Matrices 1 2. Matrix Multiplication 3
More informationOptimal Stopping Problems
2.997 Decision Making in Large Scale Systems March 3 MIT, Spring 2004 Handout #9 Lecture Note 5 Optimal Stopping Problems In the last lecture, we have analyzed the behavior of T D(λ) for approximating
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationMarkov Decision Processes
Markov Decision Processes Lecture notes for the course Games on Graphs B. Srivathsan Chennai Mathematical Institute, India 1 Markov Chains We will define Markov chains in a manner that will be useful to
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationRandom Times and Their Properties
Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2
More informationECE-517: Reinforcement Learning in Artificial Intelligence. Lecture 4: Discrete-Time Markov Chains
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains September 1, 215 Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationMATH 56A SPRING 2008 STOCHASTIC PROCESSES
MATH 56A SPRING 008 STOCHASTIC PROCESSES KIYOSHI IGUSA Contents 4. Optimal Stopping Time 95 4.1. Definitions 95 4.. The basic problem 95 4.3. Solutions to basic problem 97 4.4. Cost functions 101 4.5.
More informationMarkov Chains and Related Matters
Markov Chains and Related Matters 2 :9 3 4 : The four nodes are called states. The numbers on the arrows are called transition probabilities. For example if we are in state, there is a probability of going
More information1 Gambler s Ruin Problem
1 Gambler s Ruin Problem Consider a gambler who starts with an initial fortune of $1 and then on each successive gamble either wins $1 or loses $1 independent of the past with probabilities p and q = 1
More informationFinite Math - J-term Section Systems of Linear Equations in Two Variables Example 1. Solve the system
Finite Math - J-term 07 Lecture Notes - //07 Homework Section 4. - 9, 0, 5, 6, 9, 0,, 4, 6, 0, 50, 5, 54, 55, 56, 6, 65 Section 4. - Systems of Linear Equations in Two Variables Example. Solve the system
More informationClassification of Countable State Markov Chains
Classification of Countable State Markov Chains Friday, March 21, 2014 2:01 PM How can we determine whether a communication class in a countable state Markov chain is: transient null recurrent positive
More informationMarkov decision processes and interval Markov chains: exploiting the connection
Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic
More informationUncertainty Per Krusell & D. Krueger Lecture Notes Chapter 6
1 Uncertainty Per Krusell & D. Krueger Lecture Notes Chapter 6 1 A Two-Period Example Suppose the economy lasts only two periods, t =0, 1. The uncertainty arises in the income (wage) of period 1. Not that
More informationPortfolio Optimization
Statistical Techniques in Robotics (16-831, F12) Lecture#12 (Monday October 8) Portfolio Optimization Lecturer: Drew Bagnell Scribe: Ji Zhang 1 1 Portfolio Optimization - No Regret Portfolio We want to
More informationCLASS 12 ALGEBRA OF MATRICES
CLASS 12 ALGEBRA OF MATRICES Deepak Sir 9811291604 SHRI SAI MASTERS TUITION CENTER CLASS 12 A matrix is an ordered rectangular array of numbers or functions. The numbers or functions are called the elements
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 1
MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter
More informationThe Boundary Problem: Markov Chain Solution
MATH 529 The Boundary Problem: Markov Chain Solution Consider a random walk X that starts at positive height j, and on each independent step, moves upward a units with probability p, moves downward b units
More informationMATH 304 Linear Algebra Lecture 10: Linear independence. Wronskian.
MATH 304 Linear Algebra Lecture 10: Linear independence. Wronskian. Spanning set Let S be a subset of a vector space V. Definition. The span of the set S is the smallest subspace W V that contains S. If
More informationOn Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes
On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationMarkov Chains Handout for Stat 110
Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of
More informationSTOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY
STOCHASTIC MODELS FOR RELIABILITY, AVAILABILITY, AND MAINTAINABILITY Ph.D. Assistant Professor Industrial and Systems Engineering Auburn University RAM IX Summit November 2 nd 2016 Outline Introduction
More informationSample questions for COMP-424 final exam
Sample questions for COMP-44 final exam Doina Precup These are examples of questions from past exams. They are provided without solutions. However, Doina and the TAs would be happy to answer questions
More informationDiscrete-Time Markov Decision Processes
CHAPTER 6 Discrete-Time Markov Decision Processes 6.0 INTRODUCTION In the previous chapters we saw that in the analysis of many operational systems the concepts of a state of a system and a state transition
More informationMATH 445/545 Homework 1: Due February 11th, 2016
MATH 445/545 Homework 1: Due February 11th, 2016 Answer the following questions Please type your solutions and include the questions and all graphics if needed with the solution 1 A business executive
More information56:171 Operations Research Final Exam December 12, 1994
56:171 Operations Research Final Exam December 12, 1994 Write your name on the first page, and initial the other pages. The response "NOTA " = "None of the above" Answer both parts A & B, and five sections
More informationMathematical Games and Random Walks
Mathematical Games and Random Walks Alexander Engau, Ph.D. Department of Mathematical and Statistical Sciences College of Liberal Arts and Sciences / Intl. College at Beijing University of Colorado Denver
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationIEOR 6711, HMWK 5, Professor Sigman
IEOR 6711, HMWK 5, Professor Sigman 1. Semi-Markov processes: Consider an irreducible positive recurrent discrete-time Markov chain {X n } with transition matrix P (P i,j ), i, j S, and finite state space.
More informationStochastic Shortest Path Problems
Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can
More informationLinear Algebra V = T = ( 4 3 ).
Linear Algebra Vectors A column vector is a list of numbers stored vertically The dimension of a column vector is the number of values in the vector W is a -dimensional column vector and V is a 5-dimensional
More informationProgrammers A B C D Solution:
P a g e Q: A firm has normally distributed forecast of usage with MAD=0 units. It desires a service level, which limits the stock, out to one order cycle per year. Determine Standard Deviation (SD), if
More information3. Replace any row by the sum of that row and a constant multiple of any other row.
Section. Solution of Linear Systems by Gauss-Jordan Method A matrix is an ordered rectangular array of numbers, letters, symbols or algebraic expressions. A matrix with m rows and n columns has size or
More information9 - Markov processes and Burt & Allison 1963 AGEC
This document was generated at 8:37 PM on Saturday, March 17, 2018 Copyright 2018 Richard T. Woodward 9 - Markov processes and Burt & Allison 1963 AGEC 642-2018 I. What is a Markov Chain? A Markov chain
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More information1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a
3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now
More informationDynamic stochastic game and macroeconomic equilibrium
Dynamic stochastic game and macroeconomic equilibrium Tianxiao Zheng SAIF 1. Introduction We have studied single agent problems. However, macro-economy consists of a large number of agents including individuals/households,
More information1/2 1/2 1/4 1/4 8 1/2 1/2 1/2 1/2 8 1/2 6 P =
/ 7 8 / / / /4 4 5 / /4 / 8 / 6 P = 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Andrei Andreevich Markov (856 9) In Example. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P (n) = 0
More informationLinear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway
Linear Algebra: Lecture Notes Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway November 6, 23 Contents Systems of Linear Equations 2 Introduction 2 2 Elementary Row
More informationNovember 28 th, Carlos Guestrin 1. Lower dimensional projections
PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are
More informationMatrix Multiplication
3.2 Matrix Algebra Matrix Multiplication Example Foxboro Stadium has three main concession stands, located behind the south, north and west stands. The top-selling items are peanuts, hot dogs and soda.
More informationMATH 445/545 Test 1 Spring 2016
MATH 445/545 Test Spring 06 Note the problems are separated into two sections a set for all students and an additional set for those taking the course at the 545 level. Please read and follow all of these
More informationA Computational Method for Multidimensional Continuous-choice. Dynamic Problems
A Computational Method for Multidimensional Continuous-choice Dynamic Problems (Preliminary) Xiaolu Zhou School of Economics & Wangyannan Institution for Studies in Economics Xiamen University April 9,
More informationLinear Algebra review Powers of a diagonalizable matrix Spectral decomposition
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2018 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing
More informationSome notes on Markov Decision Theory
Some notes on Markov Decision Theory Nikolaos Laoutaris laoutaris@di.uoa.gr January, 2004 1 Markov Decision Theory[1, 2, 3, 4] provides a methodology for the analysis of probabilistic sequential decision
More informationAM 121: Intro to Optimization Models and Methods
AM 121: Intro to Optimization Models and Methods Fall 2017 Lecture 2: Intro to LP, Linear algebra review. Yiling Chen SEAS Lecture 2: Lesson Plan What is an LP? Graphical and algebraic correspondence Problems
More informationDefinition of Value, Q function and the Bellman equations
Definition of Value, Q function and the Bellman equations Alex Binder @SUTD August 16, 2016 1 the definition of value Ever concerned how the definition of a policy value V π (s) (as an expectation of discounted
More informationPower Allocation over Two Identical Gilbert-Elliott Channels
Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More informationOR MSc Maths Revision Course
OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision
More informationMatrices BUSINESS MATHEMATICS
Matrices BUSINESS MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix multipication More operations with matrices Matrix transposition Symmetric matrices Old exam question
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationEconomics 2010c: Lectures 9-10 Bellman Equation in Continuous Time
Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time David Laibson 9/30/2014 Outline Lectures 9-10: 9.1 Continuous-time Bellman Equation 9.2 Application: Merton s Problem 9.3 Application:
More informationChapter 16 focused on decision making in the face of uncertainty about one future
9 C H A P T E R Markov Chains Chapter 6 focused on decision making in the face of uncertainty about one future event (learning the true state of nature). However, some decisions need to take into account
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationThe Mabinogion Sheep Problem
The Mabinogion Sheep Problem Kun Dong Cornell University April 22, 2015 K. Dong (Cornell University) The Mabinogion Sheep Problem April 22, 2015 1 / 18 Introduction (Williams 1991) we are given a herd
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationBirgit Rudloff Operations Research and Financial Engineering, Princeton University
TIME CONSISTENT RISK AVERSE DYNAMIC DECISION MODELS: AN ECONOMIC INTERPRETATION Birgit Rudloff Operations Research and Financial Engineering, Princeton University brudloff@princeton.edu Alexandre Street
More informationQ-Learning for Markov Decision Processes*
McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of
More informationMath 141:512. Practice Exam 1 (extra credit) Due: February 6, 2019
Math 141:512 Due: February 6, 2019 Practice Exam 1 (extra credit) This is an open book, extra credit practice exam which covers the material that Exam 1 will cover (Sections 1.3, 1.4, 2.1, 2.2, 2.3, 2.4,
More informationMarkov Decision Processes Infinite Horizon Problems
Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationMath 3191 Applied Linear Algebra
Math 191 Applied Linear Algebra Lecture 8: Inverse of a Matrix Stephen Billups University of Colorado at Denver Math 191Applied Linear Algebra p.1/0 Announcements We will not make it to section. tonight,
More informationPhys 201. Matrices and Determinants
Phys 201 Matrices and Determinants 1 1.1 Matrices 1.2 Operations of matrices 1.3 Types of matrices 1.4 Properties of matrices 1.5 Determinants 1.6 Inverse of a 3 3 matrix 2 1.1 Matrices A 2 3 7 =! " 1
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationGeneral Boundary Problem: System of Equations Solution
MATH 54 General Boundary Problem: System of Equations Solution Consider again the scenario of a random walk X that starts at positive height j, and on each independent step, moves upward a units with probability
More informationIntroduction to Reinforcement Learning Part 1: Markov Decision Processes
Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for
More informationElementary Linear Algebra
Elementary Linear Algebra Linear algebra is the study of; linear sets of equations and their transformation properties. Linear algebra allows the analysis of; rotations in space, least squares fitting,
More information7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.
7.5 Operations with Matrices Copyright Cengage Learning. All rights reserved. What You Should Learn Decide whether two matrices are equal. Add and subtract matrices and multiply matrices by scalars. Multiply
More informationLecture 13 Spectral Graph Algorithms
COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random
More informationPage Points Score Total: 100
Math 1130 Spring 2019 Sample Midterm 3c 4/11/19 Name (Print): Username.#: Lecturer: Rec. Instructor: Rec. Time: This exam contains 10 pages (including this cover page) and 10 problems. Check to see if
More informationI&C 6N. Computational Linear Algebra
I&C 6N Computational Linear Algebra 1 Lecture 1: Scalars and Vectors What is a scalar? Computer representation of a scalar Scalar Equality Scalar Operations Addition and Multiplication What is a vector?
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationCrowdsourcing & Optimal Budget Allocation in Crowd Labeling
Crowdsourcing & Optimal Budget Allocation in Crowd Labeling Madhav Mohandas, Richard Zhu, Vincent Zhuang May 5, 2016 Table of Contents 1. Intro to Crowdsourcing 2. The Problem 3. Knowledge Gradient Algorithm
More informationMarkov Chains, Random Walks on Graphs, and the Laplacian
Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer
More information18.600: Lecture 32 Markov Chains
18.600: Lecture 32 Markov Chains Scott Sheffield MIT Outline Markov chains Examples Ergodicity and stationarity Outline Markov chains Examples Ergodicity and stationarity Markov chains Consider a sequence
More informationSequential Decision Problems
Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted
More informationMathematics 13: Lecture 10
Mathematics 13: Lecture 10 Matrices Dan Sloughter Furman University January 25, 2008 Dan Sloughter (Furman University) Mathematics 13: Lecture 10 January 25, 2008 1 / 19 Matrices Recall: A matrix is a
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationCMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)
More informationReview of Basic Concepts in Linear Algebra
Review of Basic Concepts in Linear Algebra Grady B Wright Department of Mathematics Boise State University September 7, 2017 Math 565 Linear Algebra Review September 7, 2017 1 / 40 Numerical Linear Algebra
More informationFrom Satisfiability to Linear Algebra
From Satisfiability to Linear Algebra Fangzhen Lin Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong Technical Report August 2013 1 Introduction
More informationDiscrete-Time Finite-Horizon Optimal ALM Problem with Regime-Switching for DB Pension Plan
Applied Mathematical Sciences, Vol. 10, 2016, no. 33, 1643-1652 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2016.6383 Discrete-Time Finite-Horizon Optimal ALM Problem with Regime-Switching
More informationStochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property
Chapter 1: and Markov chains Stochastic processes We study stochastic processes, which are families of random variables describing the evolution of a quantity with time. In some situations, we can treat
More informationMethods for Solving Linear Systems Part 2
Methods for Solving Linear Systems Part 2 We have studied the properties of matrices and found out that there are more ways that we can solve Linear Systems. In Section 7.3, we learned that we can use
More information6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE DP for imperfect state info Sufficient statistics Conditional state distribution as a sufficient statistic Finite-state systems Examples 1 REVIEW: IMPERFECT
More information