Parity Objectives in Countable MDPs

Size: px
Start display at page:

Download "Parity Objectives in Countable MDPs"

Transcription

1 Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik Wojtczak LICS 07, Reykjavik 0 June 07 Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs

2 Countable MDPs i i bad/odd good/even controlled random Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs

3 Countable MDPs i i There is no almost-surely winning strategy. sup σ Pr σ (Parity) = All finite-memory strategies lose almost surely. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs

4 {,, }-Parity i i There exists an almost-surely winning strategy. All finite-memory strategies lose almost surely. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs

5 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity {0, }-Parity {, }-Parity Safety Reach Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

6 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity {0, }-Parity {, }-Parity Safety Reach Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

7 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity {0, }-Parity {, }-Parity Safety Reach Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

8 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity optimal MD {0, }-Parity {, }-Parity ε-optimal MD Safety Reach ε-optimal MD means: sup σ Pr σ (Parity) = sup Pr σ (Parity) MD σ Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

9 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity Safety optimal MD {0, }-Parity {, }-Parity ε-optimal MD Reach i i ε-optimal MD means: sup σ Pr σ (Parity) = sup Pr σ (Parity) MD σ Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

10 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity Safety optimal MD {0, }-Parity {, }-Parity ε-optimal MD Reach i i ε-optimal MD means: sup σ Pr σ (Parity) = sup Pr σ (Parity) MD σ optimal MD means: if optimal σ, then optimal σ that is MD Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

11 Our Results in the Mostowski Hierarchy {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity Safety optimal MD {0, }-Parity {, }-Parity ε-optimal MD Reach i i ε-optimal MD means: sup σ Pr σ (Parity) = sup Pr σ (Parity) MD σ optimal MD means: if optimal σ, then optimal σ that is MD Dichotomy between MD and infinite memory; contrast to finite MDPs Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 4

12 Optimal MD-Strategies Theorem Consider a countable-state MDP with {0,, }-parity objective. If there exists an optimal strategy, then there exists an optimal strategy that is MD. Optimal strategies for {0,, }-parity may be chosen MD. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 5

13 Optimal MD-Strategies for Co-Büchi Theorem Almost-surely winning strategies for co-büchi may be chosen MD. Suppose there is an almost-surely winning strategy σ. Focus on states used by σ. They all have an a.s. winning strategy. Set a more ambitious goal: Safety (= never see again) Always playing for safety is too greedy. 8 Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 6

14 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

15 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 0. Playing the safest action everywhere is not ok. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

16 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 0. Playing the safest action everywhere is not ok.. Fixing the safest action in the blue region is ok. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

17 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 0. Playing the safest action everywhere is not ok.. Fixing the safest action in the blue region is ok. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

18 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 0. Playing the safest action everywhere is not ok.. Fixing the safest action in the blue region is ok.. Once we are in dark blue : with prob we stay in blue. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

19 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 0. Playing the safest action everywhere is not ok.. Fixing the safest action in the blue region is ok.. Once we are in dark blue : with prob we stay in blue. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

20 An Optimal MD-Strategy for Co-Büchi max σ Pr σ ( never see or again ) 0 0. Playing the safest action everywhere is not ok.. Fixing the safest action in the blue region is ok.. Once we are in dark blue : with prob we stay in blue.. The a.s. winning strategy for. gets us in dark blue a.s. Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 7

21 When MD Suffices For Finitely Branching MDPs ε-optimal MD {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity optimal MD {0, }-Parity {, }-Parity Safety Reach Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8

22 When MD Suffices For Infinitely Branching MDPs {0,,, }-Parity {,,, 4}-Parity {0,, }-Parity {,, }-Parity {0, }-Parity {, }-Parity optimal MD Safety ε-optimal MD Reach Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 8

23 Context of the Paper Our work: countable MDPs Other work: mostly finite MDPs Our work: maximizing the probability of Parity objectives Other work: maximizing expected (discounted) total/average reward/cost Our work: general countable MDPs Other work: countable MDPs arising from specific models: recursive MDPs nondeterministic probabilistic lossy channel systems VASS-induced MDPs one-counter MDPs controlled queueing systems controlled multitype branching processes Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 9

24 Conditioning a Markov Chain Pr (A) = Pr(A Parity) Pr(A Parity) = Pr (A) Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 0

25 Conditioning a Markov Chain Pr (A) = Pr(A Parity) Pr(A Parity) = Pr (A) Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs 0

26 Countable Markov Chains Infinite Markov chains are very different from finite ones. Gambler s ruin: / / / / / / / Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs

27 Countable Markov Chains Infinite Markov chains are very different from finite ones. Gambler s ruin: / / / / / / / Dependence on exact probabilities Kiefer, Mayr, Shirmohammadi, Wojtczak Parity Objectives in Countable MDPs

Parity Objectives in Countable MDPs

Parity Objectives in Countable MDPs Parity Objectives in Countable MDPs Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Dominik Wojtczak University of Oxford, UK University of Edinburgh, UK University of Liverpool, UK Abstract We study

More information

On Strong Determinacy of Countable Stochastic Games

On Strong Determinacy of Countable Stochastic Games On Strong Determinacy of Countable Stochastic Games Stefan Kiefer, Richard Mayr, Mahsa Shirmohammadi, Dominik Wojtczak University of Oxford, UK University of Edinburgh, UK University of Liverpool, UK arxiv:70405003v

More information

MDPs with Energy-Parity Objectives

MDPs with Energy-Parity Objectives MDPs with Energy-Parity Objectives Richard Mayr, Sven Schewe, Patrick Totzke, Dominik Wojtczak University of Edinburgh, UK University of Liverpool, UK Abstract Energy-parity objectives combine ω-regular

More information

A Survey of Partial-Observation Stochastic Parity Games

A Survey of Partial-Observation Stochastic Parity Games Noname manuscript No. (will be inserted by the editor) A Survey of Partial-Observation Stochastic Parity Games Krishnendu Chatterjee Laurent Doyen Thomas A. Henzinger the date of receipt and acceptance

More information

Weak Cost Monadic Logic over Infinite Trees

Weak Cost Monadic Logic over Infinite Trees Weak Cost Monadic Logic over Infinite Trees Michael Vanden Boom Department of Computer Science University of Oxford MFCS 011 Warsaw Cost monadic second-order logic (cost MSO) Syntax First-order logic with

More information

The Complexity of Stochastic Müller Games

The Complexity of Stochastic Müller Games The Complexity of Stochastic Müller Games Krishnendu Chatterjee Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-110 http://www.eecs.berkeley.edu/pubs/techrpts/2007/eecs-2007-110.html

More information

Stochastic Games with Time The value Min strategies Max strategies Determinacy Finite-state games Cont.-time Markov chains

Stochastic Games with Time The value Min strategies Max strategies Determinacy Finite-state games Cont.-time Markov chains Games with Time Finite-state Masaryk University Brno GASICS 00 /39 Outline Finite-state stochastic processes. Games over event-driven stochastic processes. Strategies,, determinacy. Existing results for

More information

Controlling probabilistic systems under partial observation an automata and verification perspective

Controlling probabilistic systems under partial observation an automata and verification perspective Controlling probabilistic systems under partial observation an automata and verification perspective Nathalie Bertrand, Inria Rennes, France Uncertainty in Computation Workshop October 4th 2016, Simons

More information

Markov Chains (Part 3)

Markov Chains (Part 3) Markov Chains (Part 3) State Classification Markov Chains - State Classification Accessibility State j is accessible from state i if p ij (n) > for some n>=, meaning that starting at state i, there is

More information

Value Iteration. 1 Introduction. Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2

Value Iteration. 1 Introduction. Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2 Value Iteration Krishnendu Chatterjee 1 and Thomas A. Henzinger 1,2 1 University of California, Berkeley 2 EPFL, Switzerland Abstract. We survey value iteration algorithms on graphs. Such algorithms can

More information

A Survey of Stochastic ω-regular Games

A Survey of Stochastic ω-regular Games A Survey of Stochastic ω-regular Games Krishnendu Chatterjee Thomas A. Henzinger EECS, University of California, Berkeley, USA Computer and Communication Sciences, EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu

More information

Readings: Finish Section 5.2

Readings: Finish Section 5.2 LECTURE 19 Readings: Finish Section 5.2 Lecture outline Markov Processes I Checkout counter example. Markov process: definition. -step transition probabilities. Classification of states. Example: Checkout

More information

Temporal logics and model checking for fairly correct systems

Temporal logics and model checking for fairly correct systems Temporal logics and model checking for fairly correct systems Hagen Völzer 1 joint work with Daniele Varacca 2 1 Lübeck University, Germany 2 Imperial College London, UK LICS 2006 Introduction Five Philosophers

More information

Nash Equilibrium for Upward-Closed Objectives Krishnendu Chatterjee Report No. UCB/CSD August 2005 Computer Science Division (EECS) University

Nash Equilibrium for Upward-Closed Objectives Krishnendu Chatterjee Report No. UCB/CSD August 2005 Computer Science Division (EECS) University Nash Equilibrium for Upward-Closed Objectives Krishnendu Chatterjee Report No. UCB/CSD-5-1407 August 2005 Computer Science Division (EECS) University of California Berkeley, California 94720 Nash Equilibrium

More information

On the Total Variation Distance of Labelled Markov Chains

On the Total Variation Distance of Labelled Markov Chains On the Total Variation Distance of Labelled Markov Chains Taolue Chen Stefan Kiefer Middlesex University London, UK University of Oxford, UK CSL-LICS, Vienna 4 July 04 Labelled Markov Chains (LMCs) a c

More information

Quasi-Weak Cost Automata

Quasi-Weak Cost Automata Quasi-Weak Cost Automata A New Variant of Weakness Denis Kuperberg 1 Michael Vanden Boom 2 1 LIAFA/CNRS/Université Paris 7, Denis Diderot, France 2 Department of Computer Science, University of Oxford,

More information

Perfect-information Stochastic Parity Games

Perfect-information Stochastic Parity Games Perfect-information Stochastic Parity Games Wies law Zielonka LIAFA, case 7014 Université Paris 7 2, Place Jussieu 75251 Paris Cedex 05, France zielonka@liafa.jussieu.fr Abstract. We show that in perfect-information

More information

Quantitative Solution of Omega-Regular Games

Quantitative Solution of Omega-Regular Games Quantitative Solution of Omega-Regular Games Luca de Alfaro Department of EECS UC Berkeley Berkeley, CA 9470, USA dealfaro@eecs.berkeley.edu Rupak Majumdar Department of EECS UC Berkeley Berkeley, CA 9470,

More information

Boundedness Games. Séminaire du LIGM, April 16th, Nathanaël Fijalkow. Institute of Informatics, Warsaw University Poland

Boundedness Games. Séminaire du LIGM, April 16th, Nathanaël Fijalkow. Institute of Informatics, Warsaw University Poland Boundedness Games Séminaire du LIGM, April 16th, 2013 Nathanaël Fijalkow Institute of Informatics, Warsaw University Poland LIAFA, Université Paris 7 Denis Diderot France (based on joint works with Krishnendu

More information

arxiv: v3 [cs.gt] 10 Apr 2009

arxiv: v3 [cs.gt] 10 Apr 2009 The Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games Michael Ummels and Dominik Wojtczak,3 RWTH Aachen University, Germany E-Mail: ummels@logicrwth-aachende CWI, Amsterdam, The Netherlands

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

A CS look at stochastic branching processes

A CS look at stochastic branching processes A computer science look at stochastic branching processes T. Brázdil J. Esparza S. Kiefer M. Luttenberger Technische Universität München For Thiagu, December 2008 Back in victorian Britain... There was

More information

Boundedness Games. Séminaire de l équipe MoVe, LIF, Marseille, May 2nd, Nathanaël Fijalkow. Institute of Informatics, Warsaw University Poland

Boundedness Games. Séminaire de l équipe MoVe, LIF, Marseille, May 2nd, Nathanaël Fijalkow. Institute of Informatics, Warsaw University Poland Boundedness Games Séminaire de l équipe MoVe, LIF, Marseille, May 2nd, 2013 Nathanaël Fijalkow Institute of Informatics, Warsaw University Poland LIAFA, Université Paris 7 Denis Diderot France (based on

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

The Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games *

The Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games * The Complexity of Nash Equilibria in Simple Stochastic Multiplayer Games * Michael Ummels 1 and Dominik Wojtczak 2,3 1 RWTH Aachen University, Germany E-Mail: ummels@logic.rwth-aachen.de 2 CWI, Amsterdam,

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Markov Decision Processes Infinite Horizon Problems

Markov Decision Processes Infinite Horizon Problems Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

Alternating nonzero automata

Alternating nonzero automata Alternating nonzero automata Application to the satisfiability of CTL [,, P >0, P =1 ] Hugo Gimbert, joint work with Paulin Fournier LaBRI, Université de Bordeaux ANR Stoch-MC 06/07/2017 Control and verification

More information

Games with Costs and Delays

Games with Costs and Delays Games with Costs and Delays Martin Zimmermann Saarland University June 20th, 2017 LICS 2017, Reykjavik, Iceland Martin Zimmermann Saarland University Games with Costs and Delays 1/14 Gale-Stewart Games

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Ratio and Weight Objectives in Annotated Markov Chains

Ratio and Weight Objectives in Annotated Markov Chains Technische Universität Dresden - Faculty of Computer Science Chair of Algebraic and Logical Foundations of Computer Science Diploma Thesis Ratio and Weight Objectives in Annotated Markov Chains Jana Schubert

More information

Infinite Games. Sumit Nain. 28 January Slides Credit: Barbara Jobstmann (CNRS/Verimag) Department of Computer Science Rice University

Infinite Games. Sumit Nain. 28 January Slides Credit: Barbara Jobstmann (CNRS/Verimag) Department of Computer Science Rice University Infinite Games Sumit Nain Department of Computer Science Rice University 28 January 2013 Slides Credit: Barbara Jobstmann (CNRS/Verimag) Motivation Abstract games are of fundamental importance in mathematics

More information

On the complexity of infinite computations

On the complexity of infinite computations On the complexity of infinite computations Damian Niwiński, Warsaw University joint work with Igor Walukiewicz and Filip Murlak Newton Institute, Cambridge, June 2006 1 Infinite computations Büchi (1960)

More information

Reinforcement Learning Active Learning

Reinforcement Learning Active Learning Reinforcement Learning Active Learning Alan Fern * Based in part on slides by Daniel Weld 1 Active Reinforcement Learning So far, we ve assumed agent has a policy We just learned how good it is Now, suppose

More information

Trading Bounds for Memory in Games with Counters

Trading Bounds for Memory in Games with Counters Trading Bounds for Memory in Games with Counters Nathanaël Fijalkow 1,2, Florian Horn 1, Denis Kuperberg 2,3, and Micha l Skrzypczak 1,2 1 LIAFA, Université Paris 7 2 Institute of Informatics, University

More information

A subexponential lower bound for the Random Facet algorithm for Parity Games

A subexponential lower bound for the Random Facet algorithm for Parity Games A subexponential lower bound for the Random Facet algorithm for Parity Games Oliver Friedmann 1 Thomas Dueholm Hansen 2 Uri Zwick 3 1 Department of Computer Science, University of Munich, Germany. 2 Center

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

A note on the attractor-property of infinite-state Markov chains

A note on the attractor-property of infinite-state Markov chains A note on the attractor-property of infinite-state Markov chains Christel Baier a, Nathalie Bertrand b, Philippe Schnoebelen b a Universität Bonn, Institut für Informatik I, Germany b Lab. Specification

More information

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property Chapter 1: and Markov chains Stochastic processes We study stochastic processes, which are families of random variables describing the evolution of a quantity with time. In some situations, we can treat

More information

Randomness for Free. 1 Introduction. Krishnendu Chatterjee 1, Laurent Doyen 2, Hugo Gimbert 3, and Thomas A. Henzinger 1

Randomness for Free. 1 Introduction. Krishnendu Chatterjee 1, Laurent Doyen 2, Hugo Gimbert 3, and Thomas A. Henzinger 1 Randomness for Free Krishnendu Chatterjee 1, Laurent Doyen 2, Hugo Gimbert 3, and Thomas A. Henzinger 1 1 IST Austria (Institute of Science and Technology Austria) 2 LSV, ENS Cachan & CNRS, France 3 LaBri

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

2 DISCRETE-TIME MARKOV CHAINS

2 DISCRETE-TIME MARKOV CHAINS 1 2 DISCRETE-TIME MARKOV CHAINS 21 FUNDAMENTAL DEFINITIONS AND PROPERTIES From now on we will consider processes with a countable or finite state space S {0, 1, 2, } Definition 1 A discrete-time discrete-state

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem 1 Gambler s Ruin Problem Consider a gambler who starts with an initial fortune of $1 and then on each successive gamble either wins $1 or loses $1 independent of the past with probabilities p and q = 1

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Recursive Probabilistic Models: efficient analysis and implementation

Recursive Probabilistic Models: efficient analysis and implementation Recursive Probabilistic Models: efficient analysis and implementation Dominik Wojtczak E H U N I V E R S I T Y T O H F R G E D I N B U Doctor of Philosophy Laboratory for Foundations of Computer Science

More information

Probability. VCE Maths Methods - Unit 2 - Probability

Probability. VCE Maths Methods - Unit 2 - Probability Probability Probability Tree diagrams La ice diagrams Venn diagrams Karnough maps Probability tables Union & intersection rules Conditional probability Markov chains 1 Probability Probability is the mathematics

More information

Qualitative Concurrent Parity Games

Qualitative Concurrent Parity Games Qualitative Concurrent Parity Games Krishnendu Chatterjee Luca de Alfaro Thomas A Henzinger CE, University of California, Santa Cruz,USA EECS, University of California, Berkeley,USA Computer and Communication

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Complexity of infinite tree languages

Complexity of infinite tree languages Complexity of infinite tree languages when automata meet topology Damian Niwiński University of Warsaw joint work with André Arnold, Szczepan Hummel, and Henryk Michalewski Liverpool, October 2010 1 Example

More information

Lecture 4 - Random walk, ruin problems and random processes

Lecture 4 - Random walk, ruin problems and random processes Lecture 4 - Random walk, ruin problems and random processes Jan Bouda FI MU April 19, 2009 Jan Bouda (FI MU) Lecture 4 - Random walk, ruin problems and random processesapril 19, 2009 1 / 30 Part I Random

More information

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford Probabilistic Model Checking Michaelmas Term 20 Dr. Dave Parker Department of Computer Science University of Oxford Overview PCTL for MDPs syntax, semantics, examples PCTL model checking next, bounded

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Reinforcement Learning

Reinforcement Learning CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Parameterized verification of many identical probabilistic timed processes

Parameterized verification of many identical probabilistic timed processes Parameterized verification of many identical probabilistic timed processes Nathalie Bertrand 1 and Paulin Fournier 2 1 Inria Rennes, France nathalie.bertrand@inria.fr 2 ENS Cachan Antenne de Bretagne,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

SFM-11:CONNECT Summer School, Bertinoro, June 2011

SFM-11:CONNECT Summer School, Bertinoro, June 2011 SFM-:CONNECT Summer School, Bertinoro, June 20 EU-FP7: CONNECT LSCITS/PSS VERIWARE Part 3 Markov decision processes Overview Lectures and 2: Introduction 2 Discrete-time Markov chains 3 Markov decision

More information

ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies. Calin Belta

ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies. Calin Belta ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies Provable safety for animal inspired agile flight Calin Belta Hybrid and Networked Systems (HyNeSs) Lab Department of

More information

1 Gambler s Ruin Problem

1 Gambler s Ruin Problem Coyright c 2017 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins

More information

2. Transience and Recurrence

2. Transience and Recurrence Virtual Laboratories > 15. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 2. Transience and Recurrence The study of Markov chains, particularly the limiting behavior, depends critically on the random times

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 458 (2012) 49 60 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Energy parity games Krishnendu

More information

Solutions to In Class Problems Week 15, Wed.

Solutions to In Class Problems Week 15, Wed. Massachusetts Institute of Technology 6.04J/18.06J, Fall 05: Mathematics for Comuter Science December 14 Prof. Albert R. Meyer and Prof. Ronitt Rubinfeld revised December 14, 005, 1404 minutes Solutions

More information

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a 3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now

More information

Markov Chains. October 5, Stoch. Systems Analysis Markov chains 1

Markov Chains. October 5, Stoch. Systems Analysis Markov chains 1 Markov Chains Alejandro Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu http://www.seas.upenn.edu/users/~aribeiro/ October 5, 2011 Stoch. Systems

More information

On Fixed Point Equations over Commutative Semirings

On Fixed Point Equations over Commutative Semirings On Fixed Point Equations over Commutative Semirings Javier Esparza, Stefan Kiefer, and Michael Luttenberger Universität Stuttgart Institute for Formal Methods in Computer Science Stuttgart, Germany {esparza,kiefersn,luttenml}@informatik.uni-stuttgart.de

More information

The Non-Deterministic Mostowski Hierarchy and Distance-Parity Automata

The Non-Deterministic Mostowski Hierarchy and Distance-Parity Automata The Non-Deterministic Mostowski Hierarchy and Distance-Parity Automata Thomas Colcombet 1, and Christof Löding 2 1 LIAFA/CNRS, France 2 RWTH Aachen, Germany Abstract. Given a Rabin tree-language and natural

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

Module 6:Random walks and related areas Lecture 24:Random woalk and other areas. The Lecture Contains: Random Walk.

Module 6:Random walks and related areas Lecture 24:Random woalk and other areas. The Lecture Contains: Random Walk. The Lecture Contains: Random Walk Ehrenfest Model file:///e /courses/introduction_stochastic_process_application/lecture24/24_1.htm[9/30/2013 1:03:45 PM] Random Walk As already discussed there is another

More information

Lecture 9 Classification of States

Lecture 9 Classification of States Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and

More information

Stochastic Safest and Shortest Path Problems

Stochastic Safest and Shortest Path Problems Stochastic Safest and Shortest Path Problems Florent Teichteil-Königsbuch AAAI-12, Toronto, Canada July 24-26, 2012 Path optimization under probabilistic uncertainties Problems coming to searching for

More information

State Explosion in Almost-Sure Probabilistic Reachability

State Explosion in Almost-Sure Probabilistic Reachability State Explosion in Almost-Sure Probabilistic Reachability François Laroussinie Lab. Spécification & Vérification, ENS de Cachan & CNRS UMR 8643, 61, av. Pdt. Wilson, 94235 Cachan Cedex France Jeremy Sproston

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) subject to for all t Jonathan Heathcote updated, March 2006 1. The household s problem max E β t u (c t ) t=0 c t + a t+1

More information

AM 121: Intro to Optimization Models and Methods: Fall 2018

AM 121: Intro to Optimization Models and Methods: Fall 2018 AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

Is this a fair game? Great Expectations. win $1 for each match lose $1 if no match

Is this a fair game? Great Expectations. win $1 for each match lose $1 if no match Mathematics for Computer Science MIT 6.042J/18.062J Great Expectations Choose a number from 1 to 6, then roll 3 fair dice: win $1 for each match lose $1 if no match expect_intro.1 expect_intro.3 Example:

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New

More information

STOCHASTIC MODELS LECTURE 1 MARKOV CHAINS. Nan Chen MSc Program in Financial Engineering The Chinese University of Hong Kong (ShenZhen) Sept.

STOCHASTIC MODELS LECTURE 1 MARKOV CHAINS. Nan Chen MSc Program in Financial Engineering The Chinese University of Hong Kong (ShenZhen) Sept. STOCHASTIC MODELS LECTURE 1 MARKOV CHAINS Nan Chen MSc Program in Financial Engineering The Chinese University of Hong Kong (ShenZhen) Sept. 6, 2016 Outline 1. Introduction 2. Chapman-Kolmogrov Equations

More information

Some AI Planning Problems

Some AI Planning Problems Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533

More information

The priority promotion approach to parity games

The priority promotion approach to parity games The priority promotion approach to parity games Massimo Benerecetti 1, Daniele Dell Erba 1, and Fabio Mogavero 2 1 Università degli Studi di Napoli Federico II 2 Università degli Studi di Verona Abstract.

More information

On Model Checking Techniques for Randomized Distributed Systems. Christel Baier Technische Universität Dresden

On Model Checking Techniques for Randomized Distributed Systems. Christel Baier Technische Universität Dresden On Model Checking Techniques for Randomized Distributed Systems Christel Baier Technische Universität Dresden joint work with Nathalie Bertrand Frank Ciesinski Marcus Größer / 6 biological systems, resilient

More information

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Math Homework 5 Solutions

Math Homework 5 Solutions Math 45 - Homework 5 Solutions. Exercise.3., textbook. The stochastic matrix for the gambler problem has the following form, where the states are ordered as (,, 4, 6, 8, ): P = The corresponding diagram

More information

Communicating Parallel Processes. Stephen Brookes

Communicating Parallel Processes. Stephen Brookes Communicating Parallel Processes Stephen Brookes Carnegie Mellon University Deconstructing CSP 1 CSP sequential processes input and output as primitives named parallel composition synchronized communication

More information

Inference for Stochastic Processes

Inference for Stochastic Processes Inference for Stochastic Processes Robert L. Wolpert Revised: June 19, 005 Introduction A stochastic process is a family {X t } of real-valued random variables, all defined on the same probability space

More information

Existence of a Limit on a Dense Set, and. Construction of Continuous Functions on Special Sets

Existence of a Limit on a Dense Set, and. Construction of Continuous Functions on Special Sets Existence of a Limit on a Dense Set, and Construction of Continuous Functions on Special Sets REU 2012 Recap: Definitions Definition Given a real-valued function f, the limit of f exists at a point c R

More information

Strategy Synthesis for Markov Decision Processes and Branching-Time Logics

Strategy Synthesis for Markov Decision Processes and Branching-Time Logics Strategy Synthesis for Markov Decision Processes and Branching-Time Logics Tomáš Brázdil and Vojtěch Forejt Faculty of Informatics, Masaryk University, Botanická 68a, 60200 Brno, Czech Republic. {brazdil,forejt}@fi.muni.cz

More information

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost

More information

Lecture 23: More PSPACE-Complete, Randomized Complexity

Lecture 23: More PSPACE-Complete, Randomized Complexity 6.045 Lecture 23: More PSPACE-Complete, Randomized Complexity 1 Final Exam Information Who: You On What: Everything through PSPACE (today) With What: One sheet (double-sided) of notes are allowed When:

More information

Decision Problems for Deterministic Pushdown Automata on Infinite Words

Decision Problems for Deterministic Pushdown Automata on Infinite Words Decision Problems for Deterministic Pushdown Automata on Infinite Words Christof Löding Lehrstuhl Informatik 7 RWTH Aachen University Germany loeding@cs.rwth-aachen.de The article surveys some decidability

More information

15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete

15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete CS125 Lecture 15 Fall 2016 15.1 Proof of the Cook-Levin Theorem: SAT is NP-complete Already know SAT NP, so only need to show SAT is NP-hard. Let L be any language in NP. Let M be a NTM that decides L

More information

Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC

Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical UIUC Partially observable systems and Predictive State Representation (PSR) Nan Jiang CS 598 Statistical RL @ UIUC Partially observable systems Key assumption so far: Markov property (Markovianness) 2 Partially

More information