Chapter 17 Planning Based on Model Checking
|
|
- Bertha Merry Banks
- 5 years ago
- Views:
Transcription
1 Lecture slides for Automated Planning: Theory and Practice Chapter 17 Planning Based on Model Checking Dana S. Nau University of Maryland 1:19 PM February 9, 01 1
2 Motivation c a b Actions with multiple possible outcomes Action failures» e.g., gripper drops its load Exogenous events» e.g., road closed Nondeterministic systems are like Markov Decision Processes (MDPs), but without probabilities attached to the outcomes Useful if accurate probabilities aren t available, or if probability calculations would introduce inaccuracies a c b grasp(c) Intended outcome a b Unintended outcome
3 Nondeterministic Systems Nondeterministic system: a triple Σ = (S, A, γ) S = finite set of states A = finite set of actions γ: S A s Like in the previous chapter, the book doesn t commit to any particular representation It only deals with the underlying semantics Draw the state-transition graph explicitly Like in the previous chapter, a policy is a function from states into actions π: S A Notation: S π = {s (s,a) π} In some algorithms, we ll temporarily have nondeterministic policies» Ambiguous: multiple actions for some states π: S A, or equivalently, π S A We ll always make these policies deterministic before the algorithm terminates 3
4 Example Robot r1 starts at location l1 Objective is to get r1 to location l4 π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 4
5 Example Robot r1 starts at location l1 Objective is to get r1 to location l4 π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 5
6 Example Robot r1 starts at location l1 Objective is to get r1 to location l4 π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 6
7 Execution Structures Execution structure for a policy π: s The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 7
8 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S s π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 8
9 Execution Structures Execution structure for a policy π: s The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 9
10 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S s π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 10
11 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 11
12 Execution Structures Execution structure for a policy π: The graph of all of π s execution paths Notation: Σ π = (Q,T) Q S T S S π 1 = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4))} π = {(, move(r1,l1,l)), (s, move(r1,l,l3)), (, move(r1,l3,l4)), (, move(r1,l3,l4))} π 3 = {(, move(r1,l1,l4))} 1
13 Types of Solutions Weak solution: at least one execution path reaches a goal s0 a0 a1 s a Strong solution: every execution path reaches a goal s0 a0 a1 s a a3 Strong-cyclic solution: every fair execution path reaches a goal Don t stay in a cycle forever if there s a state-transition out of it s0 a0 a1 a3 s a 13
14 Backward breadth-first search Finding Strong Solutions StrongPreImg(S) = {(s,a) : γ(s,a), γ(s,a) S} all state-action pairs for which all of the successors are in S PruneStates(π,S) = {(s,a) π : s S} S is the set of states we ve already solved keep only the state-action pairs for other states MkDet(π') π' is a policy that may be nondeterministic remove some state-action pairs if necessary, to get a deterministic policy 14
15 Example π = failure π' = S π' = S g S π' = {} 15
16 Example π = failure π' = S π' = S g S π' = {} π'' PreImage = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} 16
17 Example π = failure π' = S π' = S g S π' = {} π'' PreImage = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π π' = π' π' U π'' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} 17
18 π = Example π' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,} S g S π' = {,,} 18
19 π = Example π' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} s S π' = {,} S g S π' = {,,} PreImage {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4)), (,move(r1,l4,l3)), (,move(r1,l4,l5))} π'' {(s,move(r1,l,l3))} π π' = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π' {(s,move(r1,l,l3), (,move(r1,l3,l4)), (,move(r1,l5,l4))} 19
20 Example π = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π' = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {s,,} S g S π' = {s,,,} s 0
21 Example π = {(,move(r1,l3,l4)), (,move(r1,l5,l4))} π' = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} s S π' = {s,,} S g S π' = {s,,,} π'' {(,move(r1,l1,l))} π π' = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} π' {(,move(r1,l1,l)), (s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} 1
22 Example π = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} s π' = {(,move(r1,l1,l)), (s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,s,,} S g S π' = {,s,,,}
23 Example π = {(s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} s π' = {(,move(r1,l1,l)), (s,move(r1,l,l3)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,s,,} S g S π' = {,s,,,} S 0 S g S π' MkDet(π') = π' 3
24 Finding Weak Solutions Weak-Plan is just like Strong-Plan except for this: WeakPreImg(S) = {(s,a) : γ(s,a) i S } at least one successor is in S Weak Weak 4
25 π = failure π' = S π' = S g S π' = {} Example Weak Weak 5
26 Example π = failure π' = S π' = S g S π' = {} π'' = PreImage = {(,move(r1,l1,l4)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} π π' = π' π' U π'' = {(,move(r1,l1,l4)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} Weak Weak 6
27 Example π = π' = {(,move(r1,l1,l4)), (,move(r1,l3,l4)), (,move(r1,l5,l4))} S π' = {,,} S g S π' = {,,,} S 0 S g S π' MkDet(π') = π' Weak Weak 7
28 Finding Strong-Cyclic Solutions Begin with a universal policy π' that contains all state-action pairs Repeatedly, eliminate state-action pairs that take us to bad states PruneOutgoing removes state-action pairs that go to states not in S g S π» PruneOutgoing(π,S) = π {(s,a) π : γ(s,a) S S π PruneUnconnected removes states from which it is impossible to get to S g» with π' =, compute fixpoint of π' π WeakPreImg(S g S π ) 8
29 Finding Strong-Cyclic Solutions s Once the policy stops changing, If it s not a solution, return failure RemoveNonProgress removes state-action pairs that don t go toward the goal implement as backward search from the goal s6 at(r1,l6) MkDet makes sure there s only one action for each state 9
30 Example 1 π π' {(s,a) : a is applicable to s} s s6 at(r1,l6) 30
31 Example 1 π π' {(s,a) : a is applicable to s} π {(s,a) : a is applicable to s} s PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = π' RemoveNonProgress(π') =? s6 at(r1,l6) 31
32 Example 1 π π' {(s,a) : a is applicable to s} π {(s,a) : a is applicable to s} s PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = π' RemoveNonProgress(π') = as shown s6 at(r1,l6) 3
33 π Example 1 π' {(s,a) : a is applicable to s} π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = π' RemoveNonProgress(π') = as shown MkDet( ) = either {(,move(r1,l1,l4), (s,move(r1,l,l3)), (,move(r1,l3,l4), (,move(r1,l4,l6), (,move(r1,l5,l4)} or {(,move(r1,l1,l), (s,move(r1,l,l3)), (,move(r1,l3,l4), (,move(r1,l4,l6), (,move(r1,l5,l4)} s s6 at(r1,l6) 33
34 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s s6 at(r1,l6) 34
35 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = s6 at(r1,l6) 35
36 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' s6 at(r1,l6) 36
37 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = as shown s6 at(r1,l6) 37
38 Example : no applicable actions at π π' {(s,a) : a is applicable to s} s π {(s,a) : a is applicable to s} PruneOutgoing(π',S g ) = π' PruneUnconnected(π',S g ) = as shown π' as shown s6 at(r1,l6) 38
39 Example : no applicable actions at π' as hown π π' PruneOutgoing(π',S g ) = π' s PruneUnconnected(π',S g ) = π' so π = π' RemoveNonProgress(π') = s6 at(r1,l6) 39
40 Example : no applicable actions at π' shown π π' PruneOutgoing(π',S g ) = π' s PruneUnconnected(π'',S g ) = π' so π = π' RemoveNonProgress(π') = as shown MkDet(shown) = no change s6 at(r1,l6) 40
41 Planning for Extended s Here, extended means temporally extended Constraints that apply to some sequence of states Examples: want to move to l3, and then to l5 want to keep going back and forth between l3 and l5 wait wait wait move(r1,l,l1) wait 41
42 Planning for Extended s Context: the internal state of the controller Plan: (C, c 0, act, ctxt) C = a set of execution contexts c 0 is the initial context act: S C A ctxt: S C S C Sections 17.3 extends the ideas in Sections 17.1 and 17. to deal with extended goals We ll skip the details 4
Chapter 17 Planning Based on Model Checking
Lecture slides for Automated Planning: Theory and Practice Chapter 17 Planning Based on Model Checking Dana S. Nau CMSC 7, AI Planning University of Maryland, Spring 008 1 Motivation c Actions with multiple
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationChapter 7 Propositional Satisfiability Techniques
Lecture slides for Automated Planning: Theory and Practice Chapter 7 Propositional Satisfiability Techniques Dana S. Nau University of Maryland 12:58 PM February 15, 2012 1 Motivation Propositional satisfiability:
More informationChapter 7 Propositional Satisfiability Techniques
Lecture slides for Automated Planning: Theory and Practice Chapter 7 Propositional Satisfiability Techniques Dana S. Nau CMSC 722, AI Planning University of Maryland, Spring 2008 1 Motivation Propositional
More informationTask Decomposition on Abstract States, for Planning under Nondeterminism
Task Decomposition on Abstract States, for Planning under Nondeterminism Ugur Kuter a, Dana Nau a, Marco Pistore b, Paolo Traverso b a Department of Computer Science and Institute of Systems Research and
More informationPlanning Under Uncertainty II
Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov
More informationChapter 14 Temporal Planning
Lecture slides for Automated Planning: Theory and Practice Chapter 14 Temporal Planning Dana S. Nau CMSC 722, AI Planning University of Maryland, Spring 2008 1 Temporal Planning Motivation: want to do
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationCS 4100 // artificial intelligence. Recap/midterm review!
CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks
More informationChapter 5 Plan-Space Planning
Lecture slides for Automated Planning: Theory and Practice Chapter 5 Plan-Space Planning Dana S. Nau University of Maryland 5:10 PM February 6, 2012 1 Problem with state-space search Motivation In some
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationIntelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University
Intelligent Agents Formal Characteristics of Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University Extensions to the slides for chapter 3 of Dana Nau with contributions by
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationChapter 4 Deliberation with Temporal Domain Models
Last update: April 10, 2018 Chapter 4 Deliberation with Temporal Domain Models Section 4.4: Constraint Management Section 4.5: Acting with Temporal Models Automated Planning and Acting Malik Ghallab, Dana
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationIntroduction to Spring 2006 Artificial Intelligence Practice Final
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Spring 2006 Artificial Intelligence Practice Final You have 180 minutes. The exam is open-book, open-notes, no electronics other than basic calculators.
More informationMarkov Decision Processes (and a small amount of reinforcement learning)
Markov Decision Processes (and a small amount of reinforcement learning) Slides adapted from: Brian Williams, MIT Manuela Veloso, Andrew Moore, Reid Simmons, & Tom Mitchell, CMU Nicholas Roy 16.4/13 Session
More information1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a
3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian
More informationLecture 3: The Reinforcement Learning Problem
Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationMachine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel
Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning
More informationReinforcement Learning. Spring 2018 Defining MDPs, Planning
Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationLecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010
Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationCMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)
More informationCOMP4418, 25 October 2017 Planning. Planning. KRR for Agents in Dynamic Environments. David Rajaratnam 2017 ( Michael Thielscher 2015) COMP s2
1 Planning KRR for Agents in Dynamic Environments 2 Overview Last week discussed Decision Making in some very general settings: Markov process, MDP, HMM, POMDP. This week look at a practical application
More informationPlanning With Information States: A Survey Term Project for cs397sml Spring 2002
Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Jason O Kane jokane@uiuc.edu April 18, 2003 1 Introduction Classical planning generally depends on the assumption that the
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationMarkov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.
Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationAsynchronous Models For Consensus
Distributed Systems 600.437 Asynchronous Models for Consensus Department of Computer Science The Johns Hopkins University 1 Asynchronous Models For Consensus Lecture 5 Further reading: Distributed Algorithms
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationData Structures in Java
Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationIntroduction to Spring 2009 Artificial Intelligence Midterm Exam
S 188 Introduction to Spring 009 rtificial Intelligence Midterm Exam INSTRUTINS You have 3 hours. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationLecture Notes on Software Model Checking
15-414: Bug Catching: Automated Program Verification Lecture Notes on Software Model Checking Matt Fredrikson André Platzer Carnegie Mellon University Lecture 19 1 Introduction So far we ve focused on
More informationDon t Plan for the Unexpected: Planning Based on Plausibility Models
Don t Plan for the Unexpected: Planning Based on Plausibility Models Thomas Bolander, DTU Informatics, Technical University of Denmark Joint work with Mikkel Birkegaard Andersen and Martin Holm Jensen
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More informationtotal 58
CS687 Exam, J. Košecká Name: HONOR SYSTEM: This examination is strictly individual. You are not allowed to talk, discuss, exchange solutions, etc., with other fellow students. Furthermore, you are only
More informationSFM-11:CONNECT Summer School, Bertinoro, June 2011
SFM-:CONNECT Summer School, Bertinoro, June 20 EU-FP7: CONNECT LSCITS/PSS VERIWARE Part 3 Markov decision processes Overview Lectures and 2: Introduction 2 Discrete-time Markov chains 3 Markov decision
More informationPART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.
Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE
More informationWhat happens to the value of the expression x + y every time we execute this loop? while x>0 do ( y := y+z ; x := x:= x z )
Starter Questions Feel free to discuss these with your neighbour: Consider two states s 1 and s 2 such that s 1, x := x + 1 s 2 If predicate P (x = y + 1) is true for s 2 then what does that tell us about
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationContingent Planning with Goal Preferences
Contingent Planning with Goal Preferences Dmitry Shaparau ITC-IRST, Trento, Italy shaparau@irst.itc.it Marco Pistore University of Trento, Italy pistore@dit.unitn.it Paolo Traverso ITC-IRST, Trento, Italy
More informationCS 188 Fall Introduction to Artificial Intelligence Midterm 2
CS 188 Fall 2013 Introduction to rtificial Intelligence Midterm 2 ˆ You have approximately 2 hours and 50 minutes. ˆ The exam is closed book, closed notes except your one-page crib sheet. ˆ Please use
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationThe State Explosion Problem
The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis
More informationPlanning as Satisfiability
Planning as Satisfiability Alan Fern * Review of propositional logic (see chapter 7) Planning as propositional satisfiability Satisfiability techniques (see chapter 7) Combining satisfiability techniques
More informationCSCI3390-Assignment 2 Solutions
CSCI3390-Assignment 2 Solutions due February 3, 2016 1 TMs for Deciding Languages Write the specification of a Turing machine recognizing one of the following three languages. Do one of these problems.
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationBusch Complexity Lectures: Turing s Thesis. Costas Busch - LSU 1
Busch Complexity Lectures: Turing s Thesis Costas Busch - LSU 1 Turing s thesis (1930): Any computation carried out by mechanical means can be performed by a Turing Machine Costas Busch - LSU 2 Algorithm:
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationPART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.
Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE
More informationCSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam
CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost
More informationSome AI Planning Problems
Course Logistics CS533: Intelligent Agents and Decision Making M, W, F: 1:00 1:50 Instructor: Alan Fern (KEC2071) Office hours: by appointment (see me after class or send email) Emailing me: include CS533
More informationLogic: Intro & Propositional Definite Clause Logic
Logic: Intro & Propositional Definite Clause Logic Alan Mackworth UBC CS 322 Logic 1 February 27, 2013 P & M extbook 5.1 Lecture Overview Recap: CSP planning Intro to Logic Propositional Definite Clause
More informationLogic Design II (17.342) Spring Lecture Outline
Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least
More informationAGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:
AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)
More informationControlling probabilistic systems under partial observation an automata and verification perspective
Controlling probabilistic systems under partial observation an automata and verification perspective Nathalie Bertrand, Inria Rennes, France Uncertainty in Computation Workshop October 4th 2016, Simons
More informationWeek 2: Defining Computation
Computational Complexity Theory Summer HSSP 2018 Week 2: Defining Computation Dylan Hendrickson MIT Educational Studies Program 2.1 Turing Machines Turing machines provide a simple, clearly defined way
More informationArtificial Intelligence Planning
Artificial Intelligence Planning Gholamreza Ghassem-Sani Sharif University of Technology Lecture slides are mainly based on Dana Nau s AI Planning University of Maryland 1 For technical details: Related
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationQ-learning Tutorial. CSC411 Geoffrey Roeder. Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava
Q-learning Tutorial CSC411 Geoffrey Roeder Slides Adapted from lecture: Rich Zemel, Raquel Urtasun, Sanja Fidler, Nitish Srivastava Tutorial Agenda Refresh RL terminology through Tic Tac Toe Deterministic
More informationMarkov Chains and Related Matters
Markov Chains and Related Matters 2 :9 3 4 : The four nodes are called states. The numbers on the arrows are called transition probabilities. For example if we are in state, there is a probability of going
More informationAlgorithms for Playing and Solving games*
Algorithms for Playing and Solving games* Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 * Two Player Zero-sum Discrete
More informationLatches. October 13, 2003 Latches 1
Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory
More informationARTIFICIAL INTELLIGENCE. Reinforcement learning
INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationTemporal difference learning
Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationThe Markov Decision Process (MDP) model
Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy School of Informatics 25 January, 2013 In the MAB Model We were in a single casino and the
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationReinforcement Learning. Summer 2017 Defining MDPs, Planning
Reinforcement Learning Summer 2017 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationEconomics 2010c: Lectures 9-10 Bellman Equation in Continuous Time
Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time David Laibson 9/30/2014 Outline Lectures 9-10: 9.1 Continuous-time Bellman Equation 9.2 Application: Merton s Problem 9.3 Application:
More informationHomework set 2 - Solutions
Homework set 2 - Solutions Math 495 Renato Feres Simulating a Markov chain in R Generating sample sequences of a finite state Markov chain. The following is a simple program for generating sample sequences
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing
More informationDiscrete Probability and State Estimation
6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes
More informationGame Theory and Algorithms Lecture 2: Nash Equilibria and Examples
Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples February 24, 2011 Summary: We introduce the Nash Equilibrium: an outcome (action profile) which is stable in the sense that no player
More informationAM 121: Intro to Optimization Models and Methods: Fall 2018
AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted
More informationLecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationComputer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms
Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationCS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2)
CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2) Tim Roughgarden February 2, 2016 1 Recap This is our third lecture on linear programming, and the second on linear programming
More informationDecision, Computation and Language
Decision, Computation and Language Non-Deterministic Finite Automata (NFA) Dr. Muhammad S Khan (mskhan@liv.ac.uk) Ashton Building, Room G22 http://www.csc.liv.ac.uk/~khan/comp218 Finite State Automata
More informationGoal specification using temporal logic in presence of non-deterministic actions
Goal specification using temporal logic in presence of non-deterministic actions Chitta Baral Matt Barry Department of Computer Sc. and Engg. Advance Tech Development Lab Arizona State University United
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationCS505: Distributed Systems
Cristina Nita-Rotaru CS505: Distributed Systems. Required reading for this topic } Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson for "Impossibility of Distributed with One Faulty Process,
More information