AUTONOMOUS SYSTEMS. Task Planning. Pedro U. Lima M. Isabel Ribeiro Luis Custódio

Size: px
Start display at page:

Download "AUTONOMOUS SYSTEMS. Task Planning. Pedro U. Lima M. Isabel Ribeiro Luis Custódio"

Transcription

1 AUTONOMOUS SYSTEMS Task Planning Pedro U. Lima M. Isabel Ribeiro Luis Custódio Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal March 2007 Revised by Pedro U. Lima in November 2015

2 Outline 1. Planning Problem 2. Logic 3. Logic-Based Planning: Situation Calculus, STRIPS 4. Plan Representation and Modeling: Petri Net Task Models 5. Plan Analysis 6. Planning Under Uncertainty 7. Markov Decision Processes (MDP) 8. Dynamic Programming Solution of MDPs 9. Reinforcement Learning Solution of MDPs

3 Planning Planning consists of determining the sequence of actions that enables reaching the goal(s) of an agent. Robot Task Planning consists of determining the appropriate sequence of actions to move a robot from the current world situation to a world situation that satisfies its preferences.

4 Planning Robot Task Planning [Courtesy of JSK Lab U. Tokyo, Japan]

5 Logic Logic can be seen as a language to represent the knowledge about the world and a particular problem to be solved. Syntactic System Set of accepted symbols Set of rules establishing how symbols can be aggregated so as to build formulas/ sentences Alphabet Formation rules LANGUAGE Set of rules that establish how to derive formulas from other formulas INFERENCE RULES

6 Logic Semantic System Assigns a meaning to the language formulas World (semantics) Facts Language (syntax) Formulas

7 Logic Syntactic System vs Semantic System Language rules g + r + e + e + n green Associates a color to the word green Arithmetic rules x, y expressions representing numbers x > y is a formula over numbers Fact is true when the number represented by x is greater than the number represented by y

8 Logic Typically, one deals only with the world issues relevant for the problem, through a conceptualization of the reality Objects and their relations are defined Functions given a set of objects, a function establishes which object is related to the object(s) in the set and how, e.g., left_room(kitchen) Relations given a set of objects, establishes if that set is related in a certain way e.g., on(laptop, table)

9 Logic The concept of interpretation establishes the link between the language elements and the conceptualization of the reality elements (objects, functions and relations). Given a formula written in the defined language, its interpretation is designated as proposition A proposition is true iff it correctly describes the world, based on the adopted conceptualization of the reality A formula is satisfied iff there is an interpretation that associates it to a true proposition

10 Logic A fact is a true proposition for a given (conceptualized) world state The initial known facts compose the initial knowledge base Inference is the process of obtaining new propositions (conclusions) from the knowledge base To ensure that a given reached conclusion is satisfied by the adopted interpretation, only a conclusion satisfied for all the interpretations that satisfy the starting propositions (premises) is accepted. This way, we guarantee that, should the premises be satisfied, so is the conclusion, irrespectively of the interpretation. Inference Rule Ex. (Modus Ponens): Premises Conclusion IF on(a,b) THEN above(a,b) above(a,b) on(a,b)

11 Logic Propositional Logic Facts objects, functions and relations Predicate Logic variables quantifiers

12 Propositional vs Predicate Logic room S1 P1 door P1 room S2 Example R S2 door P2 P2 S1 robot S3 B box room S3 World Model (KB) Propositional Logic robot_inroom_s1 box_inroom_s3 door_p1_connects_rooms_s1_s2 door_p2_connects_rooms_s2_s3 Predicate Logic inroom(<obj>,<room>) <OBJ> ß robot; <ROOM> ß S1 <OBJ> ß box; <ROOM> ß S3 connects(<door>,<room1>,<room2>) <DOOR> ß P1; <ROOM1> ß S1; <ROOM2> ß S2 <DOOR> ß P2; <ROOM1> ß S2; <ROOM2> ß S3

13 Situation Calculus logic handles propositions truth, not action execution logic can not tell which action should be executed at most it can suggest the possible actions time and changes are not adequately handled by basic logic (propositional, predicate) Idea: the world state is represented by a proposition set the set is changed according to received perceptions and executed actions the world evolution is described by diachronic rules, which express how the world changes representation of change Situation Calculus attempts to solve the problems associated to representation and reasoning under changes. It is based on predicate logic and describes the world as a sequence of situations, each of which represents a world state

14 Situation Calculus one situation is generated from another situation by executing an action an argument is added to each property (represented by a predicate) that may change denoting the situation where the property is satisfied Ex: localization( agent, (1,1), S 0 ) localization( agent, (1,2), S 1 ) to represent passing from one situation to another, the following function is used: Result( action, situation) : Α Σ Σ Ex: Result( go_ahead, S 0 ) = S 1

15 Situation Calculus Effect Axioms pre-conditions predicate (to execute the action) (whose logical value changes after the action is executed) State action effects to describe the change(s) due to the action effect(s) e.g.,: x s Present(x, s) Portable(x) Hold(x, Result(pickup, s)) x s Hold(x, s) Hold(x, Result(release, s))

16 Situation Calculus Frame Axioms predicate conditions predicate (logical value in current situation) (for no change) (in the situation following the action) One needs to explain what does not change due to the action execution e.g.,: a x s Hold(x, s) (a release) Hold(x, Result(a, s)) a x s Hold(x, s) (a pickup (Present(x, s) Portable(x)) Hold(x, Result(a, s))

17 Situation Calculus Successor State Axioms merge effect and frame axioms Predicate true in the next situation [ one action makes it true It was true in the previous situation no action made it false] e.g., a x s Hold(x, Result(a, s)) [ (a = pickup Present(x, s) Portable(x)) (Hold(x, s) a release) ] a x s Hold(x, Result(a, s)) [ (a = release) ( Hold(x, s) (a pickup (Present(x, s) Portable(x)))) ]

18 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Effect Axioms: x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s))

19 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Frame Axioms: x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s))

20 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Resulting Successor State Axioms: x y z a s On(x, y, result(a,s)) [ ( a=puton(x,y) On(x, z, s) ClearTop(x,s) ClearTop(y,s) Block(x) (Block(y) y=m) ) ( a PutOn(x,z) On(x, y, s) ) ] x y z a s ClearTop(z, result(a,s)) [ ( a=puton(x,y) On(x, z, s) ClearTop(x,s) ClearTop(y,s) Block(x) (Block(y) y=m) ) ( a PutOn(x,z) On(x, y, s) ) ]

21 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Initial State: Block(A) Block(B) Block(C) On(C, M, s 0 ) On(B, C, s 0 ) On(A, B, s 0 ) ClearTop(A, s 0 ) ClearTop(M, s 0 ) Goal State s: Block(A) Block(B) Block(C) On(A, M, s) On(B, A, s) On(C, B, s) ClearTop(C, s) ClearTop(M, s)

22 a b c Effect Axioms: E1) x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) E2) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s)) Frame Axioms: N1) x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) N2) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s)) Initial Situation(s 0 ): Block(A); Block(B); Block(C); On(C, M, s o ); On(B, C, s o ); On(A, B, s 0 ); ClearTop(A, s 0 ); ClearTop(M, s 0 ) s 0 Situation Calculus Example (Blocks World) In s 0, axiom E1) is applicable with x=a and y=m : block(a) (block(m) M = M) ClearTop(A,s 0 ) ClearTop(M,s 0 ) On(A, M, result(puton(a,m), s 0 )) If s 1 = result(puton(a,m), s 0 ) then On(A, M, s 1 ) In s 0, axiom E2) is applicable with x=a, y=m and w=b then ClearTop(B, s 1 ) In s 0, axiom N1) is applicable with w=a, z=m, x=c and y=m then On(C, M, s 1 ) In s 0, axiom N1) is applicable with w=a, z=m, x=b and y=c then On(B, C, s 1 ) In s 0, axiom N1) is not applicable with w=a, z=m, x=a and y=b (i.e., On(A,B,s 1 ) is false) In s 0, axiom N2) is applicable with x=a, w=m and y=a then ClearTop(A, s 1 ) Situation s 1 : On(A, M, s 1 ); On(C, M, s 1 ); On(B, C, s 1 ), ClearTop(B, s 1 ); ClearTop(A, s 1 ); ClearTop(M,s 1 ) b c a

23 Situation Calculus Example (Blocks World) s 1 b c a Effect Axioms: E1) x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) E2) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s)) Frame Axioms: N1) x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) N2) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s)) Initial Situation(s 0 ): Block(A); Block(B); Block(C); On(C, M, s o ); On(B, C, s o ); On(A, B, s 0 ); ClearTop(A, s 0 ); ClearTop(M, s 0 ) In s 1, axiom E1) is applicable with x=b e y=a : block(b) (block(a) A = M) ClearTop(B,s 1 ) ClearTop(A,s 1 ) On(B, A, result(puton(b,a), s 1 )) If s 2 = result(puton(b,a), s 1 ) then On(B, A, s 2 ) In s 1, axiom E2) is applicable with x=b, y=a and w=c then ClearTop(C, s 2 ) In s 1, axiom N1) is applicable with w=b, z=a, x=c and y=m then On(C, M, s 2 ) In s 1, axiom N1) is applicable with w=b, z=a, x=a and y=m then On(A, M, s 2 ) In s 1, axiom N1) is not applicable with w=b, z=a, x=b and y=c (i.e., On(B,C,s 2 ) is false) In s 1, axiom N2) is applicable with x=b, w=a and y=b then ClearTop(B, s 2 ) In s 1, axiom N2) is not applicable with x=b, w=a and y=a (i.e., ClearTop(A,s 2 ) is false) Situation s 2 : On(A, M, s 2 ); On(C, M, s 2 ); On(B, A, s 2 ), ClearTop(C, s 2 ); ClearTop(B, s 2 ); ClearTop(M,s 2 ) c b a

24 Situation Calculus Example (Blocks World) s 2 c b a Effect Axioms: E1) x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) E2) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s)) Frame Axioms: N1) x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) N2) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s)) Initial Situation(s 0 ): Block(A); Block(B); Block(C); On(C, M, s o ); On(B, C, s o ); On(A, B, s 0 ); ClearTop(A, s 0 ); ClearTop(M, s 0 ) In s 2, axiom E1) is applicable for x=c e y=b : block(c) (block(b) B = M) ClearTop(C,s 2 ) ClearTop(B,s 2 ) On(C, B, result(puton(c,b), s 2 )) If s 3 = result(puton(c,b), s 2 ) then On(C, B, s 3 ) In s 2, axiom N1) is applicable for w=c, z=b, x=a e y=b then On(B, A, s 3 ) In s 2, axiom N1) is applicable for w=c, z=b, x=a e y=m then On(A, M, s 3 ) In s 2, axiom N1) is not applicable for w=c, z=b, x=c e y=m (i.e., On(C,M,s 3 ) is false) In s 2, axiom N2) é aplicável para x=c, w=b e y=c then ClearTop(C, s 3 ) In s 2, axiom N2) is not applicable for x=c, w=b e y=b (i.e., ClearTop(B,s 3 ) is false) Situation s 3 : On(C, B, s 3 ); On(B, A, s 3 ); On(A, M, s 3 ); ClearTop(C, s 3 ); ClearTop(M,s 3 ) c b a

25 Situation Calculus Example (Blocks World) Initial Situation a b c Action Sequence (plan) [ PutOn(A, M), PutOn(B, A), PutOn(C, B) ] Final Situation c b a

26 Complexity of Planning Problem The problem is intractable in the general case Simplifying assumptions: agent knows everything that is relevant for the planning problem agent knows how its available actions can change the world state from one state to another the planning agent is in control of the world the only state changes are the result of its deliberate actions the agent s preferred world states are constant during a planning episode Based on these assumptions, a typical approach is: first formulate the plan then execute it

27 Extensions of Planning Problem The real world surrounding the robot does not meet most of the simplifying assumptions, especially in dynamic, uncertain environments EXTENSIONS conditional planning: handles uncertainty by enumerating the possible states that may arise after the execution of an action and provides alternative courses of action for each of them plan monitoring and repair: during plan execution, progress is monitored and, when deviations from the predicted nominal conditions occur, the plan execution halts and a revised plan is created continual planning: in dynamic environments, one may allow context and/or agent s preferences changes and plan revision is an ongoing process rather than one triggered by failures of the nominal plan. Planning is not made in too much detail into the future, and it is interleaved with execution

28 Basic Planning Problem Formulation A possible formulation of the Planning problem is (Lavalle, 1996): 1. A nonempty state space, X, which is a finite or countably infinite set of states. 2. For each state, x X, a finite action space, U(x). 3. A state transition function, f, which produces a state, f(x; u) X, for every x X and u U(x). The state transition equation is derived from f as x = f(x; u). 4. An initial state, x I X. 5. A goal set, X G X.

29 Basic Planning Problem Formulation represent the planning problem as a directed state transition graph: set of vertices is the state space, X a directed edge from x X to x X exists in the graph if there exists an action u U(x) such that x = f(x; u) the initial state and goal set are designated as special vertices in the graph. X1 obj on the left u1 pick u2 push X2 obj in hand u3 release u1 pick X4 obj on the right X3 obj on ground

30 Basic Planning Problem Formulation Based on this formulation, several problem solving algorithms are available to find a feasible plan (i.e., one that leads from the initial to one of the goal states, not necessarily optimal). Examples: breadth-first depth-first best-first A*... Algorithms to solve Discrete Optimal Planning problems also exist, typically based on Dynamic Programming: find the sequence of actions that lead to the goal set and optimize some criterion, such as distance traversed or energy spent costs are associated to actions.

31 Logic-Based Planning ADVANTAGES build compact representations for discrete planning problems, when their regularity allows such compression convenient for producing output that logically explains the steps involved to arrive at some goal DISADVANTAGES difficult to generalize to enable concepts such as modeling uncertainty, unpredictability, sensing errors, and game theory to be incorporated into planning

32 Logic-Based Planning It is possible to convert the logic-based formulation into the graph-based formulation, e.g., the set of literals may be encoded as a binary string by imposing a linear ordering on the instances and predicates, and using 1s for true and 0 for false. This way, even optimal solutions can be found, if we associate costs to actions obj_on_the_left obj_on_the_right obj_in_hand obj_on_ground x1 x

33 Logic-Based Planning However, the problem dimension may become intractable, even for a small number of predicates and instances e.g., constant number k of arguments per predicate à k space state dimension is 2 P I, where P is the number of predicates I the number of instances per predicate argument. obj_on_the_left obj_on_the_right obj_in_hand obj_on_ground 4 predicates ( P = 4) with 1 argument (k = 1) left(<obj>) right(<obj>) inhand(<obj>) ground(<obj>) Autonomous Systems Pedro Lima, M. Isabel Ribeiro 3 objects ( I = 3) bolt nut bin Task Planning

34 Logic-Based Planning A STRIPS-like Planning formulation is (Lavalle, 1996): 1. A nonempty set, I, of instances. 2. A nonempty set, P, of predicates, which are binary-valued (partial) functions of one of more instances. Each application of a predicate to a specific set of instances is called a positive literal if the predicate is true or a negative literal if it is false. 3. A nonempty set, O, of operators, each of which has: 1) preconditions, which is a set of positive and negative literals that must hold for the operator to apply, and 2) effects, which is a set of positive and negative literals that are the result of applying the operator. 4. An initial set, S, which is expressed as a set of positive literals. All literals not appearing in S are assumed to be negative. 5. A goal set, G, which is expressed as a set of both positive and negative literals.

35 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) Example: mobile robot should move a box from room S3 to S2 room S1 P1 door P1 room S2 World Model (KB) inroom(robot, room s1) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) S1 R robot S2 S3 B box door P2 P2 room S3 Goal inroom(box, room s2) Plan (Action Sequence) move(robot, room s1, room s3) search(box) push(box, room s3, room s2, door p2)

36 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) tasks are specified as well-formed-formulas or wff (predicate calculus) planning system attempts to find an action sequence that modifies the world models so as to make the wff TRUE to generate a plan, the effect of each action is modeled

37 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) operator (actions over world model) world model S i clause set operator pre-conditions add clauses remove clauses world model S i+1 clause set inroom(robot, room s1) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) inroom(robot, room s2) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) 1. is goal clause in the current world model? YES: success NO: 2. search in the operator list one whose pre-conditions are satisfied and that, when applied to the current one, produces a new world model where the goal is closer to be satisfied 3. GoTo 1

38 Logic-Based Planning STRIPS and Situation Calculus STRIPS Pre-conditions: inroom(robot, room s1) connects(door p1, room s1, room s2) OPERATOR move(robot, room s1, room s2) Effects: Add: inroom(robot, room s2) Delete: inroom(robot, room s1) Situation Calculus a x s room(s2) inroom(robot, s2, Result(a,s)) [room(s1) (a = move(robot, s1, s2) inroom(robot, s1,s) (room(x) inroom(robot, s2,s) a move(robot, s2, x) ) ]

39 Plan Representation and Task Modeling How to represent and determine the right plan? Behavior switching for a soccer robot lost_ball undribbable TakeBall2Goal no_ball success Score saw_ball AND ShouldIGo ClearBall lost_ball obstacle success success NOT ShouldIGo unreachable_ball OR lost_ball GetClose2Ball ShouldIGo saw_ball AND NOT ShouldIGo saw_ball AND ShouldIGo Standby saw_ball AND NOT ShouldIGo lost_ball success OR unreachable_posture OR (saw_ball AND NOT ShouldIGo) saw_ball AND ShouldIGo GoEmptySpot success OR unreachable_posture GoHome? success OR (NOT can_shoot_safely) success OR lost_ball

40 Plan Representation and Task Modeling to design a plan that meets some specifications we need a model of the robot task the plan is supposed to carry out a model enables performance analysis, formal verification (model checking) robot tasks are discrete event systems (DES) event-driven (not time-driven) discrete state space (not continuous)

41 Plan Representation and Task Modeling x time-driven continuous state space t x e 3 e 1 e 3 x 0 x 2 x 1 x 0 e 2 event-driven discrete state space t 1 t 2 t 3 t 4 t

42 Plan Representation and Task Modeling DES: State Machines / Finite State Automata

43 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

44 Petri Nets Def.: A Petri net (PN) graph or structure is a weighted bipartite graph (P,T,A,w), where: P={p 1, p 2,... p n } is the finite set of places T ={t 1, t 2,... t m } is the finite set of transitions A ( P T) ( T P) is the set of arcs from places to transitions (p i,t j ) and transitions to places (t j,p i ) w: A { 1,2,3, } is the weight function on the arcs Set of input places to t j T I( t ) = { p P : ( p, t ) A} Set of output places from j i t j T O( t ) = { p P : ( t, p ) A} j i i j j i

45 Petri Nets Def.: A marked Petri net is a five-tuple (P,T,A,w,x), where (P,T,A,w) is a Petri net graph and x is a marking of the set of n places P; x = [ x( p1), x( p2),, x( p n )] N is the row vector associated with x. p 2 t 2 p 4 p 1 t 1 p 3 t 3

46 Petri Nets Def. (PN dynamics): The state transition function, f : N T of Petri net (P,T,A,w,x), is defined for transition t j T iff x( pi ) w( pi, t j ), pi I( t j ). If f(x,t j ) is defined, the new state is x = f(x,t j ) where x' ( p ) = x( p ) w( p, t ) + w( t, p ), i = 1,, n. i i i j j i n Enabled t j N n p 2 t 2 p 4 p 1 t 1 p 3 t 3

47 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

48 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

49 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

50 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

51 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

52 Plan Representation and Modeling Petri Net Models of Robotic Tasks (Lima et al, 1998) (Milutinovic, Lima, 2002)(Costelha, Lima, 2012) Places with tokens represent resources available primitive actions running State is distributed over the places with tokens (PN marking) Events assigned to transitions and represent uncontrolled changes of state (e.g., caused by other agents or simply by the environment dynamics) controlled decisions to start a primitive action Transition fires when it is enabled and the labeling event occurs (note: the labeling event may be replaced by input/output arcs to places representing the reading of sensors for modeling and analysis)

53 Plan Representation and Modeling PN model of a multi-task single robot detected_ post track_found move2post teardown_pole back2track reached_post pole_down standby track_found following_track no_ no_ no_ resuming_ resuming_ resuming_ look_ahead point look_left point look_right point found_ interrupt resuming_ point resuming_ point resuming_ point following_resuming_point check_if_track track_not_found track_found

54 Plan Representation and Modeling (Lima et al, 1998) A Tool for Robotic Task Design and Distributed Execution Further developments in (Milutinovic, Lima, 2002) t vision_ready2locate_ball 2 locating_ball standby t 1 p 2 new_frame p 3 p start 1 robot_ready2move moving2ball catching_ball t 3 t 4 t 5 ball_located ready2catch p 4 p 5 p 6 ball_catched

55 Petri Nets Def. (Labeled Petri net): A labeled Petri net N is an eight-tuple N = ( P, T, A, w, E, l, x0, Xm) where ( P, T, A, w) is a PN graph E is the event set for transition labeling l : T E is the transition labeling function x 0 X N m n N is the initial state n is the set of marked states Def. (Languages generated and marked): L( N) : = { l( s) E : s T and f ( x0, s) is L m defined} ( N) : = { l( s) L( N) : s T and f ( x0, s) Xm}

56 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = x 0 = [ ] T marking or state Generated string: ε in L

57 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s in L

58 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s nf in L

59 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s nf bl in L

60 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T Generated and Marked Languages marking or state L( N) = { ε, s, s Lm ( G) = { ε, s, s nf, s nf bl nf,...} nf bl r2c bc} L( G)

61 Plan Representation and Modeling Monitoring algorithms check the value of predicates over world state variables. Event occurrence means that a logical function of the predicates changed from true to false or vice-versa. Examples of events: found_ball: see(ball)= false true lost_ball: see(ball)= true false see_ball AND closest_player2ball = false true

62 Plan Representation and Modeling PN markings represent world states A plan to carry out a task is the sequence of primitive actions in a sequence of markings (world states) Plans are conditional, as resource places in markings represent logical pre-conditions for the execution of the next primitive actiion Example: primitive actions set X={GetCloseToBall, TakeBallToGoal, Score } Plan: GetCloseToBall. TakeBallToGoal. Score

63 Plan Representation and Modeling Event sequences (i.e., strings) are an equivalent representation of plans A language is the set of all possible plans for a robot Different language classes are equivalent to machine types used to represent and execute the task (Finite State Automata, PNs,...) Of course, larger classes have an increased modeling power (e.g., PN languages vs regular/finite state machine languages) Do not confuse this with modeling elegance it is more natural to program with a rule-based system rather than with a state machine, but it is not necessarily more powerful (compare with C vs assembly)

64 Plan Representation and Modeling Abstraction Levels in Discrete Event Systems Untimed Timed Stochastic Timed e, e2,..., e 1 k,... time associated to events duration stochastic time associated to events FSA x, x1,..., x 0 k,... Timed FSA x(t) STA x( t) p( x( t)) e, e2,..., e 1 k,... time associated to transitions/ events duration stochastic time associated to transitions/events duration PN x 0,x 1,...,x k,... Timed PN x(t) SPN x( t) p( x( t))

65 Stochastic DES STOCHASTIC TIMED AUTOMATA (STA) inter-event time is stochastically distributed (typical case: exponential pdf) STOCHASTIC PETRI NET (SPN) inter-event time is stochastically distributed (typical case: exponential pdf) stochastic inter-event time assigned to transitions SPN with exponential timed transitions is equivalent to a Markov Chain

66 Controllable vs Uncontrollable Events in PNs (Costelha and Lima, 2012) Conflict between transitions associated enabled by different predicates (whose value is not controlled by the robot) Uncertain action effects Conflict between controllable events (associated to commands to start Dribble2Goal or Kick2Goal) e.g., probability that robot does not see ball happens before getting close to ball λ 2 λ 2 + λ 3 Random switch: probability of choosing Dribble2Goal is p 5 probability of choosing Kick2Goal is p 5 p 7 Probabilistic policy

67 PN Hierarchical Plan Representation (Costelha and Lima, 2012)

68 Generalized Stochastic Petri Net Closed Loop Robot Plan / Environent (Costelha and Lima, 2012) stochastic transitions immediate transitions

69 Plan Qualitative Analysis (Formal Verification) Qualitative view/models enable answering analysis questions such as: will bad behaviors occur? will unsafe states be avoided? will we attempt to use more resources than those available? Qualitative view/models enable designing supervisors for specifications such as: eliminate substrings corresponding to bad behaviors avoid blocking ensure bounded usage of resources

70 Plan Qualitative Analysis (Formal Verification) Safety properties For all executions the system avoids a bad set of events or a set of bad strings is never generated or marked. e.g., robot does not enter a room where holes on the ground exist (any sequence including traversing a door leading to the room must be disabled from happening) Blocking properties deadlocks or livelocks e.g., robot that can only move forward enters a corridor with a dead-end)

71 Plan Qualitative Analysis Def. (Boundedness): Place p i P in PN N with initial state x 0 is said to be k-bounded, or k-safe, if x(p i ) k for all states x R(N), i.e., for all reachable states. e.g., robot can not be called for a transportation task a 2nd time while it is performing the same task (place corresponding to the robot performing the transportation task should be 1-bounded or safe) Def. (Conservation): A PN N with initial state x 0 is said to be conservative with respect to γ = [γ 1, γ 2,..., γ ν ] if n i= 1 γ x( i p i ) = constant for all reachable states. e.g., robot with only one tool can never use 2 tools simultaneously during the performance of the whole task

72 Plan Qualitative Analysis Def. (Liveness): A PN N with initial state x 0 is said to be live if there always exists some sample path such that any transition can eventually fire from any state reached from x 0. Liveness levels - a transition in a PN may be: Dead or L0-live, if the transition can never fire from this state L1-live, if there is some firing sequence from x 0 such that the transition can fire at least once L2-live, if the transition can fire at least k times for some given positive integer k L3-live, if there exists some infinite firing sequence in which the transition appears infinitely often L4-live, if the transition is L1-live for every possible state reached from x 0 L3 L1 L0 L2 This property is related to the reachability of given states, and with the repeatability of system states (e.g., error recovery and returning to the initial state)

73 Plan Qualitative Analysis Def. (Liveness): A PN N with initial state x 0 is said to be live if there always exists some sample path such that any transition can eventually fire from any state reached from x 0. Liveness levels robot task examples: Dead or L0-live: robot in a deadlock situation L1-live: after robot picks an object it will not be able to pick it again later L2-live: robot can only perform an action sequence with a finite number of steps (e.g., release as many objects as those it picked before until its transported bin is empty) L3-live: robot keeps repeating the same action sequence forever L4-live: robot can always return to the same state and repeat the same operation L3 L1 L0 L2

74 Plan Quantitative Analysis Stochastic view/models enable answering analysis questions such as: what is the probability of success of a task plan? given a probability of success for the plan, how many steps (actions) will it take to accomplish the task? Stochastic view/models enable designing controllers for specifications such as: given some allowed number of steps for a plan, determine the plan that maximizes the probability of success given some desired probability of success, determine the plan that minimizes the number of required actions, or the accumulated action cost

75 Markov Decision Process (MDP) A Markov Chain is a stochastic process X(t) with discrete state space which satisfies the Markov property Pr{ x t+1 =x j x t =x i,x t 1 =x k,,x 0 =x i } = Pr { x t+1 =x j x t =x i = p ij p ij := transition probabilities Adding actions to a Markov Chain makes transition probabilities depend on the action taken.

76 Markov Decision Process (MDP) A Markov Chain with transition probabilities dependent on actions (u) rewards r associated to each (state x, action u) pair an associated cost/performance function is known as a Markov Decision Process (MDP) # Pr{ x = x',r = r x,u,r,x,u,,r,x,u $ = Pr x # t +1 t +1 t t t t 1 t % { = x',r = r x,u $ t +1 t +1 t t % object on the table grasp 1.0 object grasped pickup release 0.5 object on the floor no rewards included in the diagram

77 Planning as Solving MDPs Conflict between transitions associated enabled by different predicates (whose value is not controlled by the robot) Uncertain action effects Conflict between controllable events (associated to commands to start Dribble2Goal or Kick2Goal) e.g., probability that robot does not see ball happens before getting close to ball λ 2 λ 2 + λ 3 Random switch: probability of choosing Dribble2Goal is p 5 probability of choosing Kick2Goal is p 5 Probabilistic policy GSPN equivalent to MDP

78 STOCHASTIC PETRI NETS PN Stochastic Timed Model Def.:A Stochastic PN is a 6-tuple (P,T,A,w,x,F) where (P,T,A,w,x) is a marked PN, and F:R[x 0 ] T R is a function that associates to each transition t in each reachable marking x a random variable Def.:A Generalized Stochastic PN is a 7-tuple (P,T=T 0 T D,A,w,x,F,S) where (P,T,A,w,x) is a marked PN, F:R[x 0 ] T D R is a function that associates to each timed transition t T D in each reachable marking x a random variable. Each t T 0 has zero firing time in all reachable x. S is a set (possibly empty) of elements called random switches, which associate probability distributions to subsets of conflicting immediate transitions.

79 EXPONENTIAL TIMED PETRI NETS For Exponential Timed PNs, in the two previous definitions F:R[x 0 ] T R is a function that associates to each transition t j T D in each reachable marking x an exponential random variable with rate λ j (x). The transitions in T D are known as exponential transitions and refer to λ j (x) as the firing rate of t j in x.

80 EXPONENTIAL TIMED PETRI NETS Theorem The marking process of an exponential timed Petri net is a continuous time Markov Chain (CTMC). State space of the equivalent CTMC: reachability set R[x 0 ] of the exponential timed Petri net Computation of the transition rate from state x i to state x j x i is given by q = λ ( x ) ij k tk Tij Where T ij is the subset of T D of enabled transitions in x i such that the firing of any transition in T ij leaves the CTMC in x j. q ii = q ij If x j = x i, j i i

81 GENERALIZED SPN (GSPN) When there is conflict in state x i, if T i is the set of enabled transitions in x i, the probability of firing t j T i is: if T i is composed by exponential transitions only: λ ( x j tk T i k i ) λ ( x if T i includes one single immediate transition, this is the one that will fire if T i includes two or more immediate transition, a probability mass function will be specified over them by an element of S. The subset of immediate transitions plus the switching distribution is called a random switch. i )

82 GSPN AND EQUIVALENT CTMC ( ) To ensure the existence of an unique steady state probability vector ρ 1,...,ρ s for the marking process of the GSPN with s tangible markings, the following simplifying assumptions are made: 1. The GSPN is bounded, i.e., its reachability set is finite 2. Firing rates do not depend on time parameters, ensuring that the equivalent MC is homogeneous 3. The GSPN model is proper and deadlock-free, i.e., the initial marking is reachable with a non-zero probability from any marking in the reachability set and also there is no absorbing marking (can be lifted)

83 EXAMPLE: GSPN AND EQUIVALENT CTMC p.grasped(obj) p 1 p 2 p t 1 t 3 2 p.ontable(obj) λ 1 λ 2 a.pickingup_obj t 3 sel_carry_obj p 4 p 5 a.observing_table sel_deposit_obj q 3 λ 5 t 5 q 3 + q 4 =1 q 4 t 4 t 6 a.carrying_obj sel_deposit_obj p 6 a.depositing_obj random switches sel_pickup_obj

84 EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Marking graph ( ) t 2 t 1 ( ) t 3 vanishing t 4 t 6 ( ) t 5 ( ) tangible vanishing

85 EXAMPLE: GSPN AND EQUIVALENT CTMC Embedded MC (EMC) ( ) tangible λ 2 λ 1 + λ 2 λ 1 λ 1 + λ 2 ( ) q 3 vanishing q 4 1 ( ) 1 ( ) tangible vanishing

86 EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Reduced Embedded MC (REMC) ( ) q 3 λ 1 λ 1 + λ 2 1 λ 2 λ 1 + λ 2 + q 4 λ 1 λ 1 + λ 2 ( ) tangible MDP: random switch probabilities can be manipulated to achieve optimal decision

87 GSPN, REMC AND PERFORMANCE MEASURES PNs of robot controller and world model must be connected in closed loop. Closed loop PN can be analyzed w.r.t., e.g., 1 1. Probability that a particular condition C holds Pr(C) = ρ j j { 1,...,s} : C is satisfied in x j, S 1 = j S 1 2. Probability that place p i has exactly k tokens Pr(p i,k) = ρ j, S 2 = j S 2 3. Expected number of tokens in a place p i: ET[p i ] = K k=1 k Pr(p i,k), { } { j { 1,...,s} : x j ( p i ) = k} where K is the max number of tokens p i may contain in any reachable marking 1 ρ i is the probability of marking i

88 GSPN, REMC AND PERFORMANCE MEASURES 4. Throughput rate of an exponential transition t j : TR(t j ) = ρ i λ(x i,t j ) υ ij, S 3 = { i { 1,...,s} : t j enabled in x i } i S 3 where υ ij is the probability that t j fires among all enabled transitions in x i 5. Throughput rate of immediate transitions can be computed from those of the exponential transitions and from the structure of the model 6. Mean waiting time in a place p i: WAIT( p i ) = ET[p i ] t j IN( p i ) TR(t j ) = ET[ p i ] t j OUT ( p i ) TR(t j )

89 Markov Decision Process (MDP) Given: States x Actions u Transition probabilities p(x u,x) Reinforcement r t / expected payoff function r(x,u) Wanted: Policy π(x) that maximizes the future expected (discounted) reward

90 MDP Rewards and Policies Policy (fully observable case) is a map of states onto actions: π : x t u t Expected discounted cumulative reward / payoff: " T % R T = E $ γ τ r t+τ +1 ', # & τ =0 0 < γ 1 T=0: greedy policy T>0: finite horizon case, typically no discount T= : infinite-horizon case, finite reward if discount < 1

91 Markov Decision Process (MDP) Agent action u t reinforcement r t state x t x t+1 stochastic state dynamics x t X u t U(x t ) r t+1 R r t+1 Environment Goal: choose the action sequence that maximizes R T = E T % γ τ r ', 0 γ 1 T may go to infinity, as long as τ=0 t+τ+1 ' γ 1 & " $ $ #

92 MDP Ex.: Recycling robot transition probability _ trash α, R search search_trash expected reward action taken robot has to be rescued because its battery is depleted 1 β, 3 search_trash _trash β, R search search_trash 1, R wait wait Battery High 1,0 recharge_battery Battery Low 1, R wait α _trash, R search wait search_trash _trash 1 α, R search search_trash R search _ trash > R wait > 0 Number of cans collected while performing the corresponding tasks

93 Policies l Expected cumulative payoff of policy π: " T % R π T (x t ) = E $ γ τ r t+τ +1 x t,u t+τ = π (x t+τ )' # & τ =0 l Bellman equation for continuous action and state spaces Policy (may be deterministic or probabilistic) expected payoff V π T (x) = E{ R π T x t = x} = π (x,u) du p(x' u, x) # $ r(x,u)+γv π T 1 (x')% & dx' l Bellman equation for discrete action and state spaces V π T (x) = E{ R π T x t = x} = π (x,u) p(x' u, x) # $ r(x,u)+γv π T 1 (x')% & u x' V T (x) = maxv π T (x) π

94 l Expected cumulative payoff of policy π: Policies cont d. & R π T ) T (x t ) = E γ τ r t +τ +1 x t,u t +τ = π (x t ) ' ( * + τ =0 l Optimal policy: π = argmax π l 1-step optimal policy: π 0 (x) = argmax u r(x, u) l Value function of 1-step optimal policy: V 0 (x) = max u R T π (x t ) r(x, u)

95 2-step Policies l Optimal policy: π 1 (x) = argmax u l Value function: V 1 (x) = max u # & % r(x,u)+γ V (x')p(x' u, x) 0 ( $ ' x' # & % r(x,u)+γ V (x')p(x' u, x) 0 ( $ ' x'

96 T-step Policies l Optimal policy: π T (x) = argmax u l Value function: $ & r(x,u)+γ % x' V (x')p(x' u, x) T 1 ' ) ( V T (x) = max u $ & r(x,u)+γ % x' V (x')p(x' u, x) T 1 ' ) (

97 Infinite Horizon l Optimal policy, infinite horizon: V (x) = max u l Bellman equation x' l Fixed point is optimal policy $ ' & r(x,u)+γ V (x')p(x' u, x) ) % ( l Necessary and sufficient condition: induced policy is optimal iff value function satisfies the above condition

98 Value Iteration 1. for all x do 2. endfor Vˆ ( x) r min 3. repeat until convergence 1. for all x do 2. endfor 4. endrepeat ˆ V(x) max u π (x) = argmax u # % r(x,u)+γ $ " $ r(x,u)+γ # & V(x')p(x' ˆ u, x) ( ' x' x' % V(x')p(x' ˆ u, x) ' &

99 Value Iteration for Motion Planning

100 Reinforcement Learning Previous (DP) methods to solve MDPs assume full knowledge of p(x u,x) and r(u,x) Dynamic Programming (DP) To determine V for X = N, a system of N non-linear equations must be solved. Well established mathematical method. A complete model of the environment is required (P and R known). Often faces the curse of dimensionality [Bellman, 1957] Alternative approaches, if we do not know p(x u,x) and r(u,x) Monte Carlo Similar to DP, but P and R s unknown. P and R determined from the average of several trial-and-error trials. Unappropriate for a step-by-step incremental approximation of V *. Temporal Differences Knowledge of P e R is not required Step-by-step incremental approximation of V. Mathematical analysis more complex. Q-learning

101 Value Functions state value for policy π: & V π ) (x) = E π ' γ k r t +k +1 x t = x* ( + Expected value of starting in state x and following policy π thereafter. k= 0 (state, action) value for policy π: & Q π (x,u) = E π ' γ k r t +k +1 ( k= 0 ) x t = x,u t = u* + Expected value of starting in state x, carrying out action u, and following policy π thereafter.

102 Value Functions (cont d) relation between state value and Q function for policy π: Q is such that its value is the maximum discounted cumulative reward that can be achieved starting from state x and applying action u as the first action Q(x,u) = E{ r t+1 +γv (x t+1 ) x t = x,u t = u} V (x) = maxq (x,u') u' { } Q (x,u) = E r t+1 +γ maxq (x t+1,u') x t = x,u t = u u'

103 Value Functions (cont d) Bellman equation for V and Q (discrete action and state spaces, deterministic policy) V T π (x) = Q T π (x,u) = x' x' [ ] p(x' u, x) r(x,u) + γv π T 1 (x') V T = max π V T π (x) x p(x' u, x) " # r(x,u)+γ max Q T = maxq π T (x,u) x,u π u' Q π T 1 (x',u') $ % Solutions are unique and equations are also met by the optimal functions

104 Q-Learning - Algorithm Initialize Q(x,u) random or arbitrarily Repeat forever (for each episode or trial): Initialize x Repeat (for each step n of the episode): Choose action u of x Execute action u and observe r and x Update Q for the n xu th visit to (x,u) x x'; until x final. Q (x,u) Q nxu +1 n xu (x,u)+α # nxu $ r(x,u)+γ maxq nxu (x',u') Q nxu (x,u)% & u' α constant allows adaptability to slow environment changes but it does not guarantee convergence only possible with a temporal decay under given circumstances.

105 Q-Learning Algorithm Convergence Should each pair (x,u) be visited an infinite number of times, with 0 α nxu <1 i=1 i=1 α nxu (i) 2 α nxu (i) = < then x, u Pr lim ˆQnxu (x,u) = Q(x,u) & $n % ' =1

106 Action Selection: Exploration vs Exploitation Exploration: less promising actions, which may lead to good results, are tested. Exploitation: takes advantage of tested actions which are more promising, i.e., which have a larger Q(x,u). ε- greedy: at each step n, picks the best action so far with probability 1-ε, for small ε, but can also pick with probability ε, in an uniformly distributed random fashion, one of the other actions. softmax: at each step n, picks the action to be executed according to a Gibbs or Boltzmann distribution: π n (x,u) = eq n (x,u)/τ e Q n (x,u')/τ u'(x)

107 Q-Learning an Example G r(x,u) V * (x) G 100 Q nπ (x,u) α = 1 γ =

108 Q-Learning an Example G r(x,u) V * (x) G Q nπ (x,u)

Task Planning AUTONOMOUS SYSTEMS. Pedro U. Lima M. Isabel Ribeiro. Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal

Task Planning AUTONOMOUS SYSTEMS. Pedro U. Lima M. Isabel Ribeiro. Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal AUTONOMOUS SYSTEMS Task Planning Pedro U. Lima M. Isabel Ribeiro Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal March 2007 Outline 1. Planning Problem 2. Logic 3. Logic-Based

More information

ADVANCED ROBOTICS. PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes

ADVANCED ROBOTICS. PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes ADVANCED ROBOTICS PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes Pedro U. Lima Instituto Superior Técnico/Instituto de Sistemas e Robótica September 2009 Reviewed April

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

7. Queueing Systems. 8. Petri nets vs. State Automata

7. Queueing Systems. 8. Petri nets vs. State Automata Petri Nets 1. Finite State Automata 2. Petri net notation and definition (no dynamics) 3. Introducing State: Petri net marking 4. Petri net dynamics 5. Capacity Constrained Petri nets 6. Petri net models

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Bart Selman selman@cs.cornell.edu Module: Knowledge, Reasoning, and Planning Part 2 Logical Agents R&N: Chapter 7 1 Illustrative example: Wumpus World (Somewhat

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Chapter 7 R&N ICS 271 Fall 2017 Kalev Kask

Chapter 7 R&N ICS 271 Fall 2017 Kalev Kask Set 6: Knowledge Representation: The Propositional Calculus Chapter 7 R&N ICS 271 Fall 2017 Kalev Kask Outline Representing knowledge using logic Agent that reason logically A knowledge based agent Representing

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Intelligent Agents. Pınar Yolum Utrecht University

Intelligent Agents. Pınar Yolum Utrecht University Intelligent Agents Pınar Yolum p.yolum@uu.nl Utrecht University Logical Agents (Based mostly on the course slides from http://aima.cs.berkeley.edu/) Outline Knowledge-based agents Wumpus world Logic in

More information

Stochastic Petri Net. Ben, Yue (Cindy) 2013/05/08

Stochastic Petri Net. Ben, Yue (Cindy) 2013/05/08 Stochastic Petri Net 2013/05/08 2 To study a formal model (personal view) Definition (and maybe history) Brief family tree: the branches and extensions Advantages and disadvantages for each Applications

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

The State Explosion Problem

The State Explosion Problem The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Logical Agents. Chapter 7

Logical Agents. Chapter 7 Logical Agents Chapter 7 Outline Knowledge-based agents Wumpus world Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem

More information

Specification models and their analysis Petri Nets

Specification models and their analysis Petri Nets Specification models and their analysis Petri Nets Kai Lampka December 10, 2010 1 30 Part I Petri Nets Basics Petri Nets Introduction A Petri Net (PN) is a weighted(?), bipartite(?) digraph(?) invented

More information

Artificial Intelligence Chapter 7: Logical Agents

Artificial Intelligence Chapter 7: Logical Agents Artificial Intelligence Chapter 7: Logical Agents Michael Scherger Department of Computer Science Kent State University February 20, 2006 AI: Chapter 7: Logical Agents 1 Contents Knowledge Based Agents

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 10, 5/9/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Logical Agents Chapter 7

More information

INF5390 Kunstig intelligens. Logical Agents. Roar Fjellheim

INF5390 Kunstig intelligens. Logical Agents. Roar Fjellheim INF5390 Kunstig intelligens Logical Agents Roar Fjellheim Outline Knowledge-based agents The Wumpus world Knowledge representation Logical reasoning Propositional logic Wumpus agent Summary AIMA Chapter

More information

Logic. Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001

Logic. Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001 Logic Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001 Last Lecture Games Cont. α-β pruning Outline Games with chance, e.g. Backgammon Logical Agents and thewumpus World

More information

Knowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system):

Knowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system): Logic Knowledge-based agents Inference engine Knowledge base Domain-independent algorithms Domain-specific content Knowledge base (KB) = set of sentences in a formal language Declarative approach to building

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

The Markov Decision Process (MDP) model

The Markov Decision Process (MDP) model Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy School of Informatics 25 January, 2013 In the MAB Model We were in a single casino and the

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information

Sequential Decision Problems

Sequential Decision Problems Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted

More information

Logical Agents. Outline

Logical Agents. Outline Logical Agents *(Chapter 7 (Russel & Norvig, 2004)) Outline Knowledge-based agents Wumpus world Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability

More information

Time(d) Petri Net. Serge Haddad. Petri Nets 2016, June 20th LSV ENS Cachan, Université Paris-Saclay & CNRS & INRIA

Time(d) Petri Net. Serge Haddad. Petri Nets 2016, June 20th LSV ENS Cachan, Université Paris-Saclay & CNRS & INRIA Time(d) Petri Net Serge Haddad LSV ENS Cachan, Université Paris-Saclay & CNRS & INRIA haddad@lsv.ens-cachan.fr Petri Nets 2016, June 20th 2016 1 Time and Petri Nets 2 Time Petri Net: Syntax and Semantic

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

Discrete Event Systems Exam

Discrete Event Systems Exam Computer Engineering and Networks Laboratory TEC, NSG, DISCO HS 2016 Prof. L. Thiele, Prof. L. Vanbever, Prof. R. Wattenhofer Discrete Event Systems Exam Friday, 3 rd February 2017, 14:00 16:00. Do not

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage. Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Advanced Topics in LP and FP

Advanced Topics in LP and FP Lecture 1: Prolog and Summary of this lecture 1 Introduction to Prolog 2 3 Truth value evaluation 4 Prolog Logic programming language Introduction to Prolog Introduced in the 1970s Program = collection

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

DES. 4. Petri Nets. Introduction. Different Classes of Petri Net. Petri net properties. Analysis of Petri net models

DES. 4. Petri Nets. Introduction. Different Classes of Petri Net. Petri net properties. Analysis of Petri net models 4. Petri Nets Introduction Different Classes of Petri Net Petri net properties Analysis of Petri net models 1 Petri Nets C.A Petri, TU Darmstadt, 1962 A mathematical and graphical modeling method. Describe

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

Stochastic Petri Nets. Jonatan Lindén. Modelling SPN GSPN. Performance measures. Almost none of the theory. December 8, 2010

Stochastic Petri Nets. Jonatan Lindén. Modelling SPN GSPN. Performance measures. Almost none of the theory. December 8, 2010 Stochastic Almost none of the theory December 8, 2010 Outline 1 2 Introduction A Petri net (PN) is something like a generalized automata. A Stochastic Petri Net () a stochastic extension to Petri nets,

More information

Introduction to Artificial Intelligence. Logical Agents

Introduction to Artificial Intelligence. Logical Agents Introduction to Artificial Intelligence Logical Agents (Logic, Deduction, Knowledge Representation) Bernhard Beckert UNIVERSITÄT KOBLENZ-LANDAU Winter Term 2004/2005 B. Beckert: KI für IM p.1 Outline Knowledge-based

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Logical agents. Chapter 7. Chapter 7 1

Logical agents. Chapter 7. Chapter 7 1 Logical agents Chapter 7 Chapter 7 Outline Knowledge-based agents Wumpus world Logic in general models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Intelligent Agents. First Order Logic. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 19.

Intelligent Agents. First Order Logic. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 19. Intelligent Agents First Order Logic Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University last change: 19. Mai 2015 U. Schmid (CogSys) Intelligent Agents last change: 19. Mai 2015

More information

Predicate Logic: Sematics Part 1

Predicate Logic: Sematics Part 1 Predicate Logic: Sematics Part 1 CS402, Spring 2018 Shin Yoo Predicate Calculus Propositional logic is also called sentential logic, i.e. a logical system that deals with whole sentences connected with

More information

Logical Agent & Propositional Logic

Logical Agent & Propositional Logic Logical Agent & Propositional Logic Berlin Chen 2005 References: 1. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Chapter 7 2. S. Russell s teaching materials Introduction The representation

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Analysis and Optimization of Discrete Event Systems using Petri Nets

Analysis and Optimization of Discrete Event Systems using Petri Nets Volume 113 No. 11 2017, 1 10 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Analysis and Optimization of Discrete Event Systems using Petri Nets

More information

20/c/applet/more.html Local Beam Search The best K states are selected.

20/c/applet/more.html Local Beam Search The best K states are selected. Have some fun Checker: http://www.cs.caltech.edu/~vhuang/cs 20/c/applet/more.html 1 Local Beam Search Run multiple searches to find the solution The best K states are selected. Like parallel hill climbing

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning 1 Reinforcement Learning Mainly based on Reinforcement Learning An Introduction by Richard Sutton and Andrew Barto Slides are mainly based on the course material provided by the

More information

Decidability: Church-Turing Thesis

Decidability: Church-Turing Thesis Decidability: Church-Turing Thesis While there are a countably infinite number of languages that are described by TMs over some alphabet Σ, there are an uncountably infinite number that are not Are there

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Logical Agents: Propositional Logic. Chapter 7

Logical Agents: Propositional Logic. Chapter 7 Logical Agents: Propositional Logic Chapter 7 Outline Topics: Knowledge-based agents Example domain: The Wumpus World Logic in general models and entailment Propositional (Boolean) logic Equivalence, validity,

More information

Kecerdasan Buatan M. Ali Fauzi

Kecerdasan Buatan M. Ali Fauzi Kecerdasan Buatan M. Ali Fauzi Artificial Intelligence M. Ali Fauzi Logical Agents M. Ali Fauzi In which we design agents that can form representations of the would, use a process of inference to derive

More information

c 2011 Nisha Somnath

c 2011 Nisha Somnath c 2011 Nisha Somnath HIERARCHICAL SUPERVISORY CONTROL OF COMPLEX PETRI NETS BY NISHA SOMNATH THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Aerospace

More information

A Gentle Introduction to Reinforcement Learning

A Gentle Introduction to Reinforcement Learning A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,

More information

Title: Logical Agents AIMA: Chapter 7 (Sections 7.4 and 7.5)

Title: Logical Agents AIMA: Chapter 7 (Sections 7.4 and 7.5) B.Y. Choueiry 1 Instructor s notes #12 Title: Logical Agents AIMA: Chapter 7 (Sections 7.4 and 7.5) Introduction to Artificial Intelligence CSCE 476-876, Fall 2018 URL: www.cse.unl.edu/ choueiry/f18-476-876

More information

Time and Timed Petri Nets

Time and Timed Petri Nets Time and Timed Petri Nets Serge Haddad LSV ENS Cachan & CNRS & INRIA haddad@lsv.ens-cachan.fr DISC 11, June 9th 2011 1 Time and Petri Nets 2 Timed Models 3 Expressiveness 4 Analysis 1/36 Outline 1 Time

More information

Proof Methods for Propositional Logic

Proof Methods for Propositional Logic Proof Methods for Propositional Logic Logical equivalence Two sentences are logically equivalent iff they are true in the same models: α ß iff α β and β α Russell and Norvig Chapter 7 CS440 Fall 2015 1

More information

TDT4136 Logic and Reasoning Systems

TDT4136 Logic and Reasoning Systems TDT436 Logic and Reasoning Systems Chapter 7 - Logic gents Lester Solbakken solbakke@idi.ntnu.no Norwegian University of Science and Technology 06.09.0 Lester Solbakken TDT436 Logic and Reasoning Systems

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

AI Programming CS S-09 Knowledge Representation

AI Programming CS S-09 Knowledge Representation AI Programming CS662-2013S-09 Knowledge Representation David Galles Department of Computer Science University of San Francisco 09-0: Overview So far, we ve talked about search, which is a means of considering

More information

Some techniques and results in deciding bisimilarity

Some techniques and results in deciding bisimilarity Some techniques and results in deciding bisimilarity Petr Jančar Dept of Computer Science Technical University Ostrava (FEI VŠB-TU) Czech Republic www.cs.vsb.cz/jancar Talk at the Verification Seminar,

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic

More information

Revised by Hankui Zhuo, March 21, Logical agents. Chapter 7. Chapter 7 1

Revised by Hankui Zhuo, March 21, Logical agents. Chapter 7. Chapter 7 1 Revised by Hankui Zhuo, March, 08 Logical agents Chapter 7 Chapter 7 Outline Wumpus world Logic in general models and entailment Propositional (oolean) logic Equivalence, validity, satisfiability Inference

More information

Logic in AI Chapter 7. Mausam (Based on slides of Dan Weld, Stuart Russell, Subbarao Kambhampati, Dieter Fox, Henry Kautz )

Logic in AI Chapter 7. Mausam (Based on slides of Dan Weld, Stuart Russell, Subbarao Kambhampati, Dieter Fox, Henry Kautz ) Logic in AI Chapter 7 Mausam (Based on slides of Dan Weld, Stuart Russell, Subbarao Kambhampati, Dieter Fox, Henry Kautz ) 2 Knowledge Representation represent knowledge about the world in a manner that

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Inference in first-order logic. Production systems.

Inference in first-order logic. Production systems. CS 1571 Introduction to AI Lecture 17 Inference in first-order logic. Production systems. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Sentences in Horn normal form Horn normal form (HNF) in

More information

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu

More information

Linear-time Temporal Logic

Linear-time Temporal Logic Linear-time Temporal Logic Pedro Cabalar Department of Computer Science University of Corunna, SPAIN cabalar@udc.es 2015/2016 P. Cabalar ( Department Linear oftemporal Computer Logic Science University

More information

Petri Nets (for Planners)

Petri Nets (for Planners) Petri (for Planners) B. Bonet, P. Haslum... from various places... ICAPS 2011 & Motivation Petri (PNs) is formalism for modelling discrete event systems Developed by (and named after) C.A. Petri in 1960s

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Logical Agents. Santa Clara University

Logical Agents. Santa Clara University Logical Agents Santa Clara University Logical Agents Humans know things Humans use knowledge to make plans Humans do not act completely reflexive, but reason AI: Simple problem-solving agents have knowledge

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

Decision Problems with TM s. Lecture 31: Halting Problem. Universe of discourse. Semi-decidable. Look at following sets: CSCI 81 Spring, 2012

Decision Problems with TM s. Lecture 31: Halting Problem. Universe of discourse. Semi-decidable. Look at following sets: CSCI 81 Spring, 2012 Decision Problems with TM s Look at following sets: Lecture 31: Halting Problem CSCI 81 Spring, 2012 Kim Bruce A TM = { M,w M is a TM and w L(M)} H TM = { M,w M is a TM which halts on input w} TOTAL TM

More information

Logic. Propositional Logic: Syntax

Logic. Propositional Logic: Syntax Logic Propositional Logic: Syntax Logic is a tool for formalizing reasoning. There are lots of different logics: probabilistic logic: for reasoning about probability temporal logic: for reasoning about

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University Intelligent Agents Formal Characteristics of Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University Extensions to the slides for chapter 3 of Dana Nau with contributions by

More information

Inference Methods In Propositional Logic

Inference Methods In Propositional Logic Lecture Notes, Artificial Intelligence ((ENCS434)) University of Birzeit 1 st Semester, 2011 Artificial Intelligence (ENCS434) Inference Methods In Propositional Logic Dr. Mustafa Jarrar University of

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Logical Agent & Propositional Logic

Logical Agent & Propositional Logic Logical Agent & Propositional Logic Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. S. Russell and P. Norvig. Artificial Intelligence:

More information

First-Order Logic First-Order Theories. Roopsha Samanta. Partly based on slides by Aaron Bradley and Isil Dillig

First-Order Logic First-Order Theories. Roopsha Samanta. Partly based on slides by Aaron Bradley and Isil Dillig First-Order Logic First-Order Theories Roopsha Samanta Partly based on slides by Aaron Bradley and Isil Dillig Roadmap Review: propositional logic Syntax and semantics of first-order logic (FOL) Semantic

More information

Propositional Logic: Syntax

Propositional Logic: Syntax Logic Logic is a tool for formalizing reasoning. There are lots of different logics: probabilistic logic: for reasoning about probability temporal logic: for reasoning about time (and programs) epistemic

More information