Task Planning AUTONOMOUS SYSTEMS. Pedro U. Lima M. Isabel Ribeiro. Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal
|
|
- Dale Watts
- 5 years ago
- Views:
Transcription
1 AUTONOMOUS SYSTEMS Task Planning Pedro U. Lima M. Isabel Ribeiro Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal March 2007
2 Outline 1. Planning Problem 2. Logic 3. Logic-Based Planning: Situation Calculus, STRIPS 4. Plan Representation and Modeling: Petri Net Task Models 5. Plan Analysis 6. Planning Under Uncertainty 7. Markov Decision Processes (MDP) 8. Dynamic Programming Solution of MDPs 9. Reinforcement Learning Solution of MDPs
3 Planning Planning consists of determining the action sequence that enables reaching the goal(s) of an agent. Robot Task Planning consists of determining the appropriate set of actions to move a robot from the current world state to a world state that satisfies its preferences.
4 Logic Logic can be seen as a language to represent the knowledge about the world and a particular problem to be solved. Syntactic System Set of accepted symbols Set of rules establishing how symbols can be aggregated so as to build formulas/ sentences Alphabet Formation rules LANGUAGE Set of rules that establish how to derive formulas from other formulas INFERENCE RULES
5 Logic Semantic System Assigns a meaning to the language formulas World (semantics) Facts Language (syntax) Formulas
6 Logic Syntactic System vs Semantic System Language rules g + r + e + e + n green Associates a color to the word green Arithmetic rules x, y expressions representing numbers x > y is a formula over numbers Fact is true when the number represented by x is greater than the number represented by y
7 Logic Typically, one deals only with the world issues relevant for the problem, through a conceptualization of the reality Objects and their relations are defined Functions given a set of objects, a function establishes which object is related to the object(s) in the set and how, e.g., left_room(kitchen) Relations given a set of objects, establishes if that set is related in a certain way e.g., on(laptop, table)
8 Logic The concept of interpretation establishes the link between the language elements and the conceptualization of the reality elements (objects, functions and relations). Given a formula written in the defined language, its interpretation is designated as proposition A proposition is true iff it correctly describes the world, based on the adopted conceptualization of the reality A formula is satisfied iff thereis an interpretation that associates it to a true proposition
9 Logic A fact is a true proposition for a given (conceptualized) world state The initial known facts compose the initial knowledge base Inference is the process of obtaining new propositions (conclusions) from the knowledge base To ensure that a given reached conclusion is satisfied by the adopted interpretation, only a conclusion satisfied for all the interpretations that satisfy the starting propositions (premises) is accepted. This way, we guarantee that, should the premises be satisfied, so is the conclusion, irrespectively of the interpretation. Example: Premises Conclusion IF on(a,b) THEN above(a,b) above(a,b) on(a,b)
10 Logic Entailment (semantic perspective of inference) means that the truth of a given fact is entailed from the knowledge base or one of its subsets KB α or Γ α KB or Γ are the premises and α is the conclusion Derivation (syntactic perspective of inference) is the process of proving new formulas from a set of existing formulas F f ou Γ α α (or f) denotes the formula proved from Γ (or F)
11 Logic An inference mechanism is sound iff any formula proven/derived from a set of formulas, using that mechanism, is entailed by the set IF Γ α syntactic perspective THEN Γ α semantic perspective An inference mechanism is complete iff, for any proposition entailed by a premise set, the formula denoting that proposition is provable/derivable from the premise set, using that mechanism IF Γ α THEN Γ α
12 Logic Propositional Logic Facts objects, functions and relations Predicate Logic variables quantifiers
13 Situation Calculus logic handles propositions truth, not action execution logic can not tell which action should be executed at most it can suggest the possible actions time and changes are not adequately handled by basic logic (propositional, predicate) Idea: the world state is represented by a proposition set the set is changed according to received perceptions and executed actions the world evolution is described by diachronic rules, which express how the world changes representation of change Situation Calculus attempts to solve the problems associated to representation and reasoning under changes. It is based on predicate logic and describes the world as a sequence of situations, each of which represents a world state
14 Situation Calculus one situation is generated from other situation by executing an action an argument is added to each property (represented by a predicate) that may change denoting the situtation where the property is satisfied Ex: localization( agent, (1,1), S 0 ) localization( agent, (1,2), S 1 ) to represent passing from one situation to another, the following function is used: Result( action, situation) : Α Σ Σ Ex: Result( go_ahead, S 0 ) = S 1
15 Situation Calculus Effect Axioms pre-conditions predicate (to execute the action) (whose logical value changes after the action is executed) State action effects to describe the change(s) due to the action effect(s) e.g.,: x s Present(x, s) Portable(x) Hold(x, Result(pickup, s)) x s Hold(x, Result(release, s))
16 Situation Calculus Frame Axioms predicate conditions predicate (logical value in current situation) (for no change) (in the situation following the action) One needs to explain what does not change due to the action execution e.g.,: a x s Hold(x, s) (a releases) Hold(x, Result(a, s)) a x s Hold(x, s) (a pickup (Present(x, s) Portable(x)) Hold(x, Result(a, s))
17 Situation Calculus Successor State Axioms merge effect and frame axioms Predicate true in the next situation [ one action makes it true It was true in the previous situation no action made it false] e.g., a x s Hold(x, Result(a, s)) [ (a = pickup Present(x, s) Portable(x)) (Hold(x, s) a releases) ] a x s Hold(x, Result(a, s)) [ (a = releases) ( Hold(x, s) (a pickup (Present(x, s) Portable(x)))) ]
18 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Effect Axioms: x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s))
19 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Frame Axioms: x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) x y w s ClearTop(y, s) (a PutOn(x,w)) ClearTop(y, Result(a, s))
20 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Resulting Successor State Axioms: x y z s On(x, y, result(a,s)) [ ( a=puton(x,y) On(x, z, s) ClearTop(x,s) ClearTop(y,s) Block(x) (Block(y) y=m) ) ( a PutOn(x,z) On(x, y, s) ) ] x y z s ClearTop(z, result(a,s)) [ ( a=puton(x,y) On(x, z, s) ClearTop(x,s) ClearTop(y,s) Block(x) (Block(y) y=m) ) ( a PutOn(x,z) On(x, y, s) ) ]
21 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Initial State: Block(A) Block(B) Block(C) On(C, M, s o ) On(B, C, s o ) On(A, B, s 0 ) ClearTop(A, s 0 ) Goal State: Block(A) Block(B) Block(C) On(A, M, s) On(B, A, s) On(C, B, s) ClearTop(C, s)
22 Complexity of Planning Problem The problem is intractable in the general case Simplifying assumptions: agent knows everything that is relevant for the planning problem agent knows how its available actions can change the world state from one state to another the planning agent is in control of the world the only state changes are the result of its deliberate actions the agent s preferred world states are constant during a planning episode Based on these assumptions, a typical approach is: first formulate the plan then execute it
23 Extensions of Planning Problem The real world surrounding the robot does not meet most of the simplifying assumptions, especially in dynamic, uncertain environments EXTENSIONS conditional planning: handles uncertainty by enumerating the possible states that may arise after the execution of an action and provides alternative courses of action for each of them plan monitoring and repair: during plan execution, progress is monitored and, when deviations from the predicted nominal conditions occur, the plan execution halts and a revised plan is created continual planning: in dynamic environments, one may allow context and/or agent s preferences changes and plan revision is an ongoing process rather than one triggered by failures of the nominal plan. Planning is not made in too much detail into the future, and it is interleaved with execution
24 Basic Planning Problem Formulation A possible formulation of the Planning problem is (Lavalle, 1996): 1. A nonempty state space, X, which is a finite or countably infinite set of states. 2. For each state, x X, a finite action space, U(x). 3. A state transition function, f, which produces a state, f(x; u) X, for every x X and u U(x). The state transition equation is derived from f as x = f(x; u). 4. An initial state, x I X. 5. A goal set, X G X.
25 Basic Planning Problem Formulation It is convenient to represent the planning problem as a directed state transition graph. The set of vertices is the state space, X. A directed edge from x X to x X exists in the graph if there exists an action u U(x) such that x = f(x; u). The initial state and goal set are designated as special vertices in the graph. Based on this formulation, several problem solving algorithms are available to find a feasible plan (i.e., one that leads from the initial to one of the goal states, not necessarily optimal). Examples: breadth-first depth-first best-first A*... Algorithms to solve Discrete Optimal Planning problems also exist, typically based on Dynamic Programming. In this case, we want to find the sequence of actions that lead to the goal set and optimize some criterion, such as distance traversed or energy spent.
26 Logic-Based Planning ADVANTAGES: build compact representations for discrete planning problems, when their regularity allows such compression convenient for producing output that logically explains the steps involved to arrive at some goal DISADVANTAGES difficult to generalize to enable concepts such as modeling uncertainty, unpredictability, sensing errors, and game theory to be incorporated into planning It is possible to convert the logic-based formulation into the graph-based formulation, e.g., the set of literals may be encoded as a binary string by imposing a linear ordering on the instances and predicates, and using 1s for true and 0 for false. This way, even optimal solutions can be found. However, the problem dimension may become intractable, even for a small number of predicates and instances e.g., for a constant number k of arguments per predicate, the space state dimension is 2 P I k, where P is the number of predicates, I the number of instances per predicate argument.
27 Logic-Based Planning A STRIPS-like Planning formulation is (Lavalle, 1996): 1. A nonempty set, I, of instances. 2. A nonempty set, P, of predicates, which are binary-valued (partial) functions of one of more instances. Each application of a predicate to a specific set of instances is called a positive literal if the predicate is true or a negative literal if it is false. 3. A nonempty set, O, of operators, each of which has: 1) preconditions, which is a set of positive and negative literals that must hold for the operator to apply, and 2) effects, which is a set of positive and negative literals that are the result of applying the operator. 4. An initial set, S, which is expressed as a set of positive literals. All literals not appearing in S are assumed to be negative. 5. A goal set, G, which is expressed as a set of both positive and negative literals.
28 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) Example: mobile robot should move a box from room S3 to S2 room S1 P1 door P1 room S2 World Model (KB) inroom(robot, room s1) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) S1 R robot S2 S3 B box door P2 P2 room S3 Goal inroom(box, room s2) Plan (Action Sequence) move(robot, room s1, room s3) search(box) push(box, room s3, room s2, door p2)
29 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) tasks are specified as well-formed-formulas or wff (predicate calculus) planning system attempts to find an action sequence that modifies the world models so as to make the wff TRUE to generate a plan, the effect of each action is modeled operator (actions over world model) world model Si add clauses remove clauses world model Si+1 clause set operator pre-conditions clause set 1. is goal clause in the current world model? YES: success NO: 2. search in the operator list one whose pre-conditions are satisfied and that, when applied to the current one, produces a new world model where the goal is closer to be satisfied 3. GoTo 1
30 Logic-Based Planning STRIPS and Situation Calculus STRIPS Pre-conditions: inroom(robot, room s1) connects(door p1, room s1, room s2) OPERATOR move(robot, room s1, room s2) Effects: Add: inroom(robot, room s2) Delete: inroom(robot, room s1) Situation Calculus a x s room(s2) inroom(robot, s2, Result(a,s)) [room(s1) (a = move(robot, s1, s2) inroom(robot, s1,s) (room (x) inroom(robot, s2,s) a move(robot, s2, x) ) ]
31 Plan Representation and Modeling How to model the right behavior? Behavior switching for a soccer robot lost_ball undribbable TakeBall2Goal no_ball success Score saw_ball AND ShouldIGo ClearBall lost_ball obstacle success success NOT ShouldIGo unreachable_ball OR lost_ball GetClose2Ball ShouldIGo saw_ball AND NOT ShouldIGo saw_ball AND ShouldIGo Standby saw_ball AND NOT ShouldIGo lost_ball success OR unreachable_posture OR (saw_ball AND NOT ShouldIGo) saw_ball AND ShouldIGo GoEmptySpot success OR unreachable_posture GoHome? success OR (NOT can_shoot_safely) success OR lost_ball
32 Plan Representation and Modeling Def.: A Petri net (PN) graph or structure is a weighted bipartite graph (P,T,A,w), where: P={p 1, p 2,... p n } is the finite set of places T ={t 1, t 2,... t m } is the finite set of transitions A ( P T) ( T P) is the set of arcs from places to transitions (p i,t j ) and transitions to places (t j,p i ) w: A 1,2,3, is the weight function on the arcs { } Set of input places to t j T I( t ) = { p P : ( p, t ) A} Set of output places from j i t j T O( t ) = { p P : ( t, p ) A} j i i j j i
33 Plan Representation and Modeling Def.: A marked Petri net is a five-tuple (P,T,A,w,x), where (P,T,A,w) is a Petri net graph and x is a marking of the set of n places P; x = [ x( p1), x( p2),, x( p n )] N is the row vector associated with x. Def. (PN dynamics): The state transition function, f : N T of Petri net (P,T,A,w,x), is defined for transition t j T iff x( pi ) w( pi, t j ), pi I( t j ). If f(x,t j ) is defined, the new state is x = f(x,t j ) where x' ( p ) = x( p ) w( p, t ) + w( t, p ), i = 1,, n. i i i j j i n Enabled t j N n
34 Plan Representation and Modeling Def. (Labeled Petri net): A labeled Petri net N is an eight-tuple N = ( P, T, A, w, E, l, x0, Xm) where ( P, T, A, w) is a PN graph E is the event set for transition labeling l : T E is the transition labeling function x 0 X N m n N is the initial state n is the set of marked states Def. (Languages generated and marked): L( N) : = { l( s) E : s T and f ( x0, s) is L m defined} ( N) : = { l( s) L( N) : s T and f ( x0, s) Xm}
35 Plan Representation and Modeling Petri Net Models of Robotic Tasks (Lima et al, 1998) (Milutinovic, Lima, 2002) Places with tokens represent resources available primitive actions running State is distributed over the places with tokens (PN marking) Events assigned to transitions and represent uncontrolled changes of state (e.g., caused by other agents or simply by the environment dynamics) controlled decisions to start a primitive action Transition fires when it is enabled and the labeling event occurs
36 Plan Representation and Modeling PN model of a single robot in a competition (Lima et al, 1998)
37 Plan Representation and Modeling (Lima et al, 1998) A Tool for Robotic Task Design and Distributed Execution Further developments in (Milutinovic, Lima, 2002) vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 p 1 start robot_ready2move moving2ball catching_ball t 3 t 4 t 5 ball_catched ball_located ready2catch p 4 p 5 p 6
38 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = x 0 = [ ] T marking or state Generated string: ε in L
39 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s in L
40 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s nf in L
41 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s nf bl in L
42 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T Generated and Marked Languages marking or state L( N) = { ε, s, s Lm ( G) = { ε, s, s nf, s nf bl nf,...} nf bl r2c bc} L( G)
43 Plan Representation and Modeling Monitoring algorithms check the value of predicates over world state variables. Event occurrence means that a logical function of the predicates became true or false. Examples of events: found_ball: see(ball)=false see(ball)=true lost_ball: see(ball)=true see(ball)=false when see_ball AND closest_player2ball changes from false to true PN markings represent world states A plan to carry out a task is the sequence of primitive actions in a sequence of markings (world states) Plans are conditional, as resource places in markings represent logical pre-conditions for the execution of the next primitive actiion Example: primitive actions set X={GetCloseToBall, TakeBallToGoal, Score } Plan: GetCloseToBall. TakeBallToGoal. Score
44 Plan Representation and Modeling Event sequences (i.e., strings) are an equivalent representation of plans A language is the set of all possible plans for a robot Different language classes are equivalent to machine types used to represent and execute the task (Finite State Machine, PN,...) Of course, larger classes have an increased modeling power (e.g., PN languages vs regular/finite state machine languages) Do not confuse this with modeling elegance it is more natural to program with a rule-based system rather than with a state machine, but it is not necessarily more powerful (compare with C vs assembly)
45 Plan Representation and Modeling Abstraction Levels in Discrete Event Systems Untimed Timed Stochastic Timed e, e2,..., e 1 k,... time associated to events duration stochastic time associated to events FSA x, x1,..., x 0 k,... Timed FSA x(t) STA x( t) p( x( t)) e, e2,..., e 1 k,... time associated to transitions/ events duration stochastic time associated to transitions/events duration PN x 0,x 1,...,x k,... Timed PN x(t) SPN x( t) p( x( t))
46 Plan Qualitative Analysis Qualitative view/models enable answering analysis questions such as: will bad behaviors occur? will unsafe states be avoided? will we attempt to use more resources than those available? Qualitative view/models enable designing supervisors for specifications such as: eliminate substrings corresponding to bad behaviors avoid blocking ensure bounded usage of resources
47 Plan Qualitative Analysis Safety properties For all executions the system avoids a bad set of events or a set of bad strings is never generated or marked. The robot does not exhibit bad behaviours Blocking properties deadlocks or livelocks If the robot FSA blocks, the robot may get trapped into a no-return situation
48 Plan Qualitative Analysis Def. (Boundedness): Place p i P in PN N with initial state x 0 is said to be k-bounded, or k-safe, if x(p i ) k for all states x R(N), i.e., for all reachable states. This has to do with stability concerning the usage of resources available for a task (e.g., robots, tools, CPU, memory, ) Def. (Conservation): A PN N with initial state x 0 is said to be conservative with respect to γ = [γ 1, γ 2,..., γ ν ] if n γ x( ) = constant i p i i= 1 for all reachable states. This has to do with conservation of resources required for a task (e.g., robots, tools, CPU, memory, )
49 Plan Qualitative Analysis Def. (Liveness): A PN N with initial state x 0 is said to be live if there always exists some sample path such that any transition can eventually fire from any state reached from x 0. Liveness levels - a transition in a PN may be: Dead or L0-live, if the transition can never fire from this state L1-live, if there is some firing sequence from x 0 such that the transition can fire at least once L2-live, if the transition can fire at least k times for some given positive integer k L3-live, if there exists some infinite firing sequence in which the transition appears infinitely often L4-live, if the transition is L1-live for every possible state reached from x 0 L3 L1 L0 L2 This property is related to the reachability of given states, and with the repeatability of system states (e.g., error recovery and returning to the initial state)
50 Plan Quantitative Analysis Stochastic Models STOCHASTIC TIMED AUTOMATA (STA) STA with a Poisson clock structure is equivalent to a Markov Chain transition probabilities can be computed from STA transition probabilities and from the Poisson process rates for the events STOCHASTIC PETRI NET (SPN) SPN with Exponential timed transitions is equivalent to a Markov Chain transition probabilities can be computed from random switch and from the exponential rates for the events probabilities
51 Plan Quantitative Analysis Stochastic view/models enable answering analysis questions such as: what is the probability of success of a task plan? given a probability of success for the plan, how many steps (actions) will it take to accomplish the task? Stochastic view/models enable designing controllers for specifications such as: given some allowed number of steps for a plan, determine the plan that maximizes the probability of success given some desired probability of success, determine the plan that minimizes the number of required actions, or the accumulated action cost
52 Plan Quantitative Analysis Markov Property $ Pr{ s = s " s,a,s,a,,s,a % = Pr s $ t +1 t t t #1 t #1 0 0 & { = s " s,a % t +1 t t & Environments modeled by Stochastic Timed Automata satisfy the Markov Property MARKOV DECISION PROCESSES (MDP) object on the table object grasped pickup release object on the floor effects of robot actions are uncertain but environment states are fully observable Solutions to MDPs come from Dynamic Programming Monte-Carlo Temporal Differences (e.g., reinforcement learning)
53 Reinforcement Learning (RL) Agent action a t reinforcement r t state s t s t+1 s a r t t S t+1 A( s R t ) r t+1 Environment Goal: choose the action sequence that maximizes = T k Rt γ r t + k+ 1, 0 γ 1 k = 0 T may go to infinity, as long as γ 1 Rewards and state transitions after an action is executed are stochastic.
54 Markov Decision Processes A RL task satisfying the Markov Property is known as a Markov Decision Process (MDP) Pr { s = sʹ, r = r s, a, r, s, a,, r, s, a { = Pr s = sʹ, r = r s, a t + 1 t + 1 t t t t 1 t t + 1 t + 1 t t Pa = Pr { st = s " s = s,a = a # s s " +1 t t $ % Ra = E rt s s " { s = s,a = a,s = " +1 t t t +1 s # $ % Transition probabilities Expected reward
55 Ex.: Recycling Robot 1, R wait wait 1 β, 3 robot has to be rescued because its battery is depleted search_trash _trash β, R search search_trash Battery High 1,0 recharge_battery Battery Low transition probability α _trash, R search search_trash _ trash α, R search search_trash expected reward action taken _trash 1 α, R search search_trash 1, R wait wait R search _ trash > R wait > 0 Number of cans collected while performing the corresponding tasks
56 ), ( )) (, ):(, ( Policy a s A s S a s a s π π Probability of carrying out action a in state s State value for policy π: (state, action) value for policy π: { } = = = = = + + s s r s s R s V t k k t k t t 0 1 γ π π π E E ) ( { } = = = = = = = + + a a s s r a a s s R a s Q t t k k t k t t t, E, E ), ( 0 1 γ π π π Expected value of starting in state s, carrying out action a, and following policy π thereafter. Value Functions Expected value of starting in state s and following policy π thereafter. NOTE: value of final state, if any, is always zero.
57 Value Functions relation between state value and Q function for policy π: Q is such that its value is the maximum discounted cumulative reward that can be achieved starting from state x and applying action a as the first action Q(s, a) = E{ r t+1 +!V! (s t+1 ) s t = s, a t = a} V! (s) = maxq(s, a') a' { } "Q(s, a) = E r t+1 +! maxq(s t+1, a') s t = s, a t = a a'
58 Bellman Equation for V π and Q π V π ( s) = E π a a π { R s = s} = π( s, a) P [ R + γv ( sʹ ) ] t t a A( s) sʹ S ssʹ ssʹ P R a ssʹ a ssʹ isthetransitionprobability from s to s' under action a istheaverage reward whenmoving from s to s' under action a This equation expresses a relation between the values of a state and its successors. For finite MDPs, unique solution: V π Optimal state value function: V ( s) = maxv π π ( s), s S Optimal (state, action) value function: Q ( s, a) = maxq ( s, a), s S, a A( s) { } Q! (s, a) = E r t+1 +! maxq(s t+1, a') s t = s, a t = a a' π π
59 Bellman Equation for V π and Q π s max a s,a r r s s V ( s) = max Q a A( s) π * ( s, a) = max a sʹ P a ssʹ [ R a ssʹ + γv ( sʹ )] max a state a a Q ( s, a) = Ps sʹ [ Rssʹ + γ maxq ( s, a )] aʹ sʹ ʹ ʹ (state, action)
60 Possible Approaches to the Solution of the RL Problem Dynamic Programming (DP) To determine V * for S = N, a system of N non-linear equations must be solved. Well established mathematical method. A complete model of the environment is required (P ss a and R ss a known). Often faces the curse of dimensionality [Bellman, 1957] Monte Carlo Similar to DP, but P ss a and R ss a unknown. P ss a and R ss a determined from the average of several trial-and-error trials. Unappropriate for a step-by-step incremental approximation of V *. Temporal Differences Knowledge of P ss a e R ss a is not required Step-by-step incremental approximation of V *. Mathematical analysis more complex. Q-learning
61 Q-Learning Once V * is known or learned, an apparently obvious solution for the RL problem would be:!! (s) = argmax a"a(s) E{ r(s, a)+"v! (#(s, a)) s t = s, a t = a},!(s, a) # state transition function... but r(s,a) and δ(s,a) are unknown in the general case However, if we know or learn Q *, a different solution arises: Q ( s, a) = E π ( s) = arg max Q { r + γv ( s ) s = s, a = a} a A( s) t+ 1 * ( s, a) t+ 1 t t In a stochastic environment, with unknown P ss a e R ss a, the agent s own experience when interacting with its environment can be used to learn Q * and π.
62 Q-Learning - Algorithm Initialize Q(s,a) random or arbitrarily Repeat forever (for each episode or trial): Initialize s Repeat (for each step n of the episode): Choose action a of s Execute action a and observe r and s Q 1( s, a) Q ( s, a) n+ n + αn r( s, a) + γ maxqn ( sʹ, aʹ ) a' s sʹ ; until s final. Q n ( s, a) α constant allows adaptability to slow environment changes but it does not guarantee convergence only possible with a temporal decay under given circumstances.
63 Q-Learning an Example G r(s,a) V * (s) α = 1 γ n n = G Q nπ (s,a)
64 Q-Learning an Example G r(s,a) V * (s) α γ n n = 1 = G Q nπ (s,a)
65 Q-Learning Another Example Initial Situation After some learning steps
66 Should each pair (s,a) be visited an infinite number of times, with < = < = = 1 2 ),, ( 1 ),, ( 1 0 i a s i n i a s i n n α α α Q-Learning Algorithm Convergence 1 )], ( ), ( ˆ Pr[lim, then = = a s Q a s Q a s n n
67 Action Selection: Exploration vs Exploitation Exploration: less promising actions, which may lead to good results, are tested. Exploitation: takes advantage of tested actions which are more promising, i.e., which have a larger Q(s,a). ε- greedy: at each step n, picks the best action so far with probability 1-ε, for small ε, but can also pick with probability ε, in an uniformly distributed random fashion, one of the other actions. softmax: at each step n, picks the action to be executed according to a Gibbs or Boltzmann distribution: π ( s, a) n = e Q n e b A( s) ( s, a) / τ Q n ( s, b) / τ
AUTONOMOUS SYSTEMS. Task Planning. Pedro U. Lima M. Isabel Ribeiro Luis Custódio
AUTONOMOUS SYSTEMS Task Planning Pedro U. Lima M. Isabel Ribeiro Luis Custódio Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal March 2007 Revised by Pedro U. Lima in November
More informationADVANCED ROBOTICS. PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes
ADVANCED ROBOTICS PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes Pedro U. Lima Instituto Superior Técnico/Instituto de Sistemas e Robótica September 2009 Reviewed April
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationLecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010
Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More information7. Queueing Systems. 8. Petri nets vs. State Automata
Petri Nets 1. Finite State Automata 2. Petri net notation and definition (no dynamics) 3. Introducing State: Petri net marking 4. Petri net dynamics 5. Capacity Constrained Petri nets 6. Petri net models
More informationCourse basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.
Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationARTIFICIAL INTELLIGENCE. Reinforcement learning
INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationReinforcement Learning II
Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationThe State Explosion Problem
The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis
More informationReinforcement Learning. George Konidaris
Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationReinforcement Learning
Reinforcement Learning 1 Reinforcement Learning Mainly based on Reinforcement Learning An Introduction by Richard Sutton and Andrew Barto Slides are mainly based on the course material provided by the
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationMachine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396
Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationSequential Decision Problems
Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted
More informationPlanning Under Uncertainty II
Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationReinforcement Learning. Spring 2018 Defining MDPs, Planning
Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationSpecification models and their analysis Petri Nets
Specification models and their analysis Petri Nets Kai Lampka December 10, 2010 1 30 Part I Petri Nets Basics Petri Nets Introduction A Petri Net (PN) is a weighted(?), bipartite(?) digraph(?) invented
More informationThe Markov Decision Process (MDP) model
Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy School of Informatics 25 January, 2013 In the MAB Model We were in a single casino and the
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationChapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
More informationLogic. Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001
Logic Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001 Last Lecture Games Cont. α-β pruning Outline Games with chance, e.g. Backgammon Logical Agents and thewumpus World
More informationDiscrete Event Systems Exam
Computer Engineering and Networks Laboratory TEC, NSG, DISCO HS 2016 Prof. L. Thiele, Prof. L. Vanbever, Prof. R. Wattenhofer Discrete Event Systems Exam Friday, 3 rd February 2017, 14:00 16:00. Do not
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationChapter 7 R&N ICS 271 Fall 2017 Kalev Kask
Set 6: Knowledge Representation: The Propositional Calculus Chapter 7 R&N ICS 271 Fall 2017 Kalev Kask Outline Representing knowledge using logic Agent that reason logically A knowledge based agent Representing
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationLecture 3: The Reinforcement Learning Problem
Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationMarkov Decision Processes (and a small amount of reinforcement learning)
Markov Decision Processes (and a small amount of reinforcement learning) Slides adapted from: Brian Williams, MIT Manuela Veloso, Andrew Moore, Reid Simmons, & Tom Mitchell, CMU Nicholas Roy 16.4/13 Session
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationTHE LANGUAGE OF FIRST-ORDER LOGIC (FOL) Sec2 Sec1(1-16)
THE LANGUAGE OF FIRST-ORDER LOGIC (FOL) Sec2 Sec1(1-16) FOL: A language to formulate knowledge Logic is the study of entailment relationslanguages, truth conditions and rules of inference. FOL or Predicate
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationINF5390 Kunstig intelligens. Logical Agents. Roar Fjellheim
INF5390 Kunstig intelligens Logical Agents Roar Fjellheim Outline Knowledge-based agents The Wumpus world Knowledge representation Logical reasoning Propositional logic Wumpus agent Summary AIMA Chapter
More information7. Propositional Logic. Wolfram Burgard and Bernhard Nebel
Foundations of AI 7. Propositional Logic Rational Thinking, Logic, Resolution Wolfram Burgard and Bernhard Nebel Contents Agents that think rationally The wumpus world Propositional logic: syntax and semantics
More informationPart of this work we present an extension for Language PP and we show how this
Chapter 5 Planning problems Currently, we can find different definitions for Planning. For instance, McCarthy defines planning in [25] as the restricted problem of finding a finite sequence of actions
More informationAI Programming CS S-09 Knowledge Representation
AI Programming CS662-2013S-09 Knowledge Representation David Galles Department of Computer Science University of San Francisco 09-0: Overview So far, we ve talked about search, which is a means of considering
More information20/c/applet/more.html Local Beam Search The best K states are selected.
Have some fun Checker: http://www.cs.caltech.edu/~vhuang/cs 20/c/applet/more.html 1 Local Beam Search Run multiple searches to find the solution The best K states are selected. Like parallel hill climbing
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationAutomata-based Verification - III
CS3172: Advanced Algorithms Automata-based Verification - III Howard Barringer Room KB2.20/22: email: howard.barringer@manchester.ac.uk March 2005 Third Topic Infinite Word Automata Motivation Büchi Automata
More informationIntroduction to Artificial Intelligence. Logical Agents
Introduction to Artificial Intelligence Logical Agents (Logic, Deduction, Knowledge Representation) Bernhard Beckert UNIVERSITÄT KOBLENZ-LANDAU Winter Term 2004/2005 B. Beckert: KI für IM p.1 Outline Knowledge-based
More informationLogical Agents. Santa Clara University
Logical Agents Santa Clara University Logical Agents Humans know things Humans use knowledge to make plans Humans do not act completely reflexive, but reason AI: Simple problem-solving agents have knowledge
More informationKnowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system):
Logic Knowledge-based agents Inference engine Knowledge base Domain-independent algorithms Domain-specific content Knowledge base (KB) = set of sentences in a formal language Declarative approach to building
More informationCS788 Dialogue Management Systems Lecture #2: Markov Decision Processes
CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationIntelligent Agents. First Order Logic. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 19.
Intelligent Agents First Order Logic Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University last change: 19. Mai 2015 U. Schmid (CogSys) Intelligent Agents last change: 19. Mai 2015
More informationCMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)
More informationFirst-Order Logic First-Order Theories. Roopsha Samanta. Partly based on slides by Aaron Bradley and Isil Dillig
First-Order Logic First-Order Theories Roopsha Samanta Partly based on slides by Aaron Bradley and Isil Dillig Roadmap Review: propositional logic Syntax and semantics of first-order logic (FOL) Semantic
More informationLecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More informationAutomata-based Verification - III
COMP30172: Advanced Algorithms Automata-based Verification - III Howard Barringer Room KB2.20: email: howard.barringer@manchester.ac.uk March 2009 Third Topic Infinite Word Automata Motivation Büchi Automata
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationReinforcement Learning Part 2
Reinforcement Learning Part 2 Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ From previous tutorial Reinforcement Learning Exploration No supervision Agent-Reward-Environment
More informationPreference Elicitation for Sequential Decision Problems
Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 7. Propositional Logic Rational Thinking, Logic, Resolution Wolfram Burgard, Maren Bennewitz, and Marco Ragni Albert-Ludwigs-Universität Freiburg Contents 1 Agents
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 7. Propositional Logic Rational Thinking, Logic, Resolution Joschka Boedecker and Wolfram Burgard and Bernhard Nebel Albert-Ludwigs-Universität Freiburg May 17, 2016
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationMarkov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationArtificial Intelligence Chapter 7: Logical Agents
Artificial Intelligence Chapter 7: Logical Agents Michael Scherger Department of Computer Science Kent State University February 20, 2006 AI: Chapter 7: Logical Agents 1 Contents Knowledge Based Agents
More informationFirst-Order Logic. 1 Syntax. Domain of Discourse. FO Vocabulary. Terms
First-Order Logic 1 Syntax Domain of Discourse The domain of discourse for first order logic is FO structures or models. A FO structure contains Relations Functions Constants (functions of arity 0) FO
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Bart Selman selman@cs.cornell.edu Module: Knowledge, Reasoning, and Planning Part 2 Logical Agents R&N: Chapter 7 1 Illustrative example: Wumpus World (Somewhat
More informationLogic, Knowledge Representation and Bayesian Decision Theory
Logic, Knowledge Representation and Bayesian Decision Theory David Poole University of British Columbia Overview Knowledge representation, logic, decision theory. Belief networks Independent Choice Logic
More informationKecerdasan Buatan M. Ali Fauzi
Kecerdasan Buatan M. Ali Fauzi Artificial Intelligence M. Ali Fauzi Logical Agents M. Ali Fauzi In which we design agents that can form representations of the would, use a process of inference to derive
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationInference in first-order logic. Production systems.
CS 1571 Introduction to AI Lecture 17 Inference in first-order logic. Production systems. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Sentences in Horn normal form Horn normal form (HNF) in
More informationArtificial Intelligence. Propositional logic
Artificial Intelligence Propositional logic Propositional Logic: Syntax Syntax of propositional logic defines allowable sentences Atomic sentences consists of a single proposition symbol Each symbol stands
More informationCOMP219: Artificial Intelligence. Lecture 19: Logic for KR
COMP219: Artificial Intelligence Lecture 19: Logic for KR 1 Overview Last time Expert Systems and Ontologies Today Logic as a knowledge representation scheme Propositional Logic Syntax Semantics Proof
More informationCS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents
CS 331: Artificial Intelligence Propositional Logic I 1 Knowledge-based Agents Can represent knowledge And reason with this knowledge How is this different from the knowledge used by problem-specific agents?
More informationKnowledge-based Agents. CS 331: Artificial Intelligence Propositional Logic I. Knowledge-based Agents. Outline. Knowledge-based Agents
Knowledge-based Agents CS 331: Artificial Intelligence Propositional Logic I Can represent knowledge And reason with this knowledge How is this different from the knowledge used by problem-specific agents?
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More informationCOMP3702/7702 Artificial Intelligence Week 5: Search in Continuous Space with an Application in Motion Planning " Hanna Kurniawati"
COMP3702/7702 Artificial Intelligence Week 5: Search in Continuous Space with an Application in Motion Planning " Hanna Kurniawati" Last week" Main components of PRM" Collision check for a configuration"
More informationMarkov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationOn and Off-Policy Relational Reinforcement Learning
On and Off-Policy Relational Reinforcement Learning Christophe Rodrigues, Pierre Gérard, and Céline Rouveirol LIPN, UMR CNRS 73, Institut Galilée - Université Paris-Nord first.last@lipn.univ-paris13.fr
More informationLinear-time Temporal Logic
Linear-time Temporal Logic Pedro Cabalar Department of Computer Science University of Corunna, SPAIN cabalar@udc.es 2015/2016 P. Cabalar ( Department Linear oftemporal Computer Logic Science University
More informationReinforcement Learning Active Learning
Reinforcement Learning Active Learning Alan Fern * Based in part on slides by Daniel Weld 1 Active Reinforcement Learning So far, we ve assumed agent has a policy We just learned how good it is Now, suppose
More informationIntroduction to Temporal Logic. The purpose of temporal logics is to specify properties of dynamic systems. These can be either
Introduction to Temporal Logic The purpose of temporal logics is to specify properties of dynamic systems. These can be either Desired properites. Often liveness properties like In every infinite run action
More informationCS230: Lecture 9 Deep Reinforcement Learning
CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More information