Decentralized Decision Making!

Size: px
Start display at page:

Download "Decentralized Decision Making!"

Transcription

1 Decentralized Decision Making! in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher Amato, Daniel Bernstein, Alan Carlin, Claudia Goldman, Eric Hansen, Akshat Kumar, Marek Petrik, Sven Seuken, Feng Wu, and Xiaojian Wu IJCAI 11 Workshop on Decision Making in Partially Observable, Uncertain Worlds Barcelona, Spain July 18, 2011

2 Decentralized Decision Making! Challenge: How to achieve intelligent coordination of a group of decision makers in spite of stochasticity and partial observability?! Key objective: Develop effective decision-theoretic methods to address the uncertainty about the domain, the outcome of actions, and the knowledge, beliefs and intentions of the other agents.! 2!

3 Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each episode involves a sequence of decisions over finite or infinite horizon! The change in the environment is determined stochastically by the current state and the set of actions taken by the agents! Each decision maker obtains different partial observations of the overall situation! Decision makers have the same objectives! 3!

4 Applications! Autonomous rovers for space exploration! Protocol design for multiaccess broadcast channels! Coordination of mobile robots! Decentralized detection and tracking! Decentralized detection of hazardous weather events! 4!

5 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 5!

6 Decentralized POMDP! 1 a 1 o 1 a 2 World r Reward 2 o 2 Generalization of POMDP involving multiple cooperating decision makers with different observation functions! 8!

7 DEC-POMDPs! A DEC-POMDP is defined by a tuple S, A 1, A 2, P, R 1, R 2, Ω 1, Ω 2, O, where! S is a finite set of domain states, with initial state s 0! A 1, A 2 are finite action sets! P(s, a 1, a 2, s' ) is a state transition function! R(s, a 1, a 2 ) is a reward function! Ω 1, Ω 2 are finite observation sets O(a 1, a 2, s', o 1, o 2 ) is an observation function! Straightforward generalization to n agents! 9!

8 Formal Models! 10!

9 Example: Mobile Robot Planning! States: grid cell pairs Actions:,,, Transitions: noisy Goal: meet quickly Observations: red lines 11!

10 Example: Cooperative Box-Pushing Goal: push as many boxes as possible to goal area; larger box has higher reward, but requires two agents to be moved. 12!

11 Solving DEC-POMDPs! Each agentʼs behavior is described by a local policy δ i! Policy can be represented as a mapping from! Local observation sequences to actions; or! Local memory states to actions Actions can be selected deterministically or stochastically! Goal is to maximize expected reward over a finite horizon or discounted infinite horizon! 13!

12 Work on Decentralized Decision Making and DEC-POMDPs! Team theory [Marschak 55, Tsitsiklis & Papadimitriou 82]! Incorporating dynamics [Witsenhausen 71]! Communication strategies [Varaiya & Walrand 78, Xuan et al. 01, Pynadath & Tambe 02]! Approximation algorithms [Peshkin et al. 00, Guestrin et al. 01, Nair et al. 03, Emery-Montemerlo et al. 04]! First Exact DP algorithm [Hansen et al. 04]! First policy iteration algorithm [Bernstein et al. 05]! Many recent exact and approximate DEC-POMDP algorithms! 14!

13 Some Fundamental Questions! Are DEC-POMDPs significantly harder to solve than POMDPs? Why?! What features of the problem domain affect the complexity and how?! Is optimal dynamic programming possible?! Can dynamic programming be made practical?! Is it beneficial to treat communication as a separate type of action?! How can we exploit the locality of agent interaction to develop more scalable algorithms?! 15!

14 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 16!

15 Previous Complexity Results! Finite Horizon! MDP! P-complete! ( if T < S )! POMDP! PSPACE- complete! ( if T < S )! Papadimitriou & Tsitsiklis 87! Papadimitriou & Tsitsiklis 87! Infinite-Horizon Discounted! MDP! P-complete! Papadimitriou & Tsitsiklis 87! POMDP! Undecidable! Madani et al. 99! 17!

16 How Hard are DEC-POMDPs?! Bernstein, Givan, Immerman & Zilberstein, UAI 2000, MOR 2002 The complexity of finite-horizon DEC-POMDPs has been hard to establish.! A static version of the problem, where a single set of decisions is made in response to a single set of observations, was shown to be NP-hard [Tsitsiklis and Athan, 1985]! We proved that two-agent finite-horizon DEC- POMDPs are NEXP-hard! But these are worst-case results!! Are real-world problems easier?! 18!

17 What Features of the Domain Affect the Complexity and How? Factored state spaces (structured domains)! Independent transitions (IT)! Independent observations (IO)! Structured reward function (SR)! Goal-oriented objectives (GO)! Degree of observability (partial, full, jointly full)! Degree and structure of interaction! Degree of information sharing and communication! 19!

18 Complexity of Sub-Classes Goldman & Zilberstein, JAIR 2004 NEXP-C Finite-Horizon DEC-MDP NP-C IO & IT NEXP-C Goal Oriented NP-C w/ Sharing Information NP-C Goal Oriented P-C G = 1 P-C G > 1 Certain Conditions 20!

19 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 21!

20 JESP: First DP Algorithm! Nair, Tambe, Yokoo, Pynadath & Marsella, IJCAI 2003! JESP: Joint Equilibriumbased Search for Policies! Complexity: exponential! Result: only locally optimal solutions! 22!

21 Is Exact DP Possible? The key to solving POMDPs is that they can be viewed as belief-state MDPs [Smallwood & Sondik 73]! Not as clear how to define a belief-state MDP for a DEC-POMDP! The first exact DP algorithm for finite-horizon DEC-POMDPs uses the notion of a generalized belief state! The algorithm also applies to competitive situations modeled as POSGs! 23!

22 Generalized Belief State!!A generalized belief state captures the uncertainty of one agent with respect to the state of the world as well as the policies of other agents.! 24!

23 Strategy Elimination! Any finite-horizon DEC-POMDP can be converted to a normal form game! But the number of strategies is doubly exponential in the horizon length!! R 111, R 11 2!! R 1n1, R 1n2!!!! R m11, R m12!! R mn1, R mn2! 25!

24 A Better Way to Do Elimination Hansen, Bernstein & Zilberstein, AAAI 2004 We can use dynamic programming to eliminate dominated strategies without first converting to normal form! Pruning a subtree eliminates the set of trees containing it! a 1 a a 2 3 o 1 o o 2 o 1 1 o 2 o 2 a 2 a 1 prune a 1 a 2 o 1 o 2 o 1 o 2 a 2 a 2 a 3 a 3 a 2 a 3 o 1 o 2 o 1 o 2 a 3 a 2 a 2 a 1 eliminate 26!

25 First Exact DP for DEC-POMDPs! Hansen, Bernstein & Zilberstein, AAAI 2004 Algorithm is complete & optimal! Complexity is double exponential! Theorem: DP performs iterated elimination of dominated strategies in the normal form of the POSG.! Corollary: DP can be used to find an optimal joint policy in a DEC-POMDP.! 35!

26 Alternative: Heuristic Search Szer, Charpillet & Zilberstein, UAI 2005! Perform forward bestfirst search in the space of joint policies! Take advantage of a known start state distribution! Take advantage of domain-independent heuristics for pruning! 37!

27 The MAA* Algorithm Szer, Charpillet & Zilberstein, UAI 2005! MAA* is complete and optimal! Main advantage: significant reduction in memory requirements over the dynamic programming approach! 38!

28 Scaling Up Heuristic Search Spaan, Oliehoek, and Amato, IJCAI 2011! Problem with MAA*: The number of children of a node is doubly exponential in the node's depth! Basic idea: avoid the full expansion of each node by incrementally generating the children only when a child might have a higher heuristic value! Introduce a more memory-efficient representation for heuristic functions! Yields a speed up over the state-of-the-art allowing for the optimal solution over longer horizons! 39!

29 Scaling Up Heuristic Search Spaan, Oliehoek, and Amato, IJCAI 2011! 40!

30 Memory-Bounded DP (MBDP) Seuken & Zilberstein, IJCAI 2007! Combining the two approaches:! The DP algorithm is a bottom-up approach! The search operates top-down! The DP step can only eliminate a policy tree if it is dominated for every belief state! But, only a small subset of the belief space is actually reachable! Furthermore, the combined approach allows the algorithm to focus on a small subset of joint policies that appear best! 41!

31 Memory-Bounded DP Cont. 42!

32 The MBDP Algorithm! 43!

33 Generating Good Belief States! MDP Heuristic -- Obtained by solving the corresponding fully-observable multiagent MDP! Infinite-Horizon Heuristic -- Obtained by solving the corresponding infinite-horizon DEC-POMDP! Random Policy Heuristic -- Could augment another heuristic by adding random exploration! Heuristic Portfolio -- Maintain a set of belief states generated by a set of different heuristics! Recursive MBDP! 44!

34 Performance of MBDP! 46!

35 MBDP Successors! Improved MBDP (IMBDP)!![Seuken and Zilberstein, UAI 2007]! MBDP with Observation Compression (MBDP-OC)!![Carlin and Zilberstein, AAMAS 2008]! Point Based Incremental Pruning (PBIP)!![Dibangoye, Mouaddib, and Chaib-draa, AAMAS 2009]! PBIP with Incremental Policy Generation (PBIP-IPG)!![Amato, Dibagoye, Zilberstein, AAAI 2009]! Constraint-Based Dynamic Programming (CBDP)!![Kumar and Zilberstein, AAMAS 2009]! Point-Based Backup for Decentralized POMDPs!![Kumar and Zilberstein, AAMAS 2010]! Point-Based Policy Generation (PBPG)!![Wu, Zilberstein, and Chen, AAMAS 2010]! 49!

36 Key Ideas Behind These Algorithms! Perform search in a reduced policy space! Exact algorithm perform only lossless pruning! Approximate algorithms rely on more aggressive pruning! MBDP represents an exponential size policy with linear space O(maxTrees T)! Resulting policy is an acyclic finite-state controller.! 55!

37 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 56!

38 Infinite-Horizon DEC-POMDPs Unclear how to define a compact belief-state without fixing the policies of other agents! Value iteration does not generalize to the infinitehorizon case! Can generalize policy iteration for POMDPs [Hansen 98, Poupart & Boutilier 04]! Basic idea: Representing local policies using (deterministic/stochastic) finite-state controllers and defining a set of controller transformations that guarantee improvement & convergence! 57!

39 Policies as Controllers Finite state controller represents each policy! Fixed memory! Randomness used to offset memory limitations! Action selection, ψ : Q i ΔA i! Transitions, η : Q i A i O i ΔQ i! Value of two-agent joint controller given by the Bellman equation:! V (q 1,q 2,s) = P(a 1 q 1 )P(a 2 q 2 )[ R(s,a 1,a 2 ) + a 1,a 2 γ P(s' s,a 1,a 2 ) O(o 1,o 2 s',a 1,a 2 ) P(q 1 ' q 1,a 1,o 1 )P(q 2 ' q 2,a 2,o 2 ) V (q 1 ',q 2 ',s') s' q 1 ',q 2 ' o 1,o 2 58!

40 Controller Example Stochastic controller for one agent! 2 nodes, 2 actions, 2 observations! Parameters! P(a i q i )! P(q i '! q i, o i ) a o 1 1 o 2 a o o 1 a o !

41 Finding Optimal Controllers How can we search the space of possible joint controllers?! How do we set the parameters of the controllers to maximize value?! Deterministic controllers can use traditional search methods such as BSF or B&B! Stochastic controllers continuous optimization problem! Key question: how to best use a limited amount of memory to optimize value?! 60!

42 Independent Joint Controllers! Local controller for agent i is defined by conditional distribution P(a i, q'! i q i, o i ) Independent joint controller is expressed by: Π i P(a i, q i '! q i, o i ) Can be represented as a dynamic Bayes net! s q 1 q 1 a 1 o 1 s a 2 o 2 q 2 q 2 61!

43 Correlated Joint Controllers! Bernstein, Hansen & Zilberstein, IJCAI 2005, JAIR 2009! A correlation device, [Q c,ψ], is a set of nodes and a stochastic state transition function! Joint controller:!! qc P(q c '! q c ) Π i P(a i, q i '! q i, o i, q'! c )! q 1 q 1 A shared source of randomness affecting decisions and memory state update! q c q c s a 1 o 1 s a 2 o 2 Random bits for the correlation device can be determined prior to execution time! q 2 q 2 62!

44 Exhaustive Backups! Add a node for every possible action and deterministic transition rule! a 1 o 2 o 2 a 1 a 1 o 1 a 1 o 2 o 2 a 1 a 1 o 1 o 1 o 1,o 2 o 1 o 1,o 2 a 1 o 1,o 2 o 1,o 2 o 1,o 2 a 1 a 2 o 1,o 2 a 2 a 1 o 1,o 2 o 1,o 2 o 1,o 2 a 1 a 2 o 1,o 2 a 2 o 1,o 2 o 1 o 1,o 2 o 1 a 2 o 1 o 2 o 2 a 2 a 2 a 2 o 1 o 2 o 2 a 2 a 2 Repeated backups converge to optimality, but lead to very large controllers! 63!

45 Value-Preserving Transformations! A value-preserving transformation changes the joint controller without sacrificing value! Formally, there must exist mappings! f i : Q i ΔR i for each agent i and f c : Q c ΔR c such that! V (s, q,q c )!for all s S, r q P( r q ) r c Q, and q c Q c P(r c q c )V (s, r,r c ) 64!

46 Bounded Policy Iteration Algorithm! Bernstein, Hansen & Zilberstein, IJCAI 2005, JAIR 2009! Repeat! 1) Evaluate the controller! 2) Perform an exhaustive backup! 3) Perform value-preserving transformations! Until controller is ε-optimal for all states! Theorem: For any ε, bounded policy iteration returns a joint controller that is ε-optimal for all initial states in a finite number of iterations.! 65!

47 Useful Transformations! Controller reductions! Shrink the controller without sacrificing value! Bounded dynamic programming updates! Increase value while keeping the size fixed! Both can be done using polynomial-size linear programs! Generalize ideas from POMDP literature, particularly the BPI algorithm [Poupart & Boutilier 03]! 66!

48 Controller Reduction! For some node q i, find a convex combination of nodes in Q i \ q i that dominates q i for all states and nodes of the other controllers; Merge q i into the convex combination by changing transition probabilities! Corresponding linear program:!!variables: ε,! P( q ˆ i )!Objective: Maximize ε!!constraints: s S, q i Q i, q c Q c! V (s,q i,q i,q c ) + ε q ˆ i P(ˆ q i )V (s, ˆ q i,q i,q c ) Theorem: A controller reduction is a value-preserving transformation.! 67!

49 Bounded DP Update! For some node q i, find better '! parameters assuming that the old parameters will be used from the second step onwards; New parameters must yield value at least as high for all states and nodes of the other controllers! Corresponding linear program:!!variables: ε, P(a i, q i q i, o i, q c )!!Objective: Maximize ε!!constraints: s S, q i Q i, q c Q c! V (s, q,q c ) + ε a P( a q,q c ) R(s,a) + γ s', o, q ',q c ' P( q ' q, a, o,q c )P(s', o s, a )P(q c ' q c )V (s', q ',q c ') Theorem: A bounded DP update is a value-preserving transformation.! 68!

50 Modifying the Correlation Device! Both transformations can be applied to the correlation device! Slightly different linear programs to solve! Can think of the correlation device as another agent! Lots of implementation questions! What to use for an initial joint controller?! Which transformations to perform?! Order for choosing nodes to remove or improve?! 69!

51 Decentralized BPI Summary! DEC-BPI finds better and much more compact solutions than exhaustive backups! A larger correlation device tends to lead to higher values on average! Larger local controllers tend to yield higher average values up to a point! But, bounded DP is limited by improving one controller at a time! Linear program (one-step lookahead) results in local optimality and tends to get stuck! 76!

52 Nonlinear Optimization Approach! Amato, Bernstein & Zilberstein, UAI 2007, JAAMAS 2010! Basic idea: Model the problem as a non-linear program (NLP)! Consider node values (as well as controller parameters) as variables! The NLP can take advantage of an initial state distribution when it is given! Improvement and evaluation all in one step (equivalent to an infinite lookahead)! Additional constraints maintain valid values! 77!

53 NLP Representation Variables:! x( q, a ) = P( a q ), y( q, a, o, q ') = P( q ' q, a,,! Objective: Maximize! b 0 (s) z( q 0,s) Value Constraints: s S, q Q z( q,s) = a x( q ', a ) R(s, a ) + γ s s' P(s' s, a ) O( o s', a ) y( q, a, o, q ') z( q ',s') Additional linear constraints:! ensure controllers are independent! all probabilities sum to 1 and are non-negative! o o ) z( q,s) = V ( q,s) q ' 78!

54 Independence Constraints Independence constraints guarantee that action selection and controller transition probabilities for each agent depend only on local information! Action selection independence:! Controller transition independence:! 79!

55 Probability Constraints Probability constraints guarantee that action selection probabilities and controller transition probabilities are non negative and that they add up to 1:!! (Superscript f ʼs represent arbitrary fixed values)! 80!

56 Optimality Theorem: An optimal solution of the NLP results in optimal stochastic controllers for the given size and initial state distribution.! Advantages of the NLP approach:! Efficient policy representation with fixed memory! NLP represents optimal policy for given size! Takes advantage of known start state! Easy to implement using off-the-shelf solvers! Limitations:! Difficult to solve optimally! 81!

57 Adding a Correlation Device NLP approach can be extended to include a correlation device, using the following formulation:! New variable w(c,c') represents the transition function of the correlation device; action selection and controller transitions depend on new shared signal.! 82!

58 Comparison of NLP & DEC-BPI! Amato, Bernstein & Zilberstein, UAI 2007, JAAMAS 2010! Used freely available nonlinear constrained optimization solver called filter on the NEOS server ( Solver guarantees locally optimal solution! Used 10 random initial controllers for a range of controller sizes! Compared NLP with DEC-BPI, with and without a small (2-node) correlation device! 83!

59 Results: Broadcast Channel! Amato, Bernstein & Zilberstein, UAI 2007! Simple two agents networking problem!!(2 agents, 4 states, 2 actions, 5 observations)! Average quality over 10 trials:! Average run time:! 84!

60 Results: Multi-Agent Tiger! Amato, Bernstein & Zilberstein, JAAMAS 2010! A two-agent version of a well-known POMDP benchmark [Nair et al. 03] (2 states, 3 actions, 2 observations)! Average quality of various controller sizes using NLP methods with and without 2-node correlation device and BFS! 85!

61 Results: Meeting in a Grid! Amato, Bernstein & Zilberstein, JAAMAS 2010! A two-agent domain with 16 states, 5 actions, 2 observations! Average quality of various controller sizes using NLP methods and DEC-BPI with and without 2-node correlation device and BFS! 86!

62 Results: Box Pushing! Amato, Bernstein & Zilberstein, JAAMAS 2010! Values and running times (in seconds) for each controller size using NLP methods and DEC-BPI with and without a 2 node correlation device and BFS. An x indicates that the approach was not able to solve the problem.! 88!

63 NLP Approach Summary The NLP defines the optimal fixed-size stochastic controller! Approach shows consistent improvement over DEC-BPI using an off-the-shelf locally optimal solver! A small correlation device can have significant benefits! Better performance may be obtained by exploiting the structure of the NLP! 90!

64 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 91!

65 Exploiting the Locality of Interaction In practical settings that involve many agents, each agent often interacts with a small number of neighboring agents (e.g., firefighting, sensor networks)! Algorithms designed exploit this property include LID-JESP [Nair et al. AAAI 05] and SPIDER [Varakantham et al. AAMAS 07] and FANS [Marecki et al. AAMAS 08]! FANS uses FSCs for policy representation and! Exploits FSCs for dynamic programming in policy evaluation and heuristic computations and provides significant speedups! Introduces novel heuristics to automatically vary the FSC size in different agents! Performs policy search that exploits the locality of agent interactions! 92!

66 Constraint-Based DP! Kumar & Zilberstein, AAMAS 2009 Model the domain as a Network Distributed POMDP (ND-POMDP) a restricted class of DEC-POMDPs characterized by a decomposable reward function.! CBDP uses a point-based dynamic programming (similar to MBDP).! CBDP uses constraint networks algorithms to improve the efficiency of key steps:! Computation of the heuristic function! Belief sampling using heuristic function! Finding the best joint policy for a particular belief! 93!

67 Results: Sensors Tracking Target! Kumar & Zilberstein, AAMAS 2009 CBDP provides orders!!of magnitude of speedup!!over FANS! W N S E loc1 W N S E loc2 N W S E Provides better solution quality for all test instances! Provides strong theoretical guarantees on the time and space complexity enhancing scalability! Linear complexity in planning horizon length! Linear in the number of agents, which is necessary to solve large realistic problems! Exponential only in a small parameter that depends on the level of interaction among the agents! 94!

68 Sample Results A 7-agent configuration with 4 actions per agent. Two adjacent agents are required to track a target! Graphs show the solution quality (left) and time (right) of our approach (CBDP) compared with the best existing method (FANS)! FANS is not scalable beyond horizon 7. CBDP has linear complexity in the horizon, and it provides better solution quality is less time! 700! 1000! Solution quality! 600! 500! 400! 300! 200! CBDP! FANS! Time (sec, logscale)! 100! 10! 1! CBDP! FANS! 100! 0! 0.1! 2! 3! 4! 5! 6! 7! 8! 10! 2! 3! 4! 5! 6! 7! 8! 10! Horizon! Horizon!

69 New Scalable Approach! Kumar, Zilberstein, and Toussaint, IJCAI 2011! Extend an approach [Toussaint and Storkey, ICML 06] that maps planning under uncertainty (POMDP) problems into probabilistic inference! Characterize general constraints on the interaction graph that facilitate scalable planning! Introduce an efficient algorithm to solve such models using probabilistic inference! Identify a number of existing models with such constraints! 96!

70 Value Factorization! θ = parameters of an agent! Factors state-space s = (s 1,..., s M )! Example: Consider four agents!!s.t. V = V 12 + V 23 + V 34! 97!

71 Existing Models Satisfy VF! Each agent/state variable can participate in multiple value factors! Worst case complexity is NEXP-C! TI-DEC-MDP, ND-POMDP, TD-POMDP satisfy value factorization! 98!

72 Computational Advantages! Applicability! In models that satisfy VF, inference in the EM framework can be done independently in each value factor! Smaller value factors efficient inference! Planning no longer exponential, linear in # of factors! Implementation! Distributed planning! Efficient implementation using message-passing! Parallel computation of messages! 99!

73 Planning by Inference! Recasts planning as likelihood maximization in a DBN mixture with binary reward variable r :!!P(r =1 s, a 1, a 2 ) R(s, a 1, a 2 )! DBN Mixture 100!

74 Exploiting the VF Property! Exploit additive nature of value function for scalability! Outer mixture simulates the VF property! Each V f (θ f, s f ) evaluated using time dependent mixture! Theorem: Maximizing the likelihood of observing the variable r = 1 optimizes the joint-policy! 101!

75 The Expectation-Maximization Algorithm! Observed data r = 1, every other variable hidden! Use the EM algorithm to maximize the likelihood! Implemented using message passing on the VF graph! Example: 3 factors {Ag 1, Ag 2 }, {Ag 2, Ag 3 } and {Ag 2, Ag 3 }! 102!

76 Properties of the EM Algorithm! Scalability! μ message requires independent inference in each factor! Agents/state vars. can be involved in multiple factors can model complex systems via simpler interactions! Distributed planning via message passing! Complexity! Linear in the number of factors, exponential in the number of agents/state variables in a factor! Generality! No additional assumptions (such as TOI) required a general optimization recipe for models with the VF property! Local optima?! 103!

77 Experiments! ND-POMDP domains involving target tracking in sensor networks with imperfect sensing! Multiple targets, limited sensors with battery! Penalty = -1 per sensor for miscoordination, recharging battery; positive reward (+80) per target scanned simultaneous by two adjacent sensors! 104!

78 Comparisons with NLP Approach (5P Domain)! 106!

79 Scalability on Larger Benchmarks! 15 agent and 20 agent domains, internal states = 5! 108!

80 Summary of the EM Approach! Value factorization (VF) facilitates scalability! Several existing weakly-coupled models satisfy VF! An EM algorithm can solve models with such property and yield good quality solutions! Scalability: E-step decomposes according to value factors; smaller factors lead to efficient inference! Can be easily implemented using message-passing among the agents! Future work: Explore techniques for even faster inference, and establish better error bounds.! 109!

81 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 110!

82 Back to Some Basic Questions! Are DEC-POMDPs significantly harder to solve than POMDPs? Why?! What features of the problem domain affect the complexity and how?! Is optimal dynamic programming possible?! Can dynamic programming be made practical?! Is it beneficial to treat communication as a separate type of action?! How can we exploit the locality of agent interaction to develop more scalable algorithms?! 111!

83 Questions?! Additional Information:! Resource-Bounded Reasoning Lab! University of Massachusetts, Amherst! 112!

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003

More information

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI

More information

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu

More information

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Steeve Dibangoye Inria / Université de Lorraine Nancy, France jilles.dibangoye@inria.fr Christopher Amato CSAIL / MIT Cambridge, MA, USA camato@csail.mit.edu

More information

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Journal of Artificial Intelligence Research 22 (2004) 143-174 Submitted 01/04; published 11/04 Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Claudia V. Goldman Shlomo

More information

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Optimal and Approximate Q-value Functions for Decentralized POMDPs Journal of Artificial Intelligence Research 32 (2008) 289 353 Submitted 09/07; published 05/08 Optimal and Approximate Q-value Functions for Decentralized POMDPs Frans A. Oliehoek Intelligent Systems Lab

More information

Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs

Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs Jilles S. Dibangoye INRIA Loria, Campus Scientifique Vandœuvre-lès-Nancy, France jilles.dibangoye@inria.fr Christopher

More information

Communication-Based Decomposition Mechanisms for Decentralized MDPs

Communication-Based Decomposition Mechanisms for Decentralized MDPs Journal of Artificial Intelligence Research 32 (2008) 169-202 Submitted 10/07; published 05/08 Communication-Based Decomposition Mechanisms for Decentralized MDPs Claudia V. Goldman Samsung Telecom Research

More information

Multi-Agent Online Planning with Communication

Multi-Agent Online Planning with Communication Multi-Agent Online Planning with Communication Feng Wu Department of Computer Science University of Sci. & Tech. of China Hefei, Anhui 230027 China wufeng@mail.ustc.edu.cn Shlomo Zilberstein Department

More information

Efficient Maximization in Solving POMDPs

Efficient Maximization in Solving POMDPs Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University

More information

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping Pradeep Varakantham, Jun-young Kwak, Matthew Taylor, Janusz Marecki, Paul Scerri, Milind Tambe University of Southern California,

More information

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Bikramjit Banerjee 1, Jeremy Lyle 2, Landon Kraemer

More information

Integrated Cooperation and Competition in Multi-Agent Decision-Making

Integrated Cooperation and Competition in Multi-Agent Decision-Making Integrated Cooperation and Competition in Multi-Agent Decision-Making Kyle Hollins Wray 1 Akshat Kumar 2 Shlomo Zilberstein 1 1 College of Information and Computer Sciences, University of Massachusetts,

More information

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping In The AAMAS workshop on Multi-agent Sequential Decision-Making in Uncertain Domains (MSDM-09), Budapest, Hungary, May 2009. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

More information

The Complexity of Decentralized Control of Markov Decision Processes

The Complexity of Decentralized Control of Markov Decision Processes The Complexity of Decentralized Control of Markov Decision Processes Daniel S. Bernstein Robert Givan Neil Immerman Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst,

More information

Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP

Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Jiaying Shen Department of Computer Science University of Massachusetts Amherst, MA 0003-460, USA jyshen@cs.umass.edu

More information

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES MATHEMATICS OF OPERATIONS RESEARCH Vol. 27, No. 4, November 2002, pp. 819 840 Printed in U.S.A. THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES DANIEL S. BERNSTEIN, ROBERT GIVAN, NEIL

More information

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Liam MacDermed College of Computing Georgia Institute of Technology Atlanta, GA 30332 liam@cc.gatech.edu Charles L. Isbell College

More information

Decision-theoretic approaches to planning, coordination and communication in multiagent systems

Decision-theoretic approaches to planning, coordination and communication in multiagent systems Decision-theoretic approaches to planning, coordination and communication in multiagent systems Matthijs Spaan Frans Oliehoek 2 Stefan Witwicki 3 Delft University of Technology 2 U. of Liverpool & U. of

More information

Complexity of Decentralized Control: Special Cases

Complexity of Decentralized Control: Special Cases Complexity of Decentralized Control: Special Cases Martin Allen Department of Computer Science Connecticut College New London, CT 63 martin.allen@conncoll.edu Shlomo Zilberstein Department of Computer

More information

Decision Making As An Optimization Problem

Decision Making As An Optimization Problem Decision Making As An Optimization Problem Hala Mostafa 683 Lecture 14 Wed Oct 27 th 2010 DEC-MDP Formulation as a math al program Often requires good insight into the problem to get a compact well-behaved

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

AGENT INTERACTIONS IN DECENTRALIZED ENVIRONMENTS

AGENT INTERACTIONS IN DECENTRALIZED ENVIRONMENTS AGENT INTERACTIONS IN DECENTRALIZED ENVIRONMENTS A Dissertation Presented by MARTIN WILLIAM ALLEN Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design

Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design Sven Seuken, Ruggiero Cavallo, and David C. Parkes School of

More information

Towards Faster Planning with Continuous Resources in Stochastic Domains

Towards Faster Planning with Continuous Resources in Stochastic Domains Towards Faster Planning with Continuous Resources in Stochastic Domains Janusz Marecki and Milind Tambe Computer Science Department University of Southern California 941 W 37th Place, Los Angeles, CA 989

More information

Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs

Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs Bikramjit Banerjee School of Computing The University of Southern Mississippi Hattiesburg, MS 39402 Bikramjit.Banerjee@usm.edu

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making

More information

Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication

Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication Frans A. Oliehoek MIT CSAIL / Maastricht University Maastricht, The Netherlands frans.oliehoek@maastrichtuniversity.nl Matthijs

More information

ABSTRACT ADVANTAGES OF UNPREDICTABLE MULTIAGENT SYSTEMS: RANDOMIZED POLICIES FOR SINGLE AGENTS AND AGENT-TEAMS

ABSTRACT ADVANTAGES OF UNPREDICTABLE MULTIAGENT SYSTEMS: RANDOMIZED POLICIES FOR SINGLE AGENTS AND AGENT-TEAMS Don Dini Milind Tambe ABSTRACT ADVANTAGES OF UNPREDICTABLE MULTIAGENT SYSTEMS: RANDOMIZED POLICIES FOR SINGLE AGENTS AND AGENT-TEAMS In adversarial settings, action randomization can effectively deteriorate

More information

The Complexity of Decentralized Control of Markov Decision Processes

The Complexity of Decentralized Control of Markov Decision Processes University of Massachusetts Amherst From the SelectedWorks of Neil Immerman June, 2000 The Complexity of Decentralized Control of Markov Decision Processes Daniel S. Bernstein Shlolo Zilberstein Neil Immerman,

More information

Region-Based Dynamic Programming for Partially Observable Markov Decision Processes

Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Zhengzhu Feng Department of Computer Science University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Abstract

More information

A Polynomial Algorithm for Decentralized Markov Decision Processes with Temporal Constraints

A Polynomial Algorithm for Decentralized Markov Decision Processes with Temporal Constraints A Polynomial Algorithm for Decentralized Markov Decision Processes with Temporal Constraints Aurélie Beynier, Abdel-Illah Mouaddib GREC-CNRS, University of Caen Bd Marechal Juin, Campus II, BP 5186 14032

More information

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs Madison Clark-Turner Department of Computer Science University of New Hampshire Durham, NH 03824 mbc2004@cs.unh.edu Christopher Amato

More information

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca

More information

Efficient Planning for Factored Infinite-Horizon DEC-POMDPs

Efficient Planning for Factored Infinite-Horizon DEC-POMDPs Efficient Planning for Factored Infinite-Horizon DEC-POMDPs Joni Pajarinen Aalto University, Department of Information and Computer Science, P.O. Box 15, FI-76 Aalto, Finland Joni.Pajarinen@tkk.fi Abstract

More information

Expectation maximization for average reward decentralized POMDPs

Expectation maximization for average reward decentralized POMDPs Expectation maximization for average reward decentralized POMDPs Joni Pajarinen 1 and Jaakko Peltonen 2 1 Department of Automation and Systems Technology, Aalto University, Finland Joni.Pajarinen@aalto.fi

More information

Bayes-Adaptive POMDPs 1

Bayes-Adaptive POMDPs 1 Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Influence Diagrams with Memory States: Representation and Algorithms

Influence Diagrams with Memory States: Representation and Algorithms Influence Diagrams with Memory States: Representation and Algorithms Xiaojian Wu, Akshat Kumar, and Shlomo Zilberstein Computer Science Department University of Massachusetts Amherst, MA 01003 {xiaojian,akshat,shlomo}@cs.umass.edu

More information

IAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2

IAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2 Universiteit van Amsterdam IAS technical report IAS-UVA-14-01 Dec-POMDPs as Non-Observable MDPs Frans A. Oliehoek 1 and Christopher Amato 2 1 Intelligent Systems Laboratory Amsterdam, University of Amsterdam.

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Accelerated Vector Pruning for Optimal POMDP Solvers

Accelerated Vector Pruning for Optimal POMDP Solvers Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Solving Continuous-Time Transition-Independent DEC-MDP with Temporal Constraints

Solving Continuous-Time Transition-Independent DEC-MDP with Temporal Constraints Solving Continuous-Time Transition-Independent DEC-MDP with Temporal Constraints Zhengyu Yin, Kanna Rajan, and Milind Tambe University of Southern California, Los Angeles, CA 989, USA {zhengyuy, tambe}@usc.edu

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada

Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada May 11, 2010 Organizing Committee: Matthijs Spaan Institute for Systems and Robotics, Instituto Superior

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

2534 Lecture 4: Sequential Decisions and Markov Decision Processes

2534 Lecture 4: Sequential Decisions and Markov Decision Processes 2534 Lecture 4: Sequential Decisions and Markov Decision Processes Briefly: preference elicitation (last week s readings) Utility Elicitation as a Classification Problem. Chajewska, U., L. Getoor, J. Norman,Y.

More information

Iterative Planning for Deterministic QDec-POMDPs

Iterative Planning for Deterministic QDec-POMDPs EPiC Series in Computing Volume 55, 2018, Pages 15 28 GCAI-2018. 4th Global Conference on Artificial Intelligence Information and Software Systems Engineering, Ben Gurion University Abstract QDec-POMDPs

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Heuristic Search Value Iteration for POMDPs

Heuristic Search Value Iteration for POMDPs 520 SMITH & SIMMONS UAI 2004 Heuristic Search Value Iteration for POMDPs Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract We present a novel POMDP

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

Individual Planning in Infinite-Horizon Multiagent Settings: Inference, Structure and Scalability

Individual Planning in Infinite-Horizon Multiagent Settings: Inference, Structure and Scalability Individual Planning in Infinite-Horizon Multiagent Settings: Inference, Structure and Scalability Xia Qu Epic Systems Verona, WI 53593 quxiapisces@gmail.com Prashant Doshi THINC Lab, Dept. of Computer

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information

Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains

Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Byung Kon Kang, Kee-Eung Kim KAIST, 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea Abstract While Partially

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Introduction to Sequential Teams

Introduction to Sequential Teams Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011 Decentralized

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Probabilistic inference for computing optimal policies in MDPs

Probabilistic inference for computing optimal policies in MDPs Probabilistic inference for computing optimal policies in MDPs Marc Toussaint Amos Storkey School of Informatics, University of Edinburgh Edinburgh EH1 2QL, Scotland, UK mtoussai@inf.ed.ac.uk, amos@storkey.org

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

arxiv: v2 [cs.ai] 20 Dec 2014

arxiv: v2 [cs.ai] 20 Dec 2014 Scalable Planning and Learning for Multiagent POMDPs: Extended Version arxiv:1404.1140v2 [cs.ai] 20 Dec 2014 Christopher Amato CSAIL, MIT Cambridge, MA 02139 camato@csail.mit.edu Frans A. Oliehoek Informatics

More information

A Concise Introduction to Decentralized POMDPs

A Concise Introduction to Decentralized POMDPs Frans A. Oliehoek & Christopher Amato A Concise Introduction to Decentralized POMDPs Author version July 6, 2015 Springer This is the authors pre-print version that does not include corrections that were

More information

Planning and Learning for Decentralized MDPs with Event Driven Rewards

Planning and Learning for Decentralized MDPs with Event Driven Rewards The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Planning and Learning for Decentralized MDPs with Event Driven Rewards Tarun Gupta, 1 Akshat Kumar, 2 Praveen Paruchuri 1 1 Machine

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

QUICR-learning for Multi-Agent Coordination

QUICR-learning for Multi-Agent Coordination QUICR-learning for Multi-Agent Coordination Adrian K. Agogino UCSC, NASA Ames Research Center Mailstop 269-3 Moffett Field, CA 94035 adrian@email.arc.nasa.gov Kagan Tumer NASA Ames Research Center Mailstop

More information

Course on Automated Planning: Transformations

Course on Automated Planning: Transformations Course on Automated Planning: Transformations Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain H. Geffner, Course on Automated Planning, Rome, 7/2010 1 AI Planning: Status The good news:

More information

arxiv: v2 [cs.ai] 11 Feb 2016

arxiv: v2 [cs.ai] 11 Feb 2016 Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version) Joris Scharpff 1, Diederik M. Roijers 2, Frans A. Oliehoek 2,3, Matthijs T. J. Spaan 1, Mathijs M. de Weerdt

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic

More information

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,

More information

Security in Multiagent Systems by Policy Randomization

Security in Multiagent Systems by Policy Randomization Security in Multiagent Systems by Policy Randomization Praveen Paruchuri, Milind Tambe, Fernando Ordóñez, Sarit Kraus University of Southern California, Los Angeles, CA 989, {paruchur, tambe, fordon}@usc.edu

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

EXPLOITING STRUCTURE IN COORDINATING MULTIPLE DECISION MAKERS

EXPLOITING STRUCTURE IN COORDINATING MULTIPLE DECISION MAKERS EXPLOITING STRUCTURE IN COORDINATING MULTIPLE DECISION MAKERS A Dissertation Presented by HALA MOSTAFA Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Reinforcement Learning as Classification Leveraging Modern Classifiers

Reinforcement Learning as Classification Leveraging Modern Classifiers Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

POMDP solution methods

POMDP solution methods POMDP solution methods Darius Braziunas Department of Computer Science University of Toronto 2003 Abstract This is an overview of partially observable Markov decision processes (POMDPs). We describe POMDP

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

State Space Compression with Predictive Representations

State Space Compression with Predictive Representations State Space Compression with Predictive Representations Abdeslam Boularias Laval University Quebec GK 7P4, Canada Masoumeh Izadi McGill University Montreal H3A A3, Canada Brahim Chaib-draa Laval University

More information

Proactive Dynamic Distributed Constraint Optimization

Proactive Dynamic Distributed Constraint Optimization Proactive Dynamic Distributed Constraint Optimization Khoi D. Hoang, Ferdinando Fioretto, Ping Hou, Makoto Yokoo, William Yeoh, Roie Zivan Department of Computer Science, New Mexico State University, USA

More information

Data Structures for Efficient Inference and Optimization

Data Structures for Efficient Inference and Optimization Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

University of Alberta

University of Alberta University of Alberta NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY by Tao Wang A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Miao Liu Laboratory for Information and Decision Systems Cambridge, MA 02139 miaoliu@mit.edu Christopher

More information

Agent Interaction in Distributed POMDPs and its Implications on Complexity

Agent Interaction in Distributed POMDPs and its Implications on Complexity Agent Interaction in Distributed POMDPs and its Implications on Complexity Jiaying Shen Raphen Becker Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 {jyshen,

More information