Decentralized Decision Making!

Size: px

Start display at page:

Download "Decentralized Decision Making!"

Fay O’Neal’
5 years ago
Views:

Massachusetts Amherst Joint work with Martin Allen, Christopher Amato, Daniel Bernstein, Alan Carlin, Claudia

1 Decentralized Decision Making! in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher Amato, Daniel Bernstein, Alan Carlin, Claudia Goldman, Eric Hansen, Akshat Kumar, Marek Petrik, Sven Seuken, Feng Wu, and Xiaojian Wu IJCAI 11 Workshop on Decision Making in Partially Observable, Uncertain Worlds Barcelona, Spain July 18, 2011

of stochasticity and partial observability?

2 Decentralized Decision Making! Challenge: How to achieve intelligent coordination of a group of decision makers in spite of stochasticity and partial observability?! Key objective: Develop effective decision-theoretic methods to address the uncertainty about the domain, the outcome of actions, and the knowledge, beliefs and intentions of the other agents.! 2!

3 Problem Characteristics! A group of decision makers or agents interact in a stochastic environment! Each episode involves a sequence of decisions over finite or infinite horizon! The change in the environment is determined stochastically by the current state and the set of actions taken by the agents! Each decision maker obtains different partial observations of the overall situation! Decision makers have the same objectives! 3!

4 Applications! Autonomous rovers for space exploration! Protocol design for multiaccess broadcast channels! Coordination of mobile robots! Decentralized detection and tracking! Decentralized detection of hazardous weather events! 4!

5 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 5!

6 Decentralized POMDP! 1 a 1 o 1 a 2 World r Reward 2 o 2 Generalization of POMDP involving multiple cooperating decision makers with different observation functions! 8!

7 DEC-POMDPs! A DEC-POMDP is defined by a tuple S, A 1, A 2, P, R 1, R 2, Ω 1, Ω 2, O, where! S is a finite set of domain states, with initial state s 0! A 1, A 2 are finite action sets! P(s, a 1, a 2, s' ) is a state transition function! R(s, a 1, a 2 ) is a reward function! Ω 1, Ω 2 are finite observation sets O(a 1, a 2, s', o 1, o 2 ) is an observation function! Straightforward generalization to n agents! 9!

8 Formal Models! 10!

9 Example: Mobile Robot Planning! States: grid cell pairs Actions:,,, Transitions: noisy Goal: meet quickly Observations: red lines 11!

10 Example: Cooperative Box-Pushing Goal: push as many boxes as possible to goal area; larger box has higher reward, but requires two agents to be moved. 12!

11 Solving DEC-POMDPs! Each agentʼs behavior is described by a local policy δ i! Policy can be represented as a mapping from! Local observation sequences to actions; or! Local memory states to actions Actions can be selected deterministically or stochastically! Goal is to maximize expected reward over a finite horizon or discounted infinite horizon! 13!

12 Work on Decentralized Decision Making and DEC-POMDPs! Team theory [Marschak 55, Tsitsiklis & Papadimitriou 82]! Incorporating dynamics [Witsenhausen 71]! Communication strategies [Varaiya & Walrand 78, Xuan et al. 01, Pynadath & Tambe 02]! Approximation algorithms [Peshkin et al. 00, Guestrin et al. 01, Nair et al. 03, Emery-Montemerlo et al. 04]! First Exact DP algorithm [Hansen et al. 04]! First policy iteration algorithm [Bernstein et al. 05]! Many recent exact and approximate DEC-POMDP algorithms! 14!

13 Some Fundamental Questions! Are DEC-POMDPs significantly harder to solve than POMDPs? Why?! What features of the problem domain affect the complexity and how?! Is optimal dynamic programming possible?! Can dynamic programming be made practical?! Is it beneficial to treat communication as a separate type of action?! How can we exploit the locality of agent interaction to develop more scalable algorithms?! 15!

14 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 16!

15 Previous Complexity Results! Finite Horizon! MDP! P-complete! ( if T < S )! POMDP! PSPACE- complete! ( if T < S )! Papadimitriou & Tsitsiklis 87! Papadimitriou & Tsitsiklis 87! Infinite-Horizon Discounted! MDP! P-complete! Papadimitriou & Tsitsiklis 87! POMDP! Undecidable! Madani et al. 99! 17!

16 How Hard are DEC-POMDPs?! Bernstein, Givan, Immerman & Zilberstein, UAI 2000, MOR 2002 The complexity of finite-horizon DEC-POMDPs has been hard to establish.! A static version of the problem, where a single set of decisions is made in response to a single set of observations, was shown to be NP-hard [Tsitsiklis and Athan, 1985]! We proved that two-agent finite-horizon DEC- POMDPs are NEXP-hard! But these are worst-case results!! Are real-world problems easier?! 18!

17 What Features of the Domain Affect the Complexity and How? Factored state spaces (structured domains)! Independent transitions (IT)! Independent observations (IO)! Structured reward function (SR)! Goal-oriented objectives (GO)! Degree of observability (partial, full, jointly full)! Degree and structure of interaction! Degree of information sharing and communication! 19!

18 Complexity of Sub-Classes Goldman & Zilberstein, JAIR 2004 NEXP-C Finite-Horizon DEC-MDP NP-C IO & IT NEXP-C Goal Oriented NP-C w/ Sharing Information NP-C Goal Oriented P-C G = 1 P-C G > 1 Certain Conditions 20!

19 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 21!

20 JESP: First DP Algorithm! Nair, Tambe, Yokoo, Pynadath & Marsella, IJCAI 2003! JESP: Joint Equilibriumbased Search for Policies! Complexity: exponential! Result: only locally optimal solutions! 22!

21 Is Exact DP Possible? The key to solving POMDPs is that they can be viewed as belief-state MDPs [Smallwood & Sondik 73]! Not as clear how to define a belief-state MDP for a DEC-POMDP! The first exact DP algorithm for finite-horizon DEC-POMDPs uses the notion of a generalized belief state! The algorithm also applies to competitive situations modeled as POSGs! 23!

22 Generalized Belief State!!A generalized belief state captures the uncertainty of one agent with respect to the state of the world as well as the policies of other agents.! 24!

23 Strategy Elimination! Any finite-horizon DEC-POMDP can be converted to a normal form game! But the number of strategies is doubly exponential in the horizon length!! R 111, R 11 2!! R 1n1, R 1n2!!!! R m11, R m12!! R mn1, R mn2! 25!

24 A Better Way to Do Elimination Hansen, Bernstein & Zilberstein, AAAI 2004 We can use dynamic programming to eliminate dominated strategies without first converting to normal form! Pruning a subtree eliminates the set of trees containing it! a 1 a a 2 3 o 1 o o 2 o 1 1 o 2 o 2 a 2 a 1 prune a 1 a 2 o 1 o 2 o 1 o 2 a 2 a 2 a 3 a 3 a 2 a 3 o 1 o 2 o 1 o 2 a 3 a 2 a 2 a 1 eliminate 26!

25 First Exact DP for DEC-POMDPs! Hansen, Bernstein & Zilberstein, AAAI 2004 Algorithm is complete & optimal! Complexity is double exponential! Theorem: DP performs iterated elimination of dominated strategies in the normal form of the POSG.! Corollary: DP can be used to find an optimal joint policy in a DEC-POMDP.! 35!

26 Alternative: Heuristic Search Szer, Charpillet & Zilberstein, UAI 2005! Perform forward bestfirst search in the space of joint policies! Take advantage of a known start state distribution! Take advantage of domain-independent heuristics for pruning! 37!

27 The MAA* Algorithm Szer, Charpillet & Zilberstein, UAI 2005! MAA* is complete and optimal! Main advantage: significant reduction in memory requirements over the dynamic programming approach! 38!

28 Scaling Up Heuristic Search Spaan, Oliehoek, and Amato, IJCAI 2011! Problem with MAA*: The number of children of a node is doubly exponential in the node's depth! Basic idea: avoid the full expansion of each node by incrementally generating the children only when a child might have a higher heuristic value! Introduce a more memory-efficient representation for heuristic functions! Yields a speed up over the state-of-the-art allowing for the optimal solution over longer horizons! 39!

29 Scaling Up Heuristic Search Spaan, Oliehoek, and Amato, IJCAI 2011! 40!

30 Memory-Bounded DP (MBDP) Seuken & Zilberstein, IJCAI 2007! Combining the two approaches:! The DP algorithm is a bottom-up approach! The search operates top-down! The DP step can only eliminate a policy tree if it is dominated for every belief state! But, only a small subset of the belief space is actually reachable! Furthermore, the combined approach allows the algorithm to focus on a small subset of joint policies that appear best! 41!

31 Memory-Bounded DP Cont. 42!

32 The MBDP Algorithm! 43!

33 Generating Good Belief States! MDP Heuristic -- Obtained by solving the corresponding fully-observable multiagent MDP! Infinite-Horizon Heuristic -- Obtained by solving the corresponding infinite-horizon DEC-POMDP! Random Policy Heuristic -- Could augment another heuristic by adding random exploration! Heuristic Portfolio -- Maintain a set of belief states generated by a set of different heuristics! Recursive MBDP! 44!

34 Performance of MBDP! 46!

35 MBDP Successors! Improved MBDP (IMBDP)!![Seuken and Zilberstein, UAI 2007]! MBDP with Observation Compression (MBDP-OC)!![Carlin and Zilberstein, AAMAS 2008]! Point Based Incremental Pruning (PBIP)!![Dibangoye, Mouaddib, and Chaib-draa, AAMAS 2009]! PBIP with Incremental Policy Generation (PBIP-IPG)!![Amato, Dibagoye, Zilberstein, AAAI 2009]! Constraint-Based Dynamic Programming (CBDP)!![Kumar and Zilberstein, AAMAS 2009]! Point-Based Backup for Decentralized POMDPs!![Kumar and Zilberstein, AAMAS 2010]! Point-Based Policy Generation (PBPG)!![Wu, Zilberstein, and Chen, AAMAS 2010]! 49!

36 Key Ideas Behind These Algorithms! Perform search in a reduced policy space! Exact algorithm perform only lossless pruning! Approximate algorithms rely on more aggressive pruning! MBDP represents an exponential size policy with linear space O(maxTrees T)! Resulting policy is an acyclic finite-state controller.! 55!

37 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 56!

38 Infinite-Horizon DEC-POMDPs Unclear how to define a compact belief-state without fixing the policies of other agents! Value iteration does not generalize to the infinitehorizon case! Can generalize policy iteration for POMDPs [Hansen 98, Poupart & Boutilier 04]! Basic idea: Representing local policies using (deterministic/stochastic) finite-state controllers and defining a set of controller transformations that guarantee improvement & convergence! 57!

39 Policies as Controllers Finite state controller represents each policy! Fixed memory! Randomness used to offset memory limitations! Action selection, ψ : Q i ΔA i! Transitions, η : Q i A i O i ΔQ i! Value of two-agent joint controller given by the Bellman equation:! V (q 1,q 2,s) = P(a 1 q 1 )P(a 2 q 2 )[ R(s,a 1,a 2 ) + a 1,a 2 γ P(s' s,a 1,a 2 ) O(o 1,o 2 s',a 1,a 2 ) P(q 1 ' q 1,a 1,o 1 )P(q 2 ' q 2,a 2,o 2 ) V (q 1 ',q 2 ',s') s' q 1 ',q 2 ' o 1,o 2 58!

40 Controller Example Stochastic controller for one agent! 2 nodes, 2 actions, 2 observations! Parameters! P(a i q i )! P(q i '! q i, o i ) a o 1 1 o 2 a o o 1 a o !

41 Finding Optimal Controllers How can we search the space of possible joint controllers?! How do we set the parameters of the controllers to maximize value?! Deterministic controllers can use traditional search methods such as BSF or B&B! Stochastic controllers continuous optimization problem! Key question: how to best use a limited amount of memory to optimize value?! 60!

42 Independent Joint Controllers! Local controller for agent i is defined by conditional distribution P(a i, q'! i q i, o i ) Independent joint controller is expressed by: Π i P(a i, q i '! q i, o i ) Can be represented as a dynamic Bayes net! s q 1 q 1 a 1 o 1 s a 2 o 2 q 2 q 2 61!

43 Correlated Joint Controllers! Bernstein, Hansen & Zilberstein, IJCAI 2005, JAIR 2009! A correlation device, [Q c,ψ], is a set of nodes and a stochastic state transition function! Joint controller:!! qc P(q c '! q c ) Π i P(a i, q i '! q i, o i, q'! c )! q 1 q 1 A shared source of randomness affecting decisions and memory state update! q c q c s a 1 o 1 s a 2 o 2 Random bits for the correlation device can be determined prior to execution time! q 2 q 2 62!

44 Exhaustive Backups! Add a node for every possible action and deterministic transition rule! a 1 o 2 o 2 a 1 a 1 o 1 a 1 o 2 o 2 a 1 a 1 o 1 o 1 o 1,o 2 o 1 o 1,o 2 a 1 o 1,o 2 o 1,o 2 o 1,o 2 a 1 a 2 o 1,o 2 a 2 a 1 o 1,o 2 o 1,o 2 o 1,o 2 a 1 a 2 o 1,o 2 a 2 o 1,o 2 o 1 o 1,o 2 o 1 a 2 o 1 o 2 o 2 a 2 a 2 a 2 o 1 o 2 o 2 a 2 a 2 Repeated backups converge to optimality, but lead to very large controllers! 63!

45 Value-Preserving Transformations! A value-preserving transformation changes the joint controller without sacrificing value! Formally, there must exist mappings! f i : Q i ΔR i for each agent i and f c : Q c ΔR c such that! V (s, q,q c )!for all s S, r q P( r q ) r c Q, and q c Q c P(r c q c )V (s, r,r c ) 64!

46 Bounded Policy Iteration Algorithm! Bernstein, Hansen & Zilberstein, IJCAI 2005, JAIR 2009! Repeat! 1) Evaluate the controller! 2) Perform an exhaustive backup! 3) Perform value-preserving transformations! Until controller is ε-optimal for all states! Theorem: For any ε, bounded policy iteration returns a joint controller that is ε-optimal for all initial states in a finite number of iterations.! 65!

47 Useful Transformations! Controller reductions! Shrink the controller without sacrificing value! Bounded dynamic programming updates! Increase value while keeping the size fixed! Both can be done using polynomial-size linear programs! Generalize ideas from POMDP literature, particularly the BPI algorithm [Poupart & Boutilier 03]! 66!

48 Controller Reduction! For some node q i, find a convex combination of nodes in Q i \ q i that dominates q i for all states and nodes of the other controllers; Merge q i into the convex combination by changing transition probabilities! Corresponding linear program:!!variables: ε,! P( q ˆ i )!Objective: Maximize ε!!constraints: s S, q i Q i, q c Q c! V (s,q i,q i,q c ) + ε q ˆ i P(ˆ q i )V (s, ˆ q i,q i,q c ) Theorem: A controller reduction is a value-preserving transformation.! 67!

49 Bounded DP Update! For some node q i, find better '! parameters assuming that the old parameters will be used from the second step onwards; New parameters must yield value at least as high for all states and nodes of the other controllers! Corresponding linear program:!!variables: ε, P(a i, q i q i, o i, q c )!!Objective: Maximize ε!!constraints: s S, q i Q i, q c Q c! V (s, q,q c ) + ε a P( a q,q c ) R(s,a) + γ s', o, q ',q c ' P( q ' q, a, o,q c )P(s', o s, a )P(q c ' q c )V (s', q ',q c ') Theorem: A bounded DP update is a value-preserving transformation.! 68!

50 Modifying the Correlation Device! Both transformations can be applied to the correlation device! Slightly different linear programs to solve! Can think of the correlation device as another agent! Lots of implementation questions! What to use for an initial joint controller?! Which transformations to perform?! Order for choosing nodes to remove or improve?! 69!

51 Decentralized BPI Summary! DEC-BPI finds better and much more compact solutions than exhaustive backups! A larger correlation device tends to lead to higher values on average! Larger local controllers tend to yield higher average values up to a point! But, bounded DP is limited by improving one controller at a time! Linear program (one-step lookahead) results in local optimality and tends to get stuck! 76!

52 Nonlinear Optimization Approach! Amato, Bernstein & Zilberstein, UAI 2007, JAAMAS 2010! Basic idea: Model the problem as a non-linear program (NLP)! Consider node values (as well as controller parameters) as variables! The NLP can take advantage of an initial state distribution when it is given! Improvement and evaluation all in one step (equivalent to an infinite lookahead)! Additional constraints maintain valid values! 77!

53 NLP Representation Variables:! x( q, a ) = P( a q ), y( q, a, o, q ') = P( q ' q, a,,! Objective: Maximize! b 0 (s) z( q 0,s) Value Constraints: s S, q Q z( q,s) = a x( q ', a ) R(s, a ) + γ s s' P(s' s, a ) O( o s', a ) y( q, a, o, q ') z( q ',s') Additional linear constraints:! ensure controllers are independent! all probabilities sum to 1 and are non-negative! o o ) z( q,s) = V ( q,s) q ' 78!

54 Independence Constraints Independence constraints guarantee that action selection and controller transition probabilities for each agent depend only on local information! Action selection independence:! Controller transition independence:! 79!

55 Probability Constraints Probability constraints guarantee that action selection probabilities and controller transition probabilities are non negative and that they add up to 1:!! (Superscript f ʼs represent arbitrary fixed values)! 80!

56 Optimality Theorem: An optimal solution of the NLP results in optimal stochastic controllers for the given size and initial state distribution.! Advantages of the NLP approach:! Efficient policy representation with fixed memory! NLP represents optimal policy for given size! Takes advantage of known start state! Easy to implement using off-the-shelf solvers! Limitations:! Difficult to solve optimally! 81!

57 Adding a Correlation Device NLP approach can be extended to include a correlation device, using the following formulation:! New variable w(c,c') represents the transition function of the correlation device; action selection and controller transitions depend on new shared signal.! 82!

58 Comparison of NLP & DEC-BPI! Amato, Bernstein & Zilberstein, UAI 2007, JAAMAS 2010! Used freely available nonlinear constrained optimization solver called filter on the NEOS server ( Solver guarantees locally optimal solution! Used 10 random initial controllers for a range of controller sizes! Compared NLP with DEC-BPI, with and without a small (2-node) correlation device! 83!

59 Results: Broadcast Channel! Amato, Bernstein & Zilberstein, UAI 2007! Simple two agents networking problem!!(2 agents, 4 states, 2 actions, 5 observations)! Average quality over 10 trials:! Average run time:! 84!

60 Results: Multi-Agent Tiger! Amato, Bernstein & Zilberstein, JAAMAS 2010! A two-agent version of a well-known POMDP benchmark [Nair et al. 03] (2 states, 3 actions, 2 observations)! Average quality of various controller sizes using NLP methods with and without 2-node correlation device and BFS! 85!

61 Results: Meeting in a Grid! Amato, Bernstein & Zilberstein, JAAMAS 2010! A two-agent domain with 16 states, 5 actions, 2 observations! Average quality of various controller sizes using NLP methods and DEC-BPI with and without 2-node correlation device and BFS! 86!

62 Results: Box Pushing! Amato, Bernstein & Zilberstein, JAAMAS 2010! Values and running times (in seconds) for each controller size using NLP methods and DEC-BPI with and without a 2 node correlation device and BFS. An x indicates that the approach was not able to solve the problem.! 88!

63 NLP Approach Summary The NLP defines the optimal fixed-size stochastic controller! Approach shows consistent improvement over DEC-BPI using an off-the-shelf locally optimal solver! A small correlation device can have significant benefits! Better performance may be obtained by exploiting the structure of the NLP! 90!

64 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 91!

65 Exploiting the Locality of Interaction In practical settings that involve many agents, each agent often interacts with a small number of neighboring agents (e.g., firefighting, sensor networks)! Algorithms designed exploit this property include LID-JESP [Nair et al. AAAI 05] and SPIDER [Varakantham et al. AAMAS 07] and FANS [Marecki et al. AAMAS 08]! FANS uses FSCs for policy representation and! Exploits FSCs for dynamic programming in policy evaluation and heuristic computations and provides significant speedups! Introduces novel heuristics to automatically vary the FSC size in different agents! Performs policy search that exploits the locality of agent interactions! 92!

66 Constraint-Based DP! Kumar & Zilberstein, AAMAS 2009 Model the domain as a Network Distributed POMDP (ND-POMDP) a restricted class of DEC-POMDPs characterized by a decomposable reward function.! CBDP uses a point-based dynamic programming (similar to MBDP).! CBDP uses constraint networks algorithms to improve the efficiency of key steps:! Computation of the heuristic function! Belief sampling using heuristic function! Finding the best joint policy for a particular belief! 93!

67 Results: Sensors Tracking Target! Kumar & Zilberstein, AAMAS 2009 CBDP provides orders!!of magnitude of speedup!!over FANS! W N S E loc1 W N S E loc2 N W S E Provides better solution quality for all test instances! Provides strong theoretical guarantees on the time and space complexity enhancing scalability! Linear complexity in planning horizon length! Linear in the number of agents, which is necessary to solve large realistic problems! Exponential only in a small parameter that depends on the level of interaction among the agents! 94!

68 Sample Results A 7-agent configuration with 4 actions per agent. Two adjacent agents are required to track a target! Graphs show the solution quality (left) and time (right) of our approach (CBDP) compared with the best existing method (FANS)! FANS is not scalable beyond horizon 7. CBDP has linear complexity in the horizon, and it provides better solution quality is less time! 700! 1000! Solution quality! 600! 500! 400! 300! 200! CBDP! FANS! Time (sec, logscale)! 100! 10! 1! CBDP! FANS! 100! 0! 0.1! 2! 3! 4! 5! 6! 7! 8! 10! 2! 3! 4! 5! 6! 7! 8! 10! Horizon! Horizon!

69 New Scalable Approach! Kumar, Zilberstein, and Toussaint, IJCAI 2011! Extend an approach [Toussaint and Storkey, ICML 06] that maps planning under uncertainty (POMDP) problems into probabilistic inference! Characterize general constraints on the interaction graph that facilitate scalable planning! Introduce an efficient algorithm to solve such models using probabilistic inference! Identify a number of existing models with such constraints! 96!

70 Value Factorization! θ = parameters of an agent! Factors state-space s = (s 1,..., s M )! Example: Consider four agents!!s.t. V = V 12 + V 23 + V 34! 97!

71 Existing Models Satisfy VF! Each agent/state variable can participate in multiple value factors! Worst case complexity is NEXP-C! TI-DEC-MDP, ND-POMDP, TD-POMDP satisfy value factorization! 98!

72 Computational Advantages! Applicability! In models that satisfy VF, inference in the EM framework can be done independently in each value factor! Smaller value factors efficient inference! Planning no longer exponential, linear in # of factors! Implementation! Distributed planning! Efficient implementation using message-passing! Parallel computation of messages! 99!

73 Planning by Inference! Recasts planning as likelihood maximization in a DBN mixture with binary reward variable r :!!P(r =1 s, a 1, a 2 ) R(s, a 1, a 2 )! DBN Mixture 100!

74 Exploiting the VF Property! Exploit additive nature of value function for scalability! Outer mixture simulates the VF property! Each V f (θ f, s f ) evaluated using time dependent mixture! Theorem: Maximizing the likelihood of observing the variable r = 1 optimizes the joint-policy! 101!

75 The Expectation-Maximization Algorithm! Observed data r = 1, every other variable hidden! Use the EM algorithm to maximize the likelihood! Implemented using message passing on the VF graph! Example: 3 factors {Ag 1, Ag 2 }, {Ag 2, Ag 3 } and {Ag 2, Ag 3 }! 102!

76 Properties of the EM Algorithm! Scalability! μ message requires independent inference in each factor! Agents/state vars. can be involved in multiple factors can model complex systems via simpler interactions! Distributed planning via message passing! Complexity! Linear in the number of factors, exponential in the number of agents/state variables in a factor! Generality! No additional assumptions (such as TOI) required a general optimization recipe for models with the VF property! Local optima?! 103!

77 Experiments! ND-POMDP domains involving target tracking in sensor networks with imperfect sensing! Multiple targets, limited sensors with battery! Penalty = -1 per sensor for miscoordination, recharging battery; positive reward (+80) per target scanned simultaneous by two adjacent sensors! 104!

78 Comparisons with NLP Approach (5P Domain)! 106!

79 Scalability on Larger Benchmarks! 15 agent and 20 agent domains, internal states = 5! 108!

80 Summary of the EM Approach! Value factorization (VF) facilitates scalability! Several existing weakly-coupled models satisfy VF! An EM algorithm can solve models with such property and yield good quality solutions! Scalability: E-step decomposes according to value factors; smaller factors lead to efficient inference! Can be easily implemented using message-passing among the agents! Future work: Explore techniques for even faster inference, and establish better error bounds.! 109!

81 Outline! Models for decentralized decision making Complexity results Solving finite-horizon DEC-POMDPs Solving infinite-horizon DEC-POMDPs Scalability beyond two agents Conclusion 110!

82 Back to Some Basic Questions! Are DEC-POMDPs significantly harder to solve than POMDPs? Why?! What features of the problem domain affect the complexity and how?! Is optimal dynamic programming possible?! Can dynamic programming be made practical?! Is it beneficial to treat communication as a separate type of action?! How can we exploit the locality of agent interaction to develop more scalable algorithms?! 111!

83 Questions?! Additional Information:! Resource-Bounded Reasoning Lab! University of Massachusetts, Amherst! 112!

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003