Introduction to Sequential Teams

Size: px

Start display at page:

Download "Introduction to Sequential Teams"

Bathsheba Newton
5 years ago
Views:

1 Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011

2 Decentralized systems are ubiquitous

3 Real-time quantization X Source Y ˆX Encoder Receiver Objective Choose transmission and estimation policy to minimize expected total distortion (over a finite or infinite horizon)

4 Multiaccess broadcast Transmitter 1 Transmitter 2 Multiaccess channel Objective Choose transmission policy to maximize throughput (over a finite or infinite horizon)

5 Estimating with active sensing Environment Sensor Estimator Objective Choose transmission and estimation policy to minimize a weighted average of expected transmission cost and expected total distortion (over a finite or infinite horizon)

6 Systematic design of decentralized systems Salient Features Multi-stage decision problems Multiple decision makers (or agents) with decentralized information Structure of optimal policy Can an agent, or a group of agents Shed available data Compress available data without loss of optimality? Search for optimal policies Brute force search of an optimal policy has doubly exponential complexity with time-horizon. How can we search for an optimal policy efficiently?

7 Outline A taxonomy of decentralized systems Overview of centralized stochastic control Markov decision processes (MDP) Partially observable Markov decision processes (POMDP) Delayed state observation Design principle for sequential teams. Delayed state observation

8 Classification of decentralized systems Games Static Objective Stab'lty Coupling Decentralized Systems Teams Dynamic Optim'lty

9 Classification of decentralized systems Games Static Objective Stab'lty Coupling Decentralized Systems Teams Dynamic Optim'lty

10 Classification of decentralized systems Games Static Objective Dynamic Teams Dynamic Coupling Decentralized Systems Teams Optim'lty Stab'lty Sequential Classical info strc Nonsequential Nonclassical

11 Classification of decentralized systems Games Static Objective Dynamic Teams Dynamic Coupling Decentralized Systems Teams Optim'lty Stab'lty Sequential Classical info strc Nonsequential Nonclassical J J J J J J J J

12 Sequential dynamic teams We are interested in A A A A A with non-classical information structures J J J J

13 A bit of history...

14 A bit of history...

15 Overview of centralized stochastic control

16 Centralized stochastic control Single decision maker A A A with classical information structures J J J J

17 MDP: Structural properties Sys X Agent U

18 MDP: Structural properties Sys X Agent U Structure of optimal policy Choose current action based on current state X t

19 MDP: Structural properties R R R Sys X Agent c c c f f f U X U X U X U g g g Structure of optimal policy Choose current action based on current state X t

20 MDP: Structural properties R R R Sys X Agent c c c f f f U X U X U X U g g g Structure of optimal policy R R R Choose current action c c c f f f based on current state X t X U X U X U g g g

21 POMDP: Structural properties X Sys Y Obs Agent U

22 POMDP: Structural properties X Sys Y Obs Agent U Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent)

23 POMDP: Structural properties r r r X Sys Y Obs Agent c c c f f f U x u x u x u g g g y y y Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent)

24 POMDP: Structural properties r r r X Sys Y Obs Agent c c c f f f U x u x u x u g g g y y y Structure of optimal policies Choose current action based on current info state Pr(state of system all data at agent) r r r c c c f f f x u x u x u g g g y y y π π π

25 An example: Delayed observation Sys Delay d Agent

26 An example: Delayed observation Sys Delay d Agent Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : )

27 An example: Delayed observation Sys Delay d Agent Original form of control laws U = g (X :, U : ) Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : )

28 An example: Delayed observation Sys Delay d Agent Original form of control laws U = g (X :, U : ) Structure of optimal policies Choose control action based on: π = Pr(X X :, U : ) (X, U : ) Simplified form of control laws U = g (X, U : )

29 Structural policies in stochastic control Structure of optimal policies Shed irrelevant information Compress relevant information to a compact statistic Hopefully, the data at the agent is not increasing with time

30 Structural policies in stochastic control Structure of optimal policies Shed irrelevant information Compress relevant information to a compact statistic Hopefully, the data at the agent is not increasing with time Implication of the results Simplify the functional form of the decision rules Simplify search for optimal decision rules A prerequisite for deriving dynamic programming decomposition.

31 Extending ideas to decentralized control J A A A J J J A A A A J J J J A

32 Delayed observation of state One-sided delay d (X, X ) Agent 1 U Sys (X, X ) One-sided delay d (X, X ) Agent 2 U Original form of control laws U = g ([ X : U : ], [ X : U ]) :

33 Delayed observation of state One-sided delay d (X, X ) Agent 1 U Sys (X, X ) One-sided delay d (X, X ) Agent 2 U Original form of control laws U = g ([ X : U : ], [ X : U ]) : Is this structure correct? U = g ([ X : U : ], [ X U ])

34 Lets consider delay d = 2 At Agent 1 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )

35 Lets consider delay d = 2 At Agent 1 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : ) At Agent 2 U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )

36 Lets consider delay d = 2 At Agent 1 At time 4, agent 1 can't remove X because X gives some information about U At Agent 2. U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : ) U = g (X ) U = g (X :, U ) U = g (X :, U :, X, U ) U = g (X :, U :, X :, U : )

37 How does agent 1 figure out how agent 2 will interpret his (agent 1's) actions?

38 How does agent 1 figure out how agent 2 will interpret his (agent 1's) actions?

39 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge

40 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ).

41 Solution Approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ). A three step approach: 1. Consider a coordinated system 2. Show that the coordinated system is equivalent to the original system 3. Simplify the coordinated system

42 Original System C, L X, X C, L

43 Coordinated System L X, X C L

44 Coordinated System L X, X C L Observations: C = (X :, X :, U :, U : ) Control actions : Function sections γ, γ γ ( ) = g (, C ) Agents are dumb and simply follow the prescription U = γ (L ) = γ (X, X, U )

45 Coordinated System L X, X C L Observations: C = (X :, X :, U :, U : ) Control actions : Function sections γ, γ γ ( ) = g (, C ) Agents are dumb and simply follow the prescription U = γ (L ) = γ (X, X, U ) The two systems are equivalent

46 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U )

47 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data)

48 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : )

49 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : )

50 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : ) Structural result: (γ, γ ) = ψ (π )

51 Structure of the coordination policy (γ, γ ) = ψ (Z :, γ :, γ : ), where Z = (X, X, U, U ) Sufficient statistic π = Pr(state all past data) = Pr(X, X, L, L Z :, γ :, γ : ) = Pr(X, X, Z Z :, γ :, γ : ) Structural result: (γ, γ ) = ψ (π ) Or equivalently, U = g (π, L )

52 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, )

53 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, ) We can show that π = Pr(X, X, Z Z :, γ :, γ : ) (Z, ˆγ, ˆγ )

54 Further Simplification Recall U = γ (X, X, U ) Define ˆγ ( ) = γ (X, U, ) We can show that π = Pr(X, X, Z Z :, γ :, γ : ) (Z, ˆγ, ˆγ ) Equivalent structural result U = g ([ X : U : ], [ X U ], ˆγ, ˆγ )

55 Recap: Solution approach [Mahajan 2008, 2009; Nayyar Mahajan Teneketzis 2008, 2011] Adapt based on common knowledge Split observations into two parts: Common data: C = (X :, X :, U :, U : ) Local data: L = (X, X, U ). A three step approach: 1. Consider a coordinated system 2. Show that the coordinated system is equivalent to the original system 3. Simplify the coordinated system

56 Applications Delayed sharing info structure (Open problem for 40 years) [Nayyar Mahajan Teneketzis 2011] real-time communication, feedback communication, multi-user communication, decentralized sequential hypothesis testing, multiaccess broadcast, active sensing,...

57 Randomized decision rules Unknown model Future directions

58 Thank you

Structure of optimal decentralized control policies An axiomatic approach

Structure of optimal decentralized control policies An axiomatic approach Aditya Mahajan Yale Joint work with: Demos Teneketzis (UofM), Sekhar Tatikonda (Yale), Ashutosh Nayyar (UofM), Serdar Yüksel (Queens)