Sufficient Statistics in Decentralized Decision-Making Problems

Size: px

Start display at page:

Download "Sufficient Statistics in Decentralized Decision-Making Problems"

Walter Johnston
5 years ago
Views:

1 Sufficient Statistics in Decentralized Decision-Making Problems Ashutosh Nayyar University of Southern California Feb 8, 05

2 / 46 Decentralized Systems Transportation Networks Communication Networks Networked Control Systems Sensor Networks Energy Systems Social Networks Markets Organizations Supply chain systems

3 / 46 Decentralized Decision Problems I. Static/One-stage: One shot decisions. II. Dynamic/Multi-stage: Dynamic system; Decisions made over time.

4 / 46 Decentralized Decision Problems I. Static/One-stage: One shot decisions. II. Dynamic/Multi-stage: Dynamic system; Decisions made over time. A. Cooperative/Team Problem: Decision makers (DMs) have a common goal. B. Non-cooperative/Game Formulation: DMs have different goals.

5 / 46 Decentralized Decision Problems I. Static/One-stage: One shot decisions. II. Dynamic/Multi-stage: Dynamic system; Decisions made over time. A. Cooperative/Team Problem: Decision makers (DMs) have a common goal. B. Non-cooperative/Game Formulation: DMs have different goals. In this talk, we will focus on cooperative, multi-stage decentralized decision problems. Decentralized Stochastic Control Problem

6 Stochastic Control for Decentralized Systems 3 / 46

7 3 / 46 Stochastic Control for Decentralized Systems Key features of decentralized stochastic control problem: Uncertainty: In system evolution and DMs information Information asymmetry: Different decision-makers (DMs) with different information Signaling: Decisions of one DM affect information of other DMs Information growth: DMs accumulate information over time.

8 3 / 46 Stochastic Control for Decentralized Systems Key features of decentralized stochastic control problem: Uncertainty: In system evolution and DMs information Information asymmetry: Different decision-makers (DMs) with different information Signaling: Decisions of one DM affect information of other DMs Information growth: DMs accumulate information over time. Key Question: Sufficient Statistics Can the ever-growing information history available to DMs be aggregated without compromising performance? In other words, are there sufficient statistics for the controllers?

9 4 / 46 Overview Centralized results POMDP LQG Partial history sharing model of decentralized control 3 Person-by-person method 4 Common Information Method 5 Common information method for LQG

10 5 / 46 Centralized Stochastic Control: Model Partially Observable Markov Decision Problems (POMDP) System Dynamics: X t+ = f t (X t, U t, W t ), t = 0,,,..., T. () Observation Model: Y t = h t (X t, V t ), t = 0,,,..., T. () X = State, U = Action/Decision W, V = Noise

11 5 / 46 Centralized Stochastic Control: Model Partially Observable Markov Decision Problems (POMDP) System Dynamics: X t+ = f t (X t, U t, W t ), t = 0,,,..., T. () Observation Model: Y t = h t (X t, V t ), t = 0,,,..., T. () X = State, U = Action/Decision W, V = Noise X 0, {W t, V t, t 0}, mutually independent random variables with known distributions. Cost: T t=0 l t(x t, U t ).

12 5 / 46 Centralized Stochastic Control: Model Partially Observable Markov Decision Problems (POMDP) System Dynamics: X t+ = f t (X t, U t, W t ), t = 0,,,..., T. () Observation Model: Y t = h t (X t, V t ), t = 0,,,..., T. () X = State, U = Action/Decision W, V = Noise X 0, {W t, V t, t 0}, mutually independent random variables with known distributions. Cost: T t=0 l t(x t, U t ). One decision-maker (DM) with perfect recall DM s information at t I t := {Y 0:t, U 0:t }

13 6 / 46 Centralized Stochastic Control: Strategy Optimization Control action/decision U t U t = g t (I t ) = g t (Y 0:t, U 0:t ), t = 0,,,..., T g t decision rule/control law at t. g := (g 0, g,..., g T ) decision/control strategy

14 6 / 46 Centralized Stochastic Control: Strategy Optimization Control action/decision U t U t = g t (I t ) = g t (Y 0:t, U 0:t ), t = 0,,,..., T g t decision rule/control law at t. g := (g 0, g,..., g T ) decision/control strategy Expected cost of strategy g: ] J Cent. (g) := E g [ T t=0 l t (X t, U t ),

15 6 / 46 Centralized Stochastic Control: Strategy Optimization Control action/decision U t U t = g t (I t ) = g t (Y 0:t, U 0:t ), t = 0,,,..., T g t decision rule/control law at t. g := (g 0, g,..., g T ) decision/control strategy Expected cost of strategy g: ] J Cent. (g) := E g [ T t=0 l t (X t, U t ), Optimization: Find g that minimizes J Cent. (g)

16 7 / 46 Centralized Stochastic Control: Sufficient Statistics DM s posterior belief on state at time t: Π t (x t ) = P g (x t Y 0:t, U 0:t ), x t X

17 7 / 46 Centralized Stochastic Control: Sufficient Statistics DM s posterior belief on state at time t: Π t (x t ) = P g (x t Y 0:t, U 0:t ), x t X Strategy Independence of belief: Given the DM s information, the belief at time t does not depend on the strategy!

18 7 / 46 Centralized Stochastic Control: Sufficient Statistics DM s posterior belief on state at time t: Π t (x t ) = P g (x t Y 0:t, U 0:t ), x t X Strategy Independence of belief: Given the DM s information, the belief at time t does not depend on the strategy! Sufficient Statistic: The belief is a sufficient statistic. Optimal decision rules have the form U t = g t (Π t ).

19 7 / 46 Centralized Stochastic Control: Sufficient Statistics DM s posterior belief on state at time t: Π t (x t ) = P g (x t Y 0:t, U 0:t ), x t X Strategy Independence of belief: Given the DM s information, the belief at time t does not depend on the strategy! Sufficient Statistic: The belief is a sufficient statistic. Optimal decision rules have the form U t = g t (Π t ). LQG Result: When the dynamics and observation are linear, cost function quadratic and noises Gaussian: Optimal decision rules have the form U t = g t (Z t ) Z t = E g (X t Y 0:t, U 0:t ).

20 8 / 46 Decentralized Stochastic Control: Model Partial History Sharing Model N DMs, DM, DM,..., DM N.

21 8 / 46 Decentralized Stochastic Control: Model Partial History Sharing Model N DMs, DM, DM,..., DM N. System Dynamics: X t+ = f t (X t, U t, U t,..., U N t, W t ), t = 0,,,..., T. U i t DM i s action at t, i =,,..., N. Cost: T t= l t(x t, U t,..., U N t ).

22 Information Structure Data Available at DM i 9 / 46

23 9 / 46 Information Structure Data Available at DM i Local observation Y i t = h i t(x t, V i t )

24 9 / 46 Information Structure Data Available at DM i Local observation Y i t = h i t(x t, V i t ) Local memory: Subset of past local observations and actions M i t {Y i 0:t, Ui 0:t }

25 9 / 46 Information Structure Data Available at DM i Local observation Y i t = h i t(x t, V i t ) Local memory: Subset of past local observations and actions M i t {Y i 0:t, Ui 0:t } Shared memory: Subset of past observations and actions of all controllers C t {Y 0:t,..., YN 0:t, U 0:t,..., UN 0:t }

26 0 / 46 Model Optimization of Strategies DM i s control action at time t is a function of local observation Y i t local memory M i t shared memory C t, U i t = g i t(y i t, M i t, C t ) g i t DM i s decision rule/control law at time t

27 0 / 46 Model Optimization of Strategies DM i s control action at time t is a function of local observation Y i t local memory M i t shared memory C t, U i t = g i t(y i t, M i t, C t ) g i t DM i s decision rule/control law at time t g i := (g i 0, gi,..., gi T ) DMi s decision/control strategy g := (g,..., g N ) strategy profile of the system

28 0 / 46 Model Optimization of Strategies DM i s control action at time t is a function of local observation Y i t local memory M i t shared memory C t, U i t = g i t(y i t, M i t, C t ) g i t DM i s decision rule/control law at time t g i := (g i 0, gi,..., gi T ) DMi s decision/control strategy g := (g,..., g N ) strategy profile of the system Expected cost of strategy profile g J g T := Eg [ T t=0 l t (X t, U t, U t,..., U N t ) ]

29 Model Memory Update Assumptions / 46

30 / 46 Model Memory Update Assumptions Assumption : Shared memory is non-decreasing (analogous to perfect recall assumption in centralized case ) C t+ = {C t, Z t, Z t,..., Z N t } Z i t is DM i s contribution to shared memory at time t.

31 / 46 Model Memory Update Assumptions Assumption : Shared memory is non-decreasing (analogous to perfect recall assumption in centralized case ) C t+ = {C t, Z t, Z t,..., Z N t } Z i t is DM i s contribution to shared memory at time t. Assumption : The increment in shared memory from t to t + and the local memory at t + are fixed functions of current local memories, observations and control actions: F i t, G i t are fixed. Z i t = F i t(m i t, Y i t, U i t) M i t+ = Gi t(m i t, Y i t, U i t)

32 Example: Delayed Sharing Information Structure Communication links between controllers have delay d Z i t = Y i t d, Ui t d. System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t C t = {Y 0:t d, U 0:t d, Y 0:t d, U 0:t d } M i t = {Y i t d+:t, Ui t d+:t } / 46

33 3 / 46 Model-Memory Update Shared Memory C t Z t, Z t C t+ DM M t Y t U t Z t M t+ DM M t Y t U t Z t M t+ t t + Time ordering of observations, actions and memory updates

34 4 / 46 Special Instances of the Model Decentralized Control Delayed sharing information structure (Witsenhausen 97) Delayed state sharing information structure (Aicardi et al 987) Periodic sharing information structure (Ooi et al 997) Control sharing information structure (Mahajan 03) Broadcast information structure (Wu-Lall, 00) Communications Real-time encoding (Wits. 978, Tatikonda-Mitter 004) Real-time encoding-decoding with noiseless feedback (Wal.-Var. 983) Paging and registration in cellular systems (Hajek et al 008) Sensor Networks Sequential problems in decentralized detection (DT-Varaiya 984, Tsitsiklis 986, 993, DT-Ho 987, Veeravalli et al. 993) Communication and estimation in remote sensing (Imer-Basar 00, Lipsa-Martins 0, Nayyar et al. 03)

35 Sufficient Statistics for Decentralized Problem 5 / 46

36 5 / 46 Sufficient Statistics for Decentralized Problem First approach: Person-by-person optimization A commonly used method for decentralized problems. Main Idea. Fix strategies of all DMs except DM to arbitrary choices.. Focus on the centralized stochastic control problem of DM.

37 5 / 46 Sufficient Statistics for Decentralized Problem First approach: Person-by-person optimization A commonly used method for decentralized problems. Main Idea. Fix strategies of all DMs except DM to arbitrary choices.. Focus on the centralized stochastic control problem of DM. Lemma Suppose that DM s centralized problem has a sufficient statistic: S t = function(y t, M t, C t ), that is, there is an optimal strategy of DM of the form U t = g t (S t ) t =,..., T.

38 5 / 46 Sufficient Statistics for Decentralized Problem First approach: Person-by-person optimization A commonly used method for decentralized problems. Main Idea. Fix strategies of all DMs except DM to arbitrary choices.. Focus on the centralized stochastic control problem of DM. Lemma Suppose that DM s centralized problem has a sufficient statistic: S t = function(y t, M t, C t ), that is, there is an optimal strategy of DM of the form U t = g t (S t ) t =,..., T. We get the same statistic irrespective of the choice of other DMs strategies.

39 5 / 46 Sufficient Statistics for Decentralized Problem First approach: Person-by-person optimization A commonly used method for decentralized problems. Main Idea. Fix strategies of all DMs except DM to arbitrary choices.. Focus on the centralized stochastic control problem of DM. Lemma Suppose that DM s centralized problem has a sufficient statistic: S t = function(y t, M t, C t ), that is, there is an optimal strategy of DM of the form U t = g t (S t ) t =,..., T. We get the same statistic irrespective of the choice of other DMs strategies. Then, S t is a sufficient statistic for DM in the original decentralized problem.

40 6 / 46 Sufficient Statistics for Decentralized Problem Person-by-person optimization Useful method to remove redundant information in many cases. Real-time communication problems (Mahajan and Teneketzis, 00; Kaspi and Merhav, 00) Decentralized detection (Tenney and Sandell, 98; Veeravalli, Basar and Poor, 993) Decentralized LQG control (Lessard and Nayyar, 03, Wu and Lall, 00).

41 6 / 46 Sufficient Statistics for Decentralized Problem Person-by-person optimization Useful method to remove redundant information in many cases. Real-time communication problems (Mahajan and Teneketzis, 00; Kaspi and Merhav, 00) Decentralized detection (Tenney and Sandell, 98; Veeravalli, Basar and Poor, 993) Decentralized LQG control (Lessard and Nayyar, 03, Wu and Lall, 00). Repeated application can lead to person-by-person optimal solutions (Nash eq.) in some cases (e.g. static LQG teams).

42 6 / 46 Sufficient Statistics for Decentralized Problem Person-by-person optimization Useful method to remove redundant information in many cases. Real-time communication problems (Mahajan and Teneketzis, 00; Kaspi and Merhav, 00) Decentralized detection (Tenney and Sandell, 98; Veeravalli, Basar and Poor, 993) Decentralized LQG control (Lessard and Nayyar, 03, Wu and Lall, 00). Repeated application can lead to person-by-person optimal solutions (Nash eq.) in some cases (e.g. static LQG teams). Limitation: Does not always yield a useful statistic. The only S t satisfying the lemma is the entire information itself! Example: Decentralized control with communication delay between controllers.

43 7 / 46 In search of a new approach... What are the main roadblocks in decentralized problems?

44 7 / 46 In search of a new approach... What are the main roadblocks in decentralized problems? If all DMs have identical information (Centralized problem) DMs can form identical posterior beliefs on the state. Given the strategies (g,..., g N ) each DM can exactly predict other DM s actions at the current time.

45 7 / 46 In search of a new approach... What are the main roadblocks in decentralized problems? If all DMs have identical information (Centralized problem) DMs can form identical posterior beliefs on the state. Given the strategies (g,..., g N ) each DM can exactly predict other DM s actions at the current time. DMs with different information (Our model Partial history sharing)

46 7 / 46 In search of a new approach... What are the main roadblocks in decentralized problems? If all DMs have identical information (Centralized problem) DMs can form identical posterior beliefs on the state. Given the strategies (g,..., g N ) each DM can exactly predict other DM s actions at the current time. DMs with different information (Our model Partial history sharing) DMs have non-identical beliefs on the state.

47 7 / 46 In search of a new approach... What are the main roadblocks in decentralized problems? If all DMs have identical information (Centralized problem) DMs can form identical posterior beliefs on the state. Given the strategies (g,..., g N ) each DM can exactly predict other DM s actions at the current time. DMs with different information (Our model Partial history sharing) DMs have non-identical beliefs on the state. Even with fixed strategies (g,..., g N ) a DM can not exactly predict the other DM s action.

48 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology

49 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology. Common information

50 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology. Common information Sharing data among DMs creates common information {C t }

51 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology. Common information Sharing data among DMs creates common information {C t } Beliefs based on common information are consistent among DMs

52 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology. Common information Sharing data among DMs creates common information {C t } Beliefs based on common information are consistent among DMs. Partial decision rules

53 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology. Common information Sharing data among DMs creates common information {C t } Beliefs based on common information are consistent among DMs. Partial decision rules Given fixed strategies (g, g ) DM can not know U t

54 8 / 46 In search of a new approach... Key Ideas for Our Solution Methodology. Common information Sharing data among DMs creates common information {C t } Beliefs based on common information are consistent among DMs. Partial decision rules Given fixed strategies (g, g ) DM can not know U t But for a given realization c t of common information DM knows exactly the mapping from Y t, M t to U t U t = g t (Y t, M t, c t ) U t = g t (,, c t ) = γ t (, ) γ t partial decision rule for the given realization c t of common information

55 Common Information Methodology 9 / 46

56 9 / 46 Common Information Methodology Step Introduce a problem with a fictitious coordinator"

57 9 / 46 Common Information Methodology Step Introduce a problem with a fictitious coordinator" Coordinator s beliefs are based on common information Coordinator selects partial decision rules (prescriptions)

58 9 / 46 Common Information Methodology Step Introduce a problem with a fictitious coordinator" Coordinator s beliefs are based on common information Coordinator selects partial decision rules (prescriptions) Step Establish equivalence between the original problem and the problem with the coordinator

59 9 / 46 Common Information Methodology Step Introduce a problem with a fictitious coordinator" Coordinator s beliefs are based on common information Coordinator selects partial decision rules (prescriptions) Step Establish equivalence between the original problem and the problem with the coordinator Step 3 Establish sufficient statistics for the problem with the coordinator

60 9 / 46 Common Information Methodology Step Introduce a problem with a fictitious coordinator" Coordinator s beliefs are based on common information Coordinator selects partial decision rules (prescriptions) Step Establish equivalence between the original problem and the problem with the coordinator Step 3 Establish sufficient statistics for the problem with the coordinator Step 4 Use the equivalence (step ) to find sufficient statistics for the original problem

61 0 / 46 Common Information Methodology-Step Step - The Fictitious Coordinator System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t DM M t Z t Z t Coordinator C t C t+ = {C t, Z t, Z t } Fictitious coordinator has perfect recall

62 / 46 Common Information Methodology-Step Step - The Fictitious Coordinator System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t DM M t Γ Γ Z t t t Z t Coordinator C t Coordinator selects prescriptions" Γ t, Γ t Γ i t = ψ i t (C t ), i =, ψ t = (ψ t, ψ t ) coordination strategy at t

63 / 46 Common Information Methodology-Step Step - The Fictitious Coordinator

64 / 46 Common Information Methodology-Step Step - The Fictitious Coordinator Prescriptions Γ t, Γ t instruct DM and DM how to use their private information

65 / 46 Common Information Methodology-Step Step - The Fictitious Coordinator Prescriptions Γ t, Γ t instruct DM and DM how to use their private information If the private information at controller i is y i t, m i t, it takes the action Γ i t(y i t, m i t). That is, Ut i = Γ i ( t Y i t, Mt i ),

66 3 / 46 Common Information Methodology-Step Step - The coordinator s problem P CD Choose ψ := (ψ, ψ,..., ψ T ), ψ t = ( ) ψt, ψt to minimize { T J ψ CD,T := Eψ l t (X t, Ut, Ut t=0 ) } where Γ i t = ψ i t (C t ) U i t = Γ i t(y i t, M i t)

67 4 / 46 Common Information Methodology-Step Step - Equivalence of Problems P D and P CD Problem P D System Problem P CD System Y t U t U t Y t Y t U t U t Y t DM M t, C t DM M t, C t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Z t Γ t Γ t Z t Z t g = g, g T g J T = E g l t (X t, U t, U t ) t=0 ψ J CD,T Coordinator C t ψ = (ψ, ψ ) T = E ψ l t (X t, U t, U t ) t=0

68 5 / 46 Common Information Methodology-Step Lemma Consider any choice of the coordinator s policy ψ = (ψ, ψ,..., ψ T ), ψ t = ( ψt, ψt ). Define g = (g, g ) by g i t(, C t ) = ψt(c i t ), i =,, t = 0,,..., T Then J g T = Jψ CD,T

69 6 / 46 Common Information Methodology-Step Lemma 3 Consider any choice of control strategy g = (g, g ). Define ψ = (ψ, ψ ) as follows. Then ψ i t(c t ) = g i t(, C t ), i =,, t = 0,,..., T J ψ CD,T = Jg T

70 7 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t

71 7 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t Think of the original system and the DMs together as the coordinator s environment"

72 7 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t Think of the original system and the DMs together as the coordinator s environment" Coordinator s observations at t : Z t, Z t

73 7 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t Think of the original system and the DMs together as the coordinator s environment" Coordinator s observations at t : Z t, Z t Coordinator s decisions/actions at t : Γ t, Γ t

74 8 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t

75 8 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t What is the state" that describes the coordinator s environment?

76 8 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD Coordinator s Environment System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM, M t U t = Γ t (M t, Y t ) DM, M t U t = Γ t (M t, Y t ) Z t Γ t Γ t Z t Coordinator C t What is the state" that describes the coordinator s environment? State from the coordinator s perspective ( ) S t = X t, Mt, Mt, Yt, Yt

77 9 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD

78 9 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD There exist functions ˆf t, ĥ t, ĥ t, such that State Dynamics: S t+ = ˆf t (S t, Γ t, Γ t, Noise Variables) Observation Eq: Z i t = ĥ i t(s t, Γ i t), i =,

79 9 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD There exist functions ˆf t, ĥ t, ĥ t, such that State Dynamics: S t+ = ˆf t (S t, Γ t, Γ t, Noise Variables) Observation Eq: Z i t = ĥ i t(s t, Γ i t), i =, State satsfies controlled Markov property ( ) P S t+, Zt, Zt S 0:t, Z0:t, Z 0:t, Γ t, Γ t ( ) = P S t+, Zt, Zt S t, Γ t, Γ t

80 9 / 46 Common Information Methodology-Step 3 Step 3 - The Coordinator s Problem P CD There exist functions ˆf t, ĥ t, ĥ t, such that State Dynamics: S t+ = ˆf t (S t, Γ t, Γ t, Noise Variables) Observation Eq: Z i t = ĥ i t(s t, Γ i t), i =, State satsfies controlled Markov property ( ) P S t+, Zt, Zt S 0:t, Z0:t, Z 0:t, Γ t, Γ t ( ) = P S t+, Zt, Zt S t, Γ t, Γ t There exist functions ˆl t, such that for all t =,,..., T l t (X t, U t, U t ) = ˆl t (S t, Γ t, Γ t )

81 30 / 46 Common Information Methodology-Step 3 The Coordinator s Problem P CD can be rewritten as subject to min ψ Jψ CD,T := Eψ { T t=0 ˆl t (S t, Γ t, Γ t ) } S t+ = ˆf t (s t, Γ t, Γ t, Noise Variables) Z i t = ĥ i t(s t, Γ i t), i =, Γ i t = ψt(c i t, Γ 0:t, Γ 0:t ), i =,

82 30 / 46 Common Information Methodology-Step 3 The Coordinator s Problem P CD can be rewritten as subject to min ψ Jψ CD,T := Eψ { T t=0 ˆl t (S t, Γ t, Γ t ) } S t+ = ˆf t (s t, Γ t, Γ t, Noise Variables) Z i t = ĥ i t(s t, Γ i t), i =, Γ i t = ψt(c i t, Γ 0:t, Γ 0:t ), i =, POMDP (Centralized stochastic control)!

83 3 / 46 Common Information Methodology-Step 3 Centralized Problem A sufficient Statistic Π t = P(X t Y 0:t, U 0:t ) Optimal Control Strategy of the form U t = g t (Π t ) Sequential decomposition: dynamic program in terms of Π t Coordinator s Problem A sufficient Statistic for the coordinator Π CD t = P(S t C t, Γ 0:t, Γ 0:t ) Optimal Control Strategy of the form Γ t = ψt (Π CD t ), Γ t = ψt (Π CD t ) Sequential decomposition: dynamic program in terms of Π CD t

84 3 / 46 Common Information Methodology-Step 3 Step 3 - A Dynamic Program for the Coordinator

85 3 / 46 Common Information Methodology-Step 3 Step 3 - A Dynamic Program for the Coordinator The dynamic program V T (π) = inf E γt,γ T V t (π) = inf E γt,γt for t =,,..., T ( ( ) ( )) } {l T X T, γt YT, MT, γt YT, MT Π CD T = π ( ( ) ( )) {l t X t, γt Yt, Mt, γt Yt, Mt ( )) } +V t+ ˆη t (π, γt, γt, Zt, Zt Π CD t = π

86 33 / 46 Common Information Methodology-Step 4 Step 4 - From coordinator to original problem

87 33 / 46 Common Information Methodology-Step 4 Step 4 - From coordinator to original problem Optimal policy for coordinator s problem P CD is of the form ( ) Γ t = ψt Π CD t ( ) Γ t = ψt Π CD t ( ) Π CD t = P X t, Mt, Mt, Yt, Yt C t, Γ 0:t, Γ 0:t

88 33 / 46 Common Information Methodology-Step 4 Step 4 - From coordinator to original problem Optimal policy for coordinator s problem P CD is of the form ( ) Γ t = ψt Π CD t ( ) Γ t = ψt Π CD t ( ) Π CD t = P X t, Mt, Mt, Yt, Yt C t, Γ 0:t, Γ 0:t Theorem (Nayyar, Mahajan, Teneketzis) For decentralized stochastic control problem with partial history sharing information structure, there exist optimal policies of the DMs of the form ) Ut = g t (Y t, Mt, Π t ) Ut = g t (Y t, Mt, Π t ) Π t = P (X t, Mt, Mt, Yt, Yt C t for t =,,..., T

89 Solution to Witsenhausen s Conjecture (97) 34 / 46

90 34 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97 ) { } C t = Y0:t n, Y 0:t n, U 0:t n, U 0:t n { } Mt = Yt n+:t, U t n+:t { } Mt = Yt n+:t, U t n+:t

91 34 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97 ) { } C t = Y0:t n, Y 0:t n, U 0:t n, U 0:t n { } Mt = Yt n+:t, U t n+:t { } Mt = Yt n+:t, U t n+:t { } { } P t := Yt, Mt = Yt n+:t, U t n+:t { } { } P t := Yt, Mt = Yt n+:t, U t n+:t

92 34 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97 ) { } C t = Y0:t n, Y 0:t n, U 0:t n, U 0:t n { } Mt = Yt n+:t, U t n+:t { } Mt = Yt n+:t, U t n+:t { } { } P t := Yt, Mt = Yt n+:t, U t n+:t { } { } P t := Yt, Mt = Yt n+:t, U t n+:t Theorem 4 (Nayyar, Mahajan, Teneketzis, 0) For the n-step delayed sharing problem there exist optimal policies of the DMs of the form ( )) Ut = g t P t, P (X t, P t, P t C t ( )) Ut = g t P t, P (X t, P t, P t C t

93 35 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97): Comparison with Witsenhausen s conjecture

94 35 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97): Comparison with Witsenhausen s conjecture Witsenhausen (97) Ut i = g i ( t P i t, P (X t n+ C t ) ), i =,

95 35 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97): Comparison with Witsenhausen s conjecture Witsenhausen (97) Ut i = g i ( t P i t, P (X t n+ C t ) ), i =, Only true for one-step delay n = (Varaiya-Walrand, 978).

96 35 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97): Comparison with Witsenhausen s conjecture Witsenhausen (97) Ut i = g i ( t P i t, P (X t n+ C t ) ), i =, Only true for one-step delay n = (Varaiya-Walrand, 978). Nayyar, Mahajan, Teneketzis (0) ( )) Ut i = g i t P i t, P (X t, P t, P t C t, i =,

97 35 / 46 Solution to Witsenhausen s Conjecture (97) n-step Delayed Sharing Information Structure (Witsenhausen 97): Comparison with Witsenhausen s conjecture Witsenhausen (97) Ut i = g i ( t P i t, P (X t n+ C t ) ), i =, Only true for one-step delay n = (Varaiya-Walrand, 978). Nayyar, Mahajan, Teneketzis (0) ( )) Ut i = g i t P i t, P (X t, P t, P t C t, i =, The two results are equivalent only when n =.

98 36 / 46 Summary so far Main idea

99 36 / 46 Summary so far Main idea DMs can use common information" to coordinate how they use their private information".

100 36 / 46 Summary so far Main idea DMs can use common information" to coordinate how they use their private information". Beliefs based on common information are consistent among all DMs.

101 36 / 46 Summary so far Main idea DMs can use common information" to coordinate how they use their private information". Beliefs based on common information are consistent among all DMs. Instead of selecting decisions, the (fictitious) coordinator selects partial decision rules.

102 36 / 46 Summary so far Main idea DMs can use common information" to coordinate how they use their private information". Beliefs based on common information are consistent among all DMs. Instead of selecting decisions, the (fictitious) coordinator selects partial decision rules. Sufficient statistics Common information based beliefs on system state and local information serve as a key components of sufficient statistics. The sufficient statistics found by common information approach can not be identified by the person-by-person approach

103 Common Information Methodology: Special Cases 37 / 46

104 37 / 46 Common Information Methodology: Special Cases When C t = { } Y0:t, Y 0:t, U 0:t, U 0:t M t = M t = the coordinator s methodology is the same as classical dynamic programming

105 37 / 46 Common Information Methodology: Special Cases When C t = { } Y0:t, Y 0:t, U 0:t, U 0:t M t = M t = the coordinator s methodology is the same as classical dynamic programming When C t = { } Mt Y0:t, U 0:t { } Mt Y0:t, U 0:t the coordinator s methodology is the same as the designer s approach" (Witsenhausen 973, Mahajan-DT 009)

106 37 / 46 Common Information Methodology: Special Cases When C t = { } Y0:t, Y 0:t, U 0:t, U 0:t M t = M t = the coordinator s methodology is the same as classical dynamic programming When C t = { } Mt Y0:t, U 0:t { } Mt Y0:t, U 0:t the coordinator s methodology is the same as the designer s approach" (Witsenhausen 973, Mahajan-DT 009) Designer s methodology: sequential optimization of decision rules as an open loop control problem

107 38 / 46 Decentralized LQG Problems Linear partial history sharing model System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t Dynamics and observation equations are linear, cost quadratic and noise Gaussian. Z i t and M i t+ are linear functions of Mi t, Y i t, U i t.

108 38 / 46 Decentralized LQG Problems Linear partial history sharing model System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t Dynamics and observation equations are linear, cost quadratic and noise Gaussian. Z i t and M i t+ are linear functions of Mi t, Y i t, U i t. In general decentralized control problem, Linear control strategies may not be optimal. Example: Witsenhausen, 968.

109 38 / 46 Decentralized LQG Problems Linear partial history sharing model System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t Dynamics and observation equations are linear, cost quadratic and noise Gaussian. Z i t and M i t+ are linear functions of Mi t, Y i t, U i t. In general decentralized control problem, Linear control strategies may not be optimal. Example: Witsenhausen, 968. Even if we restrict to linear strategies, we may not have finite dimensional sufficient statistics. Example: Whittle and Rudge, 974.

110 39 / 46 Decentralized LQG Problems contd. Linear partial history sharing model System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t Restrict to Linear strategies: Control action is superposition of 3 components: U i t = G i ty i t + H i tm i t + K i tc t

111 39 / 46 Decentralized LQG Problems contd. Linear partial history sharing model System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t Restrict to Linear strategies: Control action is superposition of 3 components: U i t = G i ty i t + H i tm i t + K i tc t Focus on the case when M i t is finite dimensional perhaps after employing person-by-person optimality arguments.

112 39 / 46 Decentralized LQG Problems contd. Linear partial history sharing model System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t, C t DM M t, C t Z t Z t Restrict to Linear strategies: Control action is superposition of 3 components: U i t = G i ty i t + H i tm i t + K i tc t Focus on the case when M i t is finite dimensional perhaps after employing person-by-person optimality arguments. C t is growing in time C t C t+ A finite dimensional sufficient statistic must compress common information.

113 40 / 46 Modified Common Information Approach Control action is a superposition: U i t = G i ty i t + H i tm i t + K i tc t.

114 40 / 46 Modified Common Information Approach Control action is a superposition: U i t = G i ty i t + H i tm i t + K i tc t. Fix the part of control strategies that use local information G i t, H i t are fixed. Optimize the part of control action that depends on the common information: Γ i t = K i tc t.

115 40 / 46 Modified Common Information Approach Control action is a superposition: U i t = G i ty i t + H i tm i t + K i tc t. Fix the part of control strategies that use local information G i t, H i t are fixed. Optimize the part of control action that depends on the common information: Γ i t = K i tc t. We introduce a coordinator that selects the optimal Γ i t based on the common information C t.

116 4 / 46 Modified Common Information Approach contd. System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t DM M t Γ Γ Z t t t Z t Coordinator C t Coordinator s problem is a centralized LQG problem with S t = (X t, M t, M t, Y t, Y t ) as the new state.

117 4 / 46 Modified Common Information Approach contd. System X t+ = f t (X t, U t, U t, W t 0 ) Y t U t U t Y t DM M t DM M t Γ Γ Z t t t Z t Coordinator C t Coordinator s problem is a centralized LQG problem with S t = (X t, M t, M t, Y t, Y t ) as the new state. Centralized Sufficient Statistic: E[S t C t, Γ, :t ] is coordinator s sufficient statistic.

118 4 / 46 Decentralized LQG Problem Linear partial history sharing model Theorem There exists an optimal linear strategy of the form Ut i = G i tyt i + HtM i t i + KtŜt, i where Ŝ t is the common information based estimate of (X t, Mt, Mt, Yt, Yt ). For given G i, H i matrices, Ŝ t has a linear recursive update equation similar to a Kalman estimator.

119 43 / 46 Application to Graph Problems A graph describes interconnections as well as communication links among subsystems Figure: Directed acyclic graph (DAG): An edge i j means that subsystem i affects subsystem j through its dynamics and controller i shares its information with controller j. Each node represents a linear sub-system with a local controller. LQG problem setup.

120 44 / 46 Application to Graph Problems contd. Iterated Application of Common Information Approach Step : First find sufficient statistics for leaf nodes (nodes 4 and 5) using a person-by-person approach.

121 44 / 46 Application to Graph Problems contd. Iterated Application of Common Information Approach Step : First find sufficient statistics for leaf nodes (nodes 4 and 5) using a person-by-person approach. Step : Use common information for nodes 3 and 5 to refine the statistics.

122 44 / 46 Application to Graph Problems contd. Iterated Application of Common Information Approach Step : First find sufficient statistics for leaf nodes (nodes 4 and 5) using a person-by-person approach. Step : Use common information for nodes 3 and 5 to refine the statistics. Step 3: Use common information for nodes, 3, 4 and 5 to further refine the statistics.

123 44 / 46 Application to Graph Problems contd. Iterated Application of Common Information Approach Step : First find sufficient statistics for leaf nodes (nodes 4 and 5) using a person-by-person approach. Step : Use common information for nodes 3 and 5 to refine the statistics. Step 3: Use common information for nodes, 3, 4 and 5 to further refine the statistics. Continue to apply common information approach on bigger and bigger sub-graphs.

124 45 / 46 Application to Graph Problems contd. Iterated Application of Common Information Approach Theorem There is an optimal linear strategy of the form u i t = Kt ij z j t j ancestors{i} where z j t is node j s estimate of its ancestors and descendants states. For example, for node 3, u 3 t = K 3 t z t + K 3 t z t + K 33 t z 3 t, z t is node s estimate of states of nodes, 3, 5 z t is node s estimate of states of nodes, 3, 4, 5 z 3 t is node s estimate of states of nodes,, 3,

125 46 / 46 Concluding Remarks Common information approach provides a systematic method for finding sufficient statistics in decentralized stochastic control problems. Finds sufficient statistics that cannot be computed by just using person-by-person methods. In Delayed sharing information structures, this approach provides results for general values of delay. Sufficient statistics for decentralized LQG problems.

126 46 / 46 Concluding Remarks Common information approach provides a systematic method for finding sufficient statistics in decentralized stochastic control problems. Finds sufficient statistics that cannot be computed by just using person-by-person methods. In Delayed sharing information structures, this approach provides results for general values of delay. Sufficient statistics for decentralized LQG problems. Optimization of strategies: In POMDP like formulations, a dynamic programming like decomposition for the coordinator. IN LQG formulations, we get an open loop control problem for finding the best gain matrices for local information.

127 46 / 46 Concluding Remarks Common information approach provides a systematic method for finding sufficient statistics in decentralized stochastic control problems. Finds sufficient statistics that cannot be computed by just using person-by-person methods. In Delayed sharing information structures, this approach provides results for general values of delay. Sufficient statistics for decentralized LQG problems. Optimization of strategies: In POMDP like formulations, a dynamic programming like decomposition for the coordinator. IN LQG formulations, we get an open loop control problem for finding the best gain matrices for local information. Common Information and Dynamic games: In general, cannot use a coordinator in non-cooperative setting. Common information approach allows us to recast a game of asymmetric information as a game of symmetric information. Under some conditions, provides an extension of Markov perfect equilibrium for asymmetric information games.

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach 1 Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis Abstract A general model of decentralized