Decentralized Control of Stochastic Systems

Similar documents
6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

9. Interpretations, Lifting, SOS and Moments

Convex Optimization M2

Parameterization of Stabilizing Controllers for Interconnected Systems

Mathematical Optimization Models and Applications

Relaxations and Randomized Methods for Nonconvex QCQPs

Analysis and synthesis: a complexity perspective

Convex relaxation. In example below, we have N = 6, and the cut we are considering

4. Algebra and Duality

Convex relaxation. In example below, we have N = 6, and the cut we are considering

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

Distributed Optimization. Song Chong EE, KAIST

Optimization in Wireless Communication

Semidefinite Programming Basics and Applications

Machine Learning 4771

Advances in Convex Optimization: Theory, Algorithms, and Applications

LECTURE 3. Last time:

8 Approximation Algorithms and Max-Cut

Semidefinite Programming

SDP Relaxations for MAXCUT

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Convex Optimization and Support Vector Machine

Convex sets, conic matrix factorizations and conic rank lower bounds

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

Dimension reduction for semidefinite programming

Lecture Note 5: Semidefinite Programming for Stability Analysis

Support Vector Machine

15-780: LinearProgramming

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Positive semidefinite rank

Linear and conic programming relaxations: Graph structure and message-passing

Final Exam December 12, 2017

Methodology for Computer Science Research Lecture 4: Mathematical Modeling

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

Final Exam December 12, 2017

Markov Processes Hamid R. Rabiee

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

Week 5: Markov chains Random access in communication networks Solutions

OPPORTUNISTIC Spectrum Access (OSA) is emerging

6.854J / J Advanced Algorithms Fall 2008

16.1 L.P. Duality Applied to the Minimax Theorem

Robust and Optimal Control, Spring 2015

Support Vector Machines

Power Allocation over Two Identical Gilbert-Elliott Channels

Denis ARZELIER arzelier

CSC Linear Programming and Combinatorial Optimization Lecture 12: The Lift and Project Method

Lecture notes for Analysis of Algorithms : Markov decision processes

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Machine Learning. Support Vector Machines. Manfred Huber

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Network Utility Maximization With Nonconcave Utilities Using Sum-of-Squares Method

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Markov Chain Model for ALOHA protocol

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Grundlagen der Künstlichen Intelligenz

Markov decision processes and interval Markov chains: exploiting the connection

The quest for finding Hamiltonian cycles

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

MIT Algebraic techniques and semidefinite optimization May 9, Lecture 21. Lecturer: Pablo A. Parrilo Scribe:???

16.4 Multiattribute Utility Functions

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Algorithmic Game Theory and Applications. Lecture 15: a brief taster of Markov Decision Processes and Stochastic Games

1 Strict local optimality in unconstrained optimization

Polynomial-Time Verification of PCTL Properties of MDPs with Convex Uncertainties and its Application to Cyber-Physical Systems

On the Approximate Linear Programming Approach for Network Revenue Management Problems

Chapter 16 focused on decision making in the face of uncertainty about one future

approximation algorithms I

Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays

Nonlinear Programming Models

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

c 2000 Society for Industrial and Applied Mathematics

BBM402-Lecture 20: LP Duality

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

Nonconvex Quadratic Programming: Return of the Boolean Quadric Polytope

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Introduction to Semidefinite Programming I: Basic properties a

Generating Valid 4 4 Correlation Matrices

Approximation Algorithms

Copositive Programming and Combinatorial Optimization

Part 7: Structured Prediction and Energy Minimization (2/2)

On the Throughput, Capacity and Stability Regions of Random Multiple Access over Standard Multi-Packet Reception Channels

1 Markov decision processes

L5 Support Vector Classification

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Low-ranksemidefiniteprogrammingforthe. MAX2SAT problem. Bosch Center for Artificial Intelligence

Submodularity in Machine Learning

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

Information in Aloha Networks

The Graph Realization Problem

Markovian Decision Process (MDP): theory and applications to wireless networks

Linear and non-linear programming

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

The Markov Decision Process (MDP) model

Transcription:

Decentralized Control of Stochastic Systems Sanjay Lall Stanford University CDC-ECC Workshop, December 11, 2005

2 S. Lall, Stanford 2005.12.11.02 Decentralized Control G 1 G 2 G 3 G 4 G 5 y 1 u 1 y 2 u 2 y 3 u 3 y 4 u 4 y 5 u 5 K 1 K 2 K 3 K 4 K 5 Control design problem is to find K which is block diagonal, whose diagonal blocks are 5 separate controllers. u 1 u 2 u 3 u 4 = u 5 K 1 0 0 0 0 y 1 0 K 2 0 0 0 y 2 0 0 K 3 0 0 y 3 0 0 0 K 3 0 y 4 0 0 0 0 K 3 y 5

3 S. Lall, Stanford 2005.12.11.02 Example: Communicating Controllers Often we do not want perfectly decentralized control; we may achieve better performance by allowing controllers to communicate, both to receive measurements from other aircraft, and to exchange data with other controllers. G 1 G 2 G 3 D D D D K 1 K 2 K 3 Here, we need to design 3 separate controllers.

4 S. Lall, Stanford 2005.12.11.02 Research directions Spacetime models Infinite and finite spatial extent Synthesis procedures via SDP Decentralization structure via information passing Structured synthesis Decentralized structure specified in advance Quadratic invariance conditions on solvability See talk by Mike Rotkowitz Dynamic programming Stochastic finite state systems Relaxation approach

5 S. Lall, Stanford 2005.12.11.02 Finite-State Stochastic Systems In this talk Look at dynamics specified by Markov Decision Processes. Not linear, but finite state New semidefinite programming relaxations Randy Cogill, Mike Rotkowitz, Ben Van Roy Many applications: Decision problems: e.g., task allocation problems for vehicles Detection problems: e.g., physical sensing, communication

6 S. Lall, Stanford 2005.12.11.02 Example: Medium-Access Control Two transmitters, each with a queue that can hold up to 3 packets p a k = probability that k 1 packets arrive at queue a p 1 = [ 0.7 0.2 0.05 0.05 ] p 2 = [ 0.6 0.3 0.075 0.025 ] At each time step, each transmitter sees how many packets are in its queue, and sends some of them; then new packets arrive Packets are lost when queues overflow, or when there is a collision, i.e., both transmit at the same time

7 S. Lall, Stanford 2005.12.11.02 Example: Medium-Access Control We would like a control policy for each queue, i.e., a function mapping number of packets in the queue number of packets sent One possible policy; transmit all packets in the queue. Causes large packet loss due to collisions. The other extreme; wait until the queue is full Causes large packet loss due to overflow. We d like to find the policy that minimizes the expected number of packets lost per period.

8 S. Lall, Stanford 2005.12.11.02 Centralized Control Each transmitter can see how many packets are in the other queue In this case, we look for a single policy, mapping pair of queue occupancies pair of transmission lengths Decentralized Control Each transmitter can only see the number of packets in its own queue In this case, we look for two policies, each mapping queue occupancy transmission length

9 S. Lall, Stanford 2005.12.11.02 Medium Access: Centralized Solution The optimal policy is Transmitter 2 state Transmitter 1 state 0 1 2 3 0 (0,0) (0,1) (0,2) (0,3) 1 (1,0) (1,0) (0,2) (0,3) 2 (2,0) (2,0) (2,0) (0,3) 3 (3,0) (3,0) (3,0) (3,0) Easy to compute via dynamic programming Each transmitter needs to know the state of the other We would like a decentralized control law

10 S. Lall, Stanford 2005.12.11.02 Markov Decision Processes The above medium-access control problem is an example of a Markov Decision Process (MDP) n states, and m actions, hence m n possible centralized policies However, the centralized problem is solvable by linear programming Decentralized control NP-hard, even with just two policies, even non-dynamic case Would like efficient algorithms Hence focus on suboptimal solutions

11 S. Lall, Stanford 2005.12.11.02 Static Problems Called classification or hypothesis testing e.g., a radar system sends out n pulses, and measures y reflections The cost depends on the number of false positives/negatives. 0.25 0.2 0.15 prob(y X 1 ) prob(y X 2 ) 0.1 0.05 0 0 2 4 6 8 10 12 14 16 18 20 p(y X 1 ) = prob. of receiving y reflections given no aircraft present p(y X 2 ) = prob. of receiving y reflections given an aircraft present

12 S. Lall, Stanford 2005.12.11.02 Static Centralized Formulation x is the state, with prior pdf Prob(x) = p(x) y = h(x, w) is the measurement w is sensor noise Choosing control action u results in cost C(u, x) Represent sensing by transition matrix A(y, x) = Prob(y x) Problem: find γ : Y U to minimize the expected cost: p(x)a(y, x)c ( γ(y), x ) x,y

13 S. Lall, Stanford 2005.12.11.02 Static Decentralized Formulation x is the state, with prior pdf Prob(x) = p(x) y i = h i (x, w i ) is the i th measurement w i is noise at i th sensor Choosing control actions u 1,..., u m results in cost C(u 1,..., u m, x) y = (y 1,..., y m ) given by transitions A(y, x) = Prob(y x) Problem: find γ 1,..., γ m to minimize the expected cost: p(x)a(y, x)c ( γ 1 (y 1 ),..., γ m (y m ), x ) x,y

14 S. Lall, Stanford 2005.12.11.02 Change of variables Represent the policy by a matrix { 1 if γ(y) = u K yu = 0 otherwise This matrix K satisfies K1 = 1, that is K is a stochastic matrix. K is Boolean Then the expected cost is p x A yx C ( γ(y), x ) = p x A yx C ux K yu x,y,u x,y

15 S. Lall, Stanford 2005.12.11.02 Optimization Let W yu = x C uxa yx p x, the cost of decision u when y is measured. minimize W yu K yu y,u subject to K1 = 1 K yu {0, 1} for all y, u nm variables K ij, with both linear and Boolean constraints Just m easy problems; pick γ(y) = arg min u W yu

16 S. Lall, Stanford 2005.12.11.02 Linear Programming Relaxation Another solution: relax the Boolean constraints: For estimation: interpret K yu as belief in the estimate u For control: we have a mixed policy K yu = Prob(u y) Relaxation gives the linear program: minimize W yu K yu y,u subject to K1 = 1 K 0 It is easy to show that this gives the optimal cost and rounding gives the optimal controller.

17 S. Lall, Stanford 2005.12.11.02 Decentralization Constraints Decentralization imposes constraints on K(y 1,..., y m, u 1,..., u m ). Measurement constraint: controller i can only measure measurement i Prob(u i y) = Prob(u i y i ) Independence constraint: for mixed policies, controllers i and j use independent sources of randomness to generate u i and u j Prob(u y) = m Prob(u i y i ) i=1

18 S. Lall, Stanford 2005.12.11.02 Decentralization Constraints We call a controller K decentralized if K(y 1,..., y m, u 1,..., u m ) = K 1 (y 1, u 1 )... K m (y m, u m ) for some K 1,..., K m If K is a deterministic controller y = γ(u), this is equivalent to γ(u 1, u 2 ) = ( γ 1 (u 1 ), γ 2 (u 2 ) )

19 S. Lall, Stanford 2005.12.11.02 The Decentralized Static Control Problem minimize y,u W y1 y 2 u 1 u 2 K y1 y 2 u 1 u 2 subject to K y1 y 2 u 1 u 2 = K 1 y 1 u 1 K 2 y 2 u 2 K i 0 K i 1 = 1 K i ab {0, 1} for all a, b This problem is NP-hard In addition to linear constraints, we have bilinear constraints

20 S. Lall, Stanford 2005.12.11.02 Boolean Constraints One can show the following facts Removing the Boolean constraints does not change the optimal value i.e., we can simplify the constraint set Then there exists an optimal solution which is Boolean i.e., using a mixed policy does not achieve better performance than a deterministic policy

21 S. Lall, Stanford 2005.12.11.02 Relaxing the Constraints Now we would like to solve minimize y,u W y1 y 2 u 1 u 2 K y1 y 2 u 1 u 2 subject to K y1 y 2 u 1 u 2 = K 1 y 1 u 1 K 2 y 2 u 2 K i 0 K i 1 = 1 Still NP-hard, because it would solve the original problem. Removing the bilinear constraint gives the centralized problem. This is a trivial lower bound on the achievable cost

22 S. Lall, Stanford 2005.12.11.02 Valid Constraints For any decentralized controller, we know K y1 y 2 u 1 u 2 = Ky 2 2 u 2 u 1 The interpretation is that, given measurements y 1 and y 2, averaging over the decisions of controller 1 does not affect the distribution of the decisions of controller 2 Easy to see, since K 1 is a stochastic matrix, and K y1 y 2 u 1 u 2 = K 1 y 1 u 1 K 2 y 2 u 2

23 S. Lall, Stanford 2005.12.11.02 Valid Constraints Adding these new constraints leaves the optimization unchanged. But it changes the relaxation. Dropping the bilinear constraint gives a tighter lower bound than the centralized problem. minimize y,u W y1 y 2 u 1 u 2 K y1 y 2 u 1 u 2 subject to K y1 y 2 u 1 u 2 = K 1 y 1 u 1 K 2 y 2 u 2 u 1 K y1 y 2 u 1 u 2 = K 2 y 2 u 2 u 2 K y1 y 2 u 1 u 2 = K 1 y 1 u 1 new valid constraints K i 0, K i 1 = 1

24 S. Lall, Stanford 2005.12.11.02 Synthesis Procedure 1. Solve the linear program minimize subject to W y1 y 2 u 1 u 2 K y1 y 2 u 1 u 2 y,u u 1 K y1 y 2 u 1 u 2 = K 2 y 2 u 2 K y1 y 2 u 1 u 2 = Ky 1 1 u 1 u 2 K i 0, K i 1 = 1 The optimal value gives a lower bound on the achievable cost 2. If K y1 y 2 u 1 u 2 = Ky 1 1 u 1 Ky 2 2 u 2 then the solution is optimal. If not, use K 1 and K 2, as approximate solutions.

25 S. Lall, Stanford 2005.12.11.02 Lifting Lifting is a general approach for constructing primal relaxations; the idea is Introduce new variables Y which are polynomial in x This embeds the problem in a higher dimensional space Write valid constraints in the new variables The feasible set of the original problem is the projection of the lifted feasible set

26 S. Lall, Stanford 2005.12.11.02 Example: Minimizing a Polynomial We d like to find the minimum of f = 6 k=0 a kx k Pick new variables Y = g(x) where 1 x x 2 x 3 g(x) = x x 2 x 3 x 4 x 2 x 3 x 4 x 5 and C = x 3 x 4 x 5 x 6 Then an equivalent problem is a a 1 a 2 a 3 0 2 2 2 0 0 a 4 2 0 a 5 2 a 6 minimize trace CY subject to Y 0 Y 11 = 1 Y 24 = Y 33 Y 22 = Y 13 Y 14 = Y 23 Y = g(x) Dropping the constraint Y = g(x) gives an SDP relaxation of the problem

27 S. Lall, Stanford 2005.12.11.02 The Dual SDP Relaxation The SDP relaxation has a dual, which is also an SDP. Example Suppose f = x 6 + 4x 2 + 1, then the SDP dual relaxation is maximize subject to t 1 t 0 2 + λ 2 λ 3 0 2λ 2 λ 3 λ 1 2 + λ 2 λ 3 2λ 1 0 λ 3 λ 1 0 1 0 this is exactly the condition that f t be sum of squares

28 S. Lall, Stanford 2005.12.11.02 Lifting for General Polynomial Programs When minimizing a polynomial, lifting gives an SDP relaxation of whose dual is an SOS condition When solving a general polynomial program with multiple constraints, there is a similar lifting This gives an SDP, whose feasible set is a relaxation of the feasible set of the original problem The corresponding dual SDP is a Positivstellensatz refutation Solving the dual certifies a lower bound on the original problem

29 S. Lall, Stanford 2005.12.11.02 Example: The Cut Polytope The feasible set of the MAXCUT problem is C = { X S n X = vv T, v { 1, 1} n } A simple relaxation gives the outer approximation to its convex hull. Here n = 11; the set has affine dimension 55; a projection is shown below 6 4 2 0 2 4 6 6 4 2 0 2 4 6

30 S. Lall, Stanford 2005.12.11.02 Example Suppose the sample space is Ω = {f 1, f 2, f 3, f 4 } {g 1, g 2, g 3, g 4 } The unnormalized probabilities of (f, g) Ω are given by g 1 g 2 g 3 g 4 f 1 1 6 2 0 f 2 0 1 2 4 f 3 6 2 0 1 f 4 4 0 1 2 Player 1 measures f, i.e., Y 1 is the set of horizontal strips and would like to estimate g, i.e, X 1 is the set of vertical strips Player 2 measures g and would like to estimate f

31 S. Lall, Stanford 2005.12.11.02 Example Objective: maximize the expected number of correct estimates g 1 g 2 g 3 g 4 f 1 1 6 2 0 f 2 0 1 2 4 f 3 6 2 0 1 f 4 4 0 1 2 Optimal decision rules are 0 1 0 0 K 1 = 0 0 0 1 Y 1 f 1 f 2 f 3 f 4 1 0 0 0 Xest 1 g 2 g 4 g 1 g 2 1 0 0 0 K 2 = 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 Y 2 g 1 g 2 g 3 g 4 X 2 est f 3 f 1 f 2 f 2 The optimal is 1.1875 These are simply the maximum a-posteriori probability classifiers

32 S. Lall, Stanford 2005.12.11.02 Example Objective: maximize the probability that both estimates are correct Optimal decision rules are g 1 g 2 g 3 g 4 f 1 1 6 2 0 f 2 0 1 2 4 f 3 6 2 0 1 f 4 4 0 1 2 Y 1 f 1 f 2 f 3 f 4 X 1 est g 2 g 4 g 1 g 3 Y 2 g 1 g 2 g 3 g 4 X 2 est f 3 f 1 f 4 f 2 The relaxation of the lifted problem is tight The optimal probability that both estimates are correct is 0.5313 MAP estimates are not optimal; they achieve 0.5

33 S. Lall, Stanford 2005.12.11.02 Example Objective: maximize the probability that at least one estimate is correct g 1 g 2 g 3 g 4 f 1 1 6 2 0 f 2 0 1 2 4 f 3 6 2 0 1 f 4 4 0 1 2 The relaxation of the lifted problem is not tight; it gives upper bound of 0.875 The following decision rules (constructed by projection) achieve 0.8438 Y 1 f 1 f 2 f 3 f 4 X 1 est g 2 g 4 g 1 g 1 Y 2 g 1 g 2 g 3 g 4 X 2 est f 1 f 3 f 1 f 4 MAP estimates achieve 0.6875

34 S. Lall, Stanford 2005.12.11.02 Heuristic 1: MAP f 1 1 f 2 1 14 g 1 g 2 g 3 g 4 28 0 0 0 3 f 3 0 1 7 28 0 0 f 4 0 0 3 14 5 28 0 1 4 The optimal probability that both estimates are correct is 4 7, achieved by Player 1 sees: f 1 f 2 f 3 f 4 Player 1 guesses: g 1 g 2 g 3 g 4 Player 2 sees: g 1 g 2 g 3 g 4 Player 2 guesses: f 1 f 2 f 3 f 4 If both just guess most likely outcome, achieved probability is 1 4.

35 S. Lall, Stanford 2005.12.11.02 Heuristic 2: Person by Person Optimality 1. Choose arbitrary initial policies γ 1 and γ 2. 2. Hold γ 2 fixed. Choose best γ 1 given γ 2. 3. Hold new γ 1 fixed. Choose best γ 2 given this γ 1. 4. Continue until no more improvement. u 1 = 0 u 1 = 1 u 2 = 0 1000 1001 u 2 = 1 1001 1 PBPO value is 1000 times the optimal! Almost as bad as the worst policy.

36 S. Lall, Stanford 2005.12.11.02 Lift and Project Performance Guarantee We would like to maximize reward; all rewards are nonnegative. Use the above LP relaxation together with a projection procedure The computed policies always achieve a a reward of at least optimal reward min{ U 1, U 2 } This gives an upper bound, a feasible policy, and a performance guarantee

37 S. Lall, Stanford 2005.12.11.02 Markov Decision Processes We will now consider a Markov Decision Process where X i (t) is the event that the system is in state i at time t A j (t) is the event that action j is taken at time t We assume for simplicity that for every stationary policy the chain is irreducible and aperiodic Transition probabilities: A ijk = Prob(X i (t + 1) X j (t) A k (t)) Mixed policy: Cost function: K jk = Prob(X j (t) A k (t)) W jk = cost of action k in state j

38 S. Lall, Stanford 2005.12.11.02 Markov Decision Processes The centralized solution minimize subject to W jk K jk j,k K ir = r j,k K 0 F jk = 1 j,k A ijk K jk

39 S. Lall, Stanford 2005.12.11.02 Decentralized Markov Decision Processes Two sets of states X p = {X p 1,..., Xp n} Two transition matrices A p ijk = Prob(Xp i (t + 1) Xp j (t) Ap k (t)) Two controllers K p jk = Prob(Xp j (t) Ap k (t)) Cost function W j1 j 2 k 1 k 2 = cost of actions k 1, k 2 in states j 1, j 2

40 S. Lall, Stanford 2005.12.11.02 Decentralized Markov Decision Processes minimize W j1 j 1 k 1 k 2 K j1 j 2 k 1 k 2 j 1,j 1,k 1,k 2 subject to K j1 j 2 k 1 k 2 = Kj 1 1 k 1 Kj 2 2 k 2 K p ir = A p ijk Kp jk (1) r j,k K p 0 (2) K p jk = 1 (3) j,k Each of constraints (1) (3) can be multiplied by K 3 p to construct a valid constraint in lifted variables K The resulting linear program gives a lower bound on the optimal cost

41 S. Lall, Stanford 2005.12.11.02 Exact Solution If the solution K to the lifted linear program has the form K j1 j 2 k 1 k 2 = K 1 j 1 k 1 K 2 j 2 k 2 then the controller is an optimal decentralized controller. This corresponds to the usual rank conditions in e.g., MAXCUT. Projection If not, we need to project the solution K defines a pdf on X 1 X 2 U 1 U 2 We project by constructing the marginal pdf on X p U p

42 S. Lall, Stanford 2005.12.11.02 Example: Medium-Access Control Two transmitters, each with a queue that can hold up to 3 packets p a k = probability that k 1 packets arrive at queue a p 1 = [ 0.7 0.2 0.05 0.05 ] p 2 = [ 0.6 0.3 0.075 0.025 ] At each time step, each transmitter sees how many packets are in its queue, and sends some of them; then new packets arrive Packets are lost when queues overflow, or when there is a collision, i.e., both transmit at the same time

43 S. Lall, Stanford 2005.12.11.02 Example: Medium Access This is a Decentralized Markov Decision Process, where Each MDP has 4 states; the no. of packets in the queue Each MDP has 4 actions; transmit 0, 1, 2, 3 packets State transitions are determined by arrival probabilities and actions Cost is total number of packets lost; Each queue loses all packets sent if there is a collision Each queue loses packets due to overflows

44 S. Lall, Stanford 2005.12.11.02 Example: Medium Access Optimal policies for each player are queue occupancy 0 1 2 3 number sent 0 0 2 3 queue occupancy 0 1 2 3 number sent 0 0 0 3 Expected number of packets lost per period is 0.2202 The policy always transmit loses 0.3375 per period 6000 6000 5000 5000 4000 3000 2000 4000 3000 2000 queue 2 successful 1000 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 collisions overflows 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

45 S. Lall, Stanford 2005.12.11.02 Summary Decentralized control problems are inherently computationally hard We have developed Efficient algorithms to construct suboptimal solutions Upper bounds on achievable performance Guaranteed performance ratio Much more topology independent controllers algorithms that exploit structure peer-to-peer techniques other applications: robotics, sensor networks