A DIFFERENTIAL GAMES APPROACH FOR ANALYSIS OF SPACECRAFT POST-DOCKING OPERATIONS

Size: px

Start display at page:

Download "A DIFFERENTIAL GAMES APPROACH FOR ANALYSIS OF SPACECRAFT POST-DOCKING OPERATIONS"

Bethanie Rogers
6 years ago
Views:

1 A DIFFERENTIAL GAMES APPROACH FOR ANALYSIS OF SPACECRAFT POST-DOCKING OPERATIONS By TAKASHI HIRAMATSU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 212

2 c 212 Takashi Hiramatsu 2

3 I dedicate this to everyone that helped me write this manuscript. 3

4 ACKNOWLEDGMENTS My biggest appreciation goes to my advisor Dr. Norman G. Fitz-Coy for his great help and support. Every time I talked to him he motivated me with critical responses and encouraged me whenever I was stuck in the middle of my research. I also thank my committee Dr. Warren Dixon, Dr. Gloria Wiens, and Dr. William Hager for their supports. Finally, I thank my colleagues in Space Systems Group and all other friends, who directly or indirectly helped me throughout the years I spent at University of Florida. 4

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 INTRODUCTION Spacecraft Rendezvous and Docking Cooperative Scenarios Noncooperative Scenarios Small Satellites Game Theoretic Approach MATHEMATICAL BACKGROUND FOR THE APPROACH Differential Games and Control Theory Minimax Strategy Nash Strategy Stackelberg Strategy Open-Loop Strategies for Two-Person Linear Quadratic Differential Games Numerical Methods to Optimal Control Problem Bilevel Programming TECHNICAL DESCRIPTION Reduction of Stackelberg Differential Games to Optimal Control Conversion of Stackelberg Differential Games to Stackelberg Static Games Costate Mapping of Stackelberg Differential Games Conclusion DYNAMICS OF DOCKED SPACECRAFT Formulation of Dynamics Relative Motion Dynamics of a Satellite Translation Rotation Dynamics of Two Docked Satellites Simulation Case I: Nonzero Linear Velocity Case II: Nonzero Rotational Velocity

6 4.2.3 Case III: Nonzero Linear and Rotational Velocities Conclusion LINEAR CONTROLLER DESIGN WITH STACKELBERG STRATEGY Post-Docking Study with Linear Quadratic Game Simulation and Results Conclusion SOLUTIONS TO TWO-PLAYER LINEAR QUADRATIC STACKELBERG GAMES WITH TIME-VARYING STRUCTURE Game Based on Additive Errors Open-loop Stackelberg Solution Closed-loop Stackelberg Solution Game Based on Multiplicative Errors Open-loop Stackelberg Solution Closed-loop Stackelberg Solution Simulations and Results Conclusion CONCLUSION AND FUTURE WORKS APPENDIX A OPTIMALITY CONDITIONS OF TWO-PERSON STACKELBERG DIFFERENTIAL GAMES A.1 Fixed Final Time A.1.1 Follower s Strategy A Variation of the augmented cost functional A Optimality conditions A.1.2 Leader s Strategy A Variation of the augmented cost functional A Optimality conditions A.2 Free Final Time A.2.1 Follower s Strategy A Variation of the augmented cost functional A Optimality conditions A.2.2 Leader s strategy A Variation of the augmented cost functional A Optimality conditions A.3 Linear-Quadratic Differential Game A.3.1 Fixed Final Time A.3.2 Free Final Time

7 B RISE STABILITY ANALYSIS B.1 Rise Feedback Control Development B.2 Stability Analysis C COSTATE ESTIMATION FOR THE TRANSCRIBED STACKELBERG GAMES 132 C.1 Transformed Optimality Conditions C.2 Discretization of Two-person Stackelberg Differential Games C.3 KKT Conditions and Costate Mapping REFERENCES BIOGRAPHICAL SKETCH

8 Table LIST OF TABLES page 4-1 The simulation parameters for Case I The simulation parameters for Case II The simulation parameters for Case III The simulation parameters for the linear quadratic game The simulation parameters for the Stackelberg-RISE controller

9 Figure LIST OF FIGURES page 1-1 A design iteration through satellite post-docking analysis Relationship among optimization problems Direct and indirect methods A representation of the position of a satellite with the inertial and the nominal reference frames A satellite with a body-fixed reference frame F i An exeggerated view of two satellites near the nominal orbit Two satellites initially on the same nominal orbit Two satellites initially radially aligned Case I: the interaction forces applied to the SV and the RSO Case I: the interaction torques applied to the SV and the RSO Case I: the linear motion of the RSO relative to the SV Case I: the rotational motion of the RSO relative to the SV Case II: the interaction forces applied to the SV and the RSO Case II: the interaction torques applied to the SV and the RSO Case II: the linear motion of the RSO relative to the SV Case II: the rotational motion of the RSO relative to the SV Case III: the interaction forces applied to the SV and the RSO Case III: the interaction torques applied to the SV and the RSO Case III: the linear motion of the RSO relative to the SV Case III: the rotational motion of the RSO relative to the SV Two rigid bodies on circular orbits The resultant trajectory The control force inputs The control torque inputs The relationship among the current and the desired orientations

10 6-2 Two docked satellites approximated as two rigid bodies connected via a torsion spring f (t) and g(t) as respective weights on the game and an arbitrary disturbances The simulation results for the Stackelberg and RISE controller

11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A DIFFERENTIAL GAMES APPROACH FOR ANALYSIS OF SPACECRAFT POST-DOCKING OPERATIONS Chair: Norman G. Fitz-Coy Major: Mechanical Engineering By Takashi Hiramatsu August 212 An increase of responsive space assets will contribute to the growing number of spacecraft in space and in turn the growing potential for failures to occur. The number of spacecraft which has past its operational life also keeps increasing. Without proper treatment these satellites become space debris, which could lead to more failure due to collision with other spacecraft. Thus, there will be a need for effective debris abatement (i.e., repair and /or disposal of failed satellites) which will require autonomous service satellites. Such a tow truck concept is expected to take a crucial role for sustainable small satellite utilization in the future. Current and past investigation where autonomous docking plays an important role have all considered cooperative interactions between satellites. That is, either the target has the same goals as the service vehicle or the target is not actuated and passively follows the lead of the service vehicle. Cooperative scenarios are not always guaranteed, thus it is imperative that we consider docking scenarios with noncooperative targets. Such noncooperative scenarios involve motion which may be resistible or unpredictable (i.e., the target may have lost its control authority and thus its motion becomes resistible or it may be adversarial and is maneuvering to avoid capture). Maintaining a post-docked state requires a control system which minimizes the effects of the uncertain interactions due to a noncooperative behavior. In the robust control sense, such uncertainty needs to be upper-bounded in order to develop 11

12 effective control strategies. For that purpose it is important to characterize this uncertain interactions. An approach to approximate this uncertainty is to model the docked state of spacecraft as a differential game with each spacecraft being the player, where the interactions are the outcome of the gameplay. Differential games is a class of game theory that describe a dynamical system with multiple control inputs with different objectives. Each input which actively affects the behavior of the system (e.g., control inputs, noise, disturbance, and other external inputs) is called a player of the game, and those players cooperate with or compete against one another to achieve their objectives. This manuscript addresses the characterization of a noncooperative behavior expected in satellite post-docking, and the corresponding controller law to achieve docking-state maintenance through modeling and solving the two-person Stackelberg differential game. 12

13 CHAPTER 1 INTRODUCTION This dissertation is aimed at outlining an approach to estimating a noncooperative behavior of the target spacecraft in the post-docking state, the corresponding interaction between the docked spacecraft, and the required control strategy to maintain the state. In this chapter the background of spacecraft docking and motivations for a study of noncooperative post-docking are presented. The Sputnik 1, the world first satellite was launched in Since then, more than 6, satellites have been launched to space, and currently about 3,6 satellites are operational in space. Advancing of the space technologies necessitated transportation of astronauts between spacecrafts, construction of space stations, etc., which were carried out through spacecraft rendezvous and docking (R&D). A spacecraft docking and rendezvous play important roles in space utilities [1]. Since the first rendezvous of Gemini VI-A in 1965, rendezvous have been used in cases including transportation of astronauts between spacecraft and construction of a space station. For example, HTV-1 was docked with the international space station (ISS) for refueling [2]. 1.1 Spacecraft Rendezvous and Docking There are many space applications involving docking of spacecraft, satellites, and objects. Many of the current and past docking scenarios are of cooperative nature; before two spacecraft are docked they have to rendezvous, which can be achieved if (i) both spacecraft work together to match their motions, or (ii) one of them is stationary or in constant motion such that the other can adjust to match it. Docking with a spacecraft which is tumbling is considered non-cooperative. Thus manned missions are inherently cooperative, while un-manned missions with autonomous rendezvous and docking could be noncooperative. 13

14 1.1.1 Cooperative Scenarios Servicing operations and return missions assume the targets cooperate to get services or to dock. Examples include refueling (Orbital Express [3, 4], ConeXpress [5], HTV-1 (KOUNOTORI) [2], and HTV-2 (KOUNOTORI 2) [6]), towing (Orbital maneuvering vehicle (OMV) [7]), and repairing, such as the servicing missions to repair Hubble Space Telescope [8 11] Noncooperative Scenarios Cooperative scenarios are not guaranteed for all future missions and the likelihood of docking with noncooperative target is quite high. For example, the motion of the target may either be unpredictable or resistible (i.e., the target motion is not fully under control or favorable). Cook [12] defined the non-cooperative target as any spacecraft which is either not designed for docking or tumbling freely in space. In the future it is possible to add adversarial targets, which specifically try to avoid docking and rendezvous, although military use of space resources is prohibited by the current space law [13]. There are a few important applications in the future. Collision avoidance of near-earth asteroids, for example, has drawn attentions [14, 15]. Docking maneuvers are involved when sensors are placed on asteroids to track their motion, or when actuating or explosive devices are attached to change the course of their motions. Another example is space debris. Several recent events contributed to the rapid growth of the debris population; China s ASAT operation in 27 [16], the destruction of USA-193 in 28 [17], and the collision between Iridium and COSMOS in 29 [18]. Liou [19] estimated the propagation of space debris and showed that the debris population keeps growing even if no more spacecraft is launched; the existing debris will collide with spacecraft or other debris to increase its number. Therefore, both prevention and removal of space debris are important. One of the motivational examples is the case of USA-193 [17], a reconnaissance satellite which became disabled on orbit and eventually got destroyed by a missile. The operation not only was costly but also 14

15 generated debris, just like the ASAT operation. If there exists a technology to safely deal with non-cooperative targets (e.g., a space tow truck to capture and take them to the graveyard orbit), such a problem can be prevented with smaller damages. Several active debris removal technologies have been proposed, including the Remora remover TM [2], and the micro remover [21, 22]. 1.2 Small Satellites In the past, mission-specific big monolithic satellites were developed and served for most of the space applications. But recently small satellites have drawn attentions for their short-time, low-cost development, and their versatile applications. The idea of constellation of small satellites is expected as a replacement of some of the tasks that have been taken by traditional satellites. However, having more satellites in space increases the risks of more space debris. Small satellites usually have a higher orbital life than traditional satellites due to higher ballistic coefficients and lower orbits [23]. The long orbital life of small satellites along with the fact that after the end of their service they become debris will increase the threat. Furthermore, the higher the number of spacecraft gets, the more likely the failure could occur. The utilization of small satellites for practical applications requires many satellites by nature, and thus there will be a need for effective debris abatement (i.e., remove the failed satellites for repair and/or disposal) which requires ability to work with non-cooperative debris. And while several works have been done on non-cooperative docking, non-cooperative post-docking has not drawn much attention. 1.3 Game Theoretic Approach Recent, current, and future activities necessitate development of autonomous spacecraft rendezvous and docking technology, with target spacecraft having non-cooperative characteristics. Dealing with non-cooperative targets also emphasizes the importance of post-docking maintenance, which requires the design of a control system to minimize the effects of uncertain interactions due to the non-cooperative behavior of the target 15

16 spacecraft. In the robust control sense such uncertainty needs to be upper-bounded in order to develop effective control strategies. Therefore it is important to characterize the non-cooperative behavior and the corresponding interactions. In order to successfully achieve docking and maintain the docked state between two spacecraft, accurate information of their dynamic behaviors is required so that the corresponding interactions can be controlled. In cooperative docking maneuvers, where two fully functional spacecraft work together to dock with each other, such a requirement has been tackled by several efforts already. On the other hand, in non-cooperative docking maneuvers and the corresponding post-docking maneuvers it is difficult to analyze the interaction because one spacecraft will not act in accordance with the other. In this dissertation a differential games theoretic approach is employed to estimating the behavior of the target (non-concooperative) spacecraft in the post-docking situation, the corresponding interactions, and the required control strategy to maintain the docked state. For a specific case where two satellites with known specifications such as mass, size, and power are considered, a dedicated method of simulation such as Monte Carlo method works well. However, if the general behaviors of arbitrary satellites are to be studied, then it is beneficial to know how each parameter of the design specification of satellites affects the post-docking. Characterizing the non-cooperative post-docking behaviors as a function of the design parameters allows for consideration of different post-docking scenarios. Figure 1-1 shows an iterative design process possible with game theory 1. A differential game problem will be formulated such that the tow truck (the service vehicle, SV) and the non-cooperative target (the resident space object, RSO) respectively choose their actuation command after docking to affect the interactions between them. With a set of simulation parameters including the specifications of the satellites, the problem will be 1 More details on differential game theory is discussed in Chapter 2 16

17 solved to yield the possible interactions and the control actuations required to achieve them. That information can be used as a feedback to redefine the design specifications of the satellites to be built (e.g., if the interactions are kept small but the required control efforts for the SV are too high, the simulation can be adjusted to weigh more towards lowering the control efforts at the cost of higher interactions). Solve for the interactions and the necessary control inputs to minimize them New satellite models for a noncooperative post-docking scenario Service Vehicle Target Vehicle Service Vehicle Target Vehicle Interactions potentially too great for the to maintain docking, or the necessary control input too big to generate Redefine the structure strength, actuator specification, etc. Figure 1-1. A design iteration through satellite post-docking analysis. The analysis is expected to contribute to an establishment of a new technology for the future space application. In Chapter 2 several technical backgrounds including game theory will be described. In Chapter 3 approaches to solving the differential game problem will be discussed. In Chapter 4 the dynamics of the satellite post-docking will be investigated. The solutions to the particular game-based control design problems will be presented in Chapter 5 and Chapter 6. 17

18 CHAPTER 2 MATHEMATICAL BACKGROUND FOR THE APPROACH 2.1 Differential Games and Control Theory Game theory is a study of conflict among multiple groups or individuals (players) making decisions in competitive situations [24]. Static games is a type of games where each player makes their decision simultaneously without the knowledge of the decisions of the others. Dynamic games or sequential games is an extension of static games where either the decision of each player is made in order (e.g., a two-level game where a player designated as the leader chooses their move and then the other player as the follower chooses theirs, based on the leader s action) or the game is played multiple times with the players able to learn from the results of the past games. Dynamic games may be played with rules described by a set of differential equations (e.g., a pursuit-evasion game played by two aircraft subject to their respective equations of motion). Such game is differential games [25]. Noncooperative differential game theory has been applied to a variety of control problems [26 39]. While zero-sum differential games have been heavily exploited in nonlinear H control theory, nonzero-sum differential games have had limited application in feedback control. In particular, Stackelberg differential games, which is based on a hierarchical relationship between the players, have been utilized in a decentralized control system [3], hierarchical control problems [28, 29, 37], and nonclassical control problems [31]. Differential games, as well as optimal control, are difficult tools to apply because of the challenges associated with determining analytical solutions, with a few exceptions such as the linear quadratic structure. One way to incorporate optimal control and differential game structures is to formulate a system composed of control terms to feedback linearize and additional control terms to optimize the residual system. For example, optimal controller are developed with feedback linearization with exact model knowledge assumption [4] and via neural networks 18

19 [41 43]. In [44] an open-loop Stackelberg game-based controller is developed based on the Robust Integral of the Sign of the Error (RISE) [45 47] technique. In order to design a SV for a space operation to deal with a non-cooperative interaction, it is necessary to study their dynamic behaviors as well as proper control architecture. Controlling each individual satellite which interfere one another requires game theoretic consideration. Multiobjective optimization problems, with each objective possibly conflicting one another, have been studied in the framework of game theory. The architecture of differential games was first deveoped by Isaacs [25] and has been applied to various engineering applications. Game theoretic approach to design the controller can handle the optimal control of multiple objects with conflicting objectives; even when the motion of the RSO is unknown and therefore non-cooperative, it may still be possible to analyze the interaction between the SV and the RSO, i.e., how much forces are applied to the vehicles or needed to apply. The simplest form of a two-person differential game is defined as follows: The system is given by a differential equations ẋ = f(x, u 1, u 2, t), x(t ) = x (2 1) consisting of two independent control inputs u 1 and u 2. By convention each control input is assigned to a player, such that u 1 is designed by Player 1 and u 2 is designed by Player 2. Each player chooses its control strategy in such a way that it minimizes the corresponding cost functionals J 1 (u 1, u 2 ) = Φ 1 (t, t f, x, x f ) + J 2 (u 1, u 2 ) = Φ 2 (t, t f, x, x f ) + tf t L 1 (x, u 1, u 2, t) dt (2 2) tf t L 2 (x, u 1, u 2, t) dt (2 3) Unlike optimal control problems, this set is not well-posed. WHy In an optimal control problem the optimal solution, if exists, guarantees that the cost is minimized to satisfy all the constraints, while in a two-person differential game in general the minimum cost 19

20 of player 1 and 2 cannot be obtained simultaneously. Minimization of J 1 often interferes minimization of J 2 and vice versa. In order to solve for u 1 and u 2, strategies to define the nature of the equilibrium solution, in other words how the game is played, need to be imposed. For modelling the non-cooperative interactions, Minimax, Nash, and Stackelberg strategies are considered Minimax Strategy In Minimax strategy, by assuming the worst case and trying to minimize the damage, one can obtain the solution which is the safest. Minimax considers a zero-sum game (the costs of all players add up to zero) such that J 1 = J 2 = J (2 4) therefore each control strategy is expressed as u 1 = arg min u 1 u 2 = arg min u 2 J 1 = arg min J u 1 J 2 = arg max J u 2 therefore, if the solution exists, it will be a saddle-point solution u 1 = arg min u 1 max J u 2 Minimax strategies are for zero-sum differential games, but even when a game is nonzero-sum, it is useful to consider minimax for noncooperative cases because the Player 1 may not know the objective of the Player 2; if J 2 is unknown, by assuming J 2 = J 1 the Player 1 is able to estimate the possible interaction with the Player Nash Strategy In Nash strategy each player tries to optimize their objective without caring about one another however it could result in the equilibrium solution which is not optimal to each individual player (e.g., Prisoner s Dilemma). In a two-person game Nash strategy each player tries to minimize their cost simultaneously, knowing that the other player 2

21 does the same. Therefore, given the cost of each player as J 1 (u 1, u 2 ) and J 2 (u 1, u 2 ), the Nash strategy {u 1n, u 2n } should satisfy the following: J 1 (u 1n, u 2n ) J 1 (u 1, u 2n ), J 2 (u 1n, u 2n ) J 2 (u 1n, u 2 ), that is, if one player changes their strategy, it results in increase in their cost. Therefore for zero-sum games the solution is the same as Minimax solution. For nonzero-sum games the necessary conditions for existence of Nash solution have been developed by [48]. In general, there are more than one set of the equilibrium solution for Nash strategy, or there may exist no solution at all. For a special structure of the game, linear quadratic differential game, where the cost functionals are quadratic in the state and the control, subject to the constraint which is a differential equation linear in the state and the control Stackelberg Strategy Stackelberg strategy assumes one player (the leader) has an advantage over the other (the follower) in minimizing its cost. In Stackelberg strategy, the leader can enforce their action, and the resultant equilibrium solution is always favorable to the leader. With Player 2 as the leader, the cost of each player associated with the Stackelberg strategy is as follows: J 1 (u 1s, u 2s ) J 1 (u 1, u 2s ) (2 5) J 2 (u 1s, u 2s ) J 2 (u 1, u 2 ) (2 6) where the subscript s denotes the Stackelberg strategy. Equation (2 5) shows that resultant cost Player 1 (follower) achieve is better off playing the Stackelberg strategy if the knowledge of Player 2 (leader) playing the Stackelberg strategy is available. Stackelberg solution is always better than or equal to the Nash solution [49], suggesting 21

22 that the Stackelberg strategy can characterize both hierarchical and non-hierarchical cases. J 2 (u 1s, u 2s ) J 2 (u 1n, u 2n ) (2 7) In Stackelberg games, the leader chooses its strategy first. It must be noted, however, that the order of choosing the strategy does not necessarily mean that one player physically acts before the other. Next the follower chooses its strategy such that it minimizes its cost given the leader s strategy. Thus to the follower the game is merely an optimization problem where it minimizes the cost with any strategy provided by the leader. An optimal control input of the follower is defined by u 1 (u 2 ) = arg min u 1 J 1 (u 1, u 2 ) (2 8) that is, the follower s control input is optimal to an arbitrary input of the leader. When the leader plays Stackelberg strategy, it assumes that the follower plays Stackelberg strategy; that the follower s decision is based on Eq. (2 8). Therefore the leader chooses its decision such that it solves the optimal control problem given u 1 : u 2s = arg min u 2 J 2 (u 1(u 2 ), u 2 ) (2 9) Once the Stackelberg solution of the leader is obtained, then the Stackelberg solution of the follower is also found as u 1s = u 1(u 2s ) = arg min u 1 J 1 (u 1, u 2s ) (2 1) which is based on the follower s assumption that the leader knows the follower follows the leader, and that the leader tries to optimize its objective accordingly. If Eq. (2 8) can be solved analytically as a function of u 2 then Eq. (2 1) is automatically determined once Eq. (2 9) is found, allowing u 1s and u 2s to be solved sequentially and separately, not simultaneously. 22

23 The concept was first applied to differential games in [5], and later solvability conditions were developed in [51, 52]. In a similar manner to the Nash strategy, the uniqueness of solution is not guaranteed in general, except for linear quadratic cases Open-Loop Strategies for Two-Person Linear Quadratic Differential Games The solutions of two-person differential games are found by solving the optimality conditions obtained by the calculus of variations, 1 but theboundary-value problems are so complicated that in many cases they don t have analytical solutions. However, the analytical solution exists for a simpleproblem. Two-person linear quadratic (LQ) differential games is a class of differential games, where dynamic constraints given by Eq. (2 1) are described by a set of linear differential equations ẋ = dx dt = Ax + B 1 u 1 + B 2 u 2, x(t ) = x (2 11) and the cost functionals given by Eqs. (2 2)-(2 3) are described by quadratic form J 1 (u 1, u 2 ) = 1 2 xt f K 1f x f J 2 (u 1, u 2 ) = 1 2 xt f K 2f x f In this case the control strategies are also linear in the states x, tf t ( x T Q 1 x + u T 1 R 11 u 1 + u T 2 R 12 u 2 ) dt (2 12) tf t ( x T Q 2 x + u T 1 R 21 u 1 + u T 2 R 22 u 2 ) dt (2 13) u 1 = R 1 11 BT 1 K 1 x (2 14) u 2 = R 1 22 BT 2 K 2 x (2 15) where K 1 and K 2 are the solutions of the Riccati differential equations, which are associated with the optimality conditions for the LQ differential games and vary with 1 The derivation of the optimality conditions is provided in Appendix. 23

24 strategies. For Minimax strategies, K 1 = K 1 A A T K 1 Q 1 + K 1 B 1 R 1 11 B 1K 1 + K 1 B 2 R 1 12 B 2K 1, K 1 (t f ) = K 1f (2 16) K 2 = K 2 A A T K 2 Q 2 + K 2 B 1 R 1 21 B 1K 2 + K 2 B 2 R 1 22 B 2K 2, K 2 (t f ) = K 2f (2 17) For Nash strategies, K 1 = K 1 A A T K 1 Q 1 + K 1 B 1 R 1 11 BT 1 K 1 + K 1 B 2 R 1 22 BT 2 K 2, K 1 (t f ) = K 1f K 2 = K 2 A A T K 2 Q 2 + K 2 B 2 R 1 22 BT 2 K 2 + K 2 B 1 R 1 11 BT 1 K 1, K 2 (t f ) = K 2f For Stackelberg strategies, there is an additional set of differential equations that need to be solved. K 1 = K 1 A + K 1 B 1 R 1 11 BT 1 K 1 + K 1 B 2 R 1 22 BT 2 K 2 A T K 1 Q, K 1 (t f ) = K 1f K 2 = K 2 A + K 2 B 1 R 1 11 BT 1 K 1 + K 2 B 2 R 1 22 BT 2 K 2 A T K 2 Q 2 + Q 1 P, K 2 (t f ) = K 2f (2 18) Ṗ = PA + PB 1 R 1 11 BT 1 K 1 + PB 2 R 1 22 BT 2 K 2 + AP B 1 R 1 11 RT 21R 1 11 BT 1 K 1 + B 1 R 1 11 BT 1 K 2, P(t f ) = 2.2 Numerical Methods to Optimal Control Problem An optimal control problem is defined as follows: Find u which minimizes the cost functional J = Φ (x(t ), x(t f ), t, t f ) + φ(x, t) + tf t L (x, u, t) dt (2 19) where Φ is the terminal constraint, φ is the path constraint, and with dynamic constraint ẋ = f (x, u, t), x(t ) = x (2 2) Optimal control can be seen as a one-player differential game and thus shares with it the same problem with finding the solution. A classical approach, the indirect 24

25 method, is to use calculus of variations to construct a set of differential equations whose solution is the optimal control strategy. The resultant set of differential equations is a boundary-value problem, and its solution is rarely analytical. Numerically solving a boundary-value problem is often difficult due to necessity of initial guess and small radius of convergence. In optimal control the direct method transcribes the problem into a parameter optimization problem (Nonlinear Programming). The state equation and control inputs are discretized and approximated by interpolating polynomial functions, and the cost functional is evaluated by numerical integration. As discussed in [53], there are many ways of transcription, but when an optimal control problem is transcribed to a nonlinear programming problem, they are essentially two different problems, and it is important that the solution to the converted nonlinear programming problem is indeed the solution to the original optimal control problem. This can be checked by comparing the kkt multipliers of the transcribed NLP problem to the costate of the optimal control problem with indirect method. Pseudospectral method, which is also known as orthogonal collocation method [54], converts an optimal control problem to a nonlinear programming problem by approximating the dynamics with the derivative of orthogonal interpolating polynomial functions, and the integral form of the cost with the gauss quadrature numerical integration. For example, Legendre pseudospectral method (LPM) [55], uses lagrange interpolation at Lobatto collocation points, which are roots of the derivative of Legendre polynomial, as well as the corresponding Lobatto weights to evaluate the numerical integration. [56] showed that the costate approximation with LPM is exact at every collocation point. 2.3 Bilevel Programming Bilevel Programming is a class of problems where two parameter optimization problems are arranged in such a way that some of the constraints of one problem is 25

26 defined by the solution to another problem. The relationship between Stackelberg differential games and bilevel programming is analogous to that between optimal control and nonlinear programming. The coupling of multiple optimization problems defines games, and the hierarchical nature relates Stackelberg games, in particular to optimistic bilevel programming problems [57]. Unlike the single-player parameter optimization problems (e.g., nonlinear programming) there is no well-established techniques to solving bilevel programming problems due to its complexity. [58] shows an efficient algorithm to solving linear bilevel programming problems. However, for nonlinear cases, previous works mainly focus on specific problems ([59 61]). In order to fully investigate complex nonlinear Stackelberg differential games, it is necessary to have a well established numerical method. In Section 3.2 an orthogonal collocation approach is shown to transcribe a two-player Stackelberg differential game problem to a 2N-player Stackelberg static game, and in Section 3.3 an example problem is presented. However, this dissertation does not explore bilevel programming and instead focuses on designing game-theoretic controllers for spacecraft post-docking. 26

27 CHAPTER 3 TECHNICAL DESCRIPTION In this section several approaches to solving two-player Stackelberg differential games are discussed. One of them considers transforming the problem to a static game problem by discretization. The static games considered here are particularly the multi-objective optimization problems which are similar to differential games except that the differential constraints and the cost functionals are replaced by static constraints and cost functions, respectively. The solution of an optimal control problem is obtained as follows [62]; first the differential constraints are augmented to the cost functional to form the Hamiltonian, and with calculus of variations the variation of the Hamiltonian with respect to each of the state and control variables are obtained. These variations are differential equations and called the optimality conditions. The optimality conditions consist of the dynamics of the states (i.e., the system dynamics or the original differential constraints) and the costates (i.e., the Lagrange multipliers used to augment the differential constraints to the cost), and combined the set of differential equations is a boundary-value problem. Whether a differential game problem admits an analytical solution or not depends on the existence of the analytical solution to the boundary value problem defined by the optimality conditions. Since differential equations do not have an analytical solution in many cases [63], The solution of differential games with calculus of variations (indirect method), where the solution is obtained by solving differential equations satisfying the optimality condition, is limited; it is difficult to ensure the existence of the solution especially when the problem is nonlinear, and even though the existence is ensured, it is still difficult to solve the boundary value problem. Although optimal control in general suffers the same problems as differential games, there is a class of numerical methods to transcribe the optimal control problem to a parameter optimization problem using 27

28 Direct Collocation, which can be solved with Nonlinear Programming. This so-called direct method guarantees the existence of the solution in exchange of possible loss of optimality. Three candidates for solving differential games are considered here. First, indirect method is to be used when the problem is simple and can take certain forms such that the analytical solution is well developed. Second, in some cases a two-person differential game can be converted to an optimal control problem and solved with direct methods. Third choice is inspired by direct methods for optimal control; transcribing differential games to static games using orthogonal collocation (Pseudospectral method). Since optimal control can be considered as a single-player game, there is a relationship between Stackelberg games and optimal control, as shown in Fig Stackelberg differential games have similar structure to Stackelberg static games, as optimal control does to nonlinear programming. A Stackelberg differential game problem can be reduced to an optimal control problem by including the optimality conditions for the follower, and the solution of the optimal control problem provides the solution of the Stackelberg leader. However, it doesn t provide the solution of the follower; it needs to be computed separately based on the solution of the optimal control problem, and it is difficult to discuss that the solution of the leader and the follower have the same level of accuracy. The same concept applies to the transition from a bilevel programming problem to a nonlinear programming problem (Therefore those arrows are in dotted lines). Figure 3-2 shows more general relationships among optimization problems. Not all differential games can be reduced to optimal control problems, thus the indirect approach to solving differential games does not necessarily go through solving optimal control. Likewise, not all static games can be reduced to nonlinear programming. Differential games and optimal control can be solved indirectly via calculus of 28

29 Stackelberg Differential Game Optimal Control Direct Transcription Bilevel Programming Nonlinear Programming Figure 3-1. Stackelberg games are hierarchical optimization and can be reduced to single optimization problems. Differential games and optimal control can be converted through discretization to static games and nonlinear programming, respectively. variations. If successfully transcribed, it is possible to solve optimal control problems and differential game problems as parameter optimization problems, such as nonlinear programming and bilevel programming. Although it hasn t been as popular as optimal control case to discretize differential game problems to static ones, researchers have studied this discretization for pursuit-evasion (zero-sum) games. Ehtamo [64] performed both discretization of the optimality conditions to nonlinear programming and direct conversion to bilevel programming and showed they led to the same solution. Horie [65] converted the optimality conditions and solved two sets of nonlinear programming combined with genetic algorithm. Still, no attempt has been made to apply the Pseudospectral method to solve nonzero-sum differential games, to the author s knowledge. These prior efforts inspire an approach to solving nonzero-sum Stackelberg differential games of post-docked satellites by transcribing to nonlinear programming and static game (bilevel programming). Due to the structure of Stackelberg differential games, building connections among differential/static games, optimal control, and 29

30 Differential Game Optimal Control Calculus of Variations and solve BVP Direct Transcription Static Game Nonlinear Programming Parameter Optimization Figure 3-2. Differential games and optimal control problems can be solved by indirect methods using calculus of variations, or by direct methods through direct transcription. nonlinear programming should be possible. Although the effort is only toward Stackelberg games, it could be extended to Nash games in the future. 3.1 Reduction of Stackelberg Differential Games to Optimal Control Two-person differential games could be posed as two coupled optimal control problems. Let the control set of the player 1 and the player 2 be u 1 and u 2, respectively, then u 1 solves an optimal control problem Minimize J 1 = Φ 1 (x, x f, t, t f ) + tf L 1 (x, u 1, u 2, t) dt t (3 1) Subject to ẋ = f (x, u 1, u 2, t), x(t ) = x 3

31 and u 2 solves another optimal control problem Minimize J 2 = Φ 2 (x, x f, t, t f ) + tf L 2 (x, u 1, u 2, t) dt t (3 2) subject to ẋ = f (x, u 1, u 2, t), x(t ) = x Suppose u 1 and u 2 respectively as the follower and the leader. When a Stackelberg strategy is played, a two-player nonzero-sum differential game is solved as follows [66]; first the differential constraints are augmented to the follower s cost functional to form the follower s Hamiltonian, from which the optimality conditions for the follower are obtained. can be converted to the optimal control problem of the leader, which can be solved numerically using collocation method. One characteristic of the Stackelberg strategy in two-person differential game is that the follower always acts optimally to the leader, therefore if the leader s control strategy is presented, the follower u 1 solves a tracking problem defined by Eq. (3 2) assuming u 2 is a prescribed function of time. The knowledge that the follower tracks the leader can be used as additional constraints in solving for the leader s strategy as follows. Let the Hamiltonian of the follower be H 1, defined as H 1 = L 1 (x, u 1, u 2, t) + λ T 1 f (x, u 1, u 2, t) then the follower s optimality condition is given by ( ) T H1 = (3 3) u 1 λ T 1 = H 1 x, λ T 1 (t f ) = Φ 1 x(t f ) (3 4) Equation (3 3) relates the follower s control u 1 and the costate λ 1. If u 1 can be expressed explicitly in terms of x, λ 1, u 2, and t as u 1 = u 1 (x, λ 1, u 2, t) (3 5) 31

32 then by replacing u 1 with Eq. (3 5) and combining Eq. (3 4), the two-player differential game problem is reduced to the optimal control problem of the leader u 2 : Minimize J 2 = Φ 2 (x, x f, t, t f ) + tf t L 2 (x, u 1 (x, λ 1, u 2, t), u 2, t) dt subject to ẋ = f (x, λ 1, u 2, t), x() = x (3 6) λ T 1 = H 1 x, λt 1 (t f ) = Φ 1 x(t f ) Once the optimal control problem is defined by Eq. (3 6) it is possible to solve using existing numerical methods discussed in Section 2.2. There are two issues with this conversion: (i) the follower s control strategy is restricted to continuous, while the leader can admit discontinuous control inputs. (ii) direct method of optimal control may sacrifice the optimality for the existence of the solution, but due to the conversion, only the leader s optimality is sacrificed (although Stackelberg leader is better at optimizing their cost than the follower). Therefore, except for the case when Eq. (3 6) can be analytically solved, it is more reasonable to maintain the game structure, by transcribing the differential game problem to a static game problem. 3.2 Conversion of Stackelberg Differential Games to Stackelberg Static Games In the Section 2.2 it was shown that an optimal control problem can be converted to an optimization problem by discretizing the time domain, approximating the states with interpolating functions, and numerically integrating the cost functionals. Differential game problems have the same structure as optimal control problems, as both involve optimization of functions over time subject to dynamic constraints. Thus, the same method of transcription can be employed to convert differential game and optimal control problems to static game and nonlinear programming problems, respectively. In this section, in the similar manner that an optimal control problem is transcribed to a nonlinear programming problem, a two-person Stackelberg differential game is 32

33 transcribed to a Stackelberg static programming problem. As discussed in Section 2.3, Stackelberg static games is a subset of bilevel programming problems. Transcription with the LGL Collocation.. In this section general transcription formulation is developed for two-person Stackelberg differential games using the Legendre-Gauss-Lobatto (LGL) collocation points. The main idea of the numerical approach with collocation points is to approximate x, u 1, and u 2 as polynomials constructed from finite data points, and find their coefficients such that the approximated functions satisfy the original game problem at each collocation point [54]. LGL points are defined as 1, 1, and the roots of the derivative of the Nth-order Legendre polynomial in the interval t [ 1, 1] (thus total of N + 1 points). Recall the general two-player Stackelberg differential game with the player 2 as the leader can be modeled as Minimize J 1 = Φ 1 (x(t ), t, x(t f ), t f ) + J 2 = Φ 2 (x(t ), t, x(t f ), t f ) + subject to tf t tf t L 1 (x(t), u 1 (t), u 2 (t), t) dt L 2 (x(t), u 1 (t), u 2 (t), t) dt ẋ = f (x(t), u 1 (t), u 2 (t), t), x(t ) = x For convenience, let M = L 1, N = L 2, u = u 1, and v = u 2. The time domain needs to be scaled to τ [ 1, 1]: t = t + (t f t )τ + (t f t ) 2 dt = t f t 2 dτ For N + 1 collocation points, the state dynamics becomes N + 1 equality constraints ẋ i = t f t 2 f (x(τ i), u(τ i ), v(τ i ), τ i ), i = 1,..., N, x(τ ) = x 33

34 and the cost functionals become and J 1 = Φ 1 + tf t = Φ 1 + t f t 2 = Φ 1 + t f t 2 L 1 (x(t), u 1 (t), u 2 (t), t)dt τn τ M(x(τ), u(τ), v(τ), τ)dτ N w i M i, i= J 2 = Φ 2 + t f t 2 N w i N i, i= where w i s are the weights associated with Gauss-Lobatto quadrature, which approximates the integral part of the cost functional [67]. Thus, the resultant bilevel programming problem is as follows; first the follower u solves the lower level problem Minimize J 1 = Φ 1 + t f t 2 subject to N w i M i, i= (3 7) t f t 2 f i ẋ i =, i = 1,..., N, however, since Eq. (3 7) depends also on the leader v, which has not been found yet, the lower level problem alone does not provide the unique solution for the follower. Instead, the lower level problem defines an optimal reaction set for the follower, such that u is determined once v is defined (i.e., u = u(v)). The leader, on the other hand, in 34

35 solving the upper level problem Minimize J 2 = Φ 2 + t f t 2 subject to N w i N i, i= (3 8) t f t 2 f i ẋ i =, i = 1,..., N, takes into consideration the solution of Eq. (3 7). By substituting u s with the follower s information from the lower level problem, Eq. (3 8) becomes a well-posed parameter optimization problem of the leader. 3.3 Costate Mapping of Stackelberg Differential Games Direct transcription is applied to an example problem where the analytical solution exists, in order to observe the validity of the approach to two-person Stackelberg differential games. The comparison between the costates of a Stackelberg differential game problem and the costates (or the KKT multipliers) of the transcribed bilevel programming problem is given in Appendix C. Example.. Consider a nonzero-sum pursuit-evasion game that Simaan presented in [51] given by ẋ = u 1 u 2, x() = x J 1 = 1 2 x 2 f + 1 2c p J 2 = 1 2 x 2 f + 1 2c e 1 1 u 2 1dt u 2 2dt (3 9) where x, u 1, u 2 R with u 2 as the leader, and c p >, c e > are known constants (c p c e = 1). This problem is chosen since the analytical solution exists, so that the solution with a direct method can be compared. The analytical solution to Eq. (3 9) 35

36 provided in [51] is c p u 1 = c p σc e + 1 x σc e u 2 = c p σc e + 1 x and x = x x f = 1 c p σc e + 1 x where σ = c p Now the problem is solved via Legendre pseudospectral method (LPM). For N = 2, the scaled time domain is τ = 1, τ 1 =, τ 2 = 1 with the corresponding Lobatto weights w = 1 3, w 1 = 4 3, w 2 = 1 3 The state is approximated with Lagrange polynomial of order 2: ( 1 x(τ) = τ 2 2 x x ) 2 x 2 + τ ( 1 2 x + 1 ) 2 x 2 + x 1 The derivative ẋ(τ) is then ( dx 1 dτ = 2τ 2 x x ) 2 x x x 2 36

37 Evaluating at each discretization points, ẋ = 3 2 x + 2x x 2 ẋ 1 = 1 2 x x 2 ẋ 2 = 1 2 x 2x x 2 These derivatives are expressed in a matrix form Ẋ = DX where X = x x 1 x 2, Ẋ = ẋ ẋ 1 ẋ , D = The dynamics is discretized at three collocation points or ẋ i = t f t 2 f i, i = 1, 2, 3 DX = 1 2 F which results in three equality constraints h = 1 2 f ẋ h 1 = 1 2 f 1 ẋ 1 h 2 = 1 2 f 2 ẋ 2 37

A Gauss Lobatto quadrature method for solving optimal control problems

ANZIAM J. 47 (EMAC2005) pp.c101 C115, 2006 C101 A Gauss Lobatto quadrature method for solving optimal control problems P. Williams (Received 29 August 2005; revised 13 July 2006) Abstract This paper proposes