arxiv: v1 [cs.sy] 22 Nov 2018

Size: px

Start display at page:

Download "arxiv: v1 [cs.sy] 22 Nov 2018"

Clara Barnett
5 years ago
Views:

1 AAS ROBUST MYOPIC CONTROL FOR SYSTEMS WITH IMPERFECT OBSERVATIONS Dantong Ge, Melkior Ornik, and Ufk Topc arxiv: v1 [cs.sy] 22 Nov 2018 INTRODUCTION Control of systems operating in nexplored environments is challenging de to lack of complete model knowledge. Additionally, nder measrement noises, data collected from onboard sensors are of limited accracy. This paper considers imperfect state observations in developing a control strategy for systems moving in nknown environments. First, we inclde hard constraints in the problem for safety concerns. Given the observed states, the robst myopic control approach learns local dynamics, explores all possible trajectories within the observation error bond, and comptes the optimal control action sing robst optimization. Finally, we validate the method in an OSIRIS-REx-based asteroid landing scenario. An accrate dynamics model is indispensable in developing control strategies for systems moving in a complex and changing environment. When exploring a new environment, however, prior knowledge of the dynamics model is sally scarce [1]. Consider a small body landing mission. System dynamics are complex de to the object s irreglar shape, non-niform mass distribtion, weak gravitational field, as well as environmental pertrbations cased by solar pressre and gravity of other celestial body [2, 3]. Withot long-term close observation, a precise landing dynamics model is sally navailable before lanch [4]. To decrease the impact of model ncertainty on the actal system behavior, the system shold pdate its onboard model and make real-time decisions based on interactions with the environment. Existing methods for data-driven learning [5, 6], control design and performance assessment nder nknown dynamics [7,8] typically reqire intensive measrement data or significant information abot the model, sch as pre-designed controllers nder different conditions, parameter variation range, or state characteristics. Nevertheless, in the problem setting stdied in this paper, state measrements are limited. Frthermore, we assme that there is no prior knowledge of the environment. In [9], a sampling-based task learning method is proposed for control-affine systems with nknown dynamics. The method attains the optimal control policy throgh a seqential learning of a global state-vale fnction in the state space and a local action-vale fnction arond each state. Althogh the method is independent of actal models, it cannot be applied in asteroid landings, since exploring the entire state space in a limited time is impractical. Ph.D. Candidate, Department of Aerospace Engineering, Beijing Institte of Technology, Beijing, China ; Visiting Stdent, Department of Aerospace Engineering and Engineering Mechanics, University of Texas at Astin, Astin, TX Postdoctoral Fellow, Institte for Comptational Engineering and Sciences, University of Texas at Astin, Astin, TX Assistant Professor, Department of Aerospace Engineering and Engineering Mechanics; Institte for Comptational Engineering and Sciences, University of Texas at Astin, Astin, TX

2 The concept of myopic control was first proposed in [10], where the method learns local dynamics by observing the variation of states nder a given control seqence, and ses the estimated dynamics to make online decisions. A goodness fnction is developed to qantify the system performance with respect to mission objectives, and the control action that yields the maximal vale of the goodness fnction is selected as the piecewise constant control for the next step. The ability of learning dynamics and compting control actions in real-time makes myopic control a good fit to be implemented in or problem, except that, in the method, accrate state observations are assmed to be available at every sampling point. In practice, observation information obtained from sensors is sally of limited precision de to measrement noise or instrment calibration error [11]. Sch imperfect state observations may be misleading in decision-making as, for example, the generated control action may gide the vehicle to an nexpected direction. Another drawback of myopic control is that hard constraints are not garanteed to be satisfied in the context. Althogh the method assigns a small vale to the goodness fnction for ndesirable system performance so that the corresponding control action is less likely to be selected, it does not exclde the possibility of constraint violation. As hard constraints define the feasible regions in the state space, it is reqired for safety concern that they are strictly satisfied all the time. In this paper, we extend the work of myopic control with hard safety constraint garantees, and propose a robst myopic control strategy in the presence of imperfect observations. Instead of optimizing based on a single observed trajectory, the method considers the set of all possible trajectories within a given observation error bond, and comptes the optimal control actions sing robst optimization. We validate the approach in the setting of an on-going asteroid mission OSIRIS-REx with landing reqirements [3], and compare the system performance of both nominal myopic control and robst myopic control nder the same observation conditions. PRELIMINARIES In this section, we give a brief overview of the notation, necessary assmptions, and the general idea of myopic control. We denote the set {0, 1,..., m} by [m] and the set of all continos fnctions from set A R m to set B R n by C(A, B). For a vector v R n, v denotes its 2-norm. For x R n, x i with i 1, 2,..., n denotes the i-th component of x. Notation f [a,b] emphasises that fnction f is only considered on the interval [a, b]. Symbol N denotes all strictly positive integers. Consider the system with compact domain X R n ẋ = f(x) m g i (x) i (1) where fnctions f, {g i } i [m] C(X, X ) and control inpt = [ 1,..., m ] T R m. Let the soltion of (1) nder control signal : [0, ) R m and initial state x 0 be denoted by φ (, x 0 ). We assme that is piecewise-constant and a niqe soltion φ (, x 0 ) nder exists. We make the following assmptions: i=1 No prior knowledge of the dynamics is available except it is in the form of (1). Fnctions f and {g i } i [m] are nknown, bt are known to be bonded by M 0 and are M 1 -Lipschitz, i.e., 2

3 for some M 0 > 0, M 1 > 0, f(x) M 0, g i (x) M 0, i [m], f(x) f(y) M 1 x y, x, y X, g i (x) g i (y) M 1 x y, x, y X, i [m]. (2) The system starts at an initial state x 0 and rns only once. All control decisions are made dring the rn with no repetitions. For any given control action, a goodness fnction is defined as an evalation of how well the system performs according to the control objective where (φ, v) G(φ, v), φ F, v R n (3) F = T 0 C([0, T ], X ) (4) The system performance to be evalated incldes the trajectory φ ntil time T and the system velocity v at time T. The Lipschitz constant L of fnction G : F R n R for all φ 1 [0,T1 ], φ 2 [0,T2 ] F and v 1, v 2 R n is given by where G(φ 1 [0,T1 ], v 1 ) G(φ 2 [0,T2 ], v 2 ) L(d(φ 1 [0,T1 ], φ 2 [0,T2 ]) v 1 v 2 ), (5) d(φ 1 [0,T1 ], φ 2 [0,T2 ]) = T 1 T 2 max φ 1(t) φ 2 (t). (6) t [0,min(T 1,T 2 )] Generally, the goodness fnction predicts how different control actions will inflence the system behavior in the ftre, and the one that maximizes the goodness fnction is regarded as the optimal control action. Ths, the problem to be solved can be addressed as finding a control strategy for the system that always reslts in the maximm goodness fnction vale, i.e., finding a control signal : [0, T ] U sch that, for all t [0, T ], if x = φ (t, x 0 ), then G(φ (, x 0 ) [0,t], f(x) m i=1 g i (x) i (t)) = max U G(φ (, x 0 ) [0,t], f(x) m g i (x) i ). (7) We now give a brief description of myopic control and refer the reader to [10] for details of an approximate soltion to (7). The myopic control strategy is bilt pon seqential and repeating phases of learning local dynamics and compting optimal control actions. First, we design a goodness fnction G(φ, v) according to the control objective. In order to learn the dynamics, m 1 affinely independent control inpts are applied on the system for a given short period of time and the state variations are observed. Assme that the affinely independent control inpts are marked as 0,..., m with = 0 at time t 0. By applying j, j = 0,..., m in [t 0 j, t 0 (j 1)], the system state at the end of the time interval is observed and marked as i=1 3

4 x j1. The corresponding trajectory composed by these states is denoted as φ. The local dynamics are then approximated by m v(, x) = λ j (x j1 x j )/, (8) where λ 0,..., λ m are niqe coefficients determined by λ j = 1 and = λ j ( j ). The optimal control action that maximizes the goodness fnction is obtained as arg max U G(φ(, x 0) [0,t0 (m1)], v(, x)) (9) Sch a is then sed as the basic control inpt in the next iteration, with the new t 0 given by t 0 (m 1). For simplicity, we denote the entire control signal derived from this strategy as : [0, T ] U. The process repeats ntil task completion. PROBLEM STATEMENT In this paper, we improve the basic method for myopic control in [10] by acconting for the existence of hard safety constraints [12] and availability of only imperfect system state observations [13]. We define hard safety constraints as safety-related state constraints that need to be strictly satisfied. By denoting the set of all hazardos states as B X, the states that satisfy safety constraints are given by x(t) X \B. Besides, the observed states attained from sensors often deviate from the tre ones de to measrement noise and instrmental error [14]. We consider sch imperfect observations in the problem, and se x obs to distingish the observed state from the tre state x tre. We assme that even thogh state observations are inaccrate, there is a known constant > 0 sch that x tre (t) x obs (t) for all t 0. The problem to be solved in this paper can be described as Problem 1 Let the initial state x 0 X, the nsafe set B X, and the feasible control action set U R m be given. Additionally, let G be the goodness fnction designed from control objective, T 0 be the time length of the mission, and be the bond of state observation error. Find a control signal : [0, T ] U that satisfies the following conditions for all t [0, T ]: (i) φ (t, x 0 ) B, (ii) G(φ (, x 0 ) [0,t], v(, φ (t))) = max U G(φ (, x 0 ) [0,t], v(, φ (t))). In the remainder of the paper, we first propose a modified myopic control strategy to provide garantees for hard safety constraints. Then we solve the above problem with imperfect observations by developing a robst myopic control strategy. MYOPIC CONTROL WITH HARD SAFETY CONSTRAINTS In myopic control, the goodness fnction predicts system behavior in the near ftre based on the learned local dynamics. As a reslt, it can also foresee if there is any conflict between system safety and mission objective, i.e., whether the system will violate hard safety constraints when it is trying to achieve the goal. In this paper, we estimate system performance nder different control actions at the beginning of every learning period. For control inpts that lead to constraint violation, we set their goodness fnction vales as. Otherwise, the goodness fnction receives the same real vale as designed before. We se this method of assigning goodness fnction vales so that by maximizing the goodness fnction, the control action that cases constraint violation will not be chosen, provided there are any better choices. In the following, we present two possible ways for 4

5 system performance prediction: (i) Assme that the dynamics remain the same for a short time interval. We can directly predict system state φ pred (t t, x 0 ) nder every possible control action for some small t (e.g. a learning period). We regard a control action as bad if the predicted state belongs to set B. Then we set its goodness fnction vale as, i.e., if φ pred (t t, x 0 ) B, then G(φ (, x 0 ) [0,t], v(, x)) =. (ii) Assme that the dynamics keep changing, bt remain Lipschitz-continos with a known Lipschitz constant. Then we can compte the set Φ pred (t t, x 0 ) of all possible vales for φ (t t, x 0 ). We regard a control action as bad if any element in the set of predicted system state from time t to t t is inside B, i.e. if, sing the given control action, there is possible that the system will violate the state constraint. We then set the goodness fnction vale as, i.e., if t [t,t t] Φ pred (t, x 0 ) B, then G(φ (, x 0 ) [0,t], v(, x)) =. To demonstrate the effect of sch goodness fnction design, we present a simple example where the system is reqired to avoid an obstacle: Example 1 Consider an agent moving with the dynamics ẋ 1 = x 3, ẋ 2 = x 4, ẋ 3 = 1, ẋ 4 = 2. (10) Eqation (10) describes the movement of an agent in R 2, where x 1 and x 2 are the agent s position, and x 3 and x 4 are its velocity. Note that in the example, we only se the dynamics model to generate state observations and se no knowledge of it in determining control inpts. We frther define the obstacle region as B = {x R 4 (x 1 50) 2 x , x 2 0} and design the goodness fnction in the following form { rp r f 2 τ G(φ, v(, x)) =, r p B r p r B 2, r p B (11) where r f = [0, 0] T is the target position for (x 1, x 2 ), and r p is the predicted position of (x 1, x 2 ) nder the assmption of (i). The generated trajectory with τ = 150 is shown in Figre 1, where the system sccessflly avoids the obstacle and reaches the target. Having introdced a goodness fnction design with garanteed hard constraint satisfaction, we now trn or attention to the existence of imperfect system state observations. 5

6 Figre 1. An illstration of the system trajectory, in the x 1 -x 2 plane, in Example 1. The obstacle is denoted by a black dashed semicircle, and the trajectory is given in solid ble. ROBUST MYOPIC CONTROL In the presence of imperfect state observations, the myopic control method will learn an incorrect local dynamics and generate control actions that potentially lead the system to an nexpected direction. We illstrate this phenomenon in the previos example, by adding 5 different constant observation errors on x 2 i.e., x obs 2 = x tre 2 e. The obtained trajectories are depicted on the left side of Figre 3. As can be seen, all the actal trajectories collide with the obstacle becase the system wrongly believes that it is frther away. To overcome the constraint violation problem nder imperfect observations, we propose a robst myopic control strategy that ensres tolerable performance for all possible system trajectories. Assme that the observation error bond is known, i.e., the actal system state x tre (t) satisfies x tre (t) [x obs (t), x obs (t) ] for all t [0, T ]. Notice that there exist mltiple possible trajectories consistent with the same set of observations. Using the local dynamics learned from the observed states, we predict possible system performance nder different control actions for each trajectory. Analogos to classic robst control algorithms which take bonded modeling errors into accont and optimize system performance by solving a min-max problem [15, 16], we design the robst myopic control strategy. As illstrated in Figre 2, for every control inpt, the strategy finds its worst case, i.e., a trajectory that reslts in the worst performance when is applied. It then searches for the control action whose worst performance is the best among others. Formally, instead of the control inpt from (9), the robst myopic control strategy finds a control inpt U at time t 0 (m 1) sch that G(φ, v φ (, x)) = max U min φ B(φ obs, ) G(φ, v φ (, x)), (12) where v φ is the approximate system direction defined in (8), given the trajectory φ : [0, t 0 (m 1)] R n, φ obs is the observed system trajectory, and B(φ obs, ) = {φ : [0, t 0 (m 1)] R n φ(t) φ obs (t) for all t [0, t 0 (m 1)]}. The control inpt is then applied after modified by j, as described in the preliminary section. We assme that the min in (12) exists, which holds nder mild assmptions of continity of G. Alternatively, a nearlyrobst optimal control action can be obtained by choosing φ sch that G(φ, v φ (, x)) is close to sp φ B(φ obs, ) G(φ, v φ (, x)). For simplicity, as in the preliminary section, we slightly abse notation and in the ftre also se to denote the control signal compted from the robst myopic control method. 6

At the end of the learning period, we consider each of the three possible controls ( 1, 2, and 3 ), and calclate the predicted system velocity for each of those controls, at each possible trajectory.

7 Figre 2. An illstration of the robst myopic control strategy. The observed system trajectory is given in red, and two additional possible system trajectories consistent with known observation error bonds are given in black. At the end of the learning period, we consider each of the three possible controls ( 1, 2, and 3 ), and calclate the predicted system velocity for each of those controls, at each possible trajectory. The goodness fnction is designed to be proportional to the pwards slope of the predicted velocity vectors. As the worst slope generated by 1 for different possible trajectories is better than the worst slopes of 2 and 3, the robst control algorithm will choose 1 as its control inpt for the next learning period. We now provide a theoretical bond for system performance nder robst myopic control. We write φ tre for the tre trajectory nder control signal, and se v tre (, x) = f(x) m i=1 g i(x) i. We want to determine the level of sboptimality of the robst myopic control law, i.e., bond G(φ tre [0,t], v tre ( (t), φ tre (t))) max U G(φtre [0,t], v tre ((t), φ tre (t))). (13) In the following, we derive the theoretical bond based on the reslts in [10], which only covers the case where the goodness fnction is real-valed and has a Lipschitz constant in the sense of (5) (6). In the case where the goodness fnction can reach, it can be easily shown that the robst control law will always choose a control action that reslts in real-valed goodness, if sch a control action exists. Ths, we have Theorem 1 Let fnctions f, {g i } i [m] in system dynamics be bonded by M 0 and be M 1 -Lipschitz. Assme that the goodness fnction G is real-valed and has Lipschitz constant L, and that observation noises are bonded by. Then, G(φ tre [0,t], v tre ( (t), φ tre (t))) max 8LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ 2L for all t = k(m 1), k N. We give the proof of Theorem 1 in Appendix. U G(φtre [0,t], v tre (, φ tre (t))) ( )) m (14) 1 4m ( 3 4 Remark 1 While Theorem 1 provides bond only for t = k(m 1), i.e., end times of each learning cycle, a similar bond can be obtained for all t (m 1), in analogy to the reslts in [10]. Sch a reslt is a conseqence of Lipschitz continity of the system dynamics, as well as the bond on the size of test controls j. Remark 2 We note that the bond developed in Theorem 1 is extremely liberal, and can likely be redced by employing a more carefl, and technical, proof procedre. Nonetheless, the reslting δ 7

8 bond shows that parameters δ, and distribtion of the observation noises have a joint inflence on the expected system performance, and that by redcing observation noise and properly selecting parameters, robst myopic control achieves performance arbitrarily close to optimal. We now apply the robst myopic control strategy to Example 1. The same five constant observation errors are added on x 2, and the reslts are shown on the right side of Figre 3. In the simlation, we set the observation error bond as 1.5 times larger than the actal errors. The right side of Figre 3 shows that all system trajectories now avoid the obstacle. Figre 3. Trajectories generated by nominal myopic control (left) and robst myopic control (right) nder constant observation errors e = [0.5, 1.0, 1.5, 2.0, 2.5]. Ble solid lines refer to the actal trajectories, and red dashed lines refer to the observed trajectories. SIMULATION Asteroid Model In this section, we apply the proposed method to an asteroid landing scenario based on the ongoing OSIRIS-REx asteroid sample retrn mission [3]. Lanched in 2016, the spacecraft is expected to arrive at the asteroid 1999 RQ36 Benn in late After performing close-proximity mapping operations for abot 12 months, the spacecraft is going to descend towards the srface of the planet for sample collection. Crrently, a rogh asteroid model and its associated gravity field model are available based on grond observations [17]. In the simlation, we se the model of dynamics described in [18] to generate the actal states of the system: ṙ x = v x, ṙ y = v y, ṙ z = v z, v x = 2ωv y ω 2 r x V r x x p x, v y = 2ωv x ω 2 r y V r y y p y, v z = V r z z p z, (15) where r = [r x, r y, r z ] T is the lander position, v = [v x, v y, v z ] T is the lander velocity, and = [ x, y, z ] T is the control vector corresponding to engine thrst, all given in the body-fixed rotating 8

9 frame. Additionally, ω is the asteroid rotating speed, V is the potential field of the asteroid, and p = [p x, p y, p z ] T is the environmental pertrbation cased by solar pressre and gravitation of other celestial body. For simlation prpose, we simply set p = 0 and directly se the potential field model developed in [17]. We omit the details of this model, as it is complex and does not play a large role in the remainder of the paper. We emphasize that, in line with the setting of myopic control, no specific knowledge of the above dynamics model is sed in compting control actions we simply se model (15) to simlate system behavior. Goodness Fnction To achieve high landing accracy, the distance between the target and the predicted tochdown location shold be as small as possible. We denote by r f the position of the designated landing site, and r p (t ) the predicted system state at time t. In or prediction, we assme that the vehicle, if its crrent position, at time t, is r and velocity v, flies with a constant acceleration a dring [t, t ], i.e., r p (t ) = r (t t)v 1 2 (t t) 2 a. (16) where a is the predicted system acceleration when control inpt is applied at time t. The landing error is then calclated from r p (t go ) r f, where t go is the estimated landing time, i.e., time that r p intersects with the asteroid srface. As we seek to minimize this error, we define the goodness fnction as G(φ [0,t], v(, φ(t))) = r p (t go ) r f 2, (17) It is possible that the predicted trajectory r p keeps extending in the space and never intersects with the asteroid srface. In sch a case, we set a flight time pper bond t max, and define the landing error as the minimm distance between the lander and the asteroid in the prediction. Ths, we amend (17) by { r p (t go ) r f 2, t t go t max, G(φ [0,t], v(, φ(t))) = min t [t,t go] r p (t ) r f 2 (18), t go > t max. De to resoltion constraints, the crrent best available map of Benn can only model the asteroid as largely convex and locally flat. In order to showcase the qality of the proposed landing approach, we add an additional dangeros obstacle on the asteroid srface and place it close to the landing site. The additional obstacle poses a hard safety constraint to the problem, as landing on it is prohibited. We define the hazardos states as B = {r (r r B ) 2 R}, where r B is the center point of the semi-spherical obstacle. Upon consideration of hard constraints, we modify the goodness fnction as r p (t go ) r f 2 τ r p(t ) r B, r 2 p (t ) B, t t go t max, G(φ [0,t], v(, φ(t))) = min t [t,t go] r p (t ) r f 2 τ r p(t ) r B, r 2 p (t ) B, t go > t max,, r p (t ) B, (19) Here we choose τ = Note that the prediction of obstacle collision can be condcted at any time t [t, t go ]. Here we make a short-term collision prediction by setting t = t 2(m 1), i.e., twice the length of a learning period. 9

Reslts for Perfect Observations We first assme that the state observations are perfect, and apply myopic control with the goodness fnction defined in (19).

The vehicle is originally placed abot 90m above the asteroid srface. We reqire it to land on the srface at r f = [ 26, 0, 243] T m.

10 Reslts for Perfect Observations We first assme that the state observations are perfect, and apply myopic control with the goodness fnction defined in (19). We set the vehicle initial position as r 0 = [200, 100, 330] T m, where the origin of the coordinate system corresponds to the asteroid center of mass. The vehicle is originally placed abot 90m above the asteroid srface. We reqire it to land on the srface at r f = [ 26, 0, 243] T m. The obstacle sphere is centered at r B = [14, 23, 250] T m with radis R of 15m. The asteroid with the additional obstacle, as well as the starting and landing point, are illstrated in Figre 4. In the simlation, the time interval between state observations is = 2s. Since we apply for affine-independent control inpts j, j = 0,..., 3 in each iteration, the pdate cycle of the local dynamics model is (m 1) = 8s. The generated landing trajectory, with a 7m landing error, is shown in Figre 4. Figre 4. Landing trajectory and system performance nder perfect observations. Reslts for Imperfect Observation In the asteroid landing process, the data collected by sensors is inevitably contaminated de to environment noises and instrmental calibration deviation [19]. In or simlation, we add different position observation errors on the z-axis, and compare system performances nder nominal myopic control and robst myopic control. The nominal myopic control strategy directly ses the wrong state information in learning the dynamics and generating control actions. When the flight path is close to the obstacle in the landing process, sing nominal myopic control nder imperfect observation may lead to collision. Figre 5 illstrates the actal and observed trajectories nder observation error e = [1, 2, 3, 4]m respectively. As the error increases, the risk of intersecting with the obstacle region grows. When e 3m, the lander collides with the obstacle. On the other hand, the robst myopic control approach assmes that the bond of observation error is known by the system in advance. In every iteration, the method generates a control action that ensres system performance even in the worst-case scenario. Here we demonstrate the reslts of setting = 2e where observation error again eqals e = [1, 2, 3, 4]m. The obtained trajectories are shown in Figre 6. We can see that, nlike in Figre 5, all the trajectories manage to fly over the obstacle region and reach the target. We reiterate that, if /e 1 and the goodness fnction is set to for a control inpt whenever it is possible to reslt in constraint violation, then the system is garanteed to satisfy hard 10

Figre 5. Actal landing trajectories (ble solid line) and observed trajectories (red dashed line) sing nominal myopic control.

Actal landing trajectories (ble solid line) and observed trajectories (red dashed line) sing robst myopic control.

However, as the vale of /e increases, the robst myopic control strategy becomes more conservative and comptationally intractable, since a wider range of trajectories will be taken into accont in

CONCLUSION In this paper, we extend the work of myopic control by considering hard constraints and imperfect observations in the problem.

11 Figre 5. Actal landing trajectories (ble solid line) and observed trajectories (red dashed line) sing nominal myopic control. When e = 3m and e = 4m, the corresponding trajectories terminate prematrely de to intersection with the obstacle. Figre 6. Actal landing trajectories (ble solid line) and observed trajectories (red dashed line) sing robst myopic control. All the trajectories end at the designated landing site withot collision with the obstacle. constraints. However, as the vale of /e increases, the robst myopic control strategy becomes more conservative and comptationally intractable, since a wider range of trajectories will be taken into accont in decision-making. Ths, it is natrally desirable for /e to be as close to 1 as possible. In practice, the ratio /e depends on the qality of or prior knowledge of the sensing ncertainties. CONCLUSION In this paper, we extend the work of myopic control by considering hard constraints and imperfect observations in the problem. First, we provide garantees for hard safety constraint satisfaction in myopic control by introdcing as a possible goodness fnction vale. After that, we develop a robst myopic control strategy to deal with imperfect state observations. Given a fixed observation error bond, the method explores the set of all possible trajectories and finds an optimal control action that leads to the best-worst case performance. We then apply the method in an asteroid landing scenario based on the OSIRIS-REx mission. The reslt shows that when the system is moving in the vicinity of the hard constraint bondary, large observation errors will lead to constraint violation sing the nominal myopic control strategy. However, by considering all possible trajectories in determining the control action, the robst myopic control approach diminishes the risk of violating constraints and ths improves system safety. 11

12 ACKNOWLEDGMENT This research was fnded by grants NSF and NSF from the National Science Fondation. The athors thank Srilakshmi Pattabiraman for offering helpfl ideas in formlating the initial problem. APPENDIX: PROOF OF THEOREM 1 Proof. Let t = k(m 1). In the following, we se φ to denote the worst case trajectory in B(φ obs, ) in the sense of (12). Again, we abse notation and denote by both the control law obtained by the robst myopic control, and its particlar vale at time t = k(m 1); the se is clear from the context. In the remainder of the text, we omit the time variable: x tre j = φ tre (t 0 j), = φ (t 0 j), where t 0 = (k 1)(m 1), and all trajectories are defined on [0, t]. x j We note that G(φ tre, v tre (, x tre )) max U G(φtre, v tre (, x tre )) G(φ tre, v tre (, x tre )) G(φ tre, v tre (, x tre )) G(φ tre, v tre (, x tre )) max U G(φtre, v tre (, x tre )) by triangle ineqality, where, given by (9), is the myopically optimal control based on learned dynamics for trajectory φ tre. The last smmand in (20) wold be eqal to 0 if v = v tre, i.e., if the system perfectly learned the dynamics. However, since v v tre, this smmand can be bonded from Theorem 13 in [10] by 4LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ. (21) Now, based on the triangle ineqality, G(φ tre, v tre (, x tre )) G(φ tre, v tre (, x tre )) G(φ tre, v tre (, x tre )) G(φ, v(, x )) G(φ, v(, x )) G(φ tre, v tre (, x tre )), where, throgh an nderstandable abse of notation, v is obtained from learning sing the corresponding trajectory: e.g., v(, x ) = λ j (x j1 x j )/. Define φ tre φ. = max x tre j φ tre, x j φ, j [m] (20) (22) x tre j x j. (23) The dynamics learned from x approximate tre dynamics when x tre and x are close. Define j = x j xtre j for all j [m]. Since the goodness fnction G has Lipschitz constant L, the first part on the right hand side of (22) can be bonded by G(φ tre, v tre (, x tre )) G(φ, v(, x )) L( φ tre φ v tre (, x tre ) v(, x ) ) L(max j [m] xtre j x j vtre (, x tre ) v(, x ) ) L(max j [m] j v tre (, x tre ) v(, x ) ). (24) 12

13 We now bond v tre (, x tre ) v(, x ) : v tre (, x tre ) v(, x ) = (f(x) m g i i ) λ x j1 x j j = (f(x) m g i i ) λ x tre j1 j1 xtre j j j (f(x) m g i i ) λ x tre j1 xtre j j m λ j1 j j (f(x) m g i i ) λ x tre j1 xtre j j 2 max j [m] j ( ) m 1 4m δ 2M 0 M 1 (m 1) 3 (4m 3 2 δ) δ 2 max j [m] j ( ) m 1 4m, δ (25) where the bond for m λ j together with the dynamics learning error bond are proved in [10]. Combining (24) and (25), we find the bond for the first part on the right hand of (22) as G(φ tre, v tre (, x tre )) G(φ, v(, x )) ( 2LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ L max j [m] j 2 max j [m] j ) m (1 4m δ ) (26) Let s now bond G(φ, v(, x )) G(φ tre, v tre (, x tre )). Applying the optimal control generated from perfect state observations on all possible tre trajectories, there exists a trajectory that receives the minimm goodness fnction vale. Denote this trajectory and the states on it as φ 0 and x0. Then, we have G(φ, v(, x )) G(φ 0, v(, x 0 )). (27) Considering that G(φ 0, v(, x 0 )) G(φ tre, v tre (, x tre )) G(φ 0, v(, x 0 )) G(φ tre, v(, x tre )) G(φ tre, v(, x tre )) G(φ tre, v tre (, x tre )) L(2 max j [m] j v(, x 0 ) v(, x tre ) ) 2LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ, (28) 13

14 and we obtain v(, x 0 ) v(, x tre ) m x = 0 j1 λ x0 m j x tre j λ j m (x = 0 j1 λ xtre j1 ) (x0 j xtre j 2 max j [m] j ( ) m 1 4m, δ j1 xtre j G(φ tre, v tre (, x tre )) G(φ, v(, x )) G(φ tre, v tre (, x tre )) G(φ 0, v(, x 0 )) ( L 2 max j [m] j 2 max j [m] j ) m (1 4m δ ) 2LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ. (30) On the other hand, as φ is the worst case trajectory nder control, and is the myopically optimal control for trajectory φ tre, we have G(φ, v(, x )) G(φ tre, v(, x tre )) G(φ tre, v(, x tre )). (31) Hence, G(φ tre, v tre (, x tre )) G(φ, v(, x )) can be bonded from below: j ) G(φ tre, v tre (, x tre )) G(φ, v(, x )) G(φ tre, v tre (, x tre )) G(φ tre, v(, x tre )) G(φ tre, v tre (, x tre )) G(φ tre, v(, x tre )) 2LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ. From (30) and (32), we obtain G(φ, v(, x )) G(φ tre, v tre (, x tre )) ( L 2 max j [m] j 2 max j [m] j ) m (1 4m δ ) 2LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ (33) Combining (21), (22), (26), and (33), we obtain G(φ tre, v tre (, x tre )) max U G(φtre 8LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ L 8LM 0 M 1 (m 1) 3 (4m 3 2 δ) δ 2L, v tre (, x tre )) ( 3 max j [m] j 4 max j [m] j ( 3 4 ( )) m 1 4m. δ ( 1 4m (29) (32) )) m δ (34) 14

15 REFERENCES [1] R. S. Stton, A. G. Barto, et al., Reinforcement learning: An introdction. MIT press, [2] S. Ulamec and J. Biele, Srface elements and landing strategies for small bodies missions - Philae and beyond, Advances in Space Research, Vol. 44, No. 7, 2009, pp [3] J. Gal-Edd and A. Chevront, The OSIRIS-REx asteroid sample retrn mission operations design, IEEE Aerospace Conference, 2015, pp [4] K. Berry, B. Stter, A. May, K. Williams, B. W. Barbee, M. Beckman, and B. Williams, OSIRIS-REx toch-and-go (TAG) mission design and analysis, 36th Annal AAS Gidance and Control Conference, [5] S. L. Brnton, J. L. Proctor, and J. N. Ktz, Discovering governing eqations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences, Vol. 113, No. 15, 2016, pp [6] D. J. Hills, A. M. Grütter, and J. J. Hdson, An algorithm for discovering Lagrangians atomatically from data, PeerJ Compter Science, Vol. 1, 2015, p. e31. [7] S. Khadraoi, H. N. Nono, M. N. Nono, A. Datta, and S. P. Bhattacharyya, Adaptive Controller Design for Unknown Systems Using Measred Data, Asian Jornal of Control, Vol. 18, No. 4, 2016, pp [8] M. Ahmadi, A. Israel, and U. Topc, Safety assessment based on physically-viable data-driven models, 56th IEEE Conference on Decision and Control, 2017, pp [9] A. Fast, P. Rymgaart, M. Salman, R. Fierro, and L. Tapia, Continos action reinforcement learning for control-affine systems with nknown dynamics, IEEE/CAA Jornal of Atomatica Sinica, Vol. 1, No. 3, 2014, pp [10] M. Ornik, A. Israel, and U. Topc, Control-oriented learning on the fly, arxiv , [11] P. Singla, K. Sbbarao, and J. L. Jnkins, Adaptive otpt feedback control for spacecraft rendezvos and docking nder measrement ncertainty, Jornal of Gidance, Control, and Dynamics, Vol. 29, No. 4, 2006, pp [12] T. Schowenaars, J. How, and E. Feron, Receding horizon path planning with implicit safety garantees, American Control Conference, Proceedings of the 2004, Vol. 6, IEEE, 2004, pp [13] J. Van Den Berg, P. Abbeel, and K. Goldberg, LQG-MP: Optimized path planning for robots with motion ncertainty and imperfect state information, The International Jornal of Robotics Research, Vol. 30, No. 7, 2011, pp [14] L. Kirkp and R. B. Frenkel, An introdction to ncertainty in measrement: sing the GUM (gide to the expression of ncertainty in measrement). Cambridge University Press, [15] A. Bemporad, F. Borrelli, and M. Morari, Min-max control of constrained ncertain discrete-time linear systems, IEEE Transactions on Atomatic Control, Vol. 48, No. 9, 2003, pp [16] P. J. Campo and M. Morari, Robst model predictive control, American Control Conference, IEEE, 1987, pp [17] M. C. Nolan, C. Magri, E. S. Howell, L. A. Benner, J. D. Giorgini, C. W. Hergenrother, R. S. Hdson, D. S. Laretta, J.-L. Margot, S. J. Ostro, and D. J. Scheeres, Shape model and srface properties of the OSIRIS-REx target Asteroid (101955) Benn from radar and lightcrve observations, Icars, Vol. 226, No. 1, 2013, pp [18] R. Frfaro, B. Gadet, D. R. Wibben, J. Kidd, and J. Simo, Development of non-linear gidance algorithms for asteroids close-proximity operations, AIAA Gidance, Navigation, and Control Conference, [19] M. Lindner, I. Schiller, A. Kolb, and R. Koch, Time-of-flight sensor calibration for accrate range sensing, Compter Vision and Image Understanding, Vol. 114, No. 12, 2010, pp

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS

FREQUENCY DOMAIN FLUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS 7 TH INTERNATIONAL CONGRESS O THE AERONAUTICAL SCIENCES REQUENCY DOMAIN LUTTER SOLUTION TECHNIQUE USING COMPLEX MU-ANALYSIS Yingsong G, Zhichn Yang Northwestern Polytechnical University, Xi an, P. R. China,