Energy-based Swing-up of the Acrobot and Time-optimal Motion

Energy-based Swing-up of the Acrobot and Time-optimal Motion Ravi N. Banavar Systems and Control Engineering Indian Institute of Technology, Bombay Mumbai-476, India Email: banavar@ee.iitb.ac.in Telephone:(91)-(22) 2576 7888 Fax:(91)-(22) 2572 377 Arun D. Mahindrakar Systems and Control Engineering Indian Institute of Technology, Bombay Mumbai-476, India Email:arun@ee.iitb.ac.in Abstract We present a control law for the swing-up of an acrobot with torque constraints on the actuator. The domain of the initial condition for the strategy to work is the entire manifold and we further guarantee that the system reaches a small neighbourhood about the upward equilibrium position. For a restricted domain of the initial condition, we observe that the control law is bang-bang in nature. This motivates us to verify the time-optimality of the control strategy. Necessary conditions for time-optimality are presented and these are subsequently verified numerically. Index Terms nonholonomic systems, underactuated manipulator, energy-based control, time-optimal. I. INTRODUCTION Many control strategies have been presented for the acrobot [1], [2], [3], [4]. Most of them do not account for actuator saturation. Further, the domain of the initial condition for many of these are restrictive. Notions of time-optimality [5], [6], [7] of the acrobot motions have not received much attention either. In this paper we initially present a swing-up strategy for the acrobot that brings it to a small region around the upward equilibrium point. The domain of the initial condition for this control strategy is the entire manifold. We then verify this strategy for time-optimality based on a restricted domain of the initial condition. The paper is organised as follows. In section 2 we formulate the acrobot dynamics in a Hamiltonian framework. The Hamiltonian framework is advantageous in verifying timeoptimality. Section 3 presents the global control law that places the acrobot in a certain energy level. Section 4 presents the discussion on time-optimality. Section 5 numerically verifies the time-optimality and section 6 concludes the paper. II. HAMILTONIAN FORMULATION OF THE ACROBOT g lc1 link 2 lc2 q 1 DYNAMICS m1, m 2 = link masses l 1, l2 = link lengths I1, I 2 = link moments of inertia l c1,l c2 = centers of masses q 2 link 1 Fig. 1. Actuator The Acrobot For our purpose it is advantageous to use the Hamiltonian framework since it results in a constant control vector field [8] and this special structure proves useful in verifying the necessary conditions for the time-optimal problem that we formulate later. We write the equations of motion of the acrobot (schematic shown in Figure 1) defined on the configuration manifold Q = S 1 S 1 using a Hamiltonian formulation. The configuration space Q is parametrized by the joint angles (q 1, q 2 ) and the generalized momentum is defined

as p = D(q) q, where D(q) is the inertia matrix defined as [ ] (c 1 + c 2 + 2c 3 cos q 2 ) (c 2 + c 3 cos q 2 ) D(q) =. (c 2 + c 3 cos q 2 ) c 2 The inertial parameters are collected in the following constants c i, i = 1,..., 5 as c 1 = m 1 l 2 c1 + m 2 l 2 1 + I 1, c 2 = m 2 l 2 c2 + I 2, c 3 = m 2 l 1 l c2, c 4 = m 1 l c1 + m 2 l 1, c 5 = m 2 l c2. With the state vector defined as z = (z 1 = (q 1 π/2), z 2 = q 2, z 3 = p 1, z 4 = p 2 ), the Hamiltonian system is given by ż = f(z) + bu, (1) where the state space manifold is M = S 1 S 1 IR 2 and the smooth vector fields f and b are given by [ ] D 1 z 3 z 4 f(z) = V 1 [ ] ; b = 1 2 [z z 3 z 4 ] D 1 3 2 V 2 1 with V = c 4 g cos z 1 + c 5 g cos(z 1 + z 2 ) and u = τ 2 Ω, the class of admissible controls defined as z 4 Ω = {u IR : u β, β > }. The equilibrium solutions of (1) with the input equal to zero constitute an important class of solutions. The set of equilibrium solution z e corresponding to u = is given by {z e M : z e 1 = k 1 π, z e 2 = k 2 π, z e 3 = z e 4 = }, k 1, k 2 =, 1. III. A GLOBAL DISCONTINUOUS CONTROL LAW The control objective is to bring the acrobot to a desired energy level that corresponds to that at the upward equilibrium point. So the first objective is to pump in the requisite energy and then guarantee that the system reaches a small neighbourhood about the upward equilibrium point. The latter fact is proved using Birkhoff s theorem on Ω-limit points [9]. Denote the energy of the acrobot at the four equilibrium positions by E dd, E du, E ud, E uu where the first subscript denotes the position of the 1st link and the second subscript denotes the position of the second link; the subscript u denotes upright and the subscript d denotes downright position of the link. Let where E uu = (c 4 + c 5 )g. Ê(z) = E(z) E uu Theorem 1: Given the torque constraint u(t) β the control law { δ (δ is small) if z B u = β sign[ê(z) χ(z)] otherwise (2) where B = {( π,,, ), ( π, π,, ), (, π,, )} and χ(z) = (c 1 + c 2 + 2c 3 cos z 2 )z 4 (c 2 + c 3 cos z 2 )z 3 (c 2 c 1 c 2 3 cos2 z 2 ) moves the acrobot to an energy level Ê(z) =. Proof: Note that the positive definiteness of the inertia matrix ensures (c 1 c 2 c 2 3 cos z 2 2 ) > z 2 S 1 and so χ(z) is defined for all z M. Further, χ(z) = when z 3 = z 4 = and χ(z) only at the equilibrium points. Now let us examine the dynamics of Ê for which we consider a candidate Lyapunov function V 2 : M IR defined by We have V 2 (z) = 1 2Ê2 (z) (3) V 2 (z) = Ê(z) Ê (4) and using the passivity property of the acrobot, (4) becomes V 2 = Ê(z)χ(z)u (5) Substituting the control law (2) results in { if z B V 2 = β Ê(z) χ(z) otherwise Since V 2 is non-increasing, the trajectory is bounded and the solution of the closed loop system remains inside a compact set defined by Ω c = {z M : V 2 (z) V 2 (z())}. Let Q = {z Ω c : V 2 = }. The set Q is given by Q = {(z M : χ(z) = ) (z M : Ê(z) = )}. Let M be the largest invariant set in Q. Now we compute M. Suppose Ê(z) and χ(z) =. We have two cases. 1) χ(z) = and z B. In this case the small control δ perturbs the system out of these equilibria. 2) χ(z) = and z / B. The dynamics of the system ensures that the system moves out of such points. Hence M = {z Q : Ê(z) = } is the largest invariant set. Remark 1: Note that the control law given by (2) reduces to u = βsign[ê(z)χ(z)] (6)

if the initial condition satisfies Ê(z()) max{(e dd E uu ), (E du E uu ), (E ud E uu )}. This condition would ensure that the acrobot does not get stuck at any of the intermediate equilibrium positions. In practice we would like to capture the acrobot in a region close to its upward equilibrium position. Let ˆT > be the instant at which Ê(z) = ɛ 1 where ɛ 1 > is sufficiently small. We let the control u(t) =, t > ˆT and thus the state z evolves on the set defined by Π = {z M : E(z) = E uu ɛ 1 }. (7) Lemma 3.1: Π is the one and only one non-empty, closed and invariant set in Π. Proof : We return to the Lagrangian formulation for this proof. Let (x 1, x 2, x 3, x 4 ) be the state variables and let K be any arbitrary, non-empty, closed set in Π. Then K is of the form K = {[a 1, a 2 ] [b 1, b 2 ] Π} where a 1, a 2, b 1, b 2 [, 2π). Note that once the link angles (a p, b p ) are specified, the link velocities (d p, e p ) must satisfy the equation where V (a p, b p ) + T (b p, d p, e p ) = E uu ɛ 1 (8) V (a p, b p ) = c 4 g cos a p + c 5 g cos(a p + b p ) T (d p, e p ) = d2 p 2 (c 1 + c 2 + 2c 3 cos b p ) +d p e p (c 2 + c 3 cos b p ) + c 2e 2 p 2 Notice that (8) is quadratic in both d p and e p. Define κ 1 = c 2 /2 > κ 2 = d p (c 2 + c 3 cos b p ) κ 3 = E uu ɛ 1 d2 p 2 (c 1 + c 2 + 2c 3 cos b p ) c 4 g cos a p c 5 g cos(a p + b p ) and rewrite the quadratic equation as A permissible d p satisfies κ 3 + κ 2 e p + κ 1 e 2 p = κ 2 2 + 4κ 1 κ 3 and there exist two value of e p given by ẋ 2 (ap,b p,d p) = κ 2 ± κ 2 2 + 4κ 1κ 3 2κ 1 (9) Now consider the left-extreme link angles (a 1, b 1 ) K and an admissible d 1. Two cases are possible κ 3. Then one of the solutions is e 1 < and the trajectory originating at (a 1, b 1, d 1, e 1 ) K leaves K. κ 3 =. Then one of the solutions is e 1 = and the other κ 2 κ 1. If κ 2 > then once again the trajectory originating at (a 1, b 1, d 1, e 1 ) K leaves K. If κ 2, we cannot reach any conclusion and we then consider a pair of link angles (a p, b 2 ) such that a p > a 1 and κ 3. Note that such an a p exists since κ 3 is not a constant function with respect to a p. Then one of the values of e p < and the trajectory originating at (a p, b 1, d p, e p ) K leaves K. Hence the set K is not invariant. But K is any arbitrary, nonempty, closed set in Π. Hence Π is the only invariant set. Definition 3.1: Let y (t) denote an integral curve of a system ẏ = G(y) which is assumed to be defined for all t <. A point ȳ is said to be a ω-limit point of y o (t) if there exists an increasing sequence of values of t such that t 1 < t 2 < t 3... < t k, lim k t k =, lim k yo (t k ) = ȳ. The set Ω o of all ω-limit points of y o (t) is the ω-limit set of y o (t). Theorem 2: [G. D. Birkhoff] Suppose y o (t) is a bounded trajectory. Its ω-limit set Ω o is nonempty, closed and invariant under the flow φ t G. Theorem 3: Let ξ = (a exti, b exti,, ) Π. Then ξ is an ω-limit point of any trajectory that begins in the set Π. Proof: Any trajectory that originates in the set Π is bounded. From Birkhoff s theorem [9], the Ω-limit set of any such bounded trajectory is nonempty, closed and invariant. From lemma 3.1, the only set that satisfies these properties is Π. It follows that ξ is an ω-limit point. Remark 2: Note that for ɛ 1 =, the upward equilibrium point is an ω-limit point of any trajectory that originates in Π. Once the system is in the neighbourhood of the upward equilibrium point, a linear feedback control can be switched on to balance the acrobot at the desired configuration. In the following section we investigate the time-optimality of the proposed control. IV. TIME-OPTIMAL CONTROL Consider the energy-pumping control law as in (6) with a restriction in the domain of the initial condition. The

bang-bang nature of the law motivates us to pose the question Is the control (6) optimal in a sense that it achieves energypumping in minimum time? While obtaining necessary and sufficient conditions for this problem is not trivial, we suggest a procedure to satisfy the necessary conditions for time-optimality. Time optimal problem: Minimize the performance measure J(t f ) = t f (1) for the system (1) with the initial condition {z() M : Ê(z()) max{(e dd E uu ), (E du E uu ), (E ud E uu )}} and the constraint that the final state lies on the surface S(z(t f )) = where S : M IR is defined by The control belongs to the set S(z) = E(z) E uu + ɛ 1. (11) {u( ) : IR IR : u(t) β} Writing the Hamiltonian for the above problem as H(z, u, λ) = λ T [f(z) + bu] (12) where λ(t) R 4 is the co-state vector and denoting the optimal trajectories and control law by, from Pontryagin s minimum principle (PMP) the optimal control law satisfies H(z, u, λ ) H(z, u, λ ) for all admissible u. The above inequality leads to a bang-bang control law of the form u = β sign [ λ T b ]. (13) The necessary conditions for optimality are 1) z and λ are the solutions of the canonical equations ż = H λ (z, u, λ ) (14) = f(z ) + bu [ f λ = + b ] T with the boundary conditions z () = z. (z=z ) λ (15) 2) The variation δz f should be such that it satisfies the transversality condition [1] S(z) T δz f = (16) (z=z (t f )) and λ (t f ) T δz f =. (17) 3) The Hamiltonian at the final time t f is H(z (t f ), u (t f ), λ (t f )) = 1 (18) Theorem 4: Let z b be the trajectory generated by the control law given by (6). Suppose the switching times are t s1,..., t sp. Let Φ(t, t f ) be the state-transition matrix of the equation ( f(z) ) T ψ = ψ, ψ(t) R 4 t [, t f ] (z=z b ) A necessary condition for time-optimality of the trajectory z b is that the vector [α 1 α 2 α 3 1] T where ( S α i = / S ), i = 1, 2, 3. i 4 (z=z b (t f )) is orthogonal to the subspace spanned by the vectors {Φ(t si, t f ) T b}. S Proof: We firts note that 4 = χ(z) and χ(z b (t f )). Therefore α i z=zb (t f ), i = 1, 2, 3 are well-defined. Now, equations (16) and (17) can be recast into a single one as where λ(t f ) T Q(z b (t f )) Q(z b (t f )) = δz 1f δz 2f δz 3f = 1 1 1 α 1 α 2 α 3. (19) Since [δz 1f δz 2f δz 3f ] T is arbitrary, its coefficient must be zero. Accordingly, we have Q T (z b (t f ))λ(t f ) = or λ(t f ) N (Q T (z b (t f )) where N denotes the null space. The non-trivial solution of the above equation is of the form λ(t f ) = α 1 α 2 α 3 1 λ 4(t f ) λ 4 (t f ) (2)

In view of the constant control vector field, the costate equation (15) becomes ( f(z) ) T λ = λ. (21) ( ) Denote A(z b ) = T f the form (z=z b ) (z=z b ). The solution to (21) is of λ(t) = Φ(t, t f )λ(t f ) (22) where Φ(t, t f ) IR 4 4 is the state transition matrix, which is nonsingular and satisfies Φ(t, t f ) = A(z b )Φ(t, t f ), Φ(t f, t f ) = I. (23) Suppose the switching times are t s1,..., t sp. Then from the necessary condition (13) we have or b T λ(t si ) = b T Φ(t si, t f )λ(t f ) = for each i = 1,..., p (24) < Φ(t si, t f ) T b, [α 1 α 2 α 3 1] T >= for each i = 1,..., p Remark 3: Note that the Hamiltonian at the final time is λ T (t f )(f(z b (t f )) + bu(t f )) = 1 Now < f(z b (t f )), λ(t f ) >= since [ ] I 2 2 S f(z) = I 2 2. and the condition on the Hamiltonian reduces to u(t f ) < b, λ(t f ) >= 1 Hence λ 4 (t f ) = 1 u(t f ) where u(t f ) = ±β. We next present the numerical verification of these results. V. NUMERICAL VERIFICATION OF PONTRYAGIN S MINIMUM PRINCIPLE To verify the necessary condition for the time-optimality of the proposed control law (6), we follow these steps. 1) Fix β and the initial condition z(). 2) Apply the control (6) to the system till the instant Ê = ɛ 1. Denote that instant as t f. Store the resulting time histories of the state z b from to t f. Define s(t) = Ê(z b )χ(z b ). 3) Compute α 1, α 2, α 3. 4) Numerically solve (23) over the interval [, t f ] to obtain Φ(t, t f ). The costate is obtained as λ(t) = Φ(t, t f )λ(t f ), t [, t f ]. 5) The generated switching function is given by s a (t) =< b, λ(t) >= λ 4 (t). 6) Verify the orthogonality condition (24 ). This requires that the switching functions s and s a match. By matching it is meant that they satisfy s(t si ) = s a (t si ) =, i = 1,..., p. A. Simulation results The acrobot parameters used in the simulations are l 1 = 1 m, l 2 = 2 m, m 1 = 1 kg, m 2 = 2 kg, I 1 =.83 kg m 2, I 2 =.667 kg m 2. We let β = 4 and the initial vector is z() = ( π,,.5,.5). With these we have one switching (p = 1). The control u(t) is switched off when Ê =.1. The final time is t f = 4.664 seconds. The costate vector at the final time t f is given by λ(t f ) = (.9993,.585,.58,.25). We repeat the steps (1-6) outlined in section V for different initial conditions (see Table I). We have the following observations. Numerical results TABLE I EFFECT OF THE NUMBER OF SWITCHINGS p ON THE MATCHING OF SWITCHING FUNCTIONS Case z() p s and s a match? 1 ( π,,.5,.5) 1 Yes 2 ( π,,, 1) 3 Yes 3 ( π, π, 1, 1) 5 No indicate that the number of switchings p influences the timeoptimality property. In particular, it is seen that for p 3, the control law (6) satisfies the necessary conditions for optimality. Joint angles (rad) Lyapunov function V 1 5 5 1 2 4 6 8 1 4 Torque (Nm) 2 2 4 2 4 6 8 1 15 1 5 z 1 z 2 2 4 6 8 1 Fig. 2. Angular momentum kg m 2 s 1 Energy profile Switching function s 2 2 z 3 z 4 4 2 4 6 8 1 6 E d 4 2 2 2 4 6 8 1 5 5 1 2 4 6 8 1 Energy build-up phase

VI. CONCLUSIONS A global discontinuous control law has been presented for the acrobot and it guarantees that the system reaches a small neighbourhood around the upward equilibrium point of interest. For a certain domain of the initial conditions that is based on an energy requirement, necessary conditions for time-optimality of the control law are presented. ACKNOWLEDGEMENT This work was supported by the Department of Science and Technology (DST) as a Sponsored Research & Development project (Sanction No. 1/IFD/28/21-2). REFERENCES [1] M. W. Spong, The swing up control problem for the acrobot, IEEE Control Systems Magazine, vol. 15, pp. 49 55, February 1995. [2] K. J. Åström and K. Furuta, Swinging up a pendulum by energy control, in Proceedings of the 13 th IFAC World Congress, vol. E, (San Francisco), pp. 37 42, 1996. [3] A. S. Shiriaev, H. Ludvigsen, O. Egeland, and A. L. Fradkov, Swinging up of non-affine in control pendulum, in Proceedings of the American Control Conference, (San Diego, California), pp. 439 444, 199. [4] I. Fantoni, R. Lozano, and M. W. Spong, Energy based control of the pendubot, IEEE Trans. on Automatic Control, vol. 45, pp. 725 729, April 2. [5] K. G. Shin and N. D. McKay, Minimum-time control of robotic manipulators with geometric path constraints, IEEE Trans. on Automatic Control, vol. 3, pp. 531 541, June 1985. [6] E. D. Sontag and H. J. Sussmann, Time-optimal control of manipulators, in Proceedings of IEEE International Conference on Robotics & Automation, pp. 1692 1697, 1986. [7] G. Sahar and J. M. Hollerbach, Planning of minimum-time trajectories for robot-arms, in Proceedings of the IEEE International Conference on Robotics and Automation, (St. Louis, Missouri), pp. 751 758, 1985. [8] Y. Chen and A. A. Descrochers, A Proof of the structure of the minimum-time control law of robotic manipulator using a Hamiltonian formulation, IEEE Trans. on Robotics & Automation, vol. 6, pp. 388 393, June 199. [9] A. Isidori, Nonlinear Control Systems. New York: Spinger Verlag, 1995. [1] F. L. Lewis and V. L. Syrmos, Optimal Control. New York: John Wiley & Sons, Inc., 1995.