Econ 85/Chatterjee Introduction to Continuous-ime Dynamic Optimization: Optimal Control heory 1 States and Controls he concept of a state in mathematical modeling typically refers to a specification of the quantities that fully describe, at a particular moment in time, the dynamic system being modeled. For example, we could conceivably think of a moving spaceship or a growing economy. he distance of the spaceship from the earth or the stock of goods and the current level of employment in an economy would then be the state variables of the corresponding systems. One should typically think of a state variable as a stock variable (like the stock of capital in an economy), i.e., a variable whose level is predetermined at any point in time, and which is constrained to evolve continuously over time. herefore, a state variable does not respond instantaneously to new information or unanticipated shocks, but its response is more gradual, over time. he rate of change over time, in the value of a state variable may depend on the value of that variable, time itself, or some other variables which can be controlled at any time by the operator of the system. hese other variables are called control variables. For example, the velocity of the spaceship, the levers and dials that operate a piece of machinery, or the rate of consumption and investment in an economy are some simple illustrations of control variables. Controls are usually flow variables and can respond instantaneously to new information or shocks. Once values are chosen for the control variables (at each date or point in time), the rates of change in the values of the state variables are thus determined at any time, and given the initial value for the state variables, so are all future values. 2 A Simple Control Problem: Pontryagin s Maximum Principle he object of controlling a system is usually to contribute to a given objective. For example, a firm could choose its inputs over time to minimize its costs or maximize its value; a household could choose its consumption and investment profiles to maximize its lifetime utility; a central planner could allocate resources across the different sectors of an economy over time to maximize some measure of social welfare, and so on. Let us assume that an economic system is described by one state variable, x, and one control variable, u. In general, both x and u can be vectors, but we will concentrate on the notationally simple scalar case, to fix ideas. Both x and u are functions of time, t. We assume that planning starts at time t =, and the state variable has an initial predetermined value at the beginning of the planning horizon, say, x() = x, and at the terminal point, at say t = (where may or may not be equal to infinity), it is x() = x. We assume that u( is piece-wise continuous and differentiable with trajectory {u(}. 1
he aim is to find the control u * which moves the system from x to x in such a way that some objective function is maximized. In general, the simplest control problem takes the following form: Maximize J = (1) { u() t } V Subject to x = f & (2) x() = x ; x() = x, where V(x, is a strictly concave objective function, which is assumed to be autonomous and therefore independent of t. he constraint (2) is an autonomous differential equation describing the equation of motion for the state variable, and f(x, is assumed to be concave. In addition, there may or may not be additional inequality constraints on x and u (like non-negative consumption or investmen. In principle, there can be more than one time-path (or trajectories) for u that will be a solution to the differential equation (2). he problem is, however, to choose the one path that will maximize the relation J and also simultaneously satisfy the initial and terminal conditions. In order to compare the different trajectories of alternative controls, we define a criterion called the Hamiltonian function and consequently, a Hamiltonian system. You will observe that, if the state variable x always obeys the equation of motion, then x =, t [ ] f &,. hus, using the notion of Lagrange multipliers, we can form an expression λ () t [ f x& ] for each value of t, and still get a zero value. λ ( is called the co-state variable of the problem. We will illustrate on its economic significance a little later. Since, by summing λ () t [ f x&] over t in the period [, ] would still yield a total value of zero, we can write a new objective functional [ f x& or, L = V ( x + λ f = L = J + λ ] (3), [ x& ] [ V + λf λx& ] Let us define the Hamiltonian function as u, λ ) V + λf ( x H, (4) 2
Hence, [ H u, λ ) λx& ] L = ( ) = H x, u, λ λx& (5.1) when the last integral in (5.1) is integrated by parts 1, we see that λx & = which allows us to express L as L = x & λ [ λ( )( x ) λ()() x ] [ H + & λx] [ λ( )( x ) λ()() x ] Now, suppose that the control variable changes from, say, { u( } to { u( + u( } result that the state trajectory changes from { x( } to { x( + x( } Lagrangian, L, is L = dx + du + & λdx λ u ( ) (5) (6), with the. hen the change in the = du + + & λ dx λ( ) dx u x For a maximum to be obtained, L =. his implies the following necessary conditions, also characterized as Pontryagin s Maximum Principle: dx u = t (7.1) & λ = t (7.2) x & = = f (7.3) λ ( ) = λ dx, i.e., either λ() = or dx = (7.4) Condition (7.1) states that the Hamiltonian function is maximized by the choice of the control variable at each point along the optimal trajectory. (7.2) relates to the rate of change of the co-state variable, λ. he co-state variable can now be assigned a simple interpretation: it is the shadow price associated with the state variable x: it measures the 1 See appendix at the end of these notes. 3
effect of an increment in the state variable on the value of the Hamiltonian. (7.3) follows directly from the definition of the Hamiltonian function, and (7.4) refers to the co-state variable in the terminal state, and indicates that it is zero; or if the terminal value of x() is given, then dx =. 3 Optimal Control With Discounting: he Infinite-Horizon Problem In many economic problems, the objective function V(.) would represent such things as profits, net benefits, or lifetime utility, and the time horizon may be from t = to t =. We call such a problem an infinite-horizon problem. Such an objective function would have to be maximized by first discounting it to the present. Define β as the rate of discount or the rate of time preference. hen, the control problem can be specified as: Maximize J = V x, u e (8) { u() t } Subject to x = f ( ) & (9) x() = x We can now modify the results and specifications in section 2 by writing the (discounted) current-value Hamiltonian function: H u, ) = V e + λe [ f x& ] λ (1) he Maximum Principle for this problem can be written as: u = V u + λ f u = (1.1) & λ = + βλ or equivalently 2, = t & x & = f (1.2) (1.3) In addition, we also require the following transversality condition: Lim λxe t = (1.4) which simply states that the discounted value of the state variable at the end of the time horizon is zero. his is a direct analogue of (7.4). 2 We now have, according to (7.2), ( e ) t = λ, where H = V ( x + [ f x& ], λ. 4
he problem set out above has a special structure that we can exploit in describing a solution. In the above problem, planning starts at time t =. Since no exogenous variables enter (8) or (9), the maximized value of (1) depends only on x(), the predetermined initial value of the state variable. In other words, the problem is stationary, i.e., it does not change in form with the passage of time. he nice structure of this problem relates to the following property. Suppose that the optimal plan has been followed until a time >, so that x() = x * (). Imagine a new decision maker who maximizes the discounted objective function J from time t = onward, V β ( t e ) subject to (9), but with the initial value of x now given by x() = x * (). hen the optimal program determined by this new decision-maker will coincide with the continuation, from time onward, of the optimal program determined at time, given x(). his result is closely related to the notion of dynamic consistency. herefore, for any point in time t, the optimal level of the control u * (, depends only on the inherited state x( and the co-state λ(. Let us write this functional relationship between optimal u, λ and x as u * = u(x, λ) (11) and assume that u(.) is differentiable. Such functions are called optimal policy functions, or more simply, policy functions. In terms of our necessary conditions above, this policy function can be obtained by solving equation (1.1). he next step in the solution algorithm is to substitute (11) into the differential equations (1.2) and (1.3), which then gives us the core dynamics of the system being modeled: [ u( x, λ) ) + λf u( x, λ ) βλ & λ = ) ] + (1.2 ) V x u( x, λ) ) x & = f (1.3 ) (1.2 )-(1.3 ) describe the core dynamics of the system and represents a second-order differential equation system in the two dynamic variables x and λ. he solution of this system characterizes the optimal time paths or trajectories of the state and co-state variables x and λ, and thereby the optimal time path for the control variable u. 5
APPENDIX A1 Integration By Parts b b Formula: zdy = [ yz] a a b a ydz When we integrate (5.1) by parts, we denote y = x, and z = λ. hen, since we have [ ] λ x & = λx + x & λ λ x & = zdy, which leads to the result (6). A2 Solving Linear First Order Differential Equations (FODE) A linear FODE can be expressed in the form : dy a 1 ( + a( y = b ( (A2.1) where a 1 (, a ( and b( depend only on the independent variable t and not on y. Assuming that the functions a 1 (, a ( and b( are continuous on an interval and that a 1 (, we can re-write (1) in the standard form by dividing throughout by a 1 (: dy + y = (A2.2) where = a (/a 1 (, = b(/a 1 ( and are continuous functions on the interval. o solve (A2.2), we use an integrating factor, m(, which is given by µ( = e (A2.3) Multiply both sides of (2) by m( to get dy µ ( t ) + µ ( y = µ ( dy or, e + e y = e (A2.4) Note that the left hand side can be expressed as d ( ) [ e P t y] We then have 6
d ( ) [ e P t ( ) y] = e P t (A2.5) he next step is to integrate both sides and then solve for y. Integrating, we get dx e y = e (A2.6) Divide both sides of (A2.6) by e to solve for y: 1 ) t y = e (A2.7) e (A2.7) is the complementary solution to (A2.1). Given the value of y at t =, and the bounds on the interval, we can obtain the particular solution and hence the general solution. 7