Differential Games, Distributed Systems, and Impulse Control

Size: px

Start display at page:

Download "Differential Games, Distributed Systems, and Impulse Control"

Shonda Gallagher
6 years ago
Views:

1 Chapter 12 Differential Games, Distributed Systems, and Impulse Control In previous chapters, we were mainly concerned with the optimal control problems formulated in Chapters 3 and 4 and their applications to various functional areas of management and to some problems in economics. These problems were described by a single objective function (or a single decision maker) and a set of ordinary differential equations, called the state equations, defined in a deterministic framework. In this chapter, we deal with generalizations of the (ordinary) deterministic optimal control problems that can be made in one or more of the following directions. Generalizations involving stochastic optimal control problems are discussed in the next chapter. There may be more than one decision maker, each having one s own objective function that each is trying to maximize, subject to a set of differential equations. This extension of the optimal control theory is referred to as the theory of differential games. Section 12.1 contains a brief introduction to differential games along with an application. Another extension replaces the system of ordinary differential equations by a set of partial differential equations. These come under the classification of distributed parameter systems and are treated in Section Finally in Section 12.3, we treat the theory of impulse control. These controls are developed to deal with systems which, in addition to conventional controls, allow a controller to make discrete changes in the state 37

2 12.1. Differential Games 39 where U and V are convex sets in E 1. Consider further the objective function T J(u, v) = S[x(T)] + F(x, u, v, t)dt, (12.2) which player 1 wants to maximize and player 2 wants to minimize. Since the gain of player 1 represents a loss to player 2, such games are appropriately termed zero-sum games. Clearly, we are looking for admissible control trajectories u and v such that J(u, v) J(u, v ) J(u, v ). (12.3) The solution (u, v ) is known as the minimax solution. Here u and v stand for u (t), t [, T], and v (t), t [, T], respectively. The necessary conditions for u and v to satisfy (12.3) are given by an extension of the maximum principle. To obtain these conditions, we form the Hamiltonian H = F + λf (12.4) with the adjoint variable λ satisfying the equation λ = H x, λ(t) = S x [x(t)]. (12.5) The necessary condition for trajectories u and v to be a minimax solution is that for t [, T], H(x (t), u (t), v (t), λ(t), t) = min v V max u U H(x (t), u, v,λ(t), t), (12.6) which can also be stated, with suppression of (t), as H(x, u, v,λ, t) H(x, u, v, λ, t) H(x, u, v, λ, t) (12.7) for u U and v V. Note that (u, v ) is a saddle point of the Hamiltonian function H. Note that if u and v are unconstrained, i.e., when, U = V = E 1, condition (12.6) reduces to the first-order necessary conditions and the second-order conditions are H u = and H v =, (12.8) H uu and H vv. (12.9) We now turn to the treatment of nonzero-sum differential games.

3 Differential Games, Distributed Systems, and Impulse Control Nonzero-Sum Differential Games In this section, let us assume that we have N players where N 2. Let u i U i, i = 1, 2,..., N, represent the control variable for the ith player, where U i is the set of controls from which the ith player can choose. Let the state equation be defined as Let J i, defined by J i = S i [x(t)] + ẋ = f(x, u 1, u 2,..., u N, t). (12.1) T F i (x, u 1, u 2,..., u N, t)dt, (12.11) denote the objective function which the ith player wants to maximize. In this case, a Nash solution is defined by a set of N admissible trajectories which have the property that {u 1, u 2,..., u N }, (12.12) J i (u 1, u 2,..., u N ) = max u i U Ji (u 1, u 2,..., u (i 1), u i,..., u (i+1),..., u N ) i (12.13) for i = 1, 2,..., N. To obtain the necessary conditions for a Nash solution for nonzerosum differential games, we must make a distinction between open-loop and closed-loop controls. Open-Loop Nash Solution The open-loop Nash solution is defined when (12.12) is given as functions of time satisfying (12.13). To obtain the maximum principle type conditions for such solutions to be a Nash solution, let us define the Hamiltonian functions H i = F i + λ i f (12.14) for i = 1, 2,..., N, with λ i satisfying λ i = H i x, λ i (T) = S i x[x(t)]. (12.15) The Nash control u i for the ith player is obtained by maximizing the ith Hamiltonian H i with respect to u i, i.e., u i must satisfy H i (x, u 1,..., u (i 1), u i, u (i+1),..., u N, λ, t) H i (x, u 1,..., u (i 1), u i, u (i+1),..., u N, λ, t), t [, T], (12.16)

4 12.1. Differential Games 311 for all u i U i, i = 1, 2,.., N. Deal, Sethi, and Thompson (1979) formulated and solved an advertising game with two players and obtained the open-loop Nash solution by solving a two-point boundary value problem. In Exercise 12.1, you are asked to formulate their problem. See also Deal (1979). Closed-Loop Nash Solution A closed-loop Nash solution is defined when (12.12) is defined in terms of the state of the system. To avoid confusion, we let u i (x, t) = φ i (x, t), i = 1, 2,..., N. (12.17) For these controls to represent a Nash strategy, we must recognize the dependence of the other players actions on the state variable x. Therefore, we need to replace the adjoint equation (12.15) by λ i N N = Hx i Hu i j φ j x = Hx i Hu i φ j x. (12.18) j j=1 j=1,j i The presence of the summation term in (12.18) makes the necessary condition for the closed-loop solution virtually useless for deriving computational algorithms; see Starr and Ho (1969). It is, however, possible to use a dynamic programming approach for solving extremely simple nonzero-sum games, which require the solution of a partial differential equation. In Exercise 12.2, you are asked to formulate this partial differential equation for N = 2. Note that the troublesome summation term in (12.18) is absent in three important cases: (a) in optimal control problems (N = 1) since H u u x =, (b) in two-person zero-sum games because H 1 = H 2 so that H 1 u 2 u 2 x = H 2 u 2 u 2 x = and H 2 u 1 u 1 x = H 1 u 1 u 1 x =, and (c) in openloop nonzero-sum games because u j x =. It certainly is to be expected, therefore, that the closed-loop and open-loop Nash solutions are going to be different, in general. This can be shown explicitly for the linearquadratic case. We conclude this section by providing an interpretation to the adjoint variable λ i. It is the sensitivity of ith player s profits to a perturbation in the state vector. If the other players are using closed-loop (i.e., feedback) strategies, any perturbation δx in the state vector causes them to revise their controls by the amount φ j xδx. If the ith Hamiltonian H i were extremized with respect to u j, j i, this would not affect the

5 Differential Games, Distributed Systems, and Impulse Control ith player s profit; but since H i / u j for i j, the reactions of the other players to the perturbation influence the ith player s profit, and the ith player must account for this effect in considering variations of the trajectory An Application to the Common-Property Fishery Resources Consider extending the fishery model of Section 1.1 by assuming that there are two producers having unrestricted rights to exploit the fish stock in competition with each other. This gives rise to a nonzero-sum differential game analyzed by Clark (1976). Equation (1.2) is modified by ẋ = g(x) q 1 u 1 x q 2 u 2 x, x() = x, (12.19) where u i (t) represents the rate of fishing effort and q i u i x is the rate of catch for the ith producer, i = 1, 2. The control constraints are the state constraints are u i (t) U i, i = 1, 2, (12.2) x(t), (12.21) and the objective function for the ith producer is the total present value of his profits, namely, J i = (p i q i x c i )u i e ρt dt, i = 1, 2. (12.22) To find the Nash solution for this model, we let x i denote the turnpike (or optimal biomass) level given by (1.12) on the assumption that the ith producer is the sole-owner of the fishery. Let the bionomic equilibrium x i b for producer i be defined by (1.4), i.e., x i b = ci p i qi. (12.23) As shown in Exercise 1.2, x i b < xi. If the other producer is not fishing, then producer i can maintain x i b by making the fishing effort u i b = g(xi b )pi c i ; (12.24)

6 12.1. Differential Games 313 here we have assumed U i to be sufficiently large so that u i b Ui. We also assume that x 1 b < x 2 b, (12.25) which means that producer 1 is more efficient than producer 2, i.e., producer 1 can make a positive profit at any level in the interval (x 1 b, x2 b ], while producer 2 loses money in the same interval, except at x 2 b, where he breaks even. For x > x 2 b, both producers make positive profits. Since U 1 u 1 b by assumption, producer 1 has the capability of driving the fish stock to a level down to at least x 1 b which, by (12.25), is less than x 2 b. This implies that producer 2 cannot operate at a sustained level above x 2 b ; and at a sustained level below x2 b, he cannot make a profit. Hence, his optimal policy is bang-bang: U 2 if x > x 2 u 2 b (x) =, (12.26) if x x 2 b. As far as producer 1 is concerned, he wants to attain his turnpike level x 1 if x 1 x 2 b. If x1 > x 2 b and if x x 1, then from (12.26) producer 2 will fish at his maximum rate until the fish stock is driven to x 2 b. At this level it is optimal for producer 1 to fish at a rate which maintains the fish stock at level x 2 b in order to keep producer 2 from fishing. Thus, the optimal policy for producer 1 can be stated as U 1 if x > x 1 u 1 (x) = ū 1 = g( x1 ) q 1 x if x = x 1, if x 1 < x 2 b, (12.27) 1 if x < x 1 u 1 (x) = U 1 g(x 2 b ) q 1 x 2 b if x > x 2 b if x = x 2 b if x < x 2 b, if x 1 x 2 b. (12.28) The formal proof that policies (12.26)-(12.28) give a Nash solution requires direct verification using the result of Section The Nash solution for this case means that for all feasible paths u 1 and u 2, J 1 (u 1, u 2 ) J 1 (u 1, u 2 ), (12.29)

7 Differential Games, Distributed Systems, and Impulse Control and J 2 (u 1, u 2 ) J 2 (u 1, u 2 ). (12.3) The direct verification involves defining a modified growth function g(x) q 2 U 2 x if x > x 2 g 1 b (x) =, g(x) if x x 2 b, and using the Green s theorem results of Section Since U 2 u 2 b by assumption, we have g 1 (x) for x x 2 b. From (1.12) with g replaced by g 1, it can be shown that the new turnpike level for producer 1 is min( x 1, x 2 b ), which defines the optimal policy (12.27)-(12.28) for producer 1. The optimality of (12.26) for producer 2 follows easily. To interpret the results of the model, suppose that producer 1 originally has sole possession of the fishery, but anticipates a rival entry. Producer 1 will switch from his own optimal sustained yield ū 1 to a more intensive exploitation policy prior to the anticipated entry. We can now guess the results in situations involving N producers. The fishery will see the progressive elimination of inefficient producers as the stock of fish decreases. Only the most efficient producers will survive. If, ultimately, two or more maximally efficient producers exist, the fishery will converge to a classical bionomic equilibrium, with zero sustained economic rent. We have now seen that a Nash competitive solution involving N 2 producers results in the long-run dissipation of economic rents. This conclusion depends on the assumption that producers face an infinitely elastic supply of all factors of production going into the fishing effort, but typically the methods of licensing entrants to regulated fisheries make some attempt also to control the factors of production such as permitting the licensee to operate only a single vessel of specific size. In order to develop a model for licensing of fishermen, we let the control variable v i denote the capital stock of the ith producer and let the concave function f(v i ), with f() =, denote the fishing mortality function, for i = 1, 2,..., N. This requires the replacement of q i u i in the previous model by f(v i ). The extended model becomes nonlinear in control variables. You are asked in Exercise 12.2 to formulate this new model and develop necessary conditions for a closed-loop Nash solution for this model with N producers. The reader is referred to Clark (1976) for further details.

8 12.2. Distributed Parameter Systems 317 t and let S(y) be the value of one unit of x(t, y) at time T. Then the objective function is: max u(t,y) Ω T + { J = T h Q(t)x(t, h)dt + F(t, y, x(t, y), u(t, y))dydt } h where Ω is the set of allowable controls. S(y)x(T, y)dy, (12.35) The Distributed Parameter Maximum Principle We will formulate, without giving proofs, a procedure for solving the problem in (12.31)-(12.35) by a distributed parameter maximum principle, which is analogous to the ordinary one. A more complete treatment of this maximum principle can be found in Sage (1968, Chapter 7), Butkowskiy (1969), Lions (1971), Derzko, Sethi, and Thompson (1984), and Haurie, Sethi, and Hartl (1984). In order to obtain necessary conditions for a maximum, we introduce the Hamiltonian H = F + λf, (12.36) where the spatial adjoint function λ(t, y) satisfies λ t = H x + t [ H x t ] + [ ] H, (12.37) y x y where x t = x/ t and x y = x/ y. The boundary conditions on λ are stated for the Γ 2 part of the boundary of D (see Figure 12.1) as follows: λ(t, h) = Q(t) (12.38) and λ(t, y) = S(y). (12.39) Once again we need a consistency requirement similar to (12.34). It is λ(t, h) = Q(T) = S(h), (12.4) which gives the consistency requirement in the sense that the price and the salvage value of a unit x(t, h) must agree.

9 Differential Games, Distributed Systems, and Impulse Control We let u (t, y) denote the optimal control function. Then the distributed parameter maximum principle requires that H(t, y, x, x t,x y, u, λ) H(t, y, x, x t,x y, u, λ) (12.41) for all (t, y) D and all u Ω. We have stated only a simple form of the distributed parameter maximum principle which is sufficient for the cattle ranching example dealt with in the next section. More general forms of the maximum principle are available in the references cited earlier. Among other things, these general forms allow for the function F in (12.35) to contain arguments such as x/ y, 2 x/ y 2, etc. It is also possible to consider controls on the boundary. In this case v(t) in (12.33) will become a control variable The Cattle Ranching Problem Let t denote time and y denote the age of an animal. Let x(t, y) denote the number of cattle of age y on the ranch at time t. Let h be the age at maturity at which the cattle are slaughtered. Thus, the set [, h] is the set of all possible ages of the cattle. Let u(t, y) be the rate at which y-aged cattle are bought at time t, where we agree that a negative value of u denotes a sale. To develop the dynamics of the process it is easy to see that x(t + t, y) = x(t, y t) + u(t, y) t. (12.42) Subtracting x(t, y) from both sides of (12.42), dividing by t, and taking the limit as t, yields the state equation x t = x + u. (12.43) y The boundary and consistency conditions for x are given in (12.32)- (12.34). Here x (y) denotes the initial distribution of cattle at various ages, and v(t) is an exogenously specified breeding rate. To develop the objective function for the cattle rancher, we let T denote the horizon time. Let P(t, y) be the purchase or sale price of a y-aged animal at time t. Let P(t, h) = Q(t) be the slaughter value at time t and let P(T, y) = S(y) be the salvage value of a y-aged animal at the horizon time T. The functions Q and S represent the proceeds of the cattle ranching business. To obtain the profit function we must subtract the costs of running the ranch from these proceeds. Let C(y)

10 12.3. Impulse Control 325 The objective function of the oil driller is to maximize his profit, which is the difference between the revenue from the oil pumped minus the cost of drilling new wells. Since the cost of pumping the oil is fixed, we ignore it. The objective is to T maximize J = N(t) Pbx(t)dt i=1 Qv(t i ), (12.55) where P is the unit price of oil and Q is the cost of drilling a well having an initial stock of 1. Therefore, P is the total value of the entire stock of oil in a new well. See Case (1979) for a re-interpretation of this problem as The Optimal Painting of a Roadside Inn. In the next section we will discuss a maximum principle for impulse optimal control problems, based on the work by Blaquière (1979). The solution of the oil driller s problem appears in Section The Maximum Principle for Impulse Optimal Control In (2.4) we stated the basic optimal control problem with state variable x and an ordinary control variable u. Now we will add an impulse control variable v Ω v and two associated functions. The first function is G(x, v, t), which represents the profit associated with the impulse control. The second function is g(x, v, t), which represents the instantaneous finite change in the state variable when the impulse control is applied. With this notation we can state the impulse optimal control problem as: T max J = subject to N(T) F(x, u, t)dt + i=1 G[x(t i ), v(t i ), t i ] + S[x(T)] N(T) ẋ(t) = f(x(t), u(t), t) + g[x(t i ), v(t i ), t i ], x() = x, u Ω u, v Ω v. As before we will have i=1 (12.56) x(t + i ) = x(t i) + g[x(t i ), v(t i ), t i ] (12.57)

11 Differential Games, Distributed Systems, and Impulse Control at the times t i at which impulse control is applied, i.e., g[x(t i ), v(t i ), t i ] >. Blaquière (1978) has developed the maximum principle necessary optimality conditions to deal with the problem in (12.56). To state these conditions we first define the ordinary Hamiltonian function H(x, u, λ, t) = F(x, u, t) + λf(x, u, t) (12.58) and the impulse Hamiltonian function H I (x(t), v(t), λ(t + ), t) = G(x(t), v(t), t) + λ(t + )g(x(t), v(t), t). (12.59) We now state the impulse maximum principle. Let u, v (t i ), i = 1,..., N (T), where t i > t i 1 >, be optimal controls with the associated x representing an optimal trajectory for (12.56). Then there exists an adjoint variable λ such that the following conditions hold: (i) ẋ = f(x, u, t), t [, T],, t t i, (ii) x (t + i ) = x (t i ) + g[x (t i ), v (t i ), t i ], (iii) λ = Hx [x, u, λ, t], λ(t) = S x [x (T)], t t i, (iv) λ(t i ) = λ(t + i ) + HI x[x (t i ), v (t i ), t i ], (v) H[x, u, λ, t] H[x, u, λ, t] for all u Ω u, t t i, (vi) H I [x (t i ), v (t i ),λ(t + i ), t i] H I [x (t i ), v,λ(t + i ), t i] v Ω v, (vii) H[x (t + i ), u (t + i ), λ(t+ i ), t i] + H I t [x (t + i ), v (t + i ), λ(t+ i ), t i] = H[x (t i ), u (t i ), λ(t i ), t i ] + H I t [x (t i ), v (t i ), λ(t + i ), t i]. (12.6) If t 1 = then the equality sign in (vii) should be replaced by a sign when i = 1. Clearly (i) and (ii) are equivalent to the state equation in (12.56) with the optimum value substituted for the variables. Note that condition (vii) involves the partial derivative of H I with respect to t. Thus, in autonomous problems, where H I t =, condition (vii) means that the Hamiltonian H is continuous at those times where an impulse control is applied.

12 12.3. Impulse Control Solution of the Oil Driller s Problem We now give a solution to the oil driller s problem under the assumption that T is sufficiently small so that no more than one drilling will be found to be optimal. We restate below the problem in Section for this case when t 1 is the drilling time to be determined: { max J = subject to T } Pbx(t)dt Qv(t 1 ) ẋ(t) = bx(t) + δ(t t 1 )v(t)[1 x(t)], x() = 1, v(t) 1. (12.61) To apply the maximum principle we define the Hamiltonian functions corresponding to (12.58) and (12.59): and H(x, λ) = Pbx + λ( bx) = bx(p λ) (12.62) H I (x(t), v(t), λ(t + )) = Qv(t) + λ(t + )v(t)(1 x(t)). (12.63) The ordinary Hamiltonian (12.62) is without an ordinary control variable because we do not have any ordinary control variable in this problem. We now apply the necessary conditions of (12.6) to the oil driller s problem: ẋ = bx, t [, T], t t 1, (12.64) x[t + 1 ] = x(t 1) + v(t 1 )[1 x(t 1 )], (12.65) λ = b(p λ), λ(t) =, t t 1, (12.66) λ(t 1 ) = λ(t + 1 ) v(t 1)λ(t + 1 ), (12.67) [ Q+λ(t + 1 )(1 x)]v (t i ) [ Q+λ(t + 1 )(1 x)]v for v [, 1], (12.68) bx(t + 1 )[P λ(t+ 1 )] = bx(t 1)[P λ(t 1 )]. (12.69) The solution of (12.66) for t > t 1, where t 1 is the drilling time, is λ(t) = P[1 e b(t t) ], t (t 1, T]. (12.7)

13 Differential Games, Distributed Systems, and Impulse Control From (12.68), the optimal impulse control at t 1 is v (t 1 ) = bang[, 1; Q + λ(t + 1 ){1 x(t 1)}]. (12.71) Note that the optimal impulse control is bang-bang, because the impulse Hamiltonian H I in (12.63) is linear in v. The region in the (t, x)-space in which there is certain to be no drilling is given by the set of all (t, x) for which Q + λ(t + )(1 x) <. (12.72) After the drilling, λ(t + 1 ) = P[1 e b(t t 1) ], which we obtain by using (12.7). Substituting this value into (12.72) gives the condition Q + P[1 e b(t t 1) ](1 x) <. (12.73) The boundary of the no-drilling region is obtained when the < sign in (12.73) is replaced by an = sign. This yields the curve x(t 1 ) = [ ] Q 1 P(1 e b(t t1) := φ(t 1 ), (12.74) ) shown in Figure Note that we did not label the region below the curve represented by the complement of (12.73) with v = 1, since (12.73) uses the value of λ(t), t > t 1, and its complement, therefore, does not Figure 12.4: Boundary of No-Drilling Region

14 12.3. Impulse Control 329 represent the condition Q + λ(t + )(1 x) > prior to drilling. Figure 12.4 is drawn under the assumption that φ() = 1 Q P[1 e bt >, (12.75) ] so that the solution is nontrivial; see Exercises 12.5 and In order for (12.75) to hold, it is clear that Q must be relatively small, and P, b, and/or T must be relatively large. In the contrary situation, φ() holds and by Exercise 12.6, no drilling is optimal. The bang-bang nature of the optimal impulse control in (12.71) means that we have v (t 1 ) = 1. Using this in (12.65) and (12.67), we have From (12.7), x(t + 1 ) = 1 and λ(t 1) =. (12.76) λ(t + 1 ) = P[1 eb(t t 1) ]. Substituting these results into (12.69) and solving we get x(t 1 ) = e b(t t 1) : = ψ(t 1 ). (12.77) The curve x = ψ(t) = e b(t t) is drawn in Figure The part BC of the ψ curve lies in the no drilling region, which is above the φ Figure 12.5: Drilling Time

15 Differential Games, Distributed Systems, and Impulse Control With this notation, the impulse optimal control problem is: { } T max J = (πx u)dt + v(t 1 )[Kx(t 1 ) C] subject to ẋ = bx + gu + δ(t t 1 )v(t)[1 x(t)], x() = 1, (12.79) u U, v 1. In writing (12.79) we have assumed that a fraction θ of a machine with quality x has a quality θx. Furthermore, we note that the solution of the state equation will always satisfy x 1, because of the assumption that u U b/g Application of the Impulse Maximum Principle To apply the maximum principle, we define the Hamiltonian functions: and H(x, u, λ) = πx u + λ( bx + gu) = (π λb)x + ( 1 + λg)u (12.8) H I (x(t), v(t), λ(t + )) = [Kx(t) C + λ(t + 1 )(1 x(t))]v(t), (12.81) where λ is defined below and t 1 is the time at which an impulse control is applied. We can now state the necessary conditions (12.6) of optimality for the machine maintenance and the replacement model: ẋ = bx + gu, t t 1, x() = 1, (12.82) x(t + 1 ) = x(t 1) + v(t 1 )[1 x(t 1 )], (12.83) λ = π + λb, λ(t) =, t t 1, (12.84) λ(t 1 ) = λ(t + 1 ) + [K λ(t+ 1 )]v(t 1), (12.85) (π λb)x + ( 1 + λg)u (π λb)x + ( 1 + λg)u, u [, U], {Kx(t 1 ) C + λ(t + 1 )[1 x(t 1)]}v (t 1 ) {Kx(t 1 ) C + λ(t + 1 )[1 x(t 1)]}v, v [, 1], (12.86) (12.87)

16 Differential Games, Distributed Systems, and Impulse Control From (12.87), the optimal replacement control is also bang-bang: v (t 1 ) = bang[, 1; Kx(t 1 ) C + λ(t + 1 ){1 x(t 1)}]. (12.92) As in Section , we set the argument of (12.92) to zero to obtain the curve φ(t): x(t 1 ) = C λ(t+ 1 ) K λ(t + 1 ) = C π[1 e b(t t1) ]/b K π[1 e b(t t 1) ]/b : = φ(t 1). (12.93) To graph φ(t), we compute and compute the time φ() = C π[1 e bt ]/b K π[1 e bt ]/b, (12.94) ˆt = T + 1 b ln(1 bc π ), (12.95) which makes φ(ˆt) =. For simplicity of analysis, we assume gk 1 < gc < gπ/b and ˆt >. (12.96) In Exercise 12.1, you are asked to show that, < ˆt < t 2 < T. (12.97) We can now graph φ(t) as shown in Figure In plotting Figure 12.8, we have assumed that < φ() < 1. This is certainly the case, if T is not too large so that C < (π/b)(1 e bt ). As in the oil driller s problem, we obtain ψ(t) by using (12.88). From (12.92), we have v (t 1 ) = 1 and, therefore, λ(t 1 ) = K from (12.85) and x(t + 1 ) = 1 from (12.83). Since gk 1 from (12.96), we have 1 + gλ(t 1 ) = 1 + gk and, thus, u (t 1 ) = from (12.9). That is, zero maintenance is optimal on the old machine just before it is replaced. Since t 1 < ˆt < t 2 from (12.97), we have u (t + 1 ) = U from Figure That is, full maintenance is optimal on the new machine at the beginning.

17 12.3. Impulse Control 335 Figure 12.8: Replacement Time t 1 and Maintenance Policy Substituting all these values in (12.88), we have π π b [ ] { 1 e b(t t 1) b π } b [1 e b(t t1) g] U = πx(t 1 ) Kbx(t 1 ), (12.98) which yields x(t 1 ) = U(gπ b) + π(b gu)e b(t t 1) b(π Kb) := ψ(t 1 ). (12.99) The graph of ψ(t) is shown in Figure In this figure, AB represents the replacement curve. The optimal trajectory x (t) is shown by CDEFG under the assumption that t 1 > and t 1 < t 2, where t 2 is the intersection point of curves φ(t) and ψ(t), as shown in Figure Remark Figure 12.9 has been drawn for a choice of the problem parameters such that t 1 = t 2. In this case we have the option of either replacing the machine at t 1 = t 2, or not replacing it as sketched in the figure. In other words, this case is one of indifference between replacing and not replacing the machine. If t 1 is larger than t 2, clearly no machine replacement is needed.

18 Differential Games, Distributed Systems, and Impulse Control Figure 12.9: The Case t 1 = t 2 To complete the solution of the problem, it is easy to compute λ(t) in the interval [, t 1 ], and then use (12.9) to obtain the optimal maintenance policy for the old machine before replacement. Using λ(t 1 ) = K obtained above and the adjoint equation (12.84), we have λ(t) = Ke b(t t1) + π [ ] 1 e b(t t 1), t [, t 1 ]. (12.1) b Using (12.1) in (12.9), we can get u (t), t [, t 1 ]. More specifically, we can get the switching point t 1 by solving 1 + λg =. Thus, t 1 = t [ ] πg b b ln. (12.11) πg bkg If t 1, then the policy of no maintenance is optimal in the interval [, t 1 ]. If t 1 >, the optimal maintenance policy for the old machine is U, t [, t u 1 (t) = ], (12.12), t [t 1, t 1]. In plotting Figures 12.7 and 12.8, we have assumed that t 1 >. With (12.12) and (12.82), one can obtain an expression for x(t 1 ) in terms of t 1. Equating this expression to ψ(t 1 ) obtained in (12.99), one

19 Exercises for Chapter can write a transcendental equation for t 1. In Exercise 12.11, you are asked to complete the solution of the problem by obtaining the equation for t 1. EXERCISES FOR CHAPTER A bilinear quadratic advertising model (Deal, Sethi, and Thompson, 1979): Let x i be the market share of firm i and u i be its advertising rate, i = 1, 2. The state equations are ẋ 1 = b 1 u 1 (1 x 1 x 2 ) + e 1 (u 1 u 2 )(x 1 + x 2 ) a 1 x 1 x 1 () = x 1, ẋ 2 = b 2 u 2 (1 x 1 x 2 ) + e 2 (u 2 u 1 )(x 1 + x 2 ) a 2 x 2 x 2 () = x 2, where b i, e i, and a i are given positive constants. Firm i wants to maximize T J i = w i e ρt x i (T) + (c i x i u 2 i )e ρt dt, where w i, c i, and ρ are positive constants. Derive the necessary conditions for the open-loop Nash solution, and formulate the resulting boundary value problem. In a related paper, Deal (1979) provides a numerical solution to this problem with e 1 = e 2 = Develop the nonlinear model described in the last paragraph on page 314 by rewriting (12.19) and (12.22) for the model. Derive the adjoint equation for λ i for the ith producer, and show that the closed-loop Nash policy for producer i is given by f (v i ) = c i (p i λ i )x Verify that (12.48) is a solution of (12.46) Obtain or verify that (12.5) is the solution of (12.43) with initial condition (12.32) and (12.33) Assume φ() > as in (12.75). Show that Q < P and < ˆt < T.

CHAPTER 12 DIFFERENTIAL GAMES, DISTRIBUTED SYSTEMS,

CHAPTER 12 DIFFERENTIAL GAMES, DISTRIBUTED SYSTEMS, AND IMPULSE CONTROL Chapter 12 p. 1/91 DIFFERENTIAL GAMES, DISTRIBUTED SYSTEMS, AND IMPULSE CONTROL Theory of differential games: There may be more than