THIRD EDITION. Dimitri P. Bertsekas. Massachusetts Institute of Technology. Last Updated 10/1/2008. Athena Scientific, Belmont, Mass.

Size: px
Start display at page:

Download "THIRD EDITION. Dimitri P. Bertsekas. Massachusetts Institute of Technology. Last Updated 10/1/2008. Athena Scientific, Belmont, Mass."

Transcription

1 Dynamic Programming and Optimal Control THIRD EDITION Dimitri P. Bertsekas Massachusetts Institute of Technology Selected Theoretical Problem Solutions Last Updated 10/1/2008 Athena Scientific, Belmont, Mass. WWW site for book information and orders 1

2 NOTE This solution set is meant to be a significant extension of the scope and coverage of the book. It includes solutions to all of the book s exercises marked with the symbol www. The solutions are continuously updated and improved, and additional material, including new problems and their solutions are being added. Please send comments, and suggestions for additions and improvements to the author at dimitrib@mit.edu The solutions may be reproduced and distributed for personal or educational uses. 2

3 Solutions Vol. I, Chapter www (a) Given a sequence of matrix multiplications M 1 M 2 M k M k+1 M N we represent it by a sequence of numbers n 1,..., n N+1 }, where n k n k+1 is the dimension of M k. Let the initial state be x 0 = n 1,..., n N+1 }. Then choosing the first multiplication to be carried out corresponds to choosing an element from the set x 0 n 1, n N+1 }. For instance, choosing n 2 corresponds to multiplying M 1 and M 2, which results in a matrix of dimension n 1 n 3, and the initial state must be updated to discard n 2, the control applied at that stage. Hence at each stage the state represents the dimensions of the matrices resulting from the multiplications done so far. The allowable controls at stage k are u k x k n 1, n N+1 }. The system equation evolves according to x k+1 = x k u k }. Note that the control will be applied N 1 times, therefore the horizon of this problem is N 1. The terminal state is x N 1 = n 1, n N+1 } and the terminal cost is 0. The cost at stage k is given by the number of multiplications, g k (x k, u k ) = n a n uk n b, where n uk = u k and a = max i 1,..., N + 1} i < u k, i x k }, b = min i 1,..., N + 1} i > u k, i x k }. The DP algorithm for this problem is given by J k (x k ) = J N 1 (x N 1 ) = 0, ( min na n uk n b + J k+1 xk u k } )}, k = 0,..., N 2. u k x k n 1,n N+1} Now consider the given problem, where N = 3 and M 1 is 2 10, M 2 is 10 5, M 3 is 5 1. The optimal order is M 1 (M 2 M 3 ), requiring 70 multiplications. (b) In this part we can choose a much simpler state space. Let the state at stage k be given by a, b}, where a, b 1,..., N} and give the indices of the first and the last matrix in the current partial product. 3

4 There are two possible controls at each stage, which we denote by L and R. Note that L can be applied only when a 1 and R can be applied only when b N. The system equation evolves according to x k+1 = a 1, b}, if uk = L, a, b + 1}, if u k = R, k = 1,...,N 1. The terminal state is x N = 1, N} with cost 0. The cost at stage k is given by g k (x k, u k ) = na 1 n a n b+1, if u k = L, n a n b+1 n b+2, if u k = R, k = 1,...,N 1. For the initial stage, we can take x 0 to be the empty set and u 0 1,...,N}. The next state will be given by x 1 = u 0, u 0 }, and the cost incurred at the initial stage will be 0 for all possible controls www Let t 1 < t 2 < < t N 1 denote the times where g 1 (t) = g 2 (t). Clearly, it is never optimal to switch functions at any other times. We can therefore divide the problem into N 1 stages, where we want to determine for each stage k whether or not to switch activities at time t k. Define 0 if on activity g1 just before time t x k = k, 1 if on activity g 2 just before time t k, 0 to continue current activity, u k = 1 to switch between activities. Then the state at time t k+1 is simply x k+1 = (x k + u k ) mod 2, and the profit for stage k is The DP algorithm is then J N (x N ) = 0 g k (x k, u k ) = tk+1 t k g 1+xk+1 (t)dt u k c. J k (x k ) = min u k gk (x k, u k ) + J k+1 ( (xk + u k ) mod 2 )] www We consider part (b), since part (a) is essentially a special case. We will consider the problem of placing N 2 points between the endpoints A and B of the given subarc. We will show that the polygon of maximal area is obtained when the N 2 points are equally spaced on the subarc between A and B. Based on geometric considerations, we impose the restriction that the angle between any two successive points is no more than π. 4

5 As the subarc is traversed in the clockwise direction, we number sequentially the encountered points as x 1, x 2,..., x N, where x 1 and x N are the two endpoints A and B of the arc, respectively. For any point x on the subarc, we denote by φ the angle between x and x N (measured clockwise), and we denote by A k (φ) the maximal area of a polygon with vertices the center of the circle, the points x and x N, and N k 1 additional points on the subarc that lie between x and x N. Without loss of generality, we assume that the radius of the circle is 1, so that the area of the triangle that has as vertices two points on the circle and the center of the circle is (1/2)sinu, where u is the angle corresponding to the center. By viewing as state the angle φ k between x k and x N, and as control the angle u k between x k and x k+1, we obtain the following DP algorithm A k (φ k ) = max 0 u k minφ k,π} 1 2 sin u k + A k+1 (φ k u k ) ], k = 1,...,N 2. (1) Once x N 1 is chosen, there is no issue of further choice of a point lying between x N 1 and x N, so we have A N 1 (φ) = 1 sin φ, (2) 2 using the formula for the area of the triangle formed by x N 1, x N, and the center of the circle. It can be verified by induction that the above algorithm admits the closed form solution A k (φ k ) = 1 2 (N k) sin ( φk N k and that the optimal choice for u k is given by u k = φ k N k. ), k = 1,...,N 1, (3) Indeed, the formula (3) holds for k = N 1, by Eq. (2). Assuming that Eq. (3) holds for k + 1, we have from the DP algorithm (1) A k (φ k ) = max H k(u k, φ k ), (4) 0 u k minφ k,π} where H k (u k, φ k ) = 1 2 sin u k + 1 ( ) 2 (N k 1) sin φk u k. (5) N k 1 It can be verified that for a fixed φ k and in the range 0 u k minφ k, π}, the function H k (, φ k ) is concave (its second derivative is negative) and its derivative is 0 only at the point u k = φ k/(n k) which must therefore be its unique maximum. Substituting this value of u k in Eqs. (4) and (5), we obtain A k (φ k ) = 1 ( ) 2 sin φk + 1 ( ) N k 2 (N k 1) sin φk φ k /(N k) = 1 ( ) N k 1 2 (N k) sin φk, N k and the induction is complete. Thus, given an optimally placed point x k on the subarc with corresponding angle φ k, the next point x k+1 is obtained by advancing clockwise by φ k /(N k). This process, when started at x 1 with φ 1 equal to the angle between x 1 and x N, yields as the optimal solution an equally spaced placement of the points on the subarc. 5

6 1.25 www (a) Consider the problem with the state equal to the number of free rooms. At state x 1 with y customers remaining, if the inkeeper quotes a rate r i, the transition probability is p i to state x 1 (with a reward of r i ) and 1 p i to state x (with a reward of 0). The DP algorithm for this problem starts with the terminal conditions J(x, 0) = J(0, y) = 0, x 0, y 0, and is given by J(x, y) = max pi (r i + J(x 1, y 1)) + (1 p i )J(x, y 1) ], x 0. i=1,...,m From this equation and the terminal conditions, we can compute sequentially J(1, 1), J(1, 2),..., J(1, y) up to any desired integer y. Then, we can calculate J(2, 1), J(2, 2),..., J(2, y), etc. We first prove by induction on y that for all y, we have J(x, y) J(x 1, y), x 1. Indeed this is true for y = 0. Assuming this is true for a given y, we will prove that J(x, y + 1) J(x 1, y + 1), x 1. This relation holds for x = 1 since r i > 0. For x 2, by using the DP recursion, this relation is written as max pi (r i + J(x 1, y)) + (1 p i )J(x, y) ] max pi (r i + J(x 2, y)) + (1 p i )J(x 1, y) ]. i=1,...,m i=1,...,m By the induction hypothesis, each of the terms on the left-hand side is no less than the corresponding term on the right-hand side, so the above relation holds. The optimal rate is the one that maximizes in the DP algorithm, or equivalently, the one that maximizes p i r i + p i (J(x 1, y 1) J(x, y 1)). The highest rate r m simultaneously maximizes p i r i and minimizes p i. Since J(x 1, y 1) J(x, y 1) 0, as proved above, we see that the highest rate simultaneously maximizes p i r i and p i (J(x 1, y 1) J(x, y 1)), and so it maximizes their sum. (b) The algorithm given is the algorithm of Exercise 1.22 applied to the problem of part (a). Clearly, it is optimal to accept an offer of r i if r i is larger than the threshold r(x, y) = J(x, y 1) J(x 1, y 1). 6

7 1.26 www (a) The total net expected profit from the (buy/sell) investment decisions after transaction costs are deducted is N 1 ( E uk P k (x k ) c u k )}, k=0 where 1 if a unit of stock is bought at the kth period, u k = 1 if a unit of stock is sold at the kth period, 0 otherwise. With a policy that maximizes this expression, we simultaneously maximize the expected total worth of the stock held at time N minus the investment costs (including sale revenues). The DP algorithm is given by with J k (x k ) = max u k 1,0,1} u k P k (x k ) c u k + E J k+1 (x k+1 ) x k } ], J N (x N ) = 0, where J k+1 (x k+1 ) is the optimal expected profit when the stock price is x k+1 at time k+1. Since u k does not influence x k+1 and E } J k+1 (x k+1 ) x k, a decision uk 1, 0, 1} that maximizes u k P k (x k ) c u k at time k is optimal. Since P k (x k ) is monotonically nonincreasing in x k, it follows that it is optimal to set 1 if xk x k, u k = 1 if x k x k, 0 otherwise, where x k and x k are as in the problem statement. Note that the optimal expected profit J k (x k ) is given by N 1 J k (x k ) = E ui P i (x i ) c u i ]}. i=k max u i 1,0,1} (b) Let n k be the number of units of stock held at time k. If n k is less that N k (the number of remaining decisions), then the value n k should influence the decision at time k. We thus take as state the pair (x k, n k ), and the corresponding DP algorithm takes the form V k (x k, n k ) = max uk 1,0,1} max uk 0,1} u k P k (x k ) c u k + E } ] V k+1 (x k+1, n k + u k ) x k if n k 1, u k P k (x k ) c u k + E } ] V k+1 (x k+1, n k + u k ) x k if n k = 0, with Note that we have V N (x N, n N ) = 0. V k (x k, n k ) = J k (x k ), if n k N k, 7

8 where J k (x k ) is given by the formula derived in part (a). Using the above DP algorithm, we can calculate V N 1 (x N 1, n N 1 ) for all values of n N 1, then calculate V N 2 (x N 2, n N 2 ) for all values of n N 2, etc. To show the stated property of the optimal policy, we note that V k (x k, n k ) is monotonically nondecreasing with n k, since as n k decreases, the remaining decisions become more constrained. An optimal policy at time k is to buy if and to sell if P k (x k ) c + E V k+1 (x k+1, n k + 1) V k+1 (x k+1, n k ) x k } 0, (1) P k (x k ) c + E V k+1 (x k+1, n k 1) V k+1 (x k+1, n k ) x k } 0. (2) The expected value in Eq. (1) is nonnegative, which implies that if x k x k, implying that P k (x k ) c 0, then the buying decision is optimal. Similarly, the expected value in Eq. (2) is nonpositive, which implies that if x k < x k, implying that P k (x k ) c < 0, then the selling decision cannot be optimal. It is possible that buying at a price greater than x k is optimal depending on the size of the expected value term in Eq. (1). (c) Let m k be the number of allowed purchase decisions at time k, i.e., m plus the number of sale decisions up to k, minus the number of purchase decisions up to k. If m k is less than N k (the number of remaining decisions), then the value m k should influence the decision at time k. We thus take as state the pair (x k, m k ), and the corresponding DP algorithm takes the form with W k (x k, m k ) = max uk 1,0,1} max uk 1,0} u k P k (x k ) c u k + E } ] W k+1 (x k+1, m k u k ) x k if m k 1, u k P k (x k ) c u k + E } ] W k+1 (x k+1, m k u k ) x k if m k = 0, W N (x N, m N ) = 0. From this point the analysis is similar to the one of part (b). (d) The DP algorithm takes the form H k (x k, m k, n k ) = max u k P k (x k ) c u k + E } ] H k+1 (x k+1, m k u k, n k + u k ) x k u k 1,0,1} if m k 1 and n k 1, and similar formulas apply for the cases where m k = 0 and/or n k = 0 compare with the DP algorithms of parts (b) and (c)]. (e) Let r be the interest rate, so that x invested dollars at time k will become (1+r) N k x dollars at time N. Once we redefine the expected profit P k (x k ) to be the preceding analysis applies. P k (x) = Ex N x k = x} (1 + r) N k x, 8

9 Solutions Vol. I, Chapter www (a) We denote by P k the OPEN list after having removed k nodes from OPEN, (i.e., after having performed k iterations of the algorithm). We also denote d k j the value of d j at this time. Let b k = min j Pk d k j }. First, we show by induction that b 0 b 1 b k. Indeed, b 0 = 0 and b 1 = min j a sj } 0, which implies that b 0 b 1. Next, we assume that b 0 b k for some k 1; we shall prove that b k b k+1. Let j k+1 be the node removed from OPEN during the (k +1)th iteration. By assumption d k j k+1 = min j Pk d k j } = b k, and we also have = mind k i, dk j k+1 + a jk+1 i}. d k+1 i We have P k+1 = (P k j k+1 }) N k+1, where N k+1 is the set of nodes i satisfying d k+1 i and i / P k. Therefore, min d k+1 i } = min i P k+1 d k+1 i i (P k j k+1 }) N k+1 } = min min i P k j k+1 } dk+1 i ] }, min d k+1 i }. i N k+1 = d k j k+1 + a jk+1 i Clearly, Moreover, min i P k j k+1 } dk+1 min d k+1 i } = min d k j i N k+1 i N k+1 + a jk+1 i} d k j k+1. k+1 i } = min min i P k j k+1 } ] mind k i, dk j k+1 + a jk+1 i} min i P k j k+1 } dk i }, dk j k+1 ] = min i P k d k i } = dk j k+1, because we remove from OPEN this node with the minimum d k i. It follows that b k+1 = min i Pk+1 d k+1 i } d k j k+1 = b k. Now, we may prove that once a node exits OPEN, it never re-enters. Indeed, suppose that some node i exits OPEN after the k th iteration of the algorithm; then, d k 1 i after the l th iteration (with l > k ), then we have d l1 1 i > d l i = d l 1 j l = b k 1. If node i re-enters OPEN +a jl i d l 1 j l = b l 1. On the other hand, since d i is non-increasing, we have b k 1 = d k 1 i d l 1 i. Thus, we obtain b k 1 > b l 1, which contradicts the fact that b k is non-decreasing. Next, we claim the following: after the kth iteration, d k i equals the length of the shortest possible path from s to node i P k under the restriction that all intermediate nodes belong to C k. The proof will be done by induction on k. For k = 1, we have C 1 = s} and d 1 i = a si, and the claim is obviously true. Next, we assume that the claim is true after iterations 1,...,k; we shall show that it is also true after iteration k + 1. The node j k+1 removed from OPEN at the (k + 1)-st iteration satisfies min i Pk d k i } = d j k+1. Notice now that all neighbors of the nodes in C k belong either to C k or to P k. 9

10 It follows that the shortest path from s to j k+1 either goes through C k or it exits C k, then it passes through a node j P k, and eventually reaches j k+1. In the latter case, the length of this path is at least equal to the length of the shortest path from s to j through C k ; by the induction hypothesis, this equals d k j, which is at least dk j k+1. It follows that, for node j k+1 exiting the OPEN list, d k j k+1 equals the length of the shortest path from s to j k+1. Similarly, all nodes that have exited previously have their current estimate of d i equal to the corresponding shortest distance from s. Notice now that d k+1 i = min d k i, dk j k+1 + a jk+1 i}. For i / P k and i P k+1 it follows that the only neighbor of i in C k+1 = C k j k+1 } is node j k+1 ; for such a node i, d k i =, which leads to dk+1 i = d k j k+1 + a jk+1 i. For i j k+1 and i P k, the augmentation of C k by including j k+1 offers one more path from s to i through C k+1, namely that through j k+1. Recall that the shortest path from s to i through C k has length d k i (by the induction hypothesis). Thus, d k+1 i = min d k 1, dk j k+1 + a jk+1 i} is the length of the shortest path from s to i through Ck+1. The fact that each node exits OPEN with its current estimate of d i being equal to its shortest distance from s has been proved in the course of the previous inductive argument. (b) Since each node enters the OPEN list at most once, the algorithm will terminate in at most N 1 iterations. Updating the d i s during an iteration and selecting the node to exit OPEN requires O(N) arithmetic operations (i.e., a constant number of operations per node). Thus, the total number of operations is O(N 2 ). 2.6 www Proposition: If there exists a path from the origin to each node in T, the modified version of the label correcting algorithm terminates with UPPER < and yields a shortest path from the origin to each node in T. Otherwise the algorithm terminates with UPPER =. Proof: The proof is analogous to the proof of Proposition 3.1. To show that this algorithm terminates, we can use the identical argument in the proof of Proposition 3.1. Now suppose that for some node t T, there is no path from s to t. Then a node i such that (i, t) is an arc cannot enter the OPEN list because this would establish that there is a path from s to i, and therefore also a path from s to t. Thus, d t is never changed and UPPER is never reduced from its initial value of. Suppose now that there is a path from s to each node t T. Then, since there is a finite number of distinct lengths of paths from s to each t T that do not contain any cycles, and each cycle has nonnegative length, there is also a shortest path. For some arbitrary t, let (s, j 1, j 2,..., j k, t) be a shortest path and let d t be the corresponding shortest distance. We will show that the value of UPPER upon termination must be equal to d = max t T d t. Indeed, each subpath (s, j 1,..., j m ), m = 1,...,k, of the shortest path (s, j 1,..., j k, t) must be a shortest path from s to j m. If the value of UPPER is larger than d at termination, the same must be true throughout the algorithm, and therefore UPPER will also be larger than the length of all the paths (s, j 1,..., j m ), m = 1,...,k, throughout the algorithm, in view of the nonnegative arc length assumption. If, for each t T, the parent node j k enters the OPEN list with d jk equal to the shortest distance from s to j k, UPPER will be set to d in step 2 immediately following the next time the last of the nodes j k is examined by the algorithm in step 2. It follows that, for 10

11 some t T, the associated parent node j k will never enter the OPEN list with d j k equal to the shortest distance from s to j k. Similarly, and using also the nonnegative length assumption, this means that node j k 1 will never enter the OPEN list with d j k 1 equal to the shortest distance from s to j k 1. Proceeding backwards, we conclude that j 1 never enters the OPEN list with d j 1 equal to the shortest distance from s to j 1 which is equal to the length of the arc (s, j 1 )]. This happens, however, at the first iteration of the algorithm, obtaining a contradiction. It follows that at termination, UPPER will be equal to d. Finally, it can be seen that, upon termination of the algorithm, the path constructed by tracing the parent nodes backward from d to s has length equal to d t for each t T. Thus the path is a shortest path from s to t www (a) We first need to show that d k i k = 1, is the length of the shortest k-arc path originating at i, for i t. For d 1 i = min c ij j which is the length of shortest arc out of i. Assume that d k 1 i is the length of the shortest (k 1)-arc path out of i. Then d k i = min c ij + d k 1 j j } If d k i is not the length of the shortest k-arc path, the initial arc of the shortest path must pass through a node other than j. This is true since dj k 1 length of any (k 1)-step arc out of j. Let l be the alternative node. From the optimality principle distance of path through l = c il + d k 1 l But this contradicts the choice of d k i in the DP algorithm. Thus, d k i is the length of the shortest k-arc path out of i. Since d k t = 0 for all k, once a k-arc path out of i reaches t we have d κ i = dk i for all κ k. But with all arc lengths positive, d k i is just the shortest path from i to t. Clearly, there is some finite k such that the shortest k-path out of i reaches t. If this were not true, the assumption of positive arc lengths implies that the distance from i to t is infinite. Thus, the algorithm will yield the shortest distances in a finite number of steps. We can estimate the number of steps, N i as N i min j d jt min j,k d jk d k i (b) Let d k i be the distance estimate generated using the initial condition d0 i = and dk i be the estimate generated using the initial condition d 0 i = 0. In addition, let d i be the shortest distance from i to t. Lemma: d k i d k+1 i d i d k+1 i d k i (1) d k i = d i = d k i for k sufficently large (2) 11

12 Proof: Relation (1) follows from the monotonicity property of DP. Note that d 1 i d0 i and that d 1 i d 0 i. Equation (2) follows immediately from the convergence of DP (given d 0 i = ) and from part a). Proposition: For every k there exists a time T k such that for all T T k d k i dt i d k i, i = 1, 2,...,N Proof: The proof follows by induction. For k = 0 the proposition is true, given the positive arc length assumption. Assume it is true for a given k. Let N(i) be a set containing all nodes adjacent to i. For every j N(i) there exists a time, T j k such that d k j dt j d k j T T j k Let T be the first time i updates its distance estimate given that all d T j k j, j N(i), estimates have arrived. Let d T ij be the estimate of d j that i has at time T. Note that this may differ from d T j k j since the later estimates from j may have arrived before T. From the Lemma d k j d T ij d k j which, coupled with the monotonicity of DP, implies d k+1 i d T i d k+1 i T T Since each node never stops transmitting, T is finite and the proposition is proved. Using the Lemma, we see that there is a finite k such that di κ = d i = d κ i, κ k. Thus, from the proposition, there exists a finite time T such that d T i = d i for all T T and i. 12

13 Solutions Vol. I, Chapter www This problem is similar to the Brachistochrone Problem (Example 4.2) described in the text. As in that problem, we introduce the system ẋ = u and have a fixed terminal state problem x(0) = a and x(t) = b]. Letting g(x, u) = 1 + u 2, Cx the Hamiltonian is H(x, u, p) = g(x, u) + pu. Minimization of the Hamiltonian with respect to u yields p(t) = u g ( x(t), u(t) ). Since the Hamiltonian is constant along an optimal trajectory, we have g ( x(t), u(t) ) u g ( x(t), u(t) ) u(t) = constant. Substituting in the expression for g, we have 1 + u 2 u 2 Cx 1 + u2 Cx = u2 Cx = constant, which simplifies to ( x(t) ) 2 ( 1 + ( ẋ(t) ) 2 ) = constant. Thus an optimal trajectory satisfies the differential equation D ( x(t) ) 2 ẋ(t) = ( ) 2. x(t) It can be seen through straightforward calculation that the curve ( x(t) ) 2 + (t d)2 = D satisfies this differential equation, and thus the curve of minimum travel time from A to B is an arc of a circle. 13

14 3.9 www We have the system ẋ(t) = Ax(t) + Bu(t), for which we want to minimize the quadratic cost The Hamiltonian here is and the adjoint equation is with the terminal condition x(t) Q T x(t) + T 0 ( x(t) Qx(t) + u(t) Ru(t) ) dt. H(x, u, p) = x Qx + u Ru + p (Ax + Bu), ṗ(t) = A p(t) 2Qx(t), p(t) = 2Qx(T). Minimizing the Hamiltonian with respect to u yields the optimal control u (t) = arg min x (t) Qx (t) + u Ru + p ( Ax (t) + Bu )] u = 1 2 R 1 B p(t). We now hypothesize a linear relation between x (t) and p(t) 2K(t)x (t) = p(t), t 0, T], and show that K(t) can be obtained by solving the Riccati equation. Substituting this value of p(t) into the previous equation, we have u (t) = R 1 B K(t)x (t). By combining this result with the system equation, we have ẋ(t) = ( A BR 1 B K(t) ) x (t). (1) Differentiating 2K(t)x (t) = p(t) and using the adjoint equation yields Combining with Eq. (1), we have 2 K(t)x (t) + 2K(t)ẋ (t) = A 2K(t)x (t) 2Qx (t). K(t)x (t) + K(t) ( A BR 1 B K(t) ) x (t) = A K(t)x (t) Qx (t), and we thus see that K(t) should satisfy the Riccati equation K(t) = K(t)A A K(t) + K(t)BR 1 B K(t) Q. From the terminal condition p(t) = 2Qx(T), we have K(T) = Q, from which we can solve for K(t) using the Riccati equation. Once we have K(t), we have the optimal control u (t) = R 1 B K(t)x (t). By reversing the previous arguments, this control can then be shown to satisfy all the conditions of the Pontryagin Minimum Principle. 14

15 Solutions Vol. I, Chapter www (a) Clearly, the function J N is continuous. Assume that J k+1 is continuous. We have where J k (x) = } min cu + L(x + u) + G(x + u) u 0,1,...} G(y) = E wk J k+1 (y w k )} L(y) = E wk p max(0, w k y) + h max(0, y w k )} Thus, L is continuous. Since J k+1 is continuous, G is continuous for bounded w k. Assume that J k is not continuous. Then there exists a ˆx such that as y ˆx, J k (y) does not approach J k (ˆx). Let u y = arg min u 0,1,...} cu + L(y + u) + G(y + u) } Since L and G are continuous, the discontinuity of J k at ˆx implies But since u y is optimal for y, lim y ˆx uy uˆx lim cuy + L(y + u y ) + G(y + u y ) } < lim cuˆx + L(y + uˆx ) + G(y + uˆx ) } = J k (ˆx) y ˆx y ˆx This contradicts the optimality of J k (ˆx) for ˆx. Thus, J k is continuous. (b) Let Y k (x) = J k (x + 1) J k (x) Clearly Y N (x) is a non-decreasing function. Assume that Y k+1 (x) is non-decreasing. Then Y k (x + δ) Y k (x) = c(u x+δ+1 u x+δ ) c(u x+1 u x ) + L(x + δ u x+δ+1 ) L(x + δ + u x+δ ) L(x u x+1 ) L(x + u x )] + G(x + δ u x+δ+1 ) G(x + δ + u x+δ ) G(x u x+1 ) G(x + u x )] 15

16 Since J k is continuous, u y+δ = u y for δ sufficiently small. Thus, with δ small, Y k (x + δ) Y k (x) = L(x + δ u x+1 ) L(x + δ + u x ) L(x u x+1 ) L(x + u x )] + G(x + δ u x+1 ) G(x + δ + u x ) G(x u x+1 ) G(x + u x )] Now, since the control and penalty costs are linear, the optimal order given a stock of x is less than the optimal order given x + 1 stock plus one unit. Thus u x+1 u x u x If u x = u x+1 + 1, Y (x + δ) Y (x) = 0 and we have the desired result. Assume that u x = u x+1. Since L(x) is convex, L(x + 1) L(x) is non-decreasing. Using the assumption that Y k+1 (x) is non-decreasing, we have Y k (x + δ) Y k (x) = L(x + δ u x ) L(x + δ + u x ) L(x u x ) L(x + u x )] }} 0 Thus, Y k (x) is a non-decreasing function in x. + E wk Jk+1 (x + δ u x w k ) J k+1 (x + δ + u x w k ) J k+1 (x u x w k ) J k+1 (x + u x w k )] } }} 0 0 (c) From their definition and a straightforward induction it can be shown that J k (x) and J k(x, u) are bounded below. Furthermore,since lim x L k (x, u) =, we obtain lim x (x, 0) =. From the definition of J k (x, u), we have Let S k be the smallest real number satisfying J k (x, u) = J k (x + 1, u 1) + c, u 1, 2,...}. (1) J k (S k, 0) = J k (S k + 1, 0) + c (2) We show that S k is well defined. If no S k satisfying Eq. (2) exists, we must have either J k (x, 0) J k (x + 1, 0) > c, x R or J k (x, 0) J k (x+1, 0) < 0, x R, because J k is continuous. The first possibility contradicts the fact that lim x J k (x, 0) =. The second possibility implies that lim x J k (x, 0)+cx is finite. However, using the boundedness of J k+1 (x) from below, we obtain lim x J k (x, 0)+cx =. The contradiction shows that S k is well defined. We now derive the form of an optimal policy u k (x). Fix some x and consider first the case x S k. Using the fact that J k (x, u) J k (x+1, u) is nondecreasing function of x we have for any u 0, 1, 2,...} J k (x + 1, u) J k (x, u) J k (S k + 1, u)j k (S k, u) = J k (S k + 1, 0) J k (S k, 0) = c Therefore, J k (x, u + 1) = J k (x + 1, u) + c J k (x, u) u 0, 1,...}, x S k. 16

17 This shows that u = 0 minimizes J k (x, u), for all x S k. Now let x S k n, S k n+1), n 1, 2,...}. Using Eq. (1), we have J k (x, n + m) J k (x, n) = J k (x + n, m) J k (x + n, 0) 0 m in 0, 1,...}. (3) However, if u < n then x + u < S k and Therefore, J k (x + u + 1, 0) J k (x + u, 0) < J k (S k + 1, 0) J k (S k, 0) = c. J k (x, u+1) = J k (x+u+1, 0)+(u+1)c < J k (x+u, 0)+uc = J k (x, u) u 0, 1,...}, n < n. (4) Inequalities (3) and (4) show that u = n minimizes J k (x, u) whenever x S k n, S k n + 1) www Let the state x k be defined as T, if the selection has already terminated x k = 1, if the k th object observed has rank 1 0, if the k th object observed has rank < 1 The system evolves according to The cost function is given by x k+1 = T, if uk = stop or x k = T w k, if u k = continue k g k (x k, u k, w k ) = N, if x k = 1 and u k = stop 0, otherwise g N (x N ) = 1, if xn = 1 0, otherwise Note that if termination is selected at stage k and x k 1 then the probability of success is 0. Thus, if x k = 0 it is always optimal to continue. To complete the model we have to determine P(w k x k, u k ) = P(w k ) when the control u k = continue. At stage k, we have already selected k objects from a sorted set. Since we know nothing else about these objects the new element can, with equal probability, be in any relation with the already observed objects a j < a i1 < < a i2 < < a ik }} k+1 possible positions for a k+1 17

18 Thus, P(w k = 1) = 1 k + 1, P(w k = 0) = k k + 1 ( ) Proposition: If k S 1 N = i N i 1}, then Proof: For k = N 1 and µ N 1 (0) = continue. Also, J k (0) = k N ( 1 N ), J k (1) = k k N. J N 1 (0) = max }} 0 stop N 1 J N 1 (1) = max N }} stop and µ N 1 (1) = stop. Note that N 1 S N for all S N. Assume the proposition is true for J k+1 (x k+1 ). Then Now, J k (0) = max }} 0 stop k J k (1) = max N }} stop EJ k+1 (w k )} = 1 k + 1 k + 1 N + k k + 1 k + 1 N = k ( 1 N N ) k ], Ew N 1 } = 1 }} N continue ], Ew N 1 } = N 1 }} N continue ], EJ k+1 (w k )} }} continue ], EJ k+1 (w k )} }} continue ( 1 N ) k + 1 Clearly and µ k (0) = continue. If k S N, J k (0) = k N ( 1 N ) k J k (1) = k N and µ k (1) = stop. Q.E.D. 18

19 Proposition: If k S N J k (0) = J k (1) = δ 1 N ( 1 N ) δ 1 where δ is the minimum element of S N. Proof: For k = δ 1 J k (0) = 1 δ δ N + δ 1 ( δ 1 δ N N ) δ = δ 1 ( 1 N N ) δ 1 δ 1 J k (1) = max = δ 1 N and µ δ 1 (0) = µ δ 1 (1) = continue. Assume the proposition is true for J k (x k ). Then and µ k 1 (0) = continue. N, δ 1 ( 1 N N )] δ 1 ( 1 N ) δ 1 J k 1 (0) = 1 k J k(1) + k 1 k J k(0) = J k (0) 1 J k 1 (1) = max δ 1 = max N = J k (0) k J k(1) + k 1 k J k(0), k 1 ] N ( 1 N δ 1 ), k 1 ] N and µ k 1 (1) = continue. Q.E.D. ( Thus the optimum ) policy is to continue until the δ th object, where δ is the minimum integer such 1 that N δ 1, and then stop at the first time an element is observed with largest rank www (a) In order that A k x + B k u + w X for all w W k, it is sufficient that A k x + B k u belong to some ellipsoid X such that the vector sum of X and W k is contained in X. The ellipsoid X = z z Fz 1}, 19

20 where for some scalar β (0, 1), F 1 = (1 β)(ψ 1 β 1 D 1 k ) has this property (based on the hint and assuming that F 1 is well-defined as a positive definite matrix). Thus, it is sufficient that x and u are such that (A k x + B k u) F(A k x + B k u) 1. (1) In order that for a given x, there exists u with u R k u 1 such that Eq. (1) is satisfied as well as it is sufficient that x is such that x Ξx 1, min x Ξx + u R k u + (A k x + B k u) F(A k x + B k u) ] 1, (2) u R m or by carrying out explicitly the quadratic minimization above, where The control law x Kx 1, K = A k (F 1 + B k R 1 k B k ) 1 + Ξ. µ(x) = (R k + B k FB k) 1 B k FA kx attains the minimum in Eq. (2) for all x, so it achieves reachability. (b) Follows by iterative application of the results of part (a), starting with k = N 1 and proceeding backwards. (c) Follows from the arguments of part (a). 20

21 Solutions Vol. I, Chapter www Define Then and y N = x N, y k = x k + A 1 k w k + A 1 k A 1 k+1 w k A 1 k Now, the cost function is the expected value of y k = x k + A 1 k (w k x k+1 ) + A 1 k y k+1 = x k + A 1 k ( A kx k B k u k ) + A 1 k y k+1 = A 1 k B ku k + A 1 k y k+1 y k+1 = A k y k + B k u k. A 1 N 1 w N 1. We have N 1 N 1 x N Qx N + u k R k u k = y 0 K 0 y 0 + (y k+1 K k+1 y k+1 y k K k y k + u k R k u k ). k=0 k=0 y k+1 K k+1 y k+1 y k K k y k + u k R k u k = (A k y k + B k u k ) K k+1 (A k y k + B k u k ) + u k R k u k Thus, the cost function can be written as E y 0 K 0 y 0 + y k A k K k+1 K k+1 B k (B k K k+1 B k ) 1 B k τrk k+1 ]A k y k = y k A k K k+1 A k y k + 2y k A k K k+1b k u k + u k B k K k+1 B k u k y k A k K k+1 A k y k + y k A k K k+1 B k P 1 k B k K k+1a k y k + u k R k u k = 2y k L k P ku k + u k P k u k + y k L k P k L k y k = (u k L k y k ) P k (u k L k y k ). N 1 k=0 (u k L k y k ) P k (u k L k y k ) The problem now is to find µ k (I k), k = 0, 1,...,N 1, that minimize over admissible control laws µ k (I k ), k = 0, 1,..., N 1, the cost function N 1 ( ) Pk ( ) } E y 0 K 0 y 0 + µk (I k ) L k y k µk (I k ) L k y k. k=0 21 }.

22 We do this minimization by first minimizing over µ N 1, then over µ N 2, etc. The minimization over µ N 1 involves just the last term in the sum and can be written as (µn 1 ) PN 1 ( ) } min E (I N 1 ) L N 1 y N 1 µn 1 (I N 1 ) L N 1 y N 1 µ N 1 (un 1 ) PN 1 ( ) }} = E min E L N 1 y N 1 un 1 L N 1 y IN 1 N 1. u N 1 Thus this minimization yields the optimal control law for the last stage: µ N 1 (I } N 1) = L N 1 E y IN 1 N 1. Recall here that, generically, Ez I} minimizes over u the expression Ez (u z) P(u z) } for any random variable z, any conditioning variable I, and any positive semidefinite matrix P.] The minimization over µ N 2 involves (µn 2 ) PN 2 ( ) } E (I N 2 ) L N 2 y N 2 µn 2 (I N 2 ) L N 2 y N 2 + E (EyN 1 I N 1 } y N 1 ) L N 1 P N 1L N 1 ( EyN 1 I N 1 } y N 1 ) }. However, as in Lemma 5.2.1, the term Ey N 1 I N 1 } y N 1 does not depend on any of the controls (it is a function of x 0, w 0,..., w N 2, v 0,..., v N 1 ). Thus the minimization over µ N 2 involves just the first term above and yields similarly as before µ N 2 (I } N 2) = L N 2 E y IN 2 N 2. Proceeding similarly, we prove that for all k µ k (I k) = L k E y k Ik }. Note: The preceding proof can be used to provide a quick proof of the separation theorem for linearquadratic problems in the case where x 0, w 0,...,w N 1, v 0,...,v N 1 are independent. If the cost function is N 1 ( ) } E x N Q N x N + xk Q k x k + u k R k u k the preceding calculation can be modified to show that the cost function can be written as N 1 ( E x 0 K 0 x 0 + (uk L k x k ) ) } P k (u k L k x k ) + w k K k+1 w k. k=0 By repeating the preceding proof we then obtain the optimal control law as µ k (I k) = L k E x k Ik } k=0 22

23 5.3 www The control at time k is (u k, α k ), where α k is a variable taking values 1 (if the next measurement at time k + 1 is of type 1) or 2 (if the next measurement is of type 2). The cost functional is } N 1 N 1 E x N Qx N + (x k Qx k + u k Ru k ) + g αk. We apply the DP algorithm for N = 2. We have from the Riccatti equation So k=0 J 1 (I 1 ) = J 1 (z 0, z 1, u 0, α 0 ) = E x1 x1 (A QA + Q)x 1 I 1 } + Ew1 w Qw} k=0 + min u 1 u1 (B QB + R)u Ex 1 I 1 } A QBu 1 } + ming 1, g 2 ]. µ 1 (I 1) = (B QB + R) 1 B QAEx 1 I 1 }, 1, if α 1 (I g1 g 1) = 2, 2, otherwise. Note that the measurement selected at k = 1 does not depend on I 1. This is intuitively clear since the measurement z 2 will not be used by the controller, so its selection should be based on measurement cost alone and not on the basis of the quality of estimate. The situation is different once more than one stage is considered. Using a simple modification of the analysis in Section 5.2 of the text, we have J 0 (I 0 ) = J 0 (z 0 ) = min E x0 Qx 0 + u 0 Ru 0 + Ax 0 + Bu 0 + w } } 0 K 0 Ax 0 + Bu 0 + w 0 z0 x 0,w 0 + min E E x1 Ex 1 I 1 }] P 1 x 1 Ex 1 I 1 }] } } ] I1 z0, u 0, α 0 + g α0 α 0 z 1 x 1 u 0 + E w1 w 1 Qw 1 } + ming 1, g 2 ]. Note that the minimization of the second bracket is indicated only with respect to α 0 and not u 0. The reason is that quantity in the second bracket is the error covariance of the estimation error (weighted by P 1 ) and, as shown in the text, it does not depend on u 0. Because all stochastic variables are Gaussian, the quantity in the second bracket does not depend on z 0. (The weighted error covariance produced by the Kalman filter is precomputable and depends only on the system and measurement matrices and noise covariances but not on the measurements received.) In fact E E x1 Ex 1 I 1 }] P 1 x 1 Ex 1 I 1 }] } } I1 z0, u 0, α 0 z 1 x 1 ( Tr = Tr P P 1 2, if α 0 = 1, ( ) 2 P P 1 2 1, if α 0 = 2, 23 1 )

24 where Tr( ) denotes the trace of a matrix, and ( ) denotes the error covariance of the Kalman filter estimate if a measurement of type 1 (type 2) is taken at k = 0. Thus at time k = 0, we have that the optimal measurement chosen does not depend on z 0 and is of type 1 if ( ) ( ) Tr P Σ P g 1 Tr P Σ P g 2 and is of type 2 otherwise. 5.7 www a) We have p j k+1 = P(x k+1 = j z 0,..., z k+1, u 0,..., u k ) = P(x k+1 = j I k+1 ) = P(x k+1 = j, z k+1 I k, u k ) P(z k+1 I k, u k ) n i=1 = P(x k = i)p(x k+1 = j x k = i, u k )P(z k+1 u k, x k+1 = j) n n s=1 i=1 P(x k = i)p(x k+1 = s x k = i, u k )P(z k+1 u k, x k+1 = s) n i=1 = pi k p ij(u k )r j (u k, z k+1 ) n n s=1 i=1 pi k p is(u k )r s (u k, z k+1 ). Rewriting p j k+1 in vector form, we have p j k+1 = r j (u k, z k+1 )P(u k ) P k ] j n s=1 r s(u k, z k+1 )P(u k ) P k ] s, j = 1,...,n. Therefore, P k+1 = r(u k, z k+1 )] P(u k ) P k ] r(u k, z k+1 ) P(u k ) P k. b) The DP algorithm for this system is: J N 1 (P N 1 ) = min u = min u p i N 1 p ij (u)g N 1 (i, u, j) i=1 n pn 1 i GN 1 (u) ] } i i=1 = min u P N 1 G N 1 (u) } 24

25 J k (P k ) = min u i=1 = min u c) For k = N 1, p i k P k G k(u) + p ij (u)g k (i, u, j) + q θ=1 Now assume J k (λp k ) = λ J k (P k ). Then, J k 1 (λp k 1 ) = min u = min u p i k i=1 p ij (u) q r j (u, θ) J k+1 (P k+1 P k, u, θ) θ=1 r(u, θ) P(u) P k Jk+1 r(u, θ)] P(u) P k ] r(u, θ) P(u) P k J N 1 (λp N 1 ) = min λp u N 1 G N 1 (u) } = min λp i u N 1 GN 1 (u) ] i } = λmin u i=1 = min u λ λp k 1 G k 1(u) + λp k 1 G k 1(u) + λ P k 1 G k 1(u) + p i N 1 G N 1(u)] i } i=1 n = λmin p i u N 1 G } N 1(u)] i i=1 n = λmin p i u N 1 G } N 1(u)] i i=1 = λ J N 1 (P N 1 ). ] }. } q r(u, θ) P(u) λp k 1 Jk (P k P k 1, u, θ) θ=1 } q r(u, θ) P(u) P k 1 Jk (P k P k 1, u, θ) θ=1 } q r(u, θ) P(u) P k 1 Jk (P k P k 1, u, θ) θ=1 = λ J k 1 (P k 1 ). Q.E.D. For any u, r(u, θ) P(u) P k is a scalar. Therefore, letting λ = r(u, θ) P(u) P k, we have q ] } r(u, θ)] J k (P k ) = min P u k G P(u) P k(u) + r(u, θ) P(u) k ] P k Jk+1 r(u, θ) P(u) P k θ=1 ] q = min P u k G k(u) + J k+1 (r(u, θ)] P(u) P k ]). θ=1 d) For k = N 1, we have J N 1 (P N 1 ) = min u P N 1 G N 1(u)], and so J N 1 (P N 1 ) has the desired form J N 1 (P N 1 ) = min P N 1 α1 N 1,..., P N 1 αm N 1], 25

26 where α j N 1 = G N 1(u j ) and u j is the jth element of the control constraint set. Assume that J k+1 (P k+1 ) = min P k+1 α1 k+1,...,p ] k+1 αm k+1 k+1. Then, using the expression from part (c) for J k (P k ), J k (P k ) = min u = min u = min u = min u P k G k(u) + P k G k(u) + P k G k(u) + P k q ( J k+1 r(u, θ)] P(u) P k ] )] θ=1 q r(u, min θ)] P(u) P k ] } ] ] α m m=1,...,m k+1 k+1 θ=1 q ] ] min P m=1,...,m k P(u)r(u, θ) α m k+1 k+1 θ=1 G k (u) + q ] }] min P(u)r(u, θ) α m m=1,...,m k+1 k+1 ], θ=1 = min P k α1 k,..., P k αm k k where α 1 k,..., αm k k are all possible vectors of the form G k (u) + q θ=1 P(u)r(u, θ) α m u,θ k+1, as u ranges over the finite set of controls, θ ranges over the set of observation vector indexes 1,..., q}, and m u,θ ranges over the set of indexes 1,..., m k+1 }. The induction is thus complete. For a quick way to understand the preceding proof, based on polyhedral concavity notions, note that the conclusion is equivalent to asserting that J k (P k ) is a positively homogeneous, concave polyhedral function. The preceding induction argument amounts to showing that the DP formula of part (c) preserves the positively homogeneous, concave polyhedral property of Jk+1 (P k+1 ). This is indeed evident from the formula, since taking minima and nonnegative weighted sums of positively homogeneous, concave polyhedral functions results in a positively homogeneous, concave polyhedral function. 26

27 Solutions Vol. I, Chapter www First, we notice that α β pruning is applicable only for arcs that point to right children, so that at least one sequence of moves (starting from the current position and ending at a terminal position, that is, one with no children) has been considered. Furthermore, due to depth-first search the score at the ancestor positions has been derived without taking into account the positions that can be reached from the current point. Suppose now that α-pruning applies at a position with Black to play. Then, if the current position is reached (due to a move by White), Black can respond in such a way that the final position will be worse (for White) than it would have been if the current position were not reached. What α-pruning saves is searching for even worse positions (emanating from the current position). The reason for this is that White will never play so that Black reaches the current position, because he certainly has a better alternative. A similar argument applies for β pruning. A second approach: Let us suppose that it is the WHITE s turn to move. We shall prove that a β cutoff occurring at the nth position will not affect the backed up score. We have from the definition of β β = mintbs of all ancestors of n (white) where BLACK has the move}. For a cutoff to occur: TBS(n) > β. Observe first of all that β = TBS(n 1 ) for some ancestor n 1 where BLACK has the move. Then there exists a path n 1, n 2,...,n k, n. Since it is WHITE s move at n we have that TBS(n) = maxtbs(n), BS(n i )} > β, where n i are the descendants of n. Consider now a position n k. Then TBS(n k ) will either remain unchanged or will increase to a value greater than β as a result of the exploration of node n. Proceeding similarly, we conclude that TBS(n 2 ) will either remain the same or change to a value greater than β. Finally at node n 1 we have that TBS(n 1 ) will not change since it is BLACK s turn to move and he will choose the move with minimum score. Thus the backed up score and the choice of the next move are unaffected from β pruning. A similar argument holds for α pruning. 27

28 Solutions Vol. I, Chapter www A threshold policy is specified by a threshold integer m and has the form Process the orders if and only if their number exceeds m. The cost function corresponding to a threshold policy specified by m will be denoted by J m. By Prop. 3.1(c), this cost function is the unique solution of system of equations J m (i) = K + α(1 p)jm (0) + αpj m (1) if i > m, ci + α(1 p)j m (i) + αpj m (i + 1) if i m. (1) Thus for all i m, we have J m (i) = ci + αpj m(i + 1), 1 α(1 p) J m (i 1) = c(i 1) + αpj m(i). 1 α(1 p) From these two equations it follows that for all i m, we have Denote now J m (i) J m (i + 1) J m (i 1) < J m (i). (2) γ = K + α(1 p)j m (0) + αpj m (1). Consider the policy iteration algorithm, and a policy µ that is the successor policy to the threshold policy corresponding to m. This policy has the form Process the orders if and only if or equivalently K + α(1 p)j m (0) + αpj m (1) ci + α(1 p)j m (i) + αpj m (i + 1) γ ci + α(1 p)j m (i) + αpj m (i + 1). In order for this policy to be a threshold policy, we must have for all i γ c(i 1) + α(1 p)j m (i 1) + αpj m (i) γ ci + α(1 p)j m (i) + αpj m (i + 1). (3) This relation holds if the function J m is monotonically nondecreasing, which from Eqs. (1) and (2) will be true if J m (m) J m (m + 1) = γ. Let us assume that the opposite case holds, where γ < J m (m). For i > m, we have J m (i) = γ, so that ci + α(1 p)j m (i) + αpj m (i + 1) = ci + αγ. (4) 28

29 We also have J m (m) = cm + αpγ 1 α(1 p), from which, together with the hypothesis J m (m) > γ, we obtain Thus, from Eqs. (4) and (5) we have cm + αγ > γ. (5) ci + α(1 p)j m (i) + αpj m (i + 1) > γ, for all i > m, (6) so that Eq. (3) is satisfied for all i > m. For i m, we have ci + α(1 p)j m (i) + αpj m (i + 1) = J m (i), so that the desired relation (3) takes the form γ J m (i 1) γ J m (i). (7) To show that this relation holds for all i m, we argue by contradiction. Suppose that for some i m we have J m (i) < γ J m (i 1). Then since J m (m) > γ, there must exist some i > i such that J m (i 1) < J m (i). But then Eq. (2) would imply that J m (j 1) < J m (j) for all j i, contradicting the relation J m (i) < γ J m (i 1) assumed earlier. Thus, Eq. (7) holds for all i m so that Eq. (3) holds for all i. The proof is complete www Let Assumption 2.1 hold and let π = µ 0, µ 1,...} be an admissible policy. Consider also the sets S k (i) given in the hint with S 0 (i) = i}. If t S n (i) for all π and i, we are done. Otherwise, we must have for some π and i, and some k < n, S k (i) = S k+1 (i) while t / S k (i). For j S k (i), let m(j) be the smallest integer m such that j S m. Consider a stationary policy µ with µ(j) = µ m(j) (j) for all j S k (i). For this policy we have for all j S k (i), p jl (µ(j)) > 0 l S k (i). This implies that the termination state t is not reachable from all states in S k (i) under the stationary policy µ, and contradicts Assumption

30 Solutions Vol. II, Chapter (a) We have p ij (u) = Therefore, p ij (u) are transition probabilities. (b) We have for the modified problem J (i) = min u U(i) = min u U(i) = } pij (u) m j 1 n k=1 m k n p ij(u) n m j = 1. g(i, u) + α 1 g(i, u) + α 1 n k=1 m k m j p ij (u)j (j) α p ij (u) m j 1 n k=1 m J (j) k m k J (k). k=1 So J (i) + α n k=1 m kj (k) = min 1 α u U(i) J (i) + α n k=1 m kj (k) = min 1 α u U(i) n g(i, u) + α p ij (u)j (j) α g(i, u) + α k=1 p ij (u) (J (j) + α n m k (1 1 1 α ) J (k) }} α 1 α k=1 m ) kj (k). 1 α Thus Q.E.D. J (i) + α n k=1 m kj (k) = J 1 α (i), i. 30

31 1.7 We show that for any bounded function J : S R, we have For any µ, define and note that J T(J) T(J) F(J), (1) J T(J) T(J) F(J). (2) F µ (J)(i) = g(i, µ(i)) + α j i p ij(µ(i))j(j) 1 αp ii (µ(i)) F µ (J)(i) = T µ(j)(i) αp ii (µ(i))j(i). (3) 1 αp ii (µ(i)) Fix ǫ > 0. If J T(J), let µ be such that F µ (J) F(J) + ǫe. Then, using Eq. (3), F(J)(i) + ǫ F µ (J)(i) = T µ(j)(i) αp ii (µ(i))j(i) 1 αp ii (µ(i)) T(J)(i) αp ii(µ(i))t(j)(i) 1 αp ii (µ(i)) = T(J)(i). Since ǫ > 0 is arbitrary, we obtain F(J)(i) T(J)(i). Similarly, if J T(J), let µ be such that T µ (J) T(J) + ǫe. Then, using Eq. (3), F(J)(i) F µ (J)(i) = T µ(j)(i) αp ii (µ(i))j(i) 1 αp ii (µ(i)) T(J)(i) + ǫ αp ii(µ(i))t(j)(i) 1 αp ii (µ(i)) T(J)(i) + ǫ 1 α. Since ǫ > 0 is arbitrary, we obtain F(J)(i) T(J)(i). From (1) and (2) we see that F and T have the same fixed points, so J is the unique fixed point of F. Using the definition of F, it can be seen that for any scalar r > 0 we have Furthermore, F is monotone, that is F(J + re) F(J) + αre, F(J) αre F(J re). (4) For any bounded function J, let r > 0 be such that J J F(J) F(J ). (5) J re J J + re. Applying F repeatedly to this equation and using Eqs. (4) and (5), we obtain F k (J) α k re J F k (J) + α k re. Therefore F k (J) converges to J. From Eqs. (1), (2), and (5) we see that J T(J) T k (J) F k (J) J, 31

32 J T(J) T k (J) F k (J) J. These equations demonstrate the faster convergence property of F over T. As a final result (not explicitly required in the problem statement), we show that for any two bounded functions J : S R, J : S R, we have max j so F is a contraction mapping with modulus α. Indeed, we have F(J)(j) F(J )(j) α max J(j) J (j), (6) j g(i, u) + α j i F(J)(i) = min p } ij(u)j(j) u U(i) 1 αp ii (u) g(i, u) + α j i = min p ij(u)j (j) + α j i p ij(u)j(j) J } (j)] u U(i) 1 αp ii (u) 1 αp ii (u) F(J )(i) + max J(j) J (j), i, j where we have used the fact 1 αp ii (u) 1 p ii (u) = j i p ij (u). Thus, we have F(J)(i) F(J )(i) α max J(j) J (j), i. j The roles of J and J may be reversed, so we can also obtain Combining the last two inequalities, we see that By taking the maximum over i, Eq. (6) follows. F(J )(i) F(J)(i) α max J(j) J (j), i. j F(J)(i) F(J )(i) α max J(j) J (j), i. j 1.9 (a) Since J, J B(S), i.e., are real-valued, bounded functions on S, we know that the infimum and the supremum of their difference is finite. We shall denote ( m = min J(x) J (x) ) x S and ( M = max J(x) J (x) ). x s 32

33 Thus or m J(x) J (x) M, x S, J (x) + m J(x) J (x) + M, x S. Now we apply the mapping T on the above inequalities. By property (1) we know that T will preserve the inequalities. Thus By property (2) we know that T(J + me)(x) T(J)(x) T(J + Me)(x), x S. T(J)(x) + mina 1 r, a 2 r] T(J + re)(x) T(J)(x) + maxa 1 r, a 2 r]. If we replace r by m or M, we get the inequalities T(J )(x) + mina 1 m, a 2 m] T(J + me)(x) T(J )(x) + maxa 1 m, a 2 m] and Thus so that T(J )(x) + mina 1 M, a 2 M] T(J + Me)(x) T(J )(x) + maxa 1 M, a 2 M]. T(J )(x) + mina 1 m, a 2 m] T(J)(x) T(J )(x) + maxa 1 M, a 2 M], T(J)(x) T(J )(x) maxa 1 M, a 2 M, a 1 m), a 2 m ]. We also have Thus from which maxa 1 M, a 2 M, a 1 m, a 1 m, a 2 m ] a 2 max M, m ] a 2 sup J(x) J (x). x S T(J)(x) T(J )(x) a 2 max x S J(x) J (x) max T(J)(x) T(J )(x) a 2 max J(x) J (x). x S x S Thus T is a contraction mapping since we know by the statement of the problem that 0 a 1 a 2 < 1. Since the set B(S) of bounded real valued functions is a complete linear space, we conclude that the contraction mapping T has a unique fixed point, J, and lim k T k (J)(x) = J (x). (b) We shall first prove the lower bounds of J (x). The upper bounds follow by a similar argument. Since J, T(J) B(S), there exists a c R, (c < ), such that J(x) + c T(J)(x). (1) 33

34 We apply T on both sides of (1) and since T preserves the inequalities (by assumption (1)) we have by applying the relation of assumption (2). J(x) + minc + a 1 c, c + a 2 c] T(J)(x) + mina 1 c, a 2 c] T(J + ce)(x) T 2 (J)(x). (2) Similarly, if we apply T again we get, Thus by induction we conclude J(x) + min i (1,2) c + a ic, c + a 2 i c] T(J) + mina 1c + a 2 1 c, a 2c + a 2 2 c] T 2 (J) + mina 2 1 c, a2 2 c] T(T(J) + mina 1c, a 2 c]e)(x) T 3 (J)(x). k k k k J(x) + min a m 1 c, a m 2 c] T(J)(x) + min a m 1 c, a m 2 c]... m=0 m=0 m=1 m=1 T k (J)(x) + mina k 1 c, ak 2 c] T k+1 (J)(x). (3) By taking the limit as k and noting that the quantities in the minimization are monotone, and either nonnegative or nonpositive, we conclude that ] ] 1 1 a1 a 2 J(x) + min c, c T(J)(x) + min c, c 1 a 1 1 a 2 1 a 1 1 a 2 a k T k (J)(x) + min 1 a k ] 2 c, c 1 a 1 1 a 2 ] (4) a k+1 T k+1 1 a k+1 2 (J)(x) + min c, c 1 a 1 1 a 2 Finally we note that Thus J (x). mina k 1 c, ak 2 c] T k+1 (J)(x) T k (J)(x). mina k 1 c, ak 2c] inf (T k+1 (J)(x) T k (J)(x)). x S Let b k+1 = inf x S (T k+1 (J)(x) T k (J)(x)). Thus mina k 1 c, ak 2 c] b k+1. From the above relation we infer that ] a k+1 ] 1 c min, ak+1 2 c a1 a 2 min b k1, b k+1 = c k+1 1 a 1 1 a 2 1 a 1 1 a 2 Therefore This relationship gives for k = 1 a k T k (J)(x) + min 1 c, 1 a 1 T(J)(x) + min a1 c 1 a 1, a k 2 c ] T 1 a k+1 (J)(x) + c k+1. 2 ] a 2 c T 1 a 2 (J)(x) + c

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Abstract Dynamic Programming

Abstract Dynamic Programming Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"

More information

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications May 2012 Report LIDS - 2884 Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications Dimitri P. Bertsekas Abstract We consider a class of generalized dynamic programming

More information

lim sup f(x k ) d k < 0, {α k } K 0. Hence, by the definition of the Armijo rule, we must have for some index k 0

lim sup f(x k ) d k < 0, {α k } K 0. Hence, by the definition of the Armijo rule, we must have for some index k 0 Corrections for the book NONLINEAR PROGRAMMING: 2ND EDITION Athena Scientific 1999 by Dimitri P. Bertsekas Note: Many of these corrections have been incorporated in the 2nd Printing of the book. See the

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Linear-Quadratic Optimal Control: Full-State Feedback

Linear-Quadratic Optimal Control: Full-State Feedback Chapter 4 Linear-Quadratic Optimal Control: Full-State Feedback 1 Linear quadratic optimization is a basic method for designing controllers for linear (and often nonlinear) dynamical systems and is actually

More information

5. Solving the Bellman Equation

5. Solving the Bellman Equation 5. Solving the Bellman Equation In the next two lectures, we will look at several methods to solve Bellman s Equation (BE) for the stochastic shortest path problem: Value Iteration, Policy Iteration and

More information

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6 Nonlinear Programming 3rd Edition Theoretical Solutions Manual Chapter 6 Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts 1 NOTE This manual contains

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

On the Convergence of Optimistic Policy Iteration

On the Convergence of Optimistic Policy Iteration Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology

More information

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function

More information

6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE Undiscounted problems Stochastic shortest path problems (SSP) Proper and improper policies Analysis and computational methods for SSP Pathologies of

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Optimal Stopping Problems

Optimal Stopping Problems 2.997 Decision Making in Large Scale Systems March 3 MIT, Spring 2004 Handout #9 Lecture Note 5 Optimal Stopping Problems In the last lecture, we have analyzed the behavior of T D(λ) for approximating

More information

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case Dr. Burak Demirel Faculty of Electrical Engineering and Information Technology, University of Paderborn December 15, 2015 2 Previous

More information

Regular Policies in Abstract Dynamic Programming

Regular Policies in Abstract Dynamic Programming August 2016 (Revised January 2017) Report LIDS-P-3173 Regular Policies in Abstract Dynamic Programming Dimitri P. Bertsekas Abstract We consider challenging dynamic programming models where the associated

More information

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Optimistic Policy Iteration and Q-learning in Dynamic Programming

Optimistic Policy Iteration and Q-learning in Dynamic Programming Optimistic Policy Iteration and Q-learning in Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology November 2010 INFORMS, Austin,

More information

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE Review of Q-factors and Bellman equations for Q-factors VI and PI for Q-factors Q-learning - Combination of VI and sampling Q-learning and cost function

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

where u is the decision-maker s payoff function over her actions and S is the set of her feasible actions.

where u is the decision-maker s payoff function over her actions and S is the set of her feasible actions. Seminars on Mathematics for Economics and Finance Topic 3: Optimization - interior optima 1 Session: 11-12 Aug 2015 (Thu/Fri) 10:00am 1:00pm I. Optimization: introduction Decision-makers (e.g. consumers,

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

Reaching a Consensus in a Dynamically Changing Environment A Graphical Approach

Reaching a Consensus in a Dynamically Changing Environment A Graphical Approach Reaching a Consensus in a Dynamically Changing Environment A Graphical Approach M. Cao Yale Univesity A. S. Morse Yale University B. D. O. Anderson Australia National University and National ICT Australia

More information

Optimal Control of an Inventory System with Joint Production and Pricing Decisions

Optimal Control of an Inventory System with Joint Production and Pricing Decisions Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer

More information

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b. Solutions Chapter 5 SECTION 5.1 5.1.4 www Throughout this exercise we will use the fact that strong duality holds for convex quadratic problems with linear constraints (cf. Section 3.4). The problem of

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

Networked Sensing, Estimation and Control Systems

Networked Sensing, Estimation and Control Systems Networked Sensing, Estimation and Control Systems Vijay Gupta University of Notre Dame Richard M. Murray California Institute of echnology Ling Shi Hong Kong University of Science and echnology Bruno Sinopoli

More information

The common-line problem in congested transit networks

The common-line problem in congested transit networks The common-line problem in congested transit networks R. Cominetti, J. Correa Abstract We analyze a general (Wardrop) equilibrium model for the common-line problem in transit networks under congestion

More information

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 4 Infinite Horizon Reinforcement Learning DRAFT This is Chapter 4 of the draft textbook

More information

Advanced Linear Programming: The Exercises

Advanced Linear Programming: The Exercises Advanced Linear Programming: The Exercises The answers are sometimes not written out completely. 1.5 a) min c T x + d T y Ax + By b y = x (1) First reformulation, using z smallest number satisfying x z

More information

2 Sequences, Continuity, and Limits

2 Sequences, Continuity, and Limits 2 Sequences, Continuity, and Limits In this chapter, we introduce the fundamental notions of continuity and limit of a real-valued function of two variables. As in ACICARA, the definitions as well as proofs

More information

1 Basics of probability theory

1 Basics of probability theory Examples of Stochastic Optimization Problems In this chapter, we will give examples of three types of stochastic optimization problems, that is, optimal stopping, total expected (discounted) cost problem,

More information

Advanced Calculus: MATH 410 Real Numbers Professor David Levermore 5 December 2010

Advanced Calculus: MATH 410 Real Numbers Professor David Levermore 5 December 2010 Advanced Calculus: MATH 410 Real Numbers Professor David Levermore 5 December 2010 1. Real Number System 1.1. Introduction. Numbers are at the heart of mathematics. By now you must be fairly familiar with

More information

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control RTyrrell Rockafellar and Peter R Wolenski Abstract This paper describes some recent results in Hamilton- Jacobi theory

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights

More information

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 4 Infinite Horizon Reinforcement Learning DRAFT This is Chapter 4 of the draft textbook

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Vincent D. Blondel, Julien M. Hendricx and John N. Tsitsilis July 24, 2009 Abstract

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

f(x) f(z) c x z > 0 1

f(x) f(z) c x z > 0 1 INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a

More information

Optimal control and estimation

Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011

More information

Convex Analysis and Optimization Chapter 4 Solutions

Convex Analysis and Optimization Chapter 4 Solutions Convex Analysis and Optimization Chapter 4 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

MATH4406 Assignment 5

MATH4406 Assignment 5 MATH4406 Assignment 5 Patrick Laub (ID: 42051392) October 7, 2014 1 The machine replacement model 1.1 Real-world motivation Consider the machine to be the entire world. Over time the creator has running

More information

Reaching a Consensus in a Dynamically Changing Environment - Convergence Rates, Measurement Delays and Asynchronous Events

Reaching a Consensus in a Dynamically Changing Environment - Convergence Rates, Measurement Delays and Asynchronous Events Reaching a Consensus in a Dynamically Changing Environment - Convergence Rates, Measurement Delays and Asynchronous Events M. Cao Yale Univesity A. S. Morse Yale University B. D. O. Anderson Australia

More information

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) =

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

2016 EF Exam Texas A&M High School Students Contest Solutions October 22, 2016

2016 EF Exam Texas A&M High School Students Contest Solutions October 22, 2016 6 EF Exam Texas A&M High School Students Contest Solutions October, 6. Assume that p and q are real numbers such that the polynomial x + is divisible by x + px + q. Find q. p Answer Solution (without knowledge

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

Linear Programming in Matrix Form

Linear Programming in Matrix Form Linear Programming in Matrix Form Appendix B We first introduce matrix concepts in linear programming by developing a variation of the simplex method called the revised simplex method. This algorithm,

More information

6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE Rollout algorithms Policy improvement property Discrete deterministic problems Approximations of rollout algorithms Model Predictive Control (MPC) Discretization

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

MORE ON CONTINUOUS FUNCTIONS AND SETS

MORE ON CONTINUOUS FUNCTIONS AND SETS Chapter 6 MORE ON CONTINUOUS FUNCTIONS AND SETS This chapter can be considered enrichment material containing also several more advanced topics and may be skipped in its entirety. You can proceed directly

More information

Final Exam January 26th, Dynamic Programming & Optimal Control ( ) Solutions. Number of Problems: 4. No calculators allowed.

Final Exam January 26th, Dynamic Programming & Optimal Control ( ) Solutions. Number of Problems: 4. No calculators allowed. Final Exam January 26th, 2017 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D Andrea Solutions Exam Duration: 150 minutes Number of Problems: 4 Permitted aids: One A4 sheet of paper. No

More information

MATH 115 SECOND MIDTERM EXAM

MATH 115 SECOND MIDTERM EXAM MATH 115 SECOND MIDTERM EXAM November 22, 2005 NAME: SOLUTION KEY INSTRUCTOR: SECTION NO: 1. Do not open this exam until you are told to begin. 2. This exam has 10 pages including this cover. There are

More information

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Tim Leung 1, Qingshuo Song 2, and Jie Yang 3 1 Columbia University, New York, USA; leung@ieor.columbia.edu 2 City

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017. Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 017 Nadia S. Larsen 17 November 017. 1. Construction of the product measure The purpose of these notes is to prove the main

More information

Date: July 5, Contents

Date: July 5, Contents 2 Lagrange Multipliers Date: July 5, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 14 2.3. Informative Lagrange Multipliers...........

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

RECOVERY OF NON-LINEAR CONDUCTIVITIES FOR CIRCULAR PLANAR GRAPHS

RECOVERY OF NON-LINEAR CONDUCTIVITIES FOR CIRCULAR PLANAR GRAPHS RECOVERY OF NON-LINEAR CONDUCTIVITIES FOR CIRCULAR PLANAR GRAPHS WILL JOHNSON Abstract. We consider the problem of recovering nonlinear conductances in a circular planar graph. If the graph is critical

More information

Shortest paths with negative lengths

Shortest paths with negative lengths Chapter 8 Shortest paths with negative lengths In this chapter we give a linear-space, nearly linear-time algorithm that, given a directed planar graph G with real positive and negative lengths, but no

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Math212a1413 The Lebesgue integral.

Math212a1413 The Lebesgue integral. Math212a1413 The Lebesgue integral. October 28, 2014 Simple functions. In what follows, (X, F, m) is a space with a σ-field of sets, and m a measure on F. The purpose of today s lecture is to develop the

More information

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations D. R. Wilkins Academic Year 1996-7 1 Number Systems and Matrix Algebra Integers The whole numbers 0, ±1, ±2, ±3, ±4,...

More information

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games Alberto Bressan ) and Khai T. Nguyen ) *) Department of Mathematics, Penn State University **) Department of Mathematics,

More information

Week 9-10: Recurrence Relations and Generating Functions

Week 9-10: Recurrence Relations and Generating Functions Week 9-10: Recurrence Relations and Generating Functions April 3, 2017 1 Some number sequences An infinite sequence (or just a sequence for short is an ordered array a 0, a 1, a 2,..., a n,... of countably

More information

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities

6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 6.207/14.15: Networks Lectures 4, 5 & 6: Linear Dynamics, Markov Chains, Centralities 1 Outline Outline Dynamical systems. Linear and Non-linear. Convergence. Linear algebra and Lyapunov functions. Markov

More information

Sydney University Mathematical Society Problems Competition Solutions.

Sydney University Mathematical Society Problems Competition Solutions. Sydney University Mathematical Society Problems Competition 005 Solutions 1 Suppose that we look at the set X n of strings of 0 s and 1 s of length n Given a string ɛ = (ɛ 1,, ɛ n ) X n, we are allowed

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Some Background Math Notes on Limsups, Sets, and Convexity

Some Background Math Notes on Limsups, Sets, and Convexity EE599 STOCHASTIC NETWORK OPTIMIZATION, MICHAEL J. NEELY, FALL 2008 1 Some Background Math Notes on Limsups, Sets, and Convexity I. LIMITS Let f(t) be a real valued function of time. Suppose f(t) converges

More information

Chapter 3 Continuous Functions

Chapter 3 Continuous Functions Continuity is a very important concept in analysis. The tool that we shall use to study continuity will be sequences. There are important results concerning the subsets of the real numbers and the continuity

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall. .1 Limits of Sequences. CHAPTER.1.0. a) True. If converges, then there is an M > 0 such that M. Choose by Archimedes an N N such that N > M/ε. Then n N implies /n M/n M/N < ε. b) False. = n does not converge,

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

The Multi-Agent Rendezvous Problem - The Asynchronous Case

The Multi-Agent Rendezvous Problem - The Asynchronous Case 43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas WeB03.3 The Multi-Agent Rendezvous Problem - The Asynchronous Case J. Lin and A.S. Morse Yale University

More information

Lecture 2 and 3: Controllability of DT-LTI systems

Lecture 2 and 3: Controllability of DT-LTI systems 1 Lecture 2 and 3: Controllability of DT-LTI systems Spring 2013 - EE 194, Advanced Control (Prof Khan) January 23 (Wed) and 28 (Mon), 2013 I LTI SYSTEMS Recall that continuous-time LTI systems can be

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

Designing Optimal Pre-Announced Markdowns in the Presence of Rational Customers with Multi-unit Demands - Online Appendix

Designing Optimal Pre-Announced Markdowns in the Presence of Rational Customers with Multi-unit Demands - Online Appendix 087/msom070057 Designing Optimal Pre-Announced Markdowns in the Presence of Rational Customers with Multi-unit Demands - Online Appendix Wedad Elmaghraby Altan Gülcü Pınar Keskinocak RH mith chool of Business,

More information

Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Abstract. 2. We construct several transcendental numbers.

Abstract. 2. We construct several transcendental numbers. Abstract. We prove Liouville s Theorem for the order of approximation by rationals of real algebraic numbers. 2. We construct several transcendental numbers. 3. We define Poissonian Behaviour, and study

More information

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES SANTOSH N. KABADI AND ABRAHAM P. PUNNEN Abstract. Polynomially testable characterization of cost matrices associated

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include PUTNAM TRAINING POLYNOMIALS (Last updated: December 11, 2017) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

SHORT INTRODUCTION TO DYNAMIC PROGRAMMING. 1. Example

SHORT INTRODUCTION TO DYNAMIC PROGRAMMING. 1. Example SHORT INTRODUCTION TO DYNAMIC PROGRAMMING. Example We consider different stages (discrete time events) given by k = 0,..., N. Let x k be the amount of money owned by a consumer at the stage k. At each

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

REAL ANALYSIS II: PROBLEM SET 2

REAL ANALYSIS II: PROBLEM SET 2 REAL ANALYSIS II: PROBLEM SET 2 21st Feb, 2016 Exercise 1. State and prove the Inverse Function Theorem. Theorem Inverse Function Theorem). Let f be a continuous one to one function defined on an interval,

More information