Solutions. Dynamic Programming & Optimal Control ( ) Number of Problems: 4. Use only the provided sheets for your solutions.

Similar documents
Tree/Graph Search. q IDDFS-? B C. Search CSL452 - ARTIFICIAL INTELLIGENCE 2

AI Programming CS S-06 Heuristic Search

Local search algorithms. Chapter 4, Sections 3 4 1

Deterministic Planning and State-space Search. Erez Karpas

Problem solving and search. Chapter 3. Chapter 3 1

Game Engineering CS F-12 Artificial Intelligence

Final Exam January 26th, Dynamic Programming & Optimal Control ( ) Solutions. Number of Problems: 4. No calculators allowed.

CS:4420 Artificial Intelligence

22c:145 Artificial Intelligence

State Space Representa,ons and Search Algorithms

Search' How"do"we"plan"our"holiday?" How"do"we"plan"our"holiday?" Many"problems"can"be"solved"by"search:"

Data Structures in Java

Solving Problems By Searching

Single-part-type, multiple stage systems

Artificial Intelligence Methods. Problem solving by searching

16.410/413 Principles of Autonomy and Decision Making

Artificial Intelligence (Künstliche Intelligenz 1) Part II: Problem Solving

Today s class. Constrained optimization Linear programming. Prof. Jinbo Bi CSE, UConn. Numerical Methods, Fall 2011 Lecture 12

1 [15 points] Search Strategies

Simulation. Stochastic scheduling example: Can we get the work done in time?

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

Math 216 First Midterm 19 October, 2017

Final Exam December 12, 2017

Efficient Algorithms and Intractable Problems Spring 2016 Alessandro Chiesa and Umesh Vazirani Midterm 2

Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 20 Travelling Salesman Problem

CSI 4105 MIDTERM SOLUTION

CS 7180: Behavioral Modeling and Decisionmaking

Quiz 1 EE 549 Wednesday, Feb. 27, 2008

Approximation Algorithms for Re-optimization

Final Exam December 12, 2017

Artificial Intelligence

(tree searching technique) (Boolean formulas) satisfying assignment: (X 1, X 2 )

Discrete Optimization 2010 Lecture 8 Lagrangian Relaxation / P, N P and co-n P

7. Shortest Path Problems and Deterministic Finite State Systems

Technische Universität München, Zentrum Mathematik Lehrstuhl für Angewandte Geometrie und Diskrete Mathematik. Combinatorial Optimization (MA 4502)

University of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2006

5. Solving the Bellman Equation

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

Problem-Solving via Search Lecture 3

Math 331 Homework Assignment Chapter 7 Page 1 of 9

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

CSC242: Artificial Intelligence. Lecture 4 Local Search

Distributed Optimization. Song Chong EE, KAIST

Deterministic Dynamic Programming

Unit 1A: Computational Complexity

Lecture 3. 1 Polynomial-time algorithms for the maximum flow problem

Combinatorial optimization problems

CS325: Analysis of Algorithms, Fall Final Exam

CMU Lecture 4: Informed Search. Teacher: Gianni A. Di Caro

Algorithm Design Strategies V

Advanced Mechatronics Engineering

7. Introduction to Numerical Dynamic Programming AGEC Summer 2018

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

Methods for finding optimal configurations

Stochastic Shortest Path Problems

Intractable Problems Part Two

Modelling data networks stochastic processes and Markov chains

Midterm Exam 1 Solution

Math 164 (Lec 1): Optimization Instructor: Alpár R. Mészáros

Homework 2: MDPs and Search

6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE

7. Introduction to Numerical Dynamic Programming AGEC Summer 2012

The Markov Decision Process (MDP) model

Chapter 7 Network Flow Problems, I

CS 6783 (Applied Algorithms) Lecture 3

Quiz. Good luck! Signals & Systems ( ) Number of Problems: 19. Number of Points: 19. Permitted aids:

Bayesian Networks 2:

Coin Changing: Give change using the least number of coins. Greedy Method (Chapter 10.1) Attempt to construct an optimal solution in stages.

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

CSC242: Intro to AI. Lecture 5. Tuesday, February 26, 13

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

University of Toronto Department of Electrical and Computer Engineering. Final Examination. ECE 345 Algorithms and Data Structures Fall 2016

Math 232, Final Test, 20 March 2007

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

CS 6901 (Applied Algorithms) Lecture 3

As Soon As Probable. O. Maler, J.-F. Kempf, M. Bozga. March 15, VERIMAG Grenoble, France

CS 234 Midterm - Winter

Space Complexity. then the space required by M on input x is. w i u i. F ) on. September 27, i=1

minimize x subject to (x 2)(x 4) u,

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.

EN Applied Optimal Control Lecture 8: Dynamic Programming October 10, 2018

Machine Learning, Fall 2009: Midterm

Point Process Control

Math Models of OR: Branch-and-Bound

R = J/mole K = cal/mole K = N A k B ln k = ( E a /RT) + ln A

Examiner: D. Burbulla. Aids permitted: Formula Sheet, and Casio FX-991 or Sharp EL-520 calculator.

Methods for finding optimal configurations

The quest for finding Hamiltonian cycles

HW8. Due: November 1, 2018

Algorithms 2/6/2018. Algorithms. Enough Mathematical Appetizers! Algorithm Examples. Algorithms. Algorithm Examples. Algorithm Examples

Maximum flow problem CE 377K. February 26, 2015

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Notes on Dantzig-Wolfe decomposition and column generation

CS 374: Algorithms & Models of Computation, Spring 2017 Greedy Algorithms Lecture 19 April 4, 2017 Chandra Chekuri (UIUC) CS374 1 Spring / 1

The max flow problem. Ford-Fulkerson method. A cut. Lemma Corollary Max Flow Min Cut Theorem. Max Flow Min Cut Theorem

Project Two. Outline. James K. Peterson. March 27, Cooling Models. Estimating the Cooling Rate k. Typical Cooling Project Matlab Session

Analysis of Algorithms. Unit 5 - Intractable Problems

Transcription:

Final Exam January 26th, 2012 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D Andrea Solutions Exam Duration: 150 minutes Number of Problems: 4 Permitted aids: One A4 sheet of paper. Use only the provided sheets for your solutions.

Page 2 Final Exam Dynamic Programming & Optimal Control Problem 1 25% Consider the following simplified map of Romania that is represented as a directed graph 1. Arad Timisoara Zerind 100 125 75 Lugoj 75 100 Mehadia Oradea 150 75 150 Sibiu Drobeta 50 Craiova 100 Rimniku Vileca 50 125 30 450 100 Pitesti Figure 1: Simplified map of Romania Fagaras 100 125 Bucharest City Straight-line dist. Arad 350 Bucharest 0 Craiova 100 Dobreta 150 Fagaras 100 Lugoj 200 Mehadia 175 Oradea 400 Pitesti 50 Rim. Vil. 150 Sibiu 200 Timisoara 325 Zerind 350 Figure 2: Straight-line distances to Bucharest Find the shortest path from Arad to Bucharest for the graph given in Figure 1 by applying the A*-Algorithm. Use the straight-line distances from Figure 2 as heuristics and the best-first method to determine, at each iteration, which node to remove from the OPEN bin; that is, remove node i with d i = min j in OPEN d j where the variable d i denotes the length of the shortest path from Arad to node i that has been found so far. Solve the problem by populating a table of the following form 2 : Iteration Node exiting OPEN OPEN d Arad d Zerind d Oradea d T imisoara... d Bucharest 0 -... 1 Arad...... State the resulting shortest path and its associated cost. 1 Though not a realistic model, a directed graph was chosen to reduce the computational burden. 2 Please use the paper in landscape orientation for the table.

Final Exam Dynamic Programming & Optimal Control Page 3 Solution 1 Iter. Node exiting OPEN OPEN darad dzer. dorad. dt imi. dlugoj dmeh. ddrob. dcra. dsibiu drim.v il. df ag. dp it. dbuch. 0 - Arad 0 1 Arad Zerind, Timis., 0 75 100 150 450 Sibiu 2 Zerind Timis., Sibiu 0 75 100 150 450 3 Timis. Sibiu, Lugoj 0 75 100 225 150 450 4 Sibiu Lugoj, Fagaras, 0 75 100 225 150 200 275 450 Rim.Vil. 5 Rim.Vil. Lugoj, Fagaras, Craiova, Pitesti 6 Lugoj Fagaras, Craiova, Pitesti 0 75 100 225 250 150 200 275 300 450 0 75 100 225 250 150 200 275 300 450 7 Craiova Fagaras, Pitesti 0 75 100 225 250 150 200 275 280 450 8 Fagaras Pitesti 0 75 100 225 250 150 200 275 280 400 9 Pitesti - 0 75 100 225 250 150 200 275 280 380 Optimal path: Arad Sibiu Rim.Vil. Craiova Pitesti Bucharest Cost: 380 Heuristics prevent a city from entering OPEN.

Page 4 Final Exam Dynamic Programming & Optimal Control Problem 2 25% Consider the dynamic system ẋ(t) = x(t) + u(t) with the initial state x(0) = x 0 and t [0, T ], T R +. Use the Minimum Principle to find the optimal input u (t) that minimizes the following cost function 1 2 x2 (T ) + 1 T u 2 (t)dt 2 0 and the corresponding optimal trajectory x (t).

Final Exam Dynamic Programming & Optimal Control Page 5 Solution 2 Applying the Minimum Principle: The system equation is The Hamiltonian is given by ẋ(t) = x(t) + u(t). (1) H(x(t), u(t), p(t)) = g(x(t), u(t)) + p(t)f(x(t), u(t)) The adjoint equation can be calculated as follows with the boundary condition Solving this differential equation leads to = 1 2 u2 (t) + p(t)x(t) + p(t)u(t). ṗ(t) = H (x(t), u(t), p(t)) = p(t), x p(t ) = h(x(t )) = x(t ). p(t) = ζe t, ζ = x(t )e T (2) The optimal input u (t) is obtained by minimizing the Hamiltonian along the optimal trajectory Now, (1), (2), and (3) give u (t) = argmin( 1 u 2 u2 (t) + p(t)x(t) + p(t)u(t)) u (t) = p(t). (3) ẋ(t) = x(t) ζe t, x(0) = x 0. The general solution of the above inhomogeneous ordinary differential equation is the sum of the homogeneous solution x h (t) = λe t and a particular solution x p (t) = 1 2 ζe t giving x(t) = x h (t) + x p (t) = λe t + 1 2 ζe t. The above general solution with the initial condition x(0) = x 0, and ζ = x(t )e T gives λ = x 0 1 + e 2T and ζ = 2x 0e 2T 1 + e 2T. This implies and u (t) = 2x 0e 2T e t 1 + e2t x (t) = x 0 1 + e 2T et + x 0e 2T 1 + e 2T e t.

Page 6 Final Exam Dynamic Programming & Optimal Control Problem 3 25% Consider the stochastic shortest path problem shown in Figure 3 p 12 (u) p 11 (u) 1 p 2 21 (u) p 22 (u) p 10 (u) 0 p 20 (u) p 00 (u) Figure 3: Transition graph of the stochastic shortest path problem with the control sets U(0) = {a 0 } U(1) = {a 1, b 1, c 1 } U(2) = {a 2, b 2 }, the transition probabilities p 10 (a 1 ) = 1/3 p 20 (a 2 ) = 0 p 00 (a 0 ) = 1 p 11 (a 1 ) = 1/3 p 21 (a 2 ) = 2/3 p 12 (a 1 ) = 1/3 p 22 (a 2 ) = 1/3 p 10 (b 1 ) = 1/3 p 20 (b 2 ) = 1/3 p 11 (b 1 ) = 2/3 p 21 (b 2 ) = 1/3 p 12 (b 1 ) = 0 p 22 (b 2 ) = 1/3 p 10 (c 1 ) = 0 p 11 (c 1 ) = 2/3 p 12 (c 1 ) = 1/3 and the cost function g(i, µ(i)) = 1 g(0, a 0 ) = 0. i = 1, 2 and µ(i) U(i) a) Using policy iteration, perform 2 iterations for the given problem. Start with evaluating the initial policies µ 0 (1) = a 1 and µ 0 (2) = a 2. b) Has the policy iteration converged after 2 iterations? If so, please explain why. If it has not converged, please explain the criterion that it would need to fulfill for convergence. Also, if it has not converged, can you comment on the possible outcome of iteration 3?

Final Exam Dynamic Programming & Optimal Control Page 7 Solution 3 a) From the given information we can see that node 0 is the cost free terminal node. Therefore we do not need to consider it. Iteration 1: Policy evaluation: Policy improvement: J 1 (1) = 1 + p 11 (a 1 )J 1 (1) + p 12 (a 1 )J 1 (2) = 1 + 1 3 J 1 (1) + 1 3 J 1 (2) J 1 (1) = 3 2 + 1 2 J 1 (2) (4) J 1 (2) = 1 + p 21 (a 2 )J 1 (1) + p 22 (a 2 )J 1 (2) = 1 + 2 3 J 1 (1) + 1 3 J 1 (2) using (4) = 1 + 2 3 = 2 + 2 3 J 1 (2) ( 3 2 + 1 2 J 1 (2) J 1 (2) = 6, J 1 (1) = 9/2 µ 1 [ (1) = argmin 1 + p11 (a 1 )J 1 (1) + p 12 (a 1 )J 1 (2), u U(1) µ 1 (1) = b 1 1 + p 11 (b 1 )J 1 (1) + p 12 (b 1 )J 1 (2), 1 + p 11 (c 1 )J 1 (1) + p 12 (c 1 )J 1 (2) ] ) + 1 3 J 1 (2) = argmin [1 + 1/3 9/2 + 1/3 6, 1 + 2/3 9/2 + 0 6, 1 + 2/3 9/2 + 1/3 6] u U(1) = argmin [9/2, 4, 6] u U(1) µ 1 (2) = argmin u U(2) µ 1 (2) = b 2 [ 1 + p21 (a 2 )J 1 (1) + p 22 (a 2 )J 1 (2), 1 + p 21 (b 2 )J 1 (1) + p 22 (b 2 )J 1 (2) ] = argmin [1 + 2/3 9/2 + 1/3 6, 1 + 1/3 9/2 + 1/3 6] u U(2) = argmin [6, 9/2] u U(2)

Page 8 Final Exam Dynamic Programming & Optimal Control Iteration 2: Policy evaluation: J 2 (1) = 1 + 2 3 J 2 (1) + 0 J 2 (2) J 2 (1) = 3 J 2 (2) = 1 + 1 3 J 2 (1) + 1 3 J 2 (2) = 2 + 1 3 J 2 (2) J 2 (2) = 3 Policy improvement: µ 2 (1) = argmin [1 + 1/3 3 + 1/3 3, 1 + 2/3 3, 1 + 2/3 3 + 1/3 3] u U(1) = argmin [3, 3, 4] u U(1) µ 2 (1) = a 1 or b 1 µ 2 (2) = argmin [1 + 2/3 3 + 1/3 3, 1 + 1/3 3 + 1/3 3] u U(2) µ 2 (2) = b 2 = argmin [4, 3] u U(2) b) Policy iteration has converged, if J k (i) = J k 1 (i) holds for all nodes i at iteration k. This is clearly not the case for k = 2. Therefore, policy iteration has not converged yet. Even though policy iteration has not converged yet, we can still comment on the policy that the iteration will converge to. Let s first pick µ 2 (1) = b 1 for iteration 3. In this case we apply the same combination of inputs as in iteration 2 and the transition probabilities in the policy evaluation step of iteration 3 will be the same. Therefore it will yield the same costs as in iteration 2, i.e., J 2 (i) = J 3 (i) for all nodes i. Hence, policy iteration has converged to the policy µ(1) = b 1 and µ(2) = b 2. Now, if we pick µ 2 (1) = a 1 for iteration 3 the only thing that we can tell about the convergence is that this choice will also eventually lead to convergence. But we cannot say after how many iterations or to which policy. Additional information: In fact, if you do iteration 3 with µ 2 (1) = a 1 you would also see that J 2 (i) = J 3 (i) holds for all nodes i. Therefore, both choices for µ 2 (1) are feasible outcomes of the policy iteration.

Final Exam Dynamic Programming & Optimal Control Page 9 Problem 4 25% A burglar broke into a house and found N Z + items 3. Let v i > 0 denote the value and w i Z + the weight of the i th item. There is a limit, W Z +, on the total weight the burglar can carry (the total weight of all stolen items must be less than or equal to W ) and obviously he wants to maximize the total value of the items that he can take. a) Case A: A burglar who did not take the Dynamic Programming and Optimal Control class How many different combinations of items does the burglar have to consider to find the combination with the highest value that he can take? b) Case B: A burglar who did take the Dynamic Programming and Optimal Control class b.1) Formulate the burglar s problem by defining the state space, control space, system dynamics, stage cost and terminal cost. b.2) State the Dynamic Programming Algorithm that is required to solve the burglar s problem. b.3) Using the Dynamic Programming Algorithm from b.2, solve the burglar s problem where N = 3, W = 5, v 1 = 6, v 2 = 10, v 3 = 12, w 1 = 1, w 2 = 2 and w 3 = 3. c) Briefly compare Case A and Case B in terms of computational cost. 3 Z + denotes the set of positive integers

Page 10 Final Exam Dynamic Programming & Optimal Control Solution 4 a) 2 N (Cardinality of the power set) b) b.1) Let x k, k > 1, be the total weight of the items the burglar has decided to take from the set of {1,.., k 1} items and x 1 = 0. Furthermore, let u k {0, 1} be the binary decision variable that controls if item k goes into the burglar s bag or not. State space S k [0, W ] {0, Z + }, k = 1,..., N + 1. Control space U k : if x k + w k u k > W, U k = {0} else, U k = {0, 1}, k = 1,..., N. System dynamics: x k+1 = x k + w k u k, x 1 = 0, k = 1,..., N. Stage cost: g k (x k, u k ) = v k u k, k = 1,..., N. Terminal cost: g N+1 (x N+1 ) = 0, x N+1 S N+1. b.2) Now applying the Dynamic Programming Algorithm (DPA) gives J N+1 (x N+1 ) = g N+1 (x N+1 ) = 0, x N+1 S N+1 J k (x k ) = min{ v k u k + J k+1 (x k+1 )}, k = 1,..., N. u k b.3) Applying the above DPA for N = 3, W = 5, v 1 = 6, v 2 = 10, v 3 = 12, w 1 = 1, w 2 = 2 and w 3 = 3, gives: J 4 (x 4 ) = 0, x 4 {0, 1, 2, 3, 4, 5} k = 3 k = 2 k = 1 J 3 (0) = min{ 12 0 + J 4 (0 + 0 3), 12 1 + J 4 (0 + 1 3)} = 12, µ 3 (0) = 1 J 3 (1) = min{ 12 0 + J 4 (1 + 0 3), 12 1 + J 4 (1 + 1 3)} = 12, µ 3 (1) = 1 J 3 (2) = min{ 12 0 + J 4 (2 + 0 3), 12 1 + J 4 (2 + 1 3)} = 12, µ 3 (2) = 1 J 3 (3) = min{ 12 0 + J 4 (3 + 0 3)} = 0, µ 3 (3) = 0 J 3 (4) = min{ 12 0 + J 4 (4 + 0 3)} = 0, µ 3 (4) = 0 J 3 (5) = min{ 12 0 + J 4 (5 + 0 3)} = 0, µ 3 (5) = 0 J 2 (0) = min{ 10 0 + J 3 (0 + 0 2), 10 1 + J 3 (0 + 1 2)} = 22, µ 2 (0) = 1 J 2 (1) = min{ 10 0 + J 3 (1 + 0 2), 10 1 + J 3 (1 + 1 2)} = 12, µ 2 (1) = 0 J 2 (2) = min{ 10 0 + J 3 (2 + 0 2), 10 1 + J 3 (2 + 1 2)} = 12, µ 2 (2) = 0 J 2 (3) = min{ 10 0 + J 3 (3 + 0 2) 10 1 + J 3 (3 + 1 2)} = 10, µ 2 (3) = 1 J 2 (4) = min{ 10 0 + J 3 (4 + 0 2)} = 0, µ 2 (4) = 0 J 2 (5) = min{ 10 0 + J 3 (5 + 0 2)} = 0, µ 2 (5) = 0 J 1 (0) = min{ 6 0 + J 2 (0 + 0 1), 6 1 + J 2 (0 + 1 1)} = 22, µ 1 (0) = 0. Therefore, optimal solution is to leave the first item and take the second and third items. c) Case A has a time complexity O(2 N ) which is exponential in N and Case B has a time complexity O(W N) which is linear in N. Therefore, for large N Case B (DPA) is computationally less expensive than Case A (Brute force approach).