informs DOI /moor.xxxx.xxxx c 200x INFORMS

Similar documents
The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount

A Generalized Homogeneous and Self-Dual Algorithm. for Linear Programming. February 1994 (revised December 1994)

A FULL-NEWTON STEP INFEASIBLE-INTERIOR-POINT ALGORITHM COMPLEMENTARITY PROBLEMS

Uniform Boundedness of a Preconditioned Normal Matrix Used in Interior Point Methods

A Strongly Polynomial Simplex Method for Totally Unimodular LP

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Semidefinite Programming

A Note on Exchange Market Equilibria with Leontief s Utility: Freedom of Pricing Leads to Rationality

A Full Newton Step Infeasible Interior Point Algorithm for Linear Optimization

APPROXIMATING THE COMPLEXITY MEASURE OF. Levent Tuncel. November 10, C&O Research Report: 98{51. Abstract

A Simpler and Tighter Redundant Klee-Minty Construction

Interior Point Methods for Mathematical Programming

On well definedness of the Central Path

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

A Second Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization

More First-Order Optimization Algorithms

On Mehrotra-Type Predictor-Corrector Algorithms

An O(nL) Infeasible-Interior-Point Algorithm for Linear Programming arxiv: v2 [math.oc] 29 Jun 2015

Exchange Market Equilibria with Leontief s Utility: Freedom of Pricing Leads to Rationality

Conic Linear Optimization and its Dual. yyye

Optimization: Then and Now

A new primal-dual path-following method for convex quadratic programming

Enlarging neighborhoods of interior-point algorithms for linear programming via least values of proximity measure functions

Lecture 10. Primal-Dual Interior Point Method for LP

Interior Point Methods in Mathematical Programming

Interior-Point Methods

A full-newton step feasible interior-point algorithm for P (κ)-lcp based on a new search direction

A Redundant Klee-Minty Construction with All the Redundant Constraints Touching the Feasible Region

A NEW PROXIMITY FUNCTION GENERATING THE BEST KNOWN ITERATION BOUNDS FOR BOTH LARGE-UPDATE AND SMALL-UPDATE INTERIOR-POINT METHODS

Lecture: Algorithms for LP, SOCP and SDP

Interior Point Methods for Linear Programming: Motivation & Theory

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Polynomiality of Linear Programming

A primal-simplex based Tardos algorithm

Improved Full-Newton Step O(nL) Infeasible Interior-Point Method for Linear Optimization

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE

10 Numerical methods for constrained problems

A PRIMAL-DUAL INTERIOR POINT ALGORITHM FOR CONVEX QUADRATIC PROGRAMS. 1. Introduction Consider the quadratic program (PQ) in standard format:

Chapter 6 Interior-Point Approach to Linear Programming

A path following interior-point algorithm for semidefinite optimization problem based on new kernel function. djeffal

Linear Programming: Simplex

A New Class of Polynomial Primal-Dual Methods for Linear and Semidefinite Optimization

Interior Point Methods for Nonlinear Optimization

Properties of a Simple Variant of Klee-Minty s LP and Their Proof

Semidefinite Programming. Yinyu Ye

CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming

Following The Central Trajectory Using The Monomial Method Rather Than Newton's Method

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

An Infeasible Interior Point Method for the Monotone Linear Complementarity Problem

A tight iteration-complexity upper bound for the MTY predictor-corrector algorithm via redundant Klee-Minty cubes

Conic Linear Programming. Yinyu Ye

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Introduction to optimization

1. Introduction and background. Consider the primal-dual linear programs (LPs)

Conic Linear Programming. Yinyu Ye

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

A variant of the Vavasis-Ye layered-step interior-point algorithm for linear programming

LOWER BOUNDS FOR THE MAXIMUM NUMBER OF SOLUTIONS GENERATED BY THE SIMPLEX METHOD

Interior-Point Methods for Linear Optimization

Local Self-concordance of Barrier Functions Based on Kernel-functions

COMPARATIVE STUDY BETWEEN LEMKE S METHOD AND THE INTERIOR POINT METHOD FOR THE MONOTONE LINEAR COMPLEMENTARY PROBLEM

15. Conic optimization

On the Number of Solutions Generated by the Simplex Method for LP

CCO Commun. Comb. Optim.

A Path to the Arrow-Debreu Competitive Market Equilibrium

A Bound for the Number of Different Basic Solutions Generated by the Simplex Method

On Superlinear Convergence of Infeasible Interior-Point Algorithms for Linearly Constrained Convex Programs *

Improved Full-Newton-Step Infeasible Interior- Point Method for Linear Complementarity Problems

A full-newton step infeasible interior-point algorithm for linear complementarity problems based on a kernel function

Operations Research Lecture 4: Linear Programming Interior Point Method

Improved Full-Newton-Step Infeasible Interior- Point Method for Linear Complementarity Problems

12. Interior-point methods

Mathematical Optimization Models and Applications

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

A WIDE NEIGHBORHOOD PRIMAL-DUAL INTERIOR-POINT ALGORITHM WITH ARC-SEARCH FOR LINEAR COMPLEMENTARITY PROBLEMS 1. INTRODUCTION

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

2 The SDP problem and preliminary discussion

Multicommodity Flows and Column Generation

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

A priori bounds on the condition numbers in interior-point methods

On the number of distinct solutions generated by the simplex method for LP

1 Review Session. 1.1 Lecture 2

The Simplex Algorithm

Curvature as a Complexity Bound in Interior-Point Methods

Chapter 1. Preliminaries

Week 2. The Simplex method was developed by Dantzig in the late 40-ties.

4.5 Simplex method. LP in standard form: min z = c T x s.t. Ax = b

LP. Kap. 17: Interior-point methods

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Interior-point algorithm for linear optimization based on a new trigonometric kernel function

Introduction and Math Preliminaries

CO 250 Final Exam Guide

Absolute value equations

Convergence Analysis of Inexact Infeasible Interior Point Method. for Linear Optimization

Transcription:

MATHEMATICS OF OPERATIONS RESEARCH Vol. xx, No. x, Xxxxxxx 2x, pp. xxx xxx ISSN 364-765X EISSN 526-547 x xxx xxx informs DOI.287/moor.xxxx.xxxx c 2x INFORMS A New Complexity Result on Solving the Markov Decision Problem Yinyu Ye Department of Management Science Engineering Terman 36, Stanford University, Stanford, CA 9435, USA email: yinyu-ye@stanford.edu http://www.stanford.edu/ yyye We present a new complexity result on solving the Markov decision problem (MDP) with n states a number of actions for each state, a special class of real-number linear programs with the Leontief matrix structure. We prove that, when the discount factor θ is strictly less than, the problem can be solved in at most O(n.5 (log +log n)) classical interior-point method iterations O(n 4 (log + log n)) arithmetic operations. Our method is a combinatorial interior-point method related to the work of Ye [3] Vavasis Ye [26]. To our knowledge, this is the first strongly polynomial-time algorithm for solving the MDP when the discount factor is a constant less than. Key words: Markov Decision Problem; Linear Programming; Complexity MSC2 Subject Classification: Primary: 9C4, 9C5; Secondary: 68Q25, 9C39 OR/MS subject classification: Primary: Linear Programming, Algorithm; Secondary: Dynamic Programming, Markov History: Received: November 4, 23; revised: October 3, 24 December 27, 24.. Introduction Complexity theory is arguably the foundational stone of computer algorithms. The goal of the theory is twofold: to develop criteria for measuring the effectiveness of various algorithms ( thus, to be able to compare algorithms using these criteria), to asses the inherent difficulty of various problems. The term complexity refers to the amount of resources required by a computation. We will focus on a particular resource namely, the computing time. In complexity theory, however, one is not interested on the execution time of a program implemented in a particular programming language, running on a particular computer over a particular input. There are too many contingent factors here. Instead, one would like to associate to an algorithm some more intrinsic measures of its time requirements. Roughly speaking, to do so one needs to fix: a notion of input size, a set of basic operations, a cost for each basic operation. The last two allow one to define the cost of a computation. The selection of a set of basic operations is generally easy. For the algorithms we will consider in this paper, the obvious choice is the set {+,,, /, } of the four arithmetic operations the comparison. Selecting a notion of input size a cost for the basic operations is more delicate depends on the kind of data dealt with by the algorithm. Some kinds can be represented within a fixed amount of computer memory, some others require a variable amount depending on the data at h. Examples of the first are fixed-precision floating-point numbers. Any such number is stored in a fixed amount of memory (usually 32 or 64 bits). For this kind of data the size of an element is usually taken to be consequently to have unit size. Examples of the second are integer numbers which require a number of bits approximately equal to the logarithm of their absolute value. This logarithm is usually referred to as the bit size of the integer. These ideas also apply for rational numbers. Similar considerations apply for the cost of arithmetic operations. The cost of operating two unit-size numbers is taken to be, as expected, is called unit cost. In the bit-size case, the cost of operating two numbers is the product of their bit-sizes (for multiplications divisions) or its maximum (for

2 Y. Ye: Solving the Markov Decision Problem additions, subtractions, comparisons). The consideration of integer or rational data with their associated bit size bit cost for the arithmetic operations is usually referred to as the Turing model of computation ( e.g. see [22]). The consideration of idealized reals with unit size unit cost is today referred as the BSS model of computation (from Blum, Shub Smale [2]), which is what we have used in this paper, with exact real arithmetic operations (i.e., ignoring round-off errors). A basic concept related to both the Turing the BSS models of computation is that of polynomial time. An algorithm A is said to be a polynomial time algorithm if the running time of all instances of the problem is bounded above by a polynomial in the size of the problem. A problem can be solved in polynomial time if there is a polynomial time algorithm solving the problem. The notion of polynomial time is usually taken as the formal counterpart of the more informal notion of efficiency. One not fully identify polynomial-time with efficiency since high degree polynomial bounds can hardly mean efficiency. Yet, many basic problems admitting polynomial-time algorithms can actually be efficiently solved. On the other h, the notion of polynomial-time is robust it is the ground upon which complexity theory is built. Linear programming (LP) has played a distinguished role in complexity theory. It has a primal-dual form: Primal: minimize c T x subject to Ax = b, x, () Dual: maximize b T y subject to s = c A T y, (2) where A IR m n is a given real matrix with rank m, c IR n b IR m are given real vectors, x IR n (y IR m, s IR n ) are unknown real vectors. Vector s is often called dual slack vector. We denote the LP problem as LP (A, b, c). In one sense LP is a continuous optimization problem since the goal is to minimize a linear objective function over a convex polyhedron. But it is also a combinatorial problem involving selecting an extreme point among a finite set of possible vertices. An optimal solution of a linear program always lies at a vertex of the feasible polyhedron. Unfortunately, the number of vertices associated with a set of n inequalities in m variables can be exponential in the dimensions in this case, up to n!/m!(n m)!. Except for small values of m n, this number is sufficiently large to prevent examining all possible vertices for searching an optimal vertex. The LP problem is polynomial solvable under the Turing model of computation, proved by Khachiyan [] also by Karmarkar [] many others. But the problem, whether there is a polynomial-time algorithm for LP under the BSS model of computation, remains open. It turns out that two instances of the LP problem with the same (unit) size may result in drastically different performances under an interior-point algorithm. This has lead during the past years to some research developments relating complexity of interior-point algorithms with certain condition measures for linear programming ([24, 29, 7, 9]). One particular example is the Vavasis-Ye algorithm, which interleaves small steps with longer layered least-squares (LLS) steps to follow the central path, see [26, 7, 2, 2]. The algorithm, which will be called layered-step interior point (LIP), terminates in a finite number of steps. Furthermore, the total number of iterations depends only on A: the running time is O(n 3.5 c(a)) iterations, where c(a) is the condition measure of the full-rank constraint matrix A defined in [26]: c(a) = O(log( χ A ) + log n), (3) where χ A = max{ (A B ) A : A B IR m m is a basic matrix of A} This is in contrast to other interior point methods, whose complexity depend on the vectors b c as well as on the matrix A. This is important because there are many classes of problems in which A is well-behaved but b c are arbitrary vectors., without subscript, indicates 2-norm through out this paper.

Y. Ye: Solving the Markov Decision Problem 3 2. The Markov Decision Problem In this paper, we present a new complexity result for solving the real-number Markov decision problem (MDP), a special class of real-number linear programs with the Leontief matrix structure due to de Ghellinck [5], D Epenoux [6] Manne [4] (see also the recent work by Van Roy [25]): minimize c T x... + c T j x j +... +c T n x n subject to (E θp )x... + (E j θp j )x j +... (E n θp n )x n = e, x,... x j,... x n,, where e is the vector of all ones, E j is the n k matrix whose jth row are all ones everywhere else are zeros, P j is an n k Markov or column stochastic matrix such that e T P j = e T P j, j =,..., n. Here, decision vector of x j IR k represents the decision variables associated with jth state s k actions c j is its cost vector corresponding to the k action variables. The optimal solution to the problem will select one optimal action from every state, which form an optimal feasible basis or policy. The dual of the problem is given by maximize e T y subject to (E θp ) T y c,......... (E j θp j ) T y c j,......... (E n θp n ) T y c n. If we sort the decision variables by actions, the MDP can be also written as: minimize (c ) T x... + (c i ) T x i +... +(c k ) T x k subject to (I θp )x... + (I θp i )x i +... +(I θp k )x k = e, x,... x i,... x k,. And its dual (by adding slack variables) is maximize e T y subject to (I θp ) T y + s = c,......... (I θp i ) T y + s i = c i,......... (I θp k ) T y + s k = c k, s,..., s i,..., s k. Here, x i IR n represents the decision variables of all states for action i, I is the n n identity matrix, P i, i =,..., k, is an n n Markov matrix (e T P i = e T P i ). Comparing to the LP stard form, we have A = [I θp,..., I θp k ] IR n nk, b = e IR n, c = (c ;... ; c k ) IR nk. In the MDP, θ is the so called discount factor such that θ = + r, where r is the interest rate it is assumed strictly positive in this writing so that θ <. (When θ =, problem (4) is infeasible y = e is a certificate for the infeasibility.) The problem is to find the best action for each state so that the total cost is minimized. There are four current effective methods for solving the MDP: the value iteration method, the policy iteration method, regular LP interior-point algorithms, the Vavasis-Ye algorithm. In terms of the worst-case complexity bound on the number of arithmetic operations, they (without a constant factor) are summarized in the following table for k = 2 (see Littman et al. [3], Mansour Singh [5] references therein). Value-Iteration Policy-Iteration LP-Algorithms Vavasis-Ye New Method n 2 c(p j, c j, θ) n 3 2n n n 3 c(p j, c j, θ) n 6 c (P j, θ) n 4 log (4) (5)

4 Y. Ye: Solving the Markov Decision Problem where c(p j, c j, θ) c (P j, θ) are conditional measures of data (P j, c j, θ), j =,..., k. When data (P j, c j, θ) are rational numbers, these condition measures are generally bounded by the total bit size of the input. In this paper we develop a combinatorial interior-point algorithm, when the discount factor θ is strictly less than, to solve the MDP problem in at most O(n.5 (log + log n)) interior-point method iterations total O(n 4 (log +log n)) arithmetic operations. Note that this bound is only dependent logarithmically on θ. If θ is rational, then log is bounded above by the bit length of θ. This is because, when θ = q/p where p q are positive integers with p q +, we have log = log(p) log(p q) log(p) bit length of θ. θ Our method is closely related to the work of Ye [3] Vavasis Ye [26]. To our knowledge, this is the first strongly polynomial-time algorithm for solving the MDP when the discount factor (interest rate) is a constant less than (greater than ). 3. LP Theorems Interior-Point Algorithms We first describe few general LP theorems a classical predictor-corrector interior-point algorithm. 3. Optimality condition central path The optimality conditions for all optimal solution pairs of LP (A, b, c) may be written as follows: Ax = b, A T y + s = c, SXe =, (6) x, s where X denotes diag(x) S denotes diag(s), e denotes the vector of all s all s, respectively. The third equation is often referred as the complementarity condition. If LP (A, b, c) has an optimal solution pair, then there exists a unique index set B {,..., n} N = {,..., n} \ B, such that every x, satisfying A B x B = b, x B, x N =, is an optimal solution for the primal; every (y, s), satisfying s B = c B A T B y =, s N = c N AT N y, is an optimal solution for the dual. This partition is called the strict complementarity partition, since a strictly complementary pair (x, y, s) exists, meaning A B x B = b, x B >, x N =, s B = c B A T B y =, s N = c N AT N y >. Here, for example, subvector x B contains all x i for i B {,..., n}. Consider the following equations: Ax = b, A T y + s = c, SXe = µe, (7) x >, s >. These equations always have a unique solution for any µ > provided that the primal dual problem both have interior feasible points (x >, s > ). The solution to these equations, written (x(µ), y(µ), s(µ)), is called the central path point for µ, the aggregate of all points, as µ ranges from to, is the central path of the LP problem, see [, 6]. Note that as µ +, central path equation (7) approaches optimality condition (6), (x(µ), y(µ), s(µ)) approaches an optimal solution. The following is a geometric property of the central path:

Y. Ye: Solving the Markov Decision Problem 5 Lemma 3. Let (x(µ), y(µ), s(µ)) (x(µ ), y(µ ), s(µ )) be two central path points such that µ < µ. Then for any i, s(µ ) i ns(µ) i x(µ ) i nx(µ) i. In particular, given any optimal (x, y, s ), we have, for any µ > any i, s i ns(µ) i x i nx(µ) i. Proof. The lemma was proved in [26]. We reprove the second part here to give the reader an insight view on the structure. Since x(µ) x N (A) s(µ) s R(A T ) we have (x(µ) x ) T (s(µ) s ) =. Since (x, y, s ) is optimal, we have (x ) T s =. These two equations imply (x ) T s(µ) + (s ) T x(µ) = x(µ) T s(µ) = nµ. Dividing µ on both sides noting x(µ) i s(µ) i = µ for each i =,..., n, we have n x n i s i + = n. x(µ) i s(µ) i i= Because each term in the summation is nonnegative, for any i =,..., n we have i= x i x(µ) i n s i s(µ) i n. 3.2 Predictor-corrector interior-point algorithm In the predictor-corrector path-following interior point method of Mizuno et al. [8], one solves (7) approximately to obtain an approximately centered point (x, y, s, µ) such that η(x, y, s, µ) := SXe/µ e η, (8) where, say, η =.2 throughout this paper. Then each iteration decreases µ towards zero, for each new value of µ, the approximate point (x, y, s, µ) to central path equation (7) is updated. For any point (x, y, s) µ where η(x, y, s, µ) /5, it has been proved that (3/4)s i s(µ) i (5/4)s i (3/4)x i x(µ) i (5/4)x i. In general, since each diagonal entry of SX is between µ( η ) µ(+η ), we have the two inequalities µ( η ) (SX) /2 µ( + η ) µ(+η ) (SX) /2, () µ( η ) which will be used frequently during the upcoming analysis. Given a near central path point (x, y, s, µ) such that η(x, y, s, µ) η, the newly decreased µ can be calculated by solving two related least squares problems in a predictor step. Let let (δȳ, δ s) be the solution to a weighted least-squares problem (9) D = X S, () min D /2 (δs + s) subject to δs = A T δy or δs R(A T ), where R(A T ) denotes the range space of A T ; let δ x be the solution to the related weighted leastsquares problem min D /2 (δx + x) subject to Aδx = or δx N (A), where N (A) denotes the null space of A. It is not hard to see that (δ x, δȳ, δ s) satisfy the following equations: Aδ x =, A T δȳ + δ s =, (2) D /2 δ x + D /2 δ s = X /2 S /2 e.

6 Y. Ye: Solving the Markov Decision Problem These equations imply that The new iterate point ( x, ȳ, s), defined by D /2 δ x 2 + D /2 δ s 2 = nµ. (3) ( x, ȳ, s) = (x, y, s) + α(δ x, δȳ, δ s) for a suitable step size α (, ], is a strictly feasible point. Furthermore, η( x, ȳ, s, µ( α)) 2η. Then, one takes a corrector step uses the Newton method starting from ( x, ȳ, s) to generate a new (x, y, s) such that η(x, y, s, µ( α)) η. Thus, the new µ is reduced by a factor ( α) from µ. One can further show that the step size can be greater than 4 in every iteration. Thus, the predictorcorrector interior-point algorithm reduces µ to µ (< µ) in at most O( n log(µ/µ )) iterations, while n maintaining suitable proximity to the central path. Moreover, the average arithmetic operations per iteration is O(n 2.5 ). This eventually results in a polynomial time algorithm in the bit-size computation model O(n 3 L), see also [23, 8, 2, 9]. 3.3 Properties of the Markov Decision Problem For simplicity, we fix k, the number of actions taken by each state, to 2 in the rest of the paper. Then, the MDP can be rewritten as: minimize (c ) T x +(c 2 ) T x 2 subject to (I θp )x +(I θp 2 )x 2 = e, x, x 2. And its dual is maximize e T y subject to (I θp ) T y + s = c, (I θp 2 ) T y + s 2 = c 2, s, s 2. Comparing to the LP stard form, we have A = [I θp, I θp 2 ] IR n 2n, b = e IR n, c = (c ; c 2 ) IR 2n. Note that any feasible basis A B of the MDP has the Leontief form A B = I θp where P is an n n Markov matrix chosen from columns of [P, P 2 ], the reverse is also true. (This can be seen from that, otherwise, the basis has at least one row consisting of all non-positive elements so that its inner product with x B ( ) is non-positive, but the right-h side is strictly positive.) Most proofs of the following lemma can be also found in Dantzig [3, 4] Veinott [27]. Lemma 3.2 The MDP has the following properties: (i) Both the primal dual MDPs have interior feasible points if θ <. (ii) The feasible set of the primal MDP is bounded. More precisely, e T x = n θ, where x = (x ; x 2 ). (iii) Let ˆx be a basic feasible solution of the MDP. Then, any basic variable, say ˆx i, has its value ˆx i. (iv) Let B N be the optimal partition for the MDP. Then, B contains at least one feasible basis, i.e., B n N n; for any j B there is an optimal solution x such that x j. (v) Let A B be any feasible basis A N be any submatrix of the rest columns of the MDP constraint matrix, then (A B ) A N 2n n θ. (4) (5)

Y. Ye: Solving the Markov Decision Problem 7 Proof. The proof of this lemma is straightforward based on the expressions e T P = e T (I θp ) = I + θp + θ 2 P 2 +..., the fact that P k remains a Markov matrix for k =, 2,.... It is easy to verify that is an interior feasible solution to the primal; x = (I θp ) e/2 x 2 = (I θp 2 ) e/2 y = γe is an interior feasible solution to the dual for sufficiently large γ. This proves (). Left multiplying e T to both sides of the constraints of the primal, we have which proves (2). Since any basic feasible solution has the form (3) is proved. e T Ax = ( θ)e T x = n (I θp ) e = (I + θp + θ 2 P 2 +...)e e, Since the MDP has at least one basic optimal solution, the proof of (4) follows the statement of (3). The proof of (5) is more deming. Note that any feasible basis A B = (I θp ) = I + θp + θ 2 P 2 +... where P is a Markov matrix. Let the ith column of P be P i. Then we have P i e T P i = because e T P i is the -norm of P i the 2-norm is less than or equal to the -norm. Now, for any vector a IR n, we have n n n P a = P i a i P i a i a i = a which imply that or for any Markov matrix P. i= i= a n a, P a n a P n Furthermore, each component of A N IR n t (t 2n) is between. Let the ith column of A N be (A N ) i. Then, (A N ) i n, for any vector d IR t, we have A N d Finaly, for any vector d IR t, i= t (A N ) i d i n d n t d 2n d. i= A B A Nd = (I + θp + θ 2 P 2 +...)A N d A N d + θ P A N d + θ 2 P 2 A N d +... A N d + θ P A N d + θ 2 P 2 A N d +... A N d + θ n A N d + θ 2 n A N d +... n θ A N d n 2n d, θ

8 or Y. Ye: Solving the Markov Decision Problem (A B ) A N 2n n θ. Using the lemma, we may safely assume that c since using c + γe for any number γ is equivalent to using c in the MDP. We also remark that the condition measure χ A mentioned earlier used in [26, 7, 2] cannot be bounded by 2n n of (5) of the lemma, since it is the maximum over all possible (feasible or infeasible) bases. Consider a two-state two-action MDP where [ ] θ θ( ɛ) A = θ θ ɛ θ Here, for any given θ >, (A B ) A can be arbitrarily large as ɛ + when ( ) θ θ( ɛ) A B =. θ ɛ In fact, all other condition measures ([24, 29, 7, 9]) used in complexity analyses for general LP can be arbitrarily bad for the MDP. In other words, our new complexity result achieved in this paper cannot be derived by simply proving that the MDP is well conditioned based on one of these measures. 4. Combinatorial interior-point method for solving the MDP We now develop a combinatorial interior-point method for solving the MDP. The method has at most n major steps, where each major step identifies eliminates at least one variable in N then the method proceeds to solve the MDP with at least one variable less. Each major step uses at most O(n.5 (log + log n)) predictor-corrector iterations. This elimination process shares the same spirit of the build-down scheme analyzed in [3] 4. Complexity to compute a near central-path point Like any other interior-point algorithm, our method needs a near central-path point pair satisfying (8) to start with. We now analyze the complexity to generate such a point, show this can be accomplished in O(n log ) classical interiorpoint algorithm iterations. Let (x i ) = (I θp i ) e, i =, 2 ( x = 2 (x ) 2 (x2 ) Thus, x is an interior feasible point for the MDP Let x 2 e IR2n. ( ) y = γe s (s = ) (s 2 ) = where γ is chosen sufficiently large such that ). s > γ ct x n. Denote µ = (x ) T s /2n consider the potential function φ(x, s) = 2n log(s T x) 2n j= ( c + γ( θ)e c 2 + γ( θ)e. log(s j x j ). Let c j ( ) be the jth coefficient of c = (c ; c 2 ). Then, we have φ(x, s ) = 2n log(c T x n 2n + γ( θ) θ ) log(s jx j) j= )

Y. Ye: Solving the Markov Decision Problem 9 2n log(c T x + γ n) = 2n log(2n) 2n log(2n) 2n log(2n) 2n j= 2n j= 2n j= 2n j= log(s j/2) (since x j /2) log 2n(c j/2 + γ( θ)/2) c T x + γ n log log nγ( θ) c T x + γ n nγ( θ) 2γ n 2 = 2n log(2n) + 2n log( θ ). (since c ) (since γ c T x /n) Therefore, using the primal-dual potential reduction algorithm, we can generate an approximate central path point (x, y, s ) such that η(x, y, s, µ ) η. in at most O(n(log 2 + log n)) interior-point algorithm iterations where each iteration uses O(n3 ) arithmetic operations, see [28], 4.2 Eliminating variables in N Now we present the major step of the method to identify eliminate variables in N. We start with the following lemma for a central path property of the MDP: Lemma 4. For any µ (, µ ], the central path pair of (4) (5) satisfies x(µ) j n θ s(µ) j θ µ for every j =,..., 2n; n x(µ) j 2n s(µ) j 2nµ for evey j B. Proof. The upper bound on x(µ) j is from that the sum of all primal variables is equal to n. The lower bounds on x(µ) j are direct results of Lemmas 3. 3.2(4) noting there are 2n variables in (4). The bounds on s(µ) j are from the central path equations of (7). Define a gap factor g = n2 ( + η ) ( θ) η (> ). (6) For any given approximate central path point (x, y, s) such that η(x, y, s, µ) η, define J (µ) = {j : s j 8nµ 3 }, J 3 (µ) = {j : s j 8nµ g } 3 J 2 (µ) be the rest of indices. Thus, for any j J (µ) j 3 J 3 (µ), we have From Lemma 4. (9), for any j B, we observe Therefore, we have s j = s j <. (7) s j3 g s j s(µ) j s j 2nµ 4 8nµ 2nµ = s(µ) j s(µ) j 3 3. Lemma 4.2 Let J (µ) be defined above for (4) (5) at any < µ µ. Then, every variable of B is in J (µ) or B J (µ) for any < µ µ,, thereby, J (µ) always contains an optimal basis. Moreover, since B J (µ) g >, we must have J 3 (µ) N.

Y. Ye: Solving the Markov Decision Problem J 3 (µ) N implies that every primal variable in J 3 (µ) belongs to N it has zero value at any primal optimal solution. Therefore, if J 3 (µ) is not empty, we can eliminate every primal variable ( dual constraint) in J 3 (µ) from further consideration. To restore the primal feasibility after elimination, we solve the least squares problem: min δx D /2 δx subject to A δx = A 3 x 3. Here, subscript denotes the subvector or submatrix of index set J (µ), D = X S. Note that the problem is always feasible since A contains B B contains at least one optimal basis. Then, we have A (x + δx ) + A 2 x 2 = A x + A 2 x 2 + A 3 x 3 = b. The question is whether or not x + δx >. We show below that Lemma 4.3 Not only A (x + δx ) + A 2 x 2 = b (x + δx ; x 2 ) >, but also η((x + δx ; x 2 ), y, (s ; s 2 ), µ) 2η. That is, they are a near central-path point pair for the same µ of (4) (5) after eliminating every primal variables dual constraints in J 3 (µ). Proof. Let A B be an optimal basis contained by A B be the index set of the basis. Then A D AT A B D B AT B ; where D B = X B S B, U V means that matrix U V is positive semidefinite, V means that matrix V is positive definite. Note that so that D /2 δx = D AT (A D AT ) A 3 x 3 δx = D /2 A T (A D ) A 3 x 3 = (A 3 x 3 ) T (A D ) A 3 x 3 (A 3 x 3 ) T (A B D B AT B ) A 3 x 3 = D /2 B A T B(A B D B AT B) A 3 x 3 = D /2 B A B A 3x 3 D /2 B A B A 3 x 3 D /2 A B A 3 x 3 (since B J (µ)) = S (X S ) /2 A B A 3 S 3 X 3 S3 S (X S ) /2 A B A 3 S 3 X 3 S3 S µ( η ) A B A 3 µ( + η ) S3 n Thus, = nµ( + η ) η nµ( + η ) η = nµ = µ 5. 5 n S S3 A B A 3 g 2n n (from (7) Lemma 3.2(5)) θ X δx = (X S ) /2 D /2 δx (X S ) /2 D /2 δx

Y. Ye: Solving the Markov Decision Problem which implies Furthermore, ( S S 2 ) ( x + δx x 2 µ (X S ) /2 5 µ µ( η ) 5 = η 5 <, x + δx = X (e + X δx ) >. ) ( ) ( ) ( ) µe = S x S δx µe + S 2 x 2 ( ) ( ) ( ) S x µe S 2 x 2 + S δx Sx µe + S δx η µ + (S X ) /2 D /2 δx η µ + (S X ) /2 D /2 δx η µ + µ µ( + η ) 5 + η η µ + µ 5 2η µ. 5. Complexity to Make J 3 (µ) The question now is what to do if J 3 (µ) is empty at, say, initial µ. Well, we directly apply the predictor-corrector method described earlier, that is, we compute the predictor step (2) at (x, y, s) where η(x, y, s, µ ) η. We first take care of the trivial case that N =. Lemma 5. If N =, then the solution to (2) is δ s = s δ x =. That is, any feasible solution to (4) is an optimal solution. Proof. If N =, we have s R(A T ). Thus, the minimal value of min D /2 (δs + s) subject to δs = A T δy or δs R(A T ) is or δ s = s. Then, from (2) we have δ x =. Therefore, we may assume that is strictly greater than. Let ɛ := µ D /2 (δ s + s) = µ D/2 δ x (8) { ᾱ = max, nɛ η }. (9) Now take a step defined by the directions of (2); we compute a new feasible iterate If ᾱ <, we have the following centrality lemma. x = x + ᾱδ x, ȳ = y + ᾱδȳ, s = s + ᾱδ s.

2 Y. Ye: Solving the Markov Decision Problem Lemma 5.2 Let ᾱ < in (9). Then the new iterate ( x, ȳ, s) is strictly feasible. Furthermore, η( x, ȳ, s, µ ( ᾱ)) 2η. Proof. The lemma is clearly true if ᾱ =. Thus, we assume that ᾱ >, that is, nɛ nɛ < or ᾱ = >. η η Let α (, ᾱ] let Noting that (2) implies that we have x(α) = x + αδ x, y(α) = y + αδȳ, s(α) = s + αδ s. Xδ s + Sδ x = XSe, µ ( α) η(x(α), y(α), s(α), µ ( α)) = S(α)X(α)e µ ( α)e = ( α)sxe µ ( α)e + α 2 sδ x ( α)sxe µ ( α)e + α 2 sδ x ( α)η µ + α 2 sδ x ( α)η µ + sδ x = ( α)η µ + D /2 sd /2 δ x ( α)η µ + D /2 s D /2 δ x ( α)η µ + nµ D /2 δ x nɛ = ( α)η µ + η µ η = ( α)η µ + ( ᾱ)η µ 2η ( α)µ. Now, we argue about the feasibility of (x(α), y(α), s(α)). Observe that the proximity measure S(α)X(α)e/(µ ( α)) e is a continuous function of α as α ranges over (, ᾱ]. We have just proved that the proximity measure is bounded by 2η =.4 for all α in this range, in particular, this proximity is strictly less than. This means no x(α) i or s(α) i for α in this range can be equal to since x() = x > s() = s >. By continuity, this implies that s(α) > x(α) > for all α (, ᾱ]. The next lemma identifies a variable in N if ɛ. Lemma 5.3 If ɛ >, then there must be a variable indexed j such that j N, the central-path value η ( θ)µ s(µ) j 2 ɛ, 2n 2.5 for all µ (, µ ]. Proof. Let s be any optimal dual slack vector for (5). Since s s R(A T ), we must have µ ɛ = D /2 (δ s + s) D /2 s = (XS) /2 Xs

Y. Ye: Solving the Markov Decision Problem 3 = (XS) /2 Xs µ ( η ) Xs µ ( η ) X s µ ( η ) n θ s Thus, s s η ( θ)µ ɛ. 2n 2n.5 Hence, from Lemma 3. there is a variable indexed j, such that j N, s(µ) j s η ( θ)µ 2n 2 ɛ, 2n 2.5 for all µ (, µ ]. Now consider two cases: nɛ nɛ In Case (2), we have ᾱ =, ɛ η n s(µ) j η η ( θ)µ 2, 2n 3 η. (2) η <. (2) where index j N is the one singled out in Lemma 5.3. In this case, we continue apply the predictorcorrector path-following algorithm reducing µ from µ. Thus, as soon as µ µ η η ( θ) 8, 2n 4 g we have s(µ) j η η ( θ)µ 2 4nµ g, 2n 3 at any given approximate central path point (x, y, s) such that η(x, y, s, µ) η, s j 4 5 s(µ) j 6nµ g 5 That is, j J 3 (µ) it can now be eliminated. 8nµ g. 3 In Case (2), we must have nɛ ᾱ = η s(µ) j η η ( θ)( ᾱ)µ 2 2n 3 where again index j is the one singled out in Lemma 5.3. Note that the first predictor step has reduced µ to ( ᾱ)µ. Then, we continue apply the predictor-corrector algorithm reducing µ from ( ᾱ)µ. As soon as µ ( ᾱ)µ η η ( θ) 8, 2n 4 g we have again s(µ) j 4nµ g, at any given approximate central path point (x, y, s) such that η(x, y, s, µ) η, s j 4 5 s(µ) j 6nµ g 5 8nµ g. 3

4 Y. Ye: Solving the Markov Decision Problem That is, j J 3 (µ) it can now be eliminated. Note that for both cases the number of predictor-corrector algorithm iterations is at most O(n.5 (log + log n)) to reduce µ below the specified values such that J 3(µ) contains at least one variable in N can be eliminated from the MDP. This major step can be repeated the number of such steps should be no more than N n (till final N = the algorithm terminates by Lemma 5.). To summarize: Theorem 5. The combinatorial interior-point algorithm generates an optimal solution of (4) in at most n major eliminating steps, each step uses O(n.5 (log + log n)) predictor-corrector interiorpoint algorithm iterations. Using the Karmakar rank-one updating scheme, the average number of arithmetic operations of each predictor-corrector interior-point iteration is O(n 2.5 ). Thus, Theorem 5.2 The combinatorial interior-point algorithm generates an optimal solution of (4) in at most O(n 4 (log + log n)) arithmetic operations. The similar proof can apply to the MDP with k actions for each state, where we have Corollary 5. The combinatorial interior-point algorithm generates an optimal solution of the MDP in at most (k )n major eliminating steps, each step uses O((nk).5 (log + log n + log k)) predictorcorrector interior-point algorithm iterations, where n is the number of states k is the number of actions for each state. The total arithmetic operations to solve the MDP is bounded by O((nk) 4 (log + log n + log k)). Acknowledgments. This research is supported by NSF grant DMS-366. I thank Kahn Mason, Ben Van Roy Pete Veinott for many insightful discussions on this subject. References [] D. A. Bayer J. C. Lagarias, The nonlinear geometry of linear programming. II. Legendre transform coordinates central trajectories, Transactions of the AMS 34 (989), 527 58. [2] L. Blum, F. Cucker, M. Shub, S. Smale, Complexity Real Computation, Springer-Verlag, 996. [3] G. B. Dantzig, Optimal solutions of a dynamic Leontief model with substitution, Econometrica 23 (955), 295 32. [4] G. B. Dantzig, A proof of a property of Leontief Markov matrices, Tech. Report RR-25, Operations Research Center, UC Berkeley, CA, 962. [5] G. de Ghellinck, Les Probléms de Décisions Séquentielles, Cahiers du Centre d Etudes de Recherche Opérationnelle 2 (96), 6 79. [6] F. D Epenoux, A Probabilistic Production Inventory Problem, Management Science (963), 98 8; Translation of an article published in Revue Francaise de Recherche Opérationnelle 4 (96). [7] R. M. Fruend J. R. Vera, Some characterization properties of the distance to ill-posedness the condition measures of a conic linear system, Mathematical Programming 86 (999), 225 26. [8] C. C. Gonzaga, Path-following methods for linear programming, SIAM Review 34 (992), 67 224. [9] J. C. K. Ho L. Tuncel, Reconciliation of various complexity condition measures for linear programming problems a generalization of Tardos theorem, FOUNDATIONS OF COMPUTATIONAL MATHEMAT- ICS, Proceedings of the Smalefest, World Scientific, 22, pp. 93 47. [] N. Karmarkar, A new polynomial-time algorithm for linear programming, Combinatorica 4 (984), 373 395. [] L. G. Khachiyan, A polynomial algorithm in linear programming, Dokl. Akad. Nauk SSSR 244 (979), 93 86; Translated in Soviet Math. Dokl. 2 9 94. [2] M. Kojima, S. Mizuno A. Yoshise, A polynomial-time algorithm for a class of linear complementarity problems, Mathematical Programming 44 (989), 26. [3] M. L. Littman, T. L. Dean L. P. Kaelbling, On the complexity of solving Markov decision problems, Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence (UAI 95), 995, pp. 394 42. [4] A. S. Manne, Linear programming sequential decisions, Management Science 6 (96), 259 267. [5] Y. Mansour S. Singh, On the complexity of policy iteration, Proceedings of the 5th International Conference on Uncertainty in AI, 999, pp. 4 48. [6] N. Megiddo, Pathways to the optimal set in linear programming, Progress in Mathematical Programming: Interior Point Related Method (N. Megiddo, ed.), Springer, New York, 989, pp. 3 58.

Y. Ye: Solving the Markov Decision Problem 5 [7] N. Megiddo, S. Mizuno T. Tsuchiya, A modified layered-step interior-point algorithm for linear programming, Mathematical Programming 82 (998), 339 356. [8] S. Mizuno, M. J. Todd Y. Ye, On adaptive-step primal-dual interior-point algorithms for linear programming, Mathematics of Operations Research 8 (993), 964 98. [9] R. C. Monteiro I. Adler, Interior path following primal-dual algorithm. Part I: linear programming, Mathematical Programming 44 (989), 27 4. [2] R. C. Monteiro T. Tsuchiya, A variant of the Vavasis-Ye layered-step interior-point algorithm for linear programming, SIAM J. Optimization 3 (23), 54 79. [2] R. C. Monteiro T. Tsuchiya, A new iteration-complexity bound for the MTY predictor-corrector algorithm, Research Memorom No. 856, The Inistitute of Statistical Mathematics, Tokyo, Japan, 22 (to appear in SIAM J. Optimization). [22] C. H. Papadimitriou K. Steiglitz, Combinatorial Optimization: Algorithms Complexity, Prentice Hall, Englewood Cliffs, New Jersey, 982. [23] J. Renegar, A polynomial-time algorithm based on Newton s method for linear programming, Mathematical Programming 4 (988), 59 94. [24] J. Renegar, Linear programming, complexity theory, elementary functional analysis, Mathematical Programming 7 (995), 279 35. [25] B. Van Roy, Learning value function approximation in complex decision processes, Ph.D. Thesis, MIT, 998. [26] S. Vavasis Y. Ye, A primal-dual interior-point method whose running time depends only on the constraint matrix, Mathematical Programming 74 (996) 79 2. [27] A. Veinott, Extreme points of Leontief substitution systems, Linear Algebra its Applications (968) 8-94. [28] Y. Ye, Interior-Point Algorithm: Theory Analysis, Wiley-Interscience Series in Discrete Mathematics Optimization, John Wiley & Sons, 997. [29] Y. Ye, On the finite convergence of interior-point algorithms for linear programming, Mathematical Programming 57 (992), 325 335. [3] Y. Ye, A build-down scheme for linear programming, Mathematical Programming 46 (99), 6 72.