The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
|
|
- Emery Wilkerson
- 6 years ago
- Views:
Transcription
1 The Siplex and Policy-Iteration Methods are Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010; revised Noveber 30, 2010 Abstract We prove that the classic policy-iteration ethod (Howard 1960), including the Siplex ethod (Dantzig 1947) with the ost-negative-reduced-cost pivoting rule, is a strongly polynoial-tie algorith for solving the Markov decision proble (MDP) with a fixed discount rate. Furtherore, the coputational coplexity of the policyiteration ethod (including the Siplex ethod) is superior to that of the only known strongly polynoial-tie interior-point algorith ([28] 2005) for solving this proble. The result is surprising since the Siplex ethod with the sae pivoting rule was shown to be exponential for solving a general linear prograing (LP) proble, the Siplex (or siple policy-iteration) ethod with the sallest-index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates, and the policy-iteration ethod was recently shown to be exponential for solving a undiscounted MDP. We also extend the result to solving MDPs with sub-stochastic and transient state transition probability atrices. 1 Introduction of the Markov decision proble and its linear prograing forulation Markov decision probles (MDPs), naed after Andrey Markov, provide a atheatical fraework for odeling decision-aking in situations where outcoes are partly rando Departent of Manageent Science and Engineering, Stanford University, Stanford, CA 94305; E- ail: yinyu-ye@stanford.edu. This researcher is supported in part by NSF Grant GOALI and AFOSR Grant FA
2 and partly under the control of a decision aker. The MDP is one of the ost fundaental odels for studying a wide range of optiization probles solved via dynaic prograing and reinforceent learning. Today, it has been used in a variety of areas, including anageent, econoics, bioinforatics, electronic coerce, social networking, and supply chains. More precisely, an MDP is a discrete-tie stochastic control process. At each tie step, the process is in soe state i, and the decision aker ay choose any action, say action j, that is available in state i. The process responds at the next tie step by randoly oving into a new state i, and giving the decision aker a corresponding iediate cost c j (i, i ). Let denote the total nuber of states. The probability that the process enters i as its new state is influenced by the chosen state-action j. Specifically, it is given by a state transition probability distribution p j (i, i ) 0, i = 1,,, and p j (i, i ) = 1, i = 1,,. i =1 Thus, the next state i depends on the current state i and the decision aker s chosen state-action j, but is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP possess the Markov property. The key decision of MDPs is to find a (stationary) policy for the decision aker: a set function π = {π 1, π 2,, π } that specifies the action π i that the decision aker will choose when in state i, for i = 1,,. The goal of the proble is to find a (stationary) policy π that will iniize soe cuulative function of the rando costs, typically the expected discounted su over an infinite horizon: γ t c π i t (i t, i t+1 ), t=0 where c π i t (i t, i t+1 ) represents the cost, at tie t, incurred to an individual who is in state i t and takes action π i t. Here γ is the discount rate, where γ 0 and is assued to be strictly less than 1 in this paper. This MDP proble is called the infinite-horizon discounted Markov decision proble (DMDP), which serves as the core odel for MDPs. Because of the Markov property, there is an optial stationary policy, or policy for short, for the DMDP so that it can indeed be written as a function of i only; that is, π is independent of tie t as described above. Let k i be the nuber of state-actions available in state i, i = 1,,, and let A 1 = {1, 2,, k 1 }, A 2 = {k 1 + 1, k 1 + 2,, k 1 + k 2 },... 2
3 or, for i = 1, 2,, in general, {( i 1 ) ( i 1 ) } i A i := k s + 1, k s + 2,, k s. s=1 s=1 s=1 Moreover, let n = i=1 k i, and the total n state-actions be ordered such that if j A i, then state-action j is controlled by state i. Note that the cardinality A i = k i. Suppose we know the state transition probability P and the cost function c, and we wish to calculate the policy that iniizes the expected discounted cost. Then a policy π would be associated with another array indexed by state, value vector v IR, which contains cost-to-go values for all states. Furtherore, an optial policy, (v, π ), is then a fixed point of the following iniu cost operator, π i := arg in j Ai { i pj (i, i ) (c j (i, i ) + γv i )} ; v i := i pπ i (i, i ) ( c π i (i, i ) + γv i ), i = 1,,. (1) Let P π IR be the colun stochastic atrix corresponding to a policy π, that is, the ith colun of P π be the probability distribution p π i (i, i ), i = 1,,. Then the equilibriu condition of (1) can be represented by a atrix for (I γpπ T )v = c π, (I γpπ T )v c π, π, (2) where the ith entry of colun vector c π IR equals i pπ i (i, i )c π i (i, i ). Due to D Epenoux [9] (also see Manne [17] and de Ghellinck [8]), the infinite-horizon discounted MDP can be forulated as a prial linear prograing (LP) proble in the standard for iniize c T x subject to Ax = b, x 0, (3) with the dual axiize b T y subject to s = c A T y 0, where A IR n is a given real atrix with rank, c IR n and b IR are given real vectors, 0 denotes the vector of all 0 s, and x IR n and (y IR, s IR n ) are unknown prial and dual variables, respectively. Vector s is often called the dual slack vector. In what follows, LP stands for any of the following: linear progra, linear progras, or linear prograing, depending on the context. More precisely, the DMDP can be represented by the LP probles (3) and (4) with the following assignents of (A, b, c). First, the ith entry of the colun vector b IR n is 1 for all i, representing an initial population of individuals in state i. Secondly, the jth entry of 3 (4)
4 the colun vector of c IR n is the (expected) one-tie unit cost of action j taken by a state. In particular, if j A i, then action j is controlled by state i and c j = i p j (i, i )c j (i, i ). (5) The LP constraint atrix has the for A = E γp IR n. (6) The jth colun of P is the state transition probability distribution when the jth action is taken by a state. More precisely, for each state-action j A i, that is, each action j controlled by state i, P i j = p j (i, i ), i = 1,,. (7) Finally, the ith eleent of the jth colun of E is 1 if action j is controlled by state i and zero everywhere else: E ij = { 1, if j Ai, 0, otherwise, i = 1,,, j = 1,, n. (8) Let e be the vector of all ones, where its diension depends on the context. Then we have b = e, e T P = e (that is, P is a colun stochastic atrix), e T E = e, and e T A = (1 γ)e. The interpretations of the quantities defining the DMDP prial (3) and the DMDP dual (4) are as follows: b = e eans that there is one unit of the initial nuber of individuals in each state i. The jth entry, if j A i, of prial variables x IR n is the state-action frequency for action j, or the expected present value of the nuber of ties in which an individual is in state i and takes state-action j when j A i. Thus, solving the DMDP prial entails choosing state-action frequencies that iniize the expected present value su, c T x, of total costs subject to the conservation law Ax = e. The conservation law ensures that for each state i, the expected present value of the nuber of individuals entering state i equals the expected present value of the nuber of individuals leaving i. The DMDP dual variables y IR exactly represent the expected present cost-to-go values of the states. Solving the dual entails choosing dual variables y, one for each state i, together with s IR n of slack variables, one for each state-action j, that axiizes e T y subject to A T y + s = c, s 0 or siply A T y c. It is well known that there exist unique optial y and s where, for each state i, yi is the iniu expected present cost that an individual in state i and its progeny can incur. A policy π of the original DMDP, containing exactly one action in A i for each state i, actually corresponds to basic variable indexes of a basic feasible solution (BFS) of the DMDP prial LP forulation. Obviously, we have a total of i=1 k i different policies. Let atrix A π IR (resp., P π, E π ) be the coluns of A (resp., P, E) with indexes in π. Then for a policy π, E π = I (where I is the identity atrix), so that A π has the 4
5 Leontief substitution for A π = I γp π. It is also well known that A π is nonsingular, has a nonnegative inverse and is a feasible basis for the DMDP prial. Let x π be the BFS for a policy π in the DMDP prial for and let ν contain the rest indexes not in π. Let x π and x ν be the sub-vectors of x whose indexes are respectively in policy π and ν. Then the nonbasic variables x π ν = 0 and the basic variables x π π are the unique solution to A π x π = e. The corresponding basic solution of the dual is the vector y π that is the unique solution to A T π y = c T π. The basic solution y π of the dual is feasible if also A T ν y π c T ν or s ν 0. The basic solution pair x π and y π of the DMDP prial and dual are optial if and only if both are feasible. If policy π produces optial x π and y π, then π is an optial policy π and y π is exactly v. Note that the constraints A T π = c T yπ π and AT y π c T describe the sae condition for v in (2) for each policy π or for each state-action j. 2 The Markov decision proble ethods and their coplexities There are several ajor events in developing ethods for solving DMDPs. Bellan (1957) [1] developed a successive approxiate ethod, called value-iteration, which coputes the optial total cost function assuing first a one stage finite horizon, then a two-stage finite horizon, and so on. The total cost functions so coputed are guaranteed to converge in the liit to the optial total cost function. It should be noted that, even prior to Bellan, Shapley (1953) [23] used value-iteration to solve DMDPs in the context of zero-su twoperson stochastic gaes. The other best known ethod is due to Howard (1960) [12] and is known as policyiteration, which generates an optial policy in a finite nuber of iterations. Policy-iteration alternates between a value deterination phase, in which the current policy is evaluated, and a policy iproveent phase, in which an attept is ade to iprove the current policy. In the policy iproveent phase, the policy-iteration ethod updates possibly iproved actions for every state in one iteration. If the the current policy is iproved for at ost one state in one iteration, then it is called siple policy-iteration. We will coe back to the policy-iteration and siple policy-iteration ethods later in ters of the LP forulation. Since it was discovered in 1960 that the DMDP has an LP forulation, the Siplex ethod of Dantzig (1947) [5] can be used to solving DMDPs. It turns out that the Siplex ethod, when applied to solving DMDPs, is the siple policy-iteration ethod. Other general LP ethods, such as the Ellipsoid ethod and interior-point algoriths are also capable to solve DMDPs. As the notion of coputational coplexity eerged, there were treendous efforts in analyzing the coplexity of the MDP and its solution ethods. On the positive side, since 5
6 it (with or without discount) can be forulated as an linear progra, the MDP can be solved in polynoial tie by either the Ellipsoid ethod (e.g., Khachiyan (1979) [14]) or the interior-point algorith (e.g., Kararkar (1984) [13]). Here, polynoial tie eans that the nuber of arithetic operations needed to copute an optial policy is bounded by a polynoial in the nubers of states, actions, and the bit-size of the input data, which are assued to be rational nubers. Papadiitriou and Tsitsiklis [20] then showed in 1987 that an MDP with deterinistic transitions (i.e., each entry of state transition probability atrices is either 0 s or 1 s) can be solved in strongly polynoial-tie (i.e., the nuber of arithetic operations is bounded by a polynoial in the nubers of states and actions only) as a Miniu-Mean-Cost-Cycle proble. Erickson [7] in 1988 showed that successive approxiations suffice to produce: (1) an optial stationary halting policy, or (2) show that no such policy exists in strongly polynoial tie algorith, based on the work of Eaves and Veinott [6] and Rothblu [22]. There were also great research interests and progresses in the value-iteration and policyiteration ethods for solving the DMDP. Bertsekas [2] in 1987 showed that the valueiteration ethod converges to the optial policy in a finite nuber of iterations. Tseng [24] in 1990 showed that the value-iteration ethod generates an optial policy in polynoialtie for the DMDP when the discount rate is fixed. Puteran [21] in 1994 showed that the policy-iteration ethod converges no ore slowly than the value iteration ethod, so that it is also a polynoial-tie algorith for the DMDP with a fixed discount rate. This fact was actually known to Veinott (and perhaps others) three decades earlier and used in dynaic prograing courses he taught for a nuber of years well before Mansour and Singh [18] in 1994 also gave an upper bound on the nuber of iterations, k, for the policy-iteration ethod in solving the DMDP when each state has k actions. (Note that the total nuber of possible policies is k, so that the result is not uch better than that of coplete enueration.) In 2005, Ye [28] developed a strongly polynoialtie cobinatorial interior-point algorith (CIPA) for the DMDP with a fixed discount rate, that is, the nuber of arithetic operations is bounded by a polynoial in only the nubers of states and actions. In ters of the worst-case coplexity bound on the nuber of arithetic operations, the current best results (within a constant factor) are suarized in the following table, when there are exact k actions in each of the states; see Littan et al. [16], Mansour and Singh [18], Ye [28], and references therein. Value-Iteration Policy-Iteration LP-Algoriths CIPA 2 kl(p,c,γ) log(1/(1 γ)) 1 γ in { 3 k k, } 3 kl(p,c,γ) log(1/(1 γ)) 1 γ 3 k 2 L(P, c, γ) 4 k 4 log 1 γ Here, L(P, c, γ) is the total bit-size of the DMDP input data in the linear prograing for, given that (P, c, γ) have only rational entries. As one can see fro the table, both the 6
7 value-iteration and policy-iteration ethods are polynoial-tie algoriths if the discount rate 0 γ < 1 is fixed. But they are not strongly polynoial, where the running tie needs to be a polynoial only in and k. The proof of polynoial-tie for the value and policy-iteration ethods is essentially due to the arguent that, when the gap between the objective value of the current policy (or BFS) and the optial one is sall than 2 L(P,c,γ), the current policy ust be optial; e.g., see [11]. However, the proof of a strongly polynoial-tie algorith cannot rely on this arguent, since (P, c, γ) ay have irrational entries so that the bit-size of the data can be. In practice, the policy-iteration ethod, including the siple policy-iteration or Siplex ethod, has been rearkably successful and shown to be a ost effective and widely used. The nuber of iterations is typically bounded by O(k). It turns out that the policyiteration ethod is actually the Siplex ethod with block pivots at each iteration; and the Siplex ethod also reains one of the very few extreely effective ethods for solving general LPs; see Bixby [3]. In the past 50 years, any efforts have been ade to resolve the worst-case coplexity issue of the policy-iteration ethod or the Siplex ethod, and to answer the question: are the policy-iteration and the Siplex ethods strongly polynoial-tie algoriths? Unfortunately, so far ost results have been negative. Klee and Minty [15] showed in 1972 that the classic Siplex ethod, with Dantzig s original ost-negative-reduced-cost pivoting rule, necessarily takes an exponential nuber of iterations to solve a carefully designed LP proble. Later, a siilar negative result of Melekopoglou and Condon [19] showed that one siple policy-iteration ethod, where only the action for the state with the sallest index is updated, needs an exponential nuber of iterations to copute an optial policy for a specific DMDP proble regardless of discount rates (i.e., even when γ < 1 is fixed). Most recently, Fearnley (2010) [10] showed that the policy-iteration ethod needs an exponential nuber of iterations for an undiscounted but finite-horizon MDP. Thus, it sees ipossible for the policy-iteration ethod to be a strongly polynoial-tie algorith for solving general MDPs. What about DMDP with a fixed discount rate? Is there a pivoting rule to ake the siplex and policy-iteration ethods strongly polynoial for the DMDP? In this paper, we prove that the classic Siplex ethod, or the siple policy-iteration ethod, with the ost-negative-reduced-cost pivoting rule, is indeed a strongly polynoialtie algorith for the DMDP with a fixed discount rate 0 γ < 1. The nuber of its iterations is bounded by 2 ( ) (k 1) 2 log, 1 γ 1 γ and each iteration uses at ost O( 2 k) arithetic operations. The result sees surprising, given the earlier negative results entioned above. Since the policy-iteration ethod with the all-negative-reduced-cost pivoting rule is at 7
8 least as good as the the siple policy-iteration ethod, we prove that it is also a strongly polynoial-tie algorith with the sae iteration coplexity bound. Therefore, the worstcase operation coplexity, O( 4 k 2 log ), of the Siplex ethod is actually superior to that, O( 4 k 4 log ), of the cobinatorial interior-point algorith [28] for solving DMDPs when the discount rate is a fixed constant. If the nuber of actions varies aong the states, our worst-case iteration bound would be (n ) log ( ) 2 1 γ 1 γ, and each iteration uses at ost O(n) arithetic operations, where n is again the total nuber of actions. One can see that the worst-case iteration coplexity bound is linear in the total nuber of actions, as it is observed in practice. We reark that, if the discount rate is an input, it reains open whether or not the policy-iteration ethod is polynoial for the MDP, or whether or not there exists a strongly polynoial-tie algorith for MDP or LP in general. 3 DMDP Properties and the Siplex and policy-iteration ethods We first describe a few general LP and DMDP theores and the classic Siplex and policy-iteration ethods. We will use the LP forulation (3) and (4) for DMDP and the terinology presented in the Introduction section. Recall that, for DMDP, b = e IR, A = E γp IR n, and c, P and E are defined in (5), (7) and (8), respectively. 3.1 DMDP Properties The optiality conditions for all optial solutions of a general LP ay be written as follows: Ax = b, A T y + s = c, s j x j = 0, j = 1,, n, x 0, s 0 where the third condition is often referred as the copleentarity condition. Let π be the index set of state-actions corresponding to a policy. Then, as we briefly entioned earlier, x π is a BFS of the DMDP prial and basis A π has the for A π = (I γp π ), and P π is a colun stochastic atrix, that is, P π 0 and e T P π = e. In fact, the converse is also true, that is, the index set π of basic variables of every BSF of the DMDP prial is a policy for the original DMDP. In other words, π ust have exactly one variable or action index in A i, for each state i. Thus, we have the following lea. 8
9 Lea 1 The DMDP prial linear prograing forulation has the following properties: 1. There is a one-to-one correspondence between a (stationary) policy of the original DMDP and a basic feasible solution of the DMDP prial. 2. Let x π be a basic feasible solution of the DMDP prial. Then any basic variable, say x π i, has its value 1 x π i 1 γ. 3. The feasible set of the DMDP prial is bounded. More precisely, for every feasible x 0. e T x = 1 γ, Proof. Let π be the basis set of any basic feasible solution for the DMDP prial. Then, the first stateent can be seen as follows. Consider the coefficients of the ith row of A. Fro the structure of (6), (7) and (8), we ust have a ij 0 for all j A i. Thus, if no basic variable is chosen fro A i or π A i =, then 1 = a ij x π j = j π j π,j A i a ij x π j 0, which is a contradiction. Thus, each state ust have a state-action in π. On the other hand, π =. Therefore, π ust contain exactly one action index in A i fro each state i = 1,,, that is, π is a policy. The last two stateents of the lea were given in [28] whose proofs were based on Dantzig [4, 5] and Veinott [26]. Fro the first stateent of Lea 1, in what follows we siply call the basis index set π of any BFS of the DMDP prial a policy. For the basis A π = (I γp π ) of any policy π, the BFS x π and the dual can be coputed as x π π = (A π ) 1 e e, x π ν = 0, y π = (A T π ) 1 c π, s π π = 0, s π ν = c ν A T ν (A T π ) 1 c π, where again ν contains the rest action indexes not in π. Since x π and s π are already copleentary, if s π ν 0, then π would be an optial policy. We now present the following strict copleentarity result for the DMDP. Lea 2 Let both linear progras (3) and (4) be feasible. Then there is a unique partition P {1, 2,, n} and O {1, 2,, n}, P O = and P Q = {1, 2,, n}, such that for all optial solution pair (x, s ), x j = 0, j O, and s j = 0, j P, 9
10 and there is at least one optial solution pair (x, s ) that is strictly copleentary, x j > 0, j P, and s j > 0, j O, for the DMDP linear progra. In particular, every optial policy π P so that P and O n. Proof. The strict copleentarity result for general LP is well known, where we call P the optial (super) basic variable set and O the optial non-basic variable set. The cardinality result is fro the fact that there is always an optial basic feasible solution or optial policy where the basic variables (optial state-action frequencies) are all strictly positive fro Lea 1, so that their indexes ust all belong to P. The interpretation of Lea 2 is as folows: since there ay exist ultiple optial policies π for a DMDP, P contains those state-actions each of which appears in at least one optial policy, and O contains the rest state-actions neither of which appears in any optial policy. Let s call each state-action in O a non-optial state-action or siply nonoptial action. Then, any DMDP should have no ore than n non-optial actions. Note that, although there ay be ultiple optial policies for a DMDP, the optial dual basic feasible solution (y, s ) is unique and invariant aong the ultiple optial policies. Thus, if j is a non-optial action, its optial dual slack value, s j, ust be strictly greater than 0, and the converse is also true by the lea. 3.2 The Siplex and policy-iteration Methods Let π be a policy and ν contain the reaining indexes of the non-basic variables. Then we can rewrite (3) as iniize c T π x π +c T ν x ν subject to A π x π +A ν x ν = e, (9) x = (x π ; x ν ) 0, with its dual axiize e T y subject to A T π y + s π = c π, A T ν y + s ν = c ν, s = (s π ; s ν ) 0. The (prial) Siplex ethod rewrites (9) into an equivalent proble (10) iniize ( c ν ) T x ν +c T π (A π ) 1 e subject to A π x π +A ν x ν = e, x = (x π ; x ν ) 0; (11) 10
11 where c is called the reduced cost vector: and c π = 0 and c ν = c ν A T ν y π, y π = (A T π ) 1 c π. Note that the fixed quantity c T π (A π ) 1 e = c T x π in the objective function of (11) is the objective value of the current policy π for (9). In fact, proble (9) and its equivalent for (11) share exactly the sae objective value for every feasible solution x. The Siplex ethod If c 0, the current policy is optial. Otherwise, let 0 < = in( c) with j + = arg in(c), that is, c j + = > 0. Then we ust have j + π, since c j = 0 for all j π. Let j + A i, that is, let j + be a state-action controlled by state i. Then, the classic Siplex ethod (Dantzig 1947) takes x j + as the incoing basic variable to replace the old one x πi, and the ethod repeats with the new policy denoted by π + where π i A i is replaced by j + A i. The ethod will break a tie arbitrarily, and it updates exactly one state-action in one iteration, that is, it only updates the state with the ost negative reduced cost. This is the classic Siplex, or the siple policy-iteration, ethod that uses the ost-negative-reduced-cost updating or pivoting rule. The policy-iteration ethod The original policy-iteration ethod (Howard 1960 [12]) is to update every state that has a negative reduced cost. For each state i, let i = in j Ai ( c) with j i + = arg in j Ai ( c). Then for every state i such that i > 0, let j i + A i replace π i A i already in the current policy π. The ethod repeats with the new policy denoted by π +, where possibly ultiple π i A i are replaced by j i + A i. The ethod will also break a tie in each state arbitrarily. Therefore, both ethods would generate a sequence of polices denoted by π 0, π 1,..., π t,..., starting fro any initial policy π 0. We coent that the Siplex and policy-iteration ethods with the greedy or the ost-negative-reduced-cost updating rule are special versions of generic policy iproveent. In what follows, the ost-negative-reduced-cost pivoting rule is used as a default for the Siplex and policy-iteration ethods, unless otherwise stated. 4 Proof of strong polynoiality We first prove our strongly polynoial-tie result for the Siplex ethod. For the iproveent of new policy π + over any policy π of the Siplex ethod, we have 11
12 Lea 3 Let z be the optial objective value of (9). Siplex ethod fro current policy π to new policy π + Moreover, z c T x π 1 γ. ( c T x π+ z 1 1 γ ) (c T x π z ). Then, in any iteration of the Therefore, the Siplex ethod generates a sequence of polices π 0, π 1,..., π t,... such that ( c T x πt z 1 1 γ ) t ( c T x π0 z ). Proof. Fro proble (11), we see that the objective function value for any feasible x is c T x = c T x π + c T x c T x π e T x = c T x π 1 γ, where the first inequality follows fro c e, by the ost-negative-reduced-cost pivoting rule adapted in the ethod, and the last equality is based on the third stateent of Lea 1. In particular, the optial objective value is z = c T x c T x π which proves the first inequality of the lea. 1 γ, Since at the new policy π +, the value of new basic variable x π+ j is greater than or equal + to 1, fro the second stateent of Lea 1, the objective value of the new policy for proble (11) is decreased by at least. Thus, for proble (9), c T x π c T x π+ = x π+ j + 1 γ ( c T x π z ), or ( c T x π+ z 1 1 γ ) (c T x π z ), which proves the second inequality. Replacing π by π t and using the above inequality, for all t = 0, 1,..., we have ( c T x πt+1 z 1 1 γ ) (c T x πt z ), which leads to the third desired inequality by induction. We now present the following key technical lea. 12
13 Lea 4 1. If a policy π is not optial, then there is a state-action j π O (i.e., a non-optial state-action j in the current policy) such that s j 1 γ 2 ( c T x π z ), where O, together with P, is the strict copleentarity partition stated in Lea 2, and s is the optial dual slack vector of (10). 2. For any sequence of polices π 0, π 1,..., π t,... generated by the Siplex ethod where π 0 is not optial, let j 0 π 0 O be the state-action index identified above in the initial policy π 0. Then, if j 0 π t, we ust have x πt j γ ct x πt z, t 1. c T x π0 z Proof. Since all non-basic variable of x π have zero values, c T x π z = c T x π e T y = (s ) T x π = j π s jx π j. Since the nuber of non-negative ters in the su is, there ust be a state-action j π such that s jx π j 1 ( c T x π z ). Then, fro Lea 1, x π j, so that 1 γ which also iplies j O fro Lea 2. s j 1 γ 2 ( c T x π z ) > 0, Now, suppose the initial policy π 0 is not optial and let j 0 π 0 O be the index identified at policy π 0 such that the above inequality holds, that is, s j 0 1 γ 2 ( c T x π0 z ). Then, for any policy π t generated by the Siplex ethod, if j 0 π t, we ust have c T x πt z = (s ) T x πt s j 0xπt j 0, so that x πt j 0 ct x πt z 2 s j 1 γ ct x πt z c T x 0 π0 z. 13
14 These leas lead to our key result: Theore 1 Let π 0 be any given non-optial policy. Then there is a state-action j 0 π 0 O, i.e., a non-optial action j 0 in policy π 0, that would never appear in any of the policies generated by the Siplex ethod after T := log ( ) 2 1 γ 1 γ iterations starting fro π 0. Proof. Fro Lea 3, after t iterations of the Siplex ethod, we have c T x πt z ( c T x π0 z 1 1 γ ) t. Therefore, after t T + 1 iterations fro the initial policy π 0, j 0 π t iplies, by Lea 4, x πt j γ ct x πt z ( c T x π0 z 2 1 γ 1 1 γ ) t < 1. The last inequality above coes fro the fact log(1 x) x for all x < 1 so that log ( 2 1 γ + t log 1 1 γ ) log 2 1 γ + t ( 1 γ ) < 0 if t 1 + T 1 + log ( ) 2 1 γ 1 γ. But x π t j < 1 is a contradiction to Lea 1, which 0 states that every basic variable value ust be greater or equal to 1. Thus, j 0 π t for all t T + 1. The event described in Theore 1 can be viewed as a crossover event of Vavasis and Ye [25, 28]: a state-action, although we don t know which one it is, was in the initial policy but it will never stay in or return to the policies after a certain nuber of iterations, during the iterative process of the Siplex or siple policy-iteration ethod. We now repeat the sae proof for policy π T +1, if it is not optial yet, in the policy sequence generated by the Siplex ethod. Since policy π T +1 is not optial, there ust be a non-optial state-action, j 1 π T +1 O and j 1 j 0 (because of Theore 1), that would never stay in or return to the policies generated by the Siplex ethod after 2T iterations starting fro π 0. Again, we can repeat this process for policy π 2T +1 if it is not optial yet, and so on. In each of these cycles of T Siplex iterations, at least one new non-optial state-action is eliinated fro appearance in any of the future policy cycles generated by the Siplex ethod. However, we have at ost O any such non-optial state-actions to eliinate, where O n fro Lea 2. Hence, the Siplex ethod can cycle at ost n ties, and we reach our ain conclusion: 14
15 Theore 2 The siplex, or siple policy-iteration, ethod with the ost-negative-reducedcost pivoting rule of Dantzig for solving the discounted Markov decision proble with a fixed discount rate is a strongly polynoial-tie algorith. Starting fro any policy, the ethod terinates in at ost (n ) log ( ) 2 1 γ 1 γ iterations, where each iteration uses O(n) arithetic operations. The arithetic operations count is well known for the Siplex ethod: it uses O( 2 ) arithetic operations to update the inverse of the basis (A π t) 1 of the current policy π t and the dual basic solution y πt, as well as O(n) arithetic operations to calculate the reduced cost, and then chooses the incoing basic variable. We now turn our attention to the policy-iteration ethod, and we have the following corollary: Corollary 1 The original policy-iteration ethod of Howard for solving the discounted Markov decision proble with a fixed discount rate is a strongly polynoial-tie algorith. Starting fro any policy, it terinates in at ost (n ) log ( ) 2 1 γ 1 γ iterations. Proof. First, Leas 1 and 2 hold since they are independent of which ethod is being used. Secondly, Lea 3 still holds for the policy-iteration ethod, since at any policy π the incoing basic variable j + = arg in(c) (that is, c j + = = in( c)) for the Siplex ethod is always one of the incoing basic variables for the policy-iteration ethod. Thirdly, the facts established by Lea 4 are also independent of how the policy sequence is generated as long as the state-action with the ost-negative-reduced-cost is included in the next policy, so that they hold for the policy-iteration ethod as well. Thus, we can conclude that there is a state-action j 0 π 0 O, i.e., a non-optial stateaction j 0 in the initial non-optial policy π 0, that would never stay in or return to the policies generated by the policy-iteration ethod after T iterations. Thus, Theore 1 also holds for the policy-iteration ethod, which proves the corollary. Note that, for the policy-iteration ethod, each iteration could use up to O( 2 n) arithetic operations. 5 Extensions and Rearks Our result can be extended to other undiscounted MDPs where every basic feasible atrix of (9) exhibits the Leontief substitution for: A π = I P, for soe nonnegative square atrix P with P 0 and its spectral radius ρ(p ) γ for a fixed γ < 1. This includes MDPs with sub-stochastic atrices and transient cases; see 15
16 Veinott [27]. Note that the inverse of (I P ) has the expansion for and so that (I P ) 1 = I + P + P (I P ) 1 e 2 e 2 (1 + γ + γ ) = (I P ) 1 e 1 1 γ. 1 γ, Thus, each basic variable value is still between 1 and, so that Lea 1 is true with an 1 γ inequality (actually stronger for our proof): e T x 1 γ, for every feasible solution x. Consequently, Leas 2, 3, and 4 all hold, which leads to the following corollary. Corollary 2 Let every feasible basis of an MDP have the for I P where P 0, with a spectral radius less than or equal to a fixed γ < 1. Then, the Siplex and policy-iteration ethods are strongly polynoial-tie algoriths. Starting fro any policy, each of the terinates in at ost (n ) log ( ) 2 1 γ 1 γ iterations. One observation fro our worst-case analyses is that there is no iteration-count difference between the Siplex ethod and the policy-iteration ethod that akes block pivots in each iteration, as long as the ost-negative-reduced-cost pivoting rule is adapted. However, each iteration of the Siplex ethod is ore efficient than that the policy-iteration ethod. Finally, we reark that the pivoting rule sees to ake the difference. As we entioned earlier, for the DMDP with a fixed discount rate, the siplex or siple policy-iteration ethod with the sallest-index pivoting rule (a rule popularly used against cycling in the presence of degeneracy) was shown to be exponential. This is in contrast to the ethod that uses the ost-negative-reduced-cost pivoting rule, which is proven to be strongly polynoial in this paper. On the other hand, the ost-negative-reduced-cost pivoting rule is exponential for solving soe other LP probles. Thus, searching for suitable pivoting rules for solving different LP probles is essential, and one cannot rule out the Siplex ethod siply because the behavior of one pivoting rule on one proble is shown to be exponential. Further possible research directions ay answer the questions: can the Siplex ethod or the policy-iteration ethod be strongly polynoial for solving the MDP regardless of discount rates? Or, is there any strongly polynoial-tie algorith for solving the MDP regardless of discount rates? 16
17 Acknowledgents. I thank Pete Veinott and four anonyous Referees for any insightful discussions and suggestions on this subject, which have greatly iproved the presentation of the paper. References [1] R. Bellan. Dynaic Prograing. Princeton University Press, Princeton, New Jersey, [2] D. P. Bertsekas. Dynaic Prograing, Deterinistic and Stochastic Models. Prentice-Hall, Englewood Cliffs, New Jersey, [3] R. E. Bixby, Progress in linear prograing, ORSA J. on Coput. 6:1 (1994) [4] G. B. Dantzig, Optial solutions of a dynaic Leontief odel with substitution, Econoetrica 23 (1955), [5] G. B. Dantzig. Linear Prograing and Extensions. Princeton University Press, Princeton, New Jersey, [6] B. E. Eaves and A. F. Veinott, Maxiu-Stopping-Value Policies in Finite Markov Population Decision Chains, anuscript, Stanford University, [7] R. E. Erickson, Optiality of Stationary Halting Policies and Finite Terination of Successive Approxiations, Matheatics of Operations Research 13:1 (1988), [8] G. de Ghellinck, Les Problés de Décisions Séquentielles, Cahiers du Centre d Etudes de Recherche Opérationnelle 2 (1960), [9] F. D Epenoux, A Probabilistic Production and Inventory Proble, Manageent Science 10 (1963), ; Translation of an article published in Revue Franccaise de Recherche Opérationnelle 14 (1960). [10] J. Fearnley, Exponential lower bounds for policy iteration, arxiv: v1, (March 2010). [11] M. Grötschel, L. Lovász and A. Schrijver, Geoetric Algoriths and Cobinatorial Optiization, Springer, Berlin, [12] R. A. Howard, Dynaic Prograing and Markov Processes. MIT, Cabridge, Massachusetts, [13] N. Kararkar, A new polynoial-tie algorith for linear prograing, Cobinatorica 4 (1984), [14] L. G. Khachiyan, A polynoial algorith in linear prograing, Dokl. Akad. Nauk SSSR 244 (1979), ; Translated in Soviet Math. Dokl [15] V. Klee and G. J. Minty, How good is the Siplex ethod, In O. Shisha, editor, Inequalities III, Acadeic Press, New York, NY,
18 [16] M. L. Littan, T. L. Dean and L. P. Kaelbling, On the coplexity of solving Markov decision probles, Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence (UAI 95), 1995, pp [17] A. S. Manne, Linear prograing and sequential decisions, Manageent Science 6 (1960), [18] Y. Mansour and S. Singh, On the coplexity of policy iteration, Proceedings of the 15th International Conference on Uncertainty in AI, 1999, pp [19] M. Melekopoglou and A. Condon, On the coplexity of the policy iproveent algorith for Markov Decision Processes, INFORMS Journal on Coputing 6:2 (1994), [20] C. H. Papadiitriou and J. N. Tsitsiklis, The coplexity of Markov decision processes, Matheatics of Operations Research 12:3 (1987), [21] M. L. Puteran, Markov Decision Processes, John & Wiley and Sons, New York, [22] U. Rothblu, Multiplicative Markov Decision Chains, Doctoral Dissertation, Departent of Operations Research, Stanford University, Stanford, [23] L. S. Shapley, Stochastic Gaes, Proc Natl Acad Sci U S A 39:10 (1953), [24] P. Tseng, Solving H-horizon, stationary Markov decision probles in tie proportional to log(h), Operations Research Letters 9:5 (1990), [25] S. Vavasis and Y. Ye, A prial-dual interior-point ethod whose running tie depends only on the constraint atrix, Matheatical Prograing 74 (1996) [26] A. Veinott, Extree points of Leontief substitution systes, Linear Algebra and its Applications 1 (1968) [27] A. Veinott, Discrete dynaic prograing with sensitive discount optiality criteria, The Annals of Matheatical Statistics 40:5 (1969) [28] Y. Ye, A new coplexity result on solving the Markov decision proble, Matheatics of Operations Research 30:3 (2005),
The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost
More informationThe Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
The Siplex and Policy-Iteration Methods are Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye Departent of Manageent Science and Engineering, Stanford University, Stanford,
More informationThe Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount
The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount Yinyu Ye Department of Management Science and Engineering and Institute of Computational
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationCurious Bounds for Floor Function Sums
1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More informationBipartite subgraphs and the smallest eigenvalue
Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationGraphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes
Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationSoft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis
Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES
More informationarxiv: v1 [cs.ds] 29 Jan 2012
A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationAlgorithms for parallel processor scheduling with distinct due windows and unit-time jobs
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and
More informationPolytopes and arrangements: Diameter and curvature
Operations Research Letters 36 2008 2 222 Operations Research Letters wwwelsevierco/locate/orl Polytopes and arrangeents: Diaeter and curvature Antoine Deza, Taás Terlaky, Yuriy Zinchenko McMaster University,
More informationConvex Programming for Scheduling Unrelated Parallel Machines
Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationLost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies
OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with
More informationApproximation in Stochastic Scheduling: The Power of LP-Based Priority Policies
Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationHandout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.
Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized
More informationThe Methods of Solution for Constrained Nonlinear Programming
Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained
More informationMulti-Dimensional Hegselmann-Krause Dynamics
Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationTopic 5a Introduction to Curve Fitting & Linear Regression
/7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationarxiv: v1 [math.co] 19 Apr 2017
PROOF OF CHAPOTON S CONJECTURE ON NEWTON POLYTOPES OF q-ehrhart POLYNOMIALS arxiv:1704.0561v1 [ath.co] 19 Apr 017 JANG SOO KIM AND U-KEUN SONG Abstract. Recently, Chapoton found a q-analog of Ehrhart polynoials,
More information1 Identical Parallel Machines
FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j
More informationarxiv: v1 [math.nt] 14 Sep 2014
ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row
More informationOn Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40
On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering
More informationOn the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation
journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationStochastic Machine Scheduling with Precedence Constraints
Stochastic Machine Scheduling with Precedence Constraints Martin Skutella Fakultät II, Institut für Matheatik, Sekr. MA 6-, Technische Universität Berlin, 0623 Berlin, Gerany skutella@ath.tu-berlin.de
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationIntroduction to Optimization Techniques. Nonlinear Programming
Introduction to Optiization echniques Nonlinear Prograing Optial Solutions Consider the optiization proble in f ( x) where F R n xf Definition : x F is optial (global iniu) for this proble, if f( x ) f(
More informationA Markov Framework for the Simple Genetic Algorithm
A arkov Fraework for the Siple Genetic Algorith Thoas E. Davis*, Jose C. Principe Electrical Engineering Departent University of Florida, Gainesville, FL 326 *WL/NGS Eglin AFB, FL32542 Abstract This paper
More informationHomework 3 Solutions CSE 101 Summer 2017
Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing
More informationExact tensor completion with sum-of-squares
Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David
More informationThe Frequent Paucity of Trivial Strings
The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that
More information. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe
PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More information4 = (0.02) 3 13, = 0.25 because = 25. Simi-
Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =
More informationA1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3
A. Find all ordered pairs a, b) of positive integers for which a + b = 3 08. Answer. The six ordered pairs are 009, 08), 08, 009), 009 337, 674) = 35043, 674), 009 346, 673) = 3584, 673), 674, 009 337)
More informationThe Hilbert Schmidt version of the commutator theorem for zero trace matrices
The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that
More informationAcyclic Colorings of Directed Graphs
Acyclic Colorings of Directed Graphs Noah Golowich Septeber 9, 014 arxiv:1409.7535v1 [ath.co] 6 Sep 014 Abstract The acyclic chroatic nuber of a directed graph D, denoted χ A (D), is the iniu positive
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationOn the Existence of Pure Nash Equilibria in Weighted Congestion Games
MATHEMATICS OF OPERATIONS RESEARCH Vol. 37, No. 3, August 2012, pp. 419 436 ISSN 0364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/10.1287/oor.1120.0543 2012 INFORMS On the Existence of Pure
More informationOn the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs
On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing
More informationŞtefan ŞTEFĂNESCU * is the minimum global value for the function h (x)
7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not
More informationA Generalized Permanent Estimator and its Application in Computing Multi- Homogeneous Bézout Number
Research Journal of Applied Sciences, Engineering and Technology 4(23): 5206-52, 202 ISSN: 2040-7467 Maxwell Scientific Organization, 202 Subitted: April 25, 202 Accepted: May 3, 202 Published: Deceber
More informationHybrid System Identification: An SDP Approach
49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The
More informationA Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science
A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest
More informationOptimum Value of Poverty Measure Using Inverse Optimization Programming Problem
International Journal of Conteporary Matheatical Sciences Vol. 14, 2019, no. 1, 31-42 HIKARI Ltd, www.-hikari.co https://doi.org/10.12988/ijcs.2019.914 Optiu Value of Poverty Measure Using Inverse Optiization
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationarxiv: v1 [cs.ds] 17 Mar 2016
Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationA note on the realignment criterion
A note on the realignent criterion Chi-Kwong Li 1, Yiu-Tung Poon and Nung-Sing Sze 3 1 Departent of Matheatics, College of Willia & Mary, Williasburg, VA 3185, USA Departent of Matheatics, Iowa State University,
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationIntroduction to Discrete Optimization
Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationOn Constant Power Water-filling
On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives
More informationLecture 9 November 23, 2015
CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informationAn Algorithm for Posynomial Geometric Programming, Based on Generalized Linear Programming
An Algorith for Posynoial Geoetric Prograing, Based on Generalized Linear Prograing Jayant Rajgopal Departent of Industrial Engineering University of Pittsburgh, Pittsburgh, PA 526 Dennis L. Bricer Departent
More informationPage 1 Lab 1 Elementary Matrix and Linear Algebra Spring 2011
Page Lab Eleentary Matri and Linear Algebra Spring 0 Nae Due /03/0 Score /5 Probles through 4 are each worth 4 points.. Go to the Linear Algebra oolkit site ransforing a atri to reduced row echelon for
More informationarxiv: v1 [math.na] 10 Oct 2016
GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationTight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions
Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805
More informationReed-Muller codes for random erasures and errors
Reed-Muller codes for rando erasures and errors Eanuel Abbe Air Shpilka Avi Wigderson Abstract This paper studies the paraeters for which Reed-Muller (RM) codes over GF (2) can correct rando erasures and
More informationORIGAMI CONSTRUCTIONS OF RINGS OF INTEGERS OF IMAGINARY QUADRATIC FIELDS
#A34 INTEGERS 17 (017) ORIGAMI CONSTRUCTIONS OF RINGS OF INTEGERS OF IMAGINARY QUADRATIC FIELDS Jürgen Kritschgau Departent of Matheatics, Iowa State University, Aes, Iowa jkritsch@iastateedu Adriana Salerno
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution
More informationVulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links
Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Tie-Varying Jaing Links Jun Kurihara KDDI R&D Laboratories, Inc 2 5 Ohara, Fujiino, Saitaa, 356 8502 Japan Eail: kurihara@kddilabsjp
More informationConstrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008
LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationGeneralized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.
Generalized eigenfunctions and a Borel Theore on the Sierpinski Gasket. Kasso A. Okoudjou, Luke G. Rogers, and Robert S. Strichartz May 26, 2006 1 Introduction There is a well developed theory (see [5,
More informationThe concavity and convexity of the Boros Moll sequences
The concavity and convexity of the Boros Moll sequences Ernest X.W. Xia Departent of Matheatics Jiangsu University Zhenjiang, Jiangsu 1013, P.R. China ernestxwxia@163.co Subitted: Oct 1, 013; Accepted:
More informationThe Fundamental Basis Theorem of Geometry from an algebraic point of view
Journal of Physics: Conference Series PAPER OPEN ACCESS The Fundaental Basis Theore of Geoetry fro an algebraic point of view To cite this article: U Bekbaev 2017 J Phys: Conf Ser 819 012013 View the article
More informationON REGULARITY, TRANSITIVITY, AND ERGODIC PRINCIPLE FOR QUADRATIC STOCHASTIC VOLTERRA OPERATORS MANSOOR SABUROV
ON REGULARITY TRANSITIVITY AND ERGODIC PRINCIPLE FOR QUADRATIC STOCHASTIC VOLTERRA OPERATORS MANSOOR SABUROV Departent of Coputational & Theoretical Sciences Faculty of Science International Islaic University
More informationEMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS
EMPIRICAL COMPLEXITY ANALYSIS OF A MILP-APPROACH FOR OPTIMIZATION OF HYBRID SYSTEMS Jochen Till, Sebastian Engell, Sebastian Panek, and Olaf Stursberg Process Control Lab (CT-AST), University of Dortund,
More informationA Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)
1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationNew upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.
New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing
More informationADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE
ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE CHRISTOPHER J. HILLAR Abstract. A long-standing conjecture asserts that the polynoial p(t = Tr(A + tb ] has nonnegative coefficients whenever is
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationOPTIMIZATION OF SPECIFIC FACTORS TO PRODUCE SPECIAL ALLOYS
5 th International Conference Coputational Mechanics and Virtual Engineering COMEC 2013 24-25 October 2013, Braşov, Roania OPTIMIZATION OF SPECIFIC FACTORS TO PRODUCE SPECIAL ALLOYS I. Milosan 1 1 Transilvania
More informationDetermining OWA Operator Weights by Mean Absolute Deviation Minimization
Deterining OWA Operator Weights by Mean Absolute Deviation Miniization Micha l Majdan 1,2 and W lodziierz Ogryczak 1 1 Institute of Control and Coputation Engineering, Warsaw University of Technology,
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationA Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay
A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer
More informationComplex Quadratic Optimization and Semidefinite Programming
Coplex Quadratic Optiization and Seidefinite Prograing Shuzhong Zhang Yongwei Huang August 4 Abstract In this paper we study the approxiation algoriths for a class of discrete quadratic optiization probles
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More information