The simplex method is strongly polynomial for deterministic Markov decision processes
|
|
- Meredith Lucas
- 5 years ago
- Views:
Transcription
1 The implex method i trongly polynomial for determinitic Markov deciion procee Ian Pot Yinyu Ye May 31, 2013 Abtract We prove that the implex method with the highet gain/mot-negative-reduced cot pivoting rule converge in trongly polynomial time for determinitic Markov deciion procee (MDP) regardle of the dicount factor. For a determinitic MDP with n tate and m action, we prove the implex method run in O(n 3 m 2 log 2 n) iteration if the dicount factor i uniform and O(n 5 m 3 log 2 n) iteration if each action ha a ditinct dicount factor. Previouly the implex method wa known to run in polynomial time only for dicounted MDP where the dicount wa bounded away from 1 [Ye11]. Unlike in the dicounted cae, the algorithm doe not greedily converge to the optimum, and we require a more complex meaure of progre. We identify a et of layer in which the value of primal variable mut lie and how that the implex method alway make progre optimizing one layer, and when the upper layer i updated the algorithm make a ubtantial amount of progre. In the cae of nonuniform dicount, we define a polynomial number of miletone policie and we prove that, while the objective function may not improve ubtantially overall, the value of at leat one dual variable i alway making progre toward ome miletone, and the algorithm will reach the next miletone in a polynomial number of tep. 1 Introduction Markov deciion procee (MDP) are a powerful tool for modeling repeated deciion making in tochatic, dynamic environment. An MDP conit of a et of tate and a et of action that one may perform in each tate. Baed on an agent action it receive reward and affect the future evolution of the proce, and the agent attempt to maximize it reward over time (ee Section 2 for a formal definition). MDP are widely ued in machine learning, robotic and control, operation reearch, economic, and related field. See the book [Put94] and [Ber96] for a thorough overview. Solving MDP i alo an important problem theoretically. Optimizing an MDP can be formulated a a linear program (LP), and although thee LP poe extra tructure that can be exploited by algorithm like Howard policy iteration method [How60], they lie jut beyond the point at which Department of Combinatoric and Optimization, Univerity of Waterloo. Reearch done while at Stanford Univerity. ian@ianpot.org. Reearch upported by NSF grant We alo acknowledge financial upport from grant #FA from the U.S. Air Force Office of Scientific Reearch (AFOSR) and the Defene Advanced Reearch Project Agency (DARPA). Department of Management Science and Engineering, Stanford Univerity. yinyu-ye@tanford.edu. Reearch upported in part from grant #FA from the U.S. Air Force Office of Scientific Reearch (AFOSR). 1
2 our ability to olve LP in trongly-polynomial time end (and are a natural target for extending thi ability), and they have proven to be hard in general for algorithm previouly thought to be quite powerful, uch a randomized implex pivoting rule [FHZ11]. In practice [LDK95] MDP are olved uing policy iteration, which may be viewed a a parallel verion of the implex method with multiple imultaneou pivot, or value iteration [Bel57], an inexact approximation to policy iteration that i fater per iteration. If the dicount factor γ, which determine the effective time horizon (ee Section 2), i mall it ha long been known that policy and value iteration will find an ɛ-approximation to the optimum [Bel57]. It i alo well-known that value iteration may be exponential, but policy iteration reited wort-cae analyi for many year. It wa conjectured to be trongly polynomial but except for highly-retricted example [Mad02] only exponential time bound were known [MS99]. Building on reult for parity game [Fri09], Fearnley recently gave an exponential lower bound [Fea10]. Friedmann, Hanen, and Zwick extended Fearnley technique to achieve ub-exponential lower bound for randomized implex pivoting rule [FHZ11] uing MDP, and Friedmann gave an exponential lower bound for MDP uing the leat-entered pivoting rule [Fri11]. Melekopoglou and Condon proved everal other implex pivoting rule are exponential [MC94]. On the poitive ide, Ye deigned a pecialized interior-point method that i trongly polynomial in everything except the dicount factor [Ye05]. Ye later proved that for dicounted MDP with n tate and m action, the implex method with the mot-negative-reduced-cot pivoting rule and, by extenion, policy iteration, run in time O(nm/(1 γ) log(n/(1 γ))) on dicounted MDP, which i polynomial for fixed γ [Ye11]. Hanen, Milteren, and Zwick improved the policy iteration bound to O(m/(1 γ) log(n/(1 γ))) and extended it to both value iteration a well a the trategy iteration algorithm for two player turn-baed tochatic game [HMZ11]. But the performance of policy iteration and implex-tyle bai-exchange algorithm on MDP remain poorly undertood. Policy iteration, for intance, i conjectured to run in O(m) iteration on determinitic MDP, but the bet upper bound are exponential, although a lower bound of O(m) i known [HZ10]. Improving our undertanding of thee algorithm i an important tep in deigning better one with polynomial or even trongly-polynomial guarantee. Motivated by thee quetion, we analyze the implex method with the mot-negative-reducedcot pivoting rule on determinitic MDP. For a determinitic MDP with n tate and m action, we prove that the implex method terminate in O(n 3 m 2 log 2 n) iteration regardle of the dicount factor, and if each action ha a ditinct dicount factor, then the algorithm run in O(n 5 m 3 log 2 n) iteration. Our reult do not extend to policy iteration, and we leave thi a a challenging open quetion. Determinitic MDP were previouly known to be olvable in trongly polynomial time uing pecialized method not applicable to general MDP minimum mean cycle algorithm [PT87] or, in the cae of nonuniform dicount, by exploiting the property that the dual LP ha only two variable per inequality [HN94]. The fatet known algorithm for uniformly dicounted determinitic MDP run in time O(mn) [MTZ10]. However, thee problem were not known to be olvable in polynomial time with the more-generic implex method. More generally, we believe that our reult help hed ome light on how algorithm like implex and policy iteration function on MDP. Our proof technique, particularly in the cae of nonuniform dicount, may be of independent interet. For uniformly dicounted MDP, we how that the value of the primal flux variable mut lie within one of two interval or layer of polynomial ize depending on whether an action i on a path or a cycle. Mot iteration update variable in the maller path layer, and we how thee 2
3 converge rapidly to a locally optimal policy for the path, at which point the algorithm mut update the larger cycle layer and make a large amount of progre toward the optimum. Progre take the form of many mall improvement interpered with a few much larger one rather than uniform convergence. The nonuniform cae i harder, and our meaure of progre i unuual and, to the bet of our knowledge, novel. We again define a et of interval in which the value of variable on cycle mut fall, and thee define a collection of intermediate miletone or checkpoint value for each dual variable (the value of a tate in the MDP). Whenever a variable enter a cycle layer, we argue that a correponding dual variable i making progre toward the layer miletone and will pa thi value after enough update. When each of thee checkpoint have been paed, the algorithm mut have reached the optimum. We believe ome of thee idea may prove ueful in other problem a well. In Section 2 we formally define MDP and decribe a number of well-known propertie that we require. In Section 3 we analyze the cae of a uniform dicount factor, and in Section 4 we extend thee reult to the nonuniform cae. 2 Preliminarie Many variation and extenion of MDP have been defined, but we will tudy the following problem. A Markov deciion proce conit of a et of n tate S and m action A. Each action a i aociated with a ingle tate in which it can be performed, a reward r a R for performing the action, and a probability ditribution P a over tate to which the proce will tranition when uing action a. We denote by P a, the probability of tranitioning to tate when taking action a. There i at leat one action uable in each tate. Let r be the vector of reward indexed by a with entrie r a, A A be the et of action performable in tate, and P be the n by m matrix with column P a and entrie P a,. We will retrict the ditribution P a to be determinitic for all action, in which cae tate may be thought of a node in a graph and action a directed edge. However, the reult in thi ection apply to MDP with tochatic tranition a well. At each time tep, the MDP tart in ome tate and perform an action a admiible in tate, at which point it receive the reward r a and tranition to a new tate according to the probability ditribution P a. We are given a dicount factor γ < 1 a part of the input, and our goal i to chooe action to perform o a to maximize the expected dicounted reward we accumulate over an infinite time horizon. The dicount can be thought of a a topping probability at each time tep the proce end with probability 1 γ. Normally, the dicount γ i uniform for the entire MDP, but in Section 4 we will allow each action to have a ditinct dicount γ a. Due to the Markov property tranition depend only the current tate and action there i an optimal trategy that i memoryle and depend only on the current tate. Let π be uch a policy, a ditribution of action to perform for each tate. Thi define a Markov chain and a value for each tate: Definition 2.1. Let π be a policy, P π be the n by n matrix where P, π i the probability of tranitioning from to uing π, and r π the vector of expected reward for each tate according to the ditribution of action in π. The value vector v π i indexed by tate, and v π i equal to the expected total dicounted reward of tarting in tate and following policy π. It i defined a v π = i 0 (γ(p π ) T ) i r π = (I γp π ) T r π or equivalently by v π = r π + γ(p π ) T v π. (1) 3
4 If policy π i randomized and ue two or more action in ome tate, then the value of v π i an average of the value of performing each of the pure action in, and one of thee i the larget. Therefore we can replace the ditribution by a ingle action and only increae the value of the tate. In the remainder of the paper we will retrict ourelve to pure policie in which a ingle action i taken in each tate. In addition to the value vector, a policy π alo ha an aociated flux vector x π that will play a critical role in our analyi. It act a a kind of dicounted flow. Suppoe we tart with a ingle unit of ma on every tate and then run the Markov chain. At each time tep we remove 1 γ fraction of the ma on each tate and reditribute the remaining ma according to the policy π. Summing over all time tep, the total amount of ma that pae through each action i it flux. More formally, Definition 2.2. Let π be a policy and P π the n by n tranition matrix for π formed by the column P a for action in π. The flux vector x π i indexed by action. If action a i not in π then x π a = 0, and if π ue a in tate, then x π a = z, where z = i 0(γP π ) i 1 = (I γp π ) 1 1, (2) and 1 i the all one vector of dimenion n. The flux i the total dicounted number of time we ue each action if we tart the MDP in all tate and run the Markov chain P π dicounting by γ each iteration. Note that if a π then x π a 1, ince the initial flux placed on a tate alway pae through a. Further note that each bit of flux can be traced back to one of the initial unit of ma placed on each tate, although the vector x π um flux from all tate. Thi will be important in Section 4. Solving the MDP can be formulated a the following primal/dual pair of LP, in which the flux and value vector correpond to primal and (poibly infeaible) dual olution: Primal: maximize a r ax a ubject to S, a A x a = 1 + γ a P a,x a x 0 (3) Dual: minimize v ubject to S, a A, v r a + γ (4) P a, v The contraint matrix of (3) i equal to M γp, where M,a = 1 if action a can be ued in tate and 0 otherwie. The dual value LP (4) i often defined a the primal, a it i perhap more intuitive, and (3) i rarely conidered. However, our analyi center on the flux variable, and algorithm that manipulate policie can more naturally be een a moving through the polytope (3), ince vertice of the polytope repreent policie: Lemma 2.3. The LP (3) i non-degenerate, and there i a bijection between vertice of the polytope and policie of the MDP. Proof. Policie have exactly n nonzero variable, and olving for the flux vector in (2) i identical to olving for a bai in the polytope, o policie map to bae. Write the contraint in (3) in the 4
5 tandard matrix form Ax = b. The vector b i 1, and A = M γp. In a row of A the only poitive entrie are on action uable in tate, o if Ax = b, then x mut have a nonzero entry for every tate, i.e., a choice of action for every tate. Bae of the LP have n variable, o they mut include only one action per tate. Finally, a hown above x π a 1 for all a in a policy/bai, o the LP i not degenerate, and bae correpond to vertice. By Lemma 2.3, the implex method applied to (3) correpond to a imple, ingle-witch verion of policy iteration: we tart with an arbitrary policy, and in each iteration we change a ingle action that improve the value of ome tate. Since the LP i not degenerate, the implex method will find the optimal policy with no cycling. We will ue Dantzig mot-negative-reduced-cot pivoting rule to chooe the action witched. Since (3) i written a a maximization problem, we will refer to reduced cot a gain and alway chooe the highet gain action to witch/pivot. For MDP, the gain have a imple interpretation: Definition 2.4. The gain (or reduced cot) of an action a for tate with repect to a policy π i denoted r π a and i the improvement in the value of if ue action a once and then follow π for all time. Formally, r π a = (r a + γp T a v π ) v π, or, in vector form r π = r (M γp ) T v π. (5) We denote the optimal policy by π, and the optimal flux, value, and gain by x, v, and r. The following are baic propertie of the implex method, and we prove them for completene. Lemma 2.5. Let π and π be any policie. The gain atify the following propertie (r π ) T x π = r T x π r T x π = 1 T v π 1 T v π, r π a = 0 for all a π, and r a 0 for all a. Proof. From the definition of gain (r π ) T x π = (r (M γp ) T v π ) T x π = r T x π (v π ) T (M γp )x π = r T x π (v π ) T 1, uing that (M γp ) i the contraint matrix of (3). From the definition of value and flux vector r T x π = r T π (I γp π ) 1 1 = (v π ) T 1, where r π i the reward vector retricted to indice π. Combining thee two give the firt reult. For the econd reult, if a i in π, then v π = r a + γp T a v π, o r π a = 0. Finally, if r a > 0 for ome a, then conider the policy π that i identical to π but ue a. Then (r ) T x π > 0, and the firt identity prove that π i not optimal. A key property of the implex method on MDP that we will employ repeatedly i that not only i the overall objective improving, but alo the value of all tate are monotone non-decreaing, and there exit a ingle policy we denote by π that maximize the value of all tate: Lemma 2.6. Let π and π be policie appearing in an execution of the implex method with π being ued after π. Then v π v π. Further, let π be the policy when implex terminate, and π be any other policy. Then v v π. 5
6 Proof. Suppoe π and π are ubequent policie. The gain of all action in π with repect to π are equal to r π (I γp π ) T v π, all of which are nonnegative. Therefore 0 (I γp π ) T (r π (I γp π ) T )v π = v π v π, uing that (I γp π ) T = i 0 (γ(p π ) T ) i 0. By induction, thi hold if π and π occur further apart. Performing a imilar calculation uing the gain r, which are nonpoitive, how that v v π 0 for any policy π. 3 Uniform dicount A a warmup before delving into our analyi of determinitic MDP, we briefly review the analyi of [Ye11] for tochatic MDP with a fixed dicount. Conider the flux vector in Definition 2.2. One unit of flux i added to each tate, and every tep it i dicounted by a factor of γ, for a total of n(1 + γ + γ 2 + ) = n/(1 γ) flux overall. If π i the current policy and i the highet gain, then, by Lemma 2.5 the farthet π can be from π i if all n/(1 γ) unit of flux in π are on the action with gain, o r T x r T x π n /(1 γ). If we pivot on thi action, at leat 1 unit of flux i placed on the new action, increaing the objective by at leat. Thu we have reduced the gap to π by a 1 (1 γ)/n fraction, which i ubtantial if 1/(1 γ) i polynomial. Now conider r T x r T x π = (r ) T x π. All the term r ax π a are nonnegative, and for ome action a in π we have r ax π a (r ) T x π /n. The term r ax π a i at mot r an/(1 γ), o r a (r ) T x π /(n 2 /(1 γ)). But for any policy π that include a, (r ) T x π r ax π a r a, o after r T x r T x π ha hrunk by a factor of n 2 /(1 γ), action a cannot appear in any future policy, and thi occur after log 1 (1 γ)/n 1 γ n 2 ( ) n = O 1 γ log n 1 γ tep. See [Ye11] for the detail. The above reult hinged on the fact that the ize of all nonzero flux lay within the interval [1, n/(1 γ)], which wa aumed to be polynomial but give a weak bound if γ i very cloe to 1. However, conider a policy for a determinitic MDP. It can be een a a graph with a node for each tate with a ingle directed edge leaving each tate repreenting the action, o the graph conit of one or more directed cycle and directed path leading to thee cycle. Starting on a path, the MDP ue each path action once before reaching a cycle, o the flux on path mut be mall. Flux on the cycle may be ubtantially larger, but ince the MDP reviit each action after at mot n tep, the flux on cycle action varie by at mot a factor of n. Lemma 3.1. Let π be a policy with flux vector x π and a an action in π. If a i on a path in π then 1 x π a n, and if a i on a cycle then 1/(1 γ) x π a n/(1 γ). The total flux on path i at mot n 2, and the total flux on cycle i at mot n/(1 γ). Proof. All action have at leat 1 flux. If a i on a path, then tarting from any tate we can only ue a once and never return, contributing flux at mot 1 per tate, o x π a n. Summing over all path action, the total flux i at mot n 2. If a i on a cycle, each tate on the cycle contribute a total of 1/(1 γ) flux to the cycle. By ymmetry thi flux i ditributed evenly among action on the cycle, o x π a 1/(1 γ). The total flux in the MDP i n/(1 γ), o x π a n/(1 γ). The overall range of flux i large, but all value mut lie within one of two polynomial layer. We will prove that implex can eentially optimize each layer eparately. If a cycle i not updated, then 6
7 not much progre i made toward the optimum, but we make a ubtantial amount of progre in optimizing the path for the current cycle. When the path are optimal the algorithm i forced to update a cycle, at which point we make a ubtantial amount of progre toward the optimum but reet all progre on the path. Firt we analyze progre on the path: Lemma 3.2. Suppoe the implex method pivot from π to π, which doe not create a new cycle. Let π be the final policy uch that cycle in π are a ubet of thoe in π (i.e., the final policy before a new cycle i created). Then r T (x π x π ) (1 1/n 2 )r T (x π x π ). Proof. Let = max a r π a be the highet gain. Conider (r π ) T x π. Since cycle in π are contained in π, r π a = 0 for any action a on a cycle in π, and by Lemma 3.1, π ha at mot n 2 unit of flux on path, o (r π ) T x π = r T (x π x π ) n 2. Policy π ha at leat 1 unit of flux on the action with gain, o r T (x π x π ) r T (x π x π ) (1 1n ) 2 r T (x π x π ). Due to the polynomial contraction in the lemma above, not too many iteration can pa before a new cycle i formed. Lemma 3.3. Let π be a policy. After O(n 2 log n) iteration tarting from π, either the algorithm finihe, a new cycle i created, a cycle i broken, or ome action in π never appear in a policy again until a new cycle i created. Proof. Let π be the policy in ome iteration, π the lat policy before a new cycle i created, and π an arbitrary policy occurring between π and π in the algorithm. Policy π differ from π in action on path and poibly in cycle that exit in π but have been broken in π. By Lemma 2.5 (r π ) T x π = r T (x π x π ) = 1 T (v π v π ). We divide the analyi into two cae. Firt uppoe that there exit an action a ued in tate on a path uch that r π a x π a (r π ) T x π /n (note (r π ) T x π 0). Since a i on a path x π a n, which implie r π a n 2 (r π ) T x π. Now if policy π ue action a, then (r π ) T x π = 1 T (v π v π ) v π v π =v π (r a + γp a v π ) v π (r a + γp a v π ) = r π a (rπ ) π x π n 2, uing that the value of all tate are monotone increaing. In the econd cae there i no action a on a path in π atifying r π a x π a (r π ) T x π /n. The remaining portion of (r π ) T x π i due to cycle, o there mut be ome cycle C coniting of action {a 1,..., a k } ued in tate { 1,..., k } uch that a C rπ a x π a (r π ) T x π /n. All flux in C firt enter C either from a path ending at C or from the initial unit of flux placed on ome tate in C. If y 1 unit of flux firt enter C at tate in policy π, then that flux earn y (v π v π ) reward with repect to the reward r π, o a C rπ a x π a = C y (v π v π ). Moreover, each term v π v π i nonnegative, ince the value of all tate are nondecreaing. Now note that C (vπ v π ) = a C rπ a /(1 γ), and at mot n unit of flux enter each tate from outide. Therefore n a C rπ a /(1 γ) a C rπ a x π a, implying n 2 a C rπ a /(1 γ) (r π ) T x π. 7
8 A long a cycle C i intact, each a C ha 1/(1 γ) flux from tate in C (Lemma 3.1), o if C i in policy π then (r π ) T x π = 1 T (v π v π ) C v π v π = a C rπ a 1 γ (rπ ) T x π n 2. (6) Now if log n 2 /(n 2 1) n 2 iteration occur between π and π, Lemma 3.2 implie (r π ) T x π < (1 1n ) logn 2 /(n 2 1) n 2 2 (r π ) T x π (rπ ) T x π n 2. In the firt cae action a cannot appear in π, and in the econd cae cycle C mut be broken in π. Thi take log n 2 /(n 2 1) n 2 = O(n 2 log n) iteration if no new cycle interrupt the proce. Lemma 3.4. Either the algorithm finihe or a new cycle i created after O(n 2 m log n) iteration. Proof. Let π 0 be a policy after a new cycle i created, and conider the policie π 1, π 2,... each eparated by O(n 2 log n) iteration. If no new cycle i created, then by Lemma 3.3 each of thee policie π i ha either broken another cycle in π 0 or contain an action that cannot appear in π j for all j > i. There are at mot n cycle in π 0 and at mot m action that can be eliminated, o after (m + n)o(n 2 log n) = O(n 2 m log n) iteration, the algorithm mut terminate or create a new cycle. When a new cycle i formed, the algorithm make a ubtantial amount of progre toward the optimum but alo reet the path optimality above. Lemma 3.5. Let π and π be ubequent policie uch that π create a new cycle. Then r T (x x π ) (1 1/n)r T (x x π ). Proof. Let = max a r π a and a = argmax a r π a. There i a total of n/(1 γ) flux in the MDP, o r T x r T x π = (r π ) T x n/(1 γ). By Lemma 3.1, pivoting on a and creating a cycle will reult in at leat 1/(1 γ) flux through a. Therefore r T x π r T x π + /(1 γ), o r T (x x π ) r T (x x π ) ( 1 γ 1 1 ) r T (x x π ). n Lemma 3.6. Let π be a policy. Starting from π, after O(n log n) iteration in which a new cycle i created, ome action in π i either eliminated from cycle for the remainder of the algorithm or entirely eliminated from policie for the remainder of the algorithm. Proof. Conider a policy π with repect to the optimal gain r. There i an action a uch that r ax π a (r ) T x π /n. If a i on a path in π, then 1 x π a n, o r a (r ) T x π /n 2, and if a i on a cycle, then 1/(1 γ) x π a n/(1 γ), o r a/(1 γ) (r ) T x π /n 2. Since r are the gain for the optimal policy, r a 0 for all a. Therefore if π i any policy containing a, then r a r ax π a (r ) T x π, and if π i any policy containing a on a cycle, then r a/(1 γ) r ax π a (r ) T x π. Now by Lemma 3.5, if there are more than log n/(n 1) n 2 = O(n log n) new cycle created between policie π and π then ( (r ) T x π < 1 1 ) logn/(n 1) n 2 (r ) T x π = (r ) T x π n n 2. 8
9 Therefore if π contained a on a path, then a cannot appear in any policy after π for the remainder of the algorithm, and if π contained a on a cycle, then a cannot appear in a cycle after π (but may appear in a path) for the remainder of the algorithm. Theorem 3.7. The implex method converge in at mot O(n 3 m 2 log 2 n) iteration on determinitic MDP with uniform dicount uing the highet gain pivoting rule. Proof. Conider the policie π 0, π 1, π 2,... where O(n log n) new cycle have been created between π i and π i+1. By Lemma 3.6, each π i contain an action that i either eliminated entirely in π j for j > i or eliminated from cycle. Each action can be eliminated from cycle and path, o after 2m uch round of O(n log n) new cycle the algorithm ha converged. By Lemma 3.4 cycle are created every O(n 2 m log n) iteration, for a total of O(n 3 m 2 log 2 n) iteration. 4 Varying Dicount In thi ection we allow each action a to have a ditinct dicount γ a. Thi ignificantly complicate the proof of convergence ince the total flux i no longer fixed. When updating a cycle we can no longer bound the ditance to the optimum baed olely on the maximum gain, ince the optimal policy may employ action with maller gain to the current policy but ubtantially more flux. We are able to exhibit a et of layer in which the flux on cycle mut lie baed on the dicount of the action, and we will how that when a cycle i created in a particular layer we make progre toward the optimum value for the updated tate auming that it lie within that layer. Thee layer will define a et of bound whoe value we mut urpa, which erve a miletone or checkpoint to the optimum. When we update a cycle we cannot claim that the overall objective increae ubtantially but only that the value of individual tate make progre toward one of thee miletone value. When the value of all tate have urpaed each of thee intermediate miletone the algorithm will terminate. We firt define ome notation. Recall that to calculate flux we place one unit of ma in each tate and then run the Markov chain, o all flux trace back to ome tate, but x π aggregate all of it together. Becaue we will be concerned with analyzing the value of individual tate in thi ection, it will be ueful to eparate out the flux originating in a particular tate. Conider the following alternate LP: maximize ubject to r T x a A x a = 1 + a γ ap a, x a a A x a = a γ ap a, x a x 0 The LP (7) i identical to (3), except that initial flux i only added to tate rather than all tate, and the dual of (7) matche (4) if the objective in (4) i changed to minimize only v. Feaible olution in (7) meaure only flux originating in and contributing to v. For a tate and policy π we ue the notation x π, to denote the correponding vertex in (7). Note that x π = xπ,. The following lemma i analogou to Lemma 2.5 and ha an identical proof: Lemma 4.1. For a tate and for policie π and π, (r π ) T x π, = r T x π, r T x π, = v π v π. (7) 9
10 We now define the interval in which the flux mut lie. A in Section 3 flux on path i in [1, n]. Let C be a cycle in ome policy, and γ C = a C γ a be total dicount of C. We will prove that the mallet dicount in C determine the rough order of magnitude of the flux through C. Definition 4.2. Let C be a cycle and a an action in C, then the dicount of a dominate the dicount of C if γ a γ a for all a C. Lemma 4.3. Let π be a policy containing the cycle C with dicount dominated by γ a and total dicount γ C. Let be a tate on C, a the action ued in and a an arbitrary action in C, then x π, a = 1/(1 γ C ), γ C /(1 γ C ) x π, a 1/(1 γ C ), and 1/(n(1 γ a )) 1/(1 γ C ) 1/(1 γ a ). Proof. For the firt equality, all flux originate at, o the flux through a (ued in tate ) either jut originated in or came around the cycle from, implying x π, a = 1 + γ C x π, a. An analogou equation hold for all other action a on C, but now the initial flow from may have been dicounted by at mot γ C before reaching a, giving γ C /(1 γ C ) x π, a 1/(1 γ C ). The upper bound in the final inequality, 1/(1 γ C ) 1/(1 γ a ) hold ince a C (γ a dominate the dicount of C). For the lower bound, let l = 1 γ a. Then γ C γa n = (1 l) n 1 nl = 1 n(1 γ a ), implying 1/(1 γ C ) 1/(n(1 γ a )). Flux on path till fall in [1, n], o the algorithm behave the ame on path a it did in the uniform cae: Lemma 4.4. Either the algorithm finihe or a new cycle i created after O(n 2 m log n) iteration. Proof. Thi i identical to the proof of Lemma 3.4, which depend on Lemma 3.2 and 3.3. Lemma 3.2 hold for nonuniform dicount, and Lemma 3.3 hold after adjuting Equation (6) a follow (r π ) T x π C v π v π a C rπ a (rπ ) T x π 1 γ C n 2, uing that a C rπ a n/(1 γ C ) (r π ) T x π /n and Lemma 4.3. Now uppoe the implex method update the action for tate in policy π and create a cycle dominated by γ a. Again, v may not improve much, ince there may be a cycle with dicount much larger than γ a. However, in any policy π where i on a cycle dominated by γ a and ue ome action a, 1/(n(1 γ a )) x π, a 1/(1 γ a ), which allow u to argue v ha made progre toward the highet value achievable when it i on a cycle dominated by γ a, and after enough uch progre ha made, v will beat thi value and never again appear on any cycle dominated by γ a. The optimal value achievable for each tate on a cycle dominated by each γ a erve a the above-mentioned miletone. Since all cycle are dominated by ome γ a, there are m miletone per tate. Lemma 4.5. Suppoe the implex method move from π to π by updating the action for tate, creating a new cycle C with dicount dominated by γ a for ome a in π. Let π be the final policy ued by the implex method in which i in a cycle dominated by γ a. Then v π v π (1 1/n 2 )(v π v π ). 10
11 Proof. Let = max a r π a be the value of the highet gain with repect to π. Any cycle contain at mot n action, each of which ha gain at mot in r π, o if i on a cycle dominated by γ a in π then by Lemma 4.3 and Lemma 4.1, v π v π n /(1 γ a ), and ince π create a cycle dominated by γ a, by the ame lemma v π v π + /(n(1 γ a )). Combining the two, v π v π = (v π v π ) (v π v π ) (v π v π ) n(1 γ a ) (1 1n 2 ) (v π v π ). The following lemma i the crux of our analyi and allow u to eliminate action when we get cloe to a miletone value. Thi occur becaue the poitive gain mut hrink or ele the algorithm would urpa the miletone, and a the poitive gain hrink they can no longer balance larger negative gain, forcing uch action out of the cycle. Lemma 4.6. Suppoe policy π contain a cycle C with dicount dominated by γ a and i a tate in C. There i ome action a in C (depending on ) uch that after O(n 2 log n) iteration that change the action for and create a cycle with dicount dominated by γ a, action a will never again appear in a cycle dominated by γ a. Proof. Let π be a policy containing a cycle C with dicount dominated by γ a and a tate in C. Let π be another policy where i on a cycle dominated by γ a after at leat 1+log n 2 /(n 2 1) n 5 = O(n 2 log n) iteration that create uch a cycle by changing the action for and π the final policy ued by the algorithm in which i on a cycle dominated by γ a. Conider the policy ˆπ in the iteration immediately preceding π. By Lemma 4.5, and the choice of π, v π vˆπ ( 1 1 ) logn 2 /(n 2 1) n 5 n 2 (v π v π ) = 1 n 5 (vπ v π ), or equivalently v π v π n 5 (v π vˆπ ), implying v π vˆπ = (v π v π ) + (v π vˆπ ) ( n 5 + 1)(v π vˆπ ). (8) Since the gap v π vˆπ i large and negative, there mut be highly negative gain in rˆπ. By Lemma 4.1 v π vˆπ = (rˆπ ) T x π,. Let rˆπ a = min rˆπ a C a and be the tate uing a. By Lemma 4.3, x π, 1/(1 γ a ), and C ha at mot n tate, o applying Equation (8) rˆπ a 1 ( 1 γ a n (vπ vˆπ ) n ) (v π vˆπ ). (9) n The poitive entrie in rˆπ mut all be mall, ince there i only a mall increae in the value of. Let = max rˆπ. The algorithm pivot on the highet gain, and by aumption it update the action for and create a cycle dominated by γ a. By Lemma 4.3, the new action i ued at leat 1/(n(1 γ a )) time by flux from, ince it i the firt action in the cycle, o n(1 γ a ) vπ vˆπ v π vˆπ. (10) We prove that the highly negative rˆπ a cannot coexit with only mall poitive gain bounded by. Conider any policy in which i on a cycle C containing a (but not necearily containing ) with total gain γ C dominated by γ a. By Lemma 4.3, there i at leat 1/(1 γ C ) 1/(n(1 γ a )) 11
12 flux from going through a, and in the ret of the cycle there are at mot n 1 other action with at mot 1/(1 γ C ) 1/(1 γ a ) flux. The highet gain with repect to ˆπ i, o the value of v relative to rˆπ i at mot rˆπ a n(1 γ a ) + n 1 γ a = ( n 3 + 1n 2 ) (v π vˆπ ) + n 2 (v π vˆπ ) ( n 3 + 1n 2 + n2 ) (v π vˆπ ) < 0 uing Equation (9) and (10). But vˆπ = 0 relative to rˆπ, and it only increae in future iteration, o a cannot appear again in a cycle dominated by γ a. Lemma 4.7. For any action a, there are at mot O(n 3 m log n) iteration that create a cycle with dicount dominated by γ a. Proof. After O(n 3 log n) iteration that create a cycle dominated by γ a, ome tate mut have been updated in O(n 2 log n) of thoe iteration, o by Lemma 4.6 ome action will never appear again in a cycle dominated by γ a. After m repetition of thi proce all action have been eliminated. Theorem 4.8. Simplex terminate in at mot O(n 5 m 3 log 2 n) iteration on determinitic MDP with nonuniform dicount uing the highet gain pivoting rule. Proof. There are O(m) poible dicount γ a that can dominate a cycle, and by Lemma 4.7 there are at mot O(n 3 m log n) iteration creating a cycle dominated by any particular γ a, for a total of O(n 3 m 2 log n) iteration that create a cycle. By Lemma 4.4 a new cycle i created every O(n 2 m log n) iteration, for a total of O(n 5 m 3 log 2 n) iteration overall. 5 Open problem A difficult but natural next tep would be to try to extend thee technique to handle policy iteration on determinitic MDP. The main problem encountered i that the multiple imultaneou pivot ued in policy iteration can interfere with each other in uch a way that the algorithm effectively pivot on the mallet improving witch rather than the larget. See [HZ10] for uch an example. Another challenging open quetion i to deign a trongly polynomial algorithm for general MDP. Finally, we believe the technique of dividing variable value into polynomial ized layer may be helpful for entirely different problem. Acknowledgment. The author would like to thank Kazuhia Makino for pointing out an error in Lemma 3.3. Reference [Bel57] Richard E. Bellman. Dynamic Programming. Princeton Univerity Pre, [Ber96] Dimitri P. Berteka. Dynamic programming and optimal control. Athena Scientific,
13 [Fea10] [FHZ11] [Fri09] [Fri11] John Fearnley. Exponential lower bound for policy iteration. In Automata, Language and Programming, volume 6199 of Lecture Note in Computer Science, page Springer Berlin / Heidelberg, arxiv: v1, doi: / _46. 2 Oliver Friedmann, Thoma Dueholm Hanen, and Uri Zwick. Subexponential lower bound for randomized pivoting rule for the implex algorithm. In Proc. 43rd Sympoium on Theory of Computing, STOC 11, page ACM, doi: / Oliver Friedmann. An exponential lower bound for the parity game trategy improvement algorithm a we know it. In Proc. 24th Logic In Computer Science, LICS 09, page , arxiv: v1, doi: /lics Oliver Friedmann. A ubexponential lower bound for zadeh pivoting rule for olving linear program and game. In Integer Programming and Combinatoral Optimization, volume 6655 of Lecture Note in Computer Science, page Springer Berlin / Heidelberg, doi: / _16. 2 [HMZ11] Thoma Dueholm Hanen, Peter Bro Milteren, and Uri Zwick. Strategy iteration i trongly polynomial for 2-player turn-baed tochatic game with a contant dicount factor. In ICS, page , arxiv: v1. 2 [HN94] [How60] [HZ10] Dorit S. Hochbaum and Joeph (Seffi) Naor. Simple and fat algorithm for linear and integer program with two variable per inequality. SIAM Journal on Computing, 23:1179, doi: /s Ronald Howard. Dynamic programming and markov deciion procee. MIT, Cambridge, Thoma Hanen and Uri Zwick. Lower bound for howard algorithm for finding minimum mean-cot cycle. In Otfried Cheong, Kyung-Yong Chwa, and Kunoo Park, editor, Algorithm and Computation, volume 6506 of Lecture Note in Computer Science, page Springer Berlin / Heidelberg, doi: / _37. 2, 12 [LDK95] Michael L. Littman, Thoma L. Dean, and Lelie Pack Kaelbling. On the complexity of olving markov deciion problem. In Proc. 11th Uncertainty in Artificial Intelligence, UAI 95, page , Available from: [Mad02] [MC94] Omid Madani. On policy iteration a a newton method and polynomial policy iteration algorithm. In Proc. 18th National Conference on Artificial intelligence, page , Available from: 2 Mary Melekopoglou and Anne Condon. On the complexity of the policy improvement algorithm for markov deciion procee. ORSA Journal on Computing, 6(2): , doi: /ijoc
14 [MS99] Yihay Manour and Satinder Singh. On the complexity of policy iteration. In Proc. 15th Uncertainty in Artificial Intelligence, UAI 99, page , Available from: 2 [MTZ10] Omid Madani, Mikkel Thorup, and Uri Zwick. Dicounted determinitic markov deciion procee and dicounted all-pair hortet path. ACM Tranaction on Algorithm (TALG), 6(2):33:1 33:25, doi: / [PT87] Chrito Papadimitriou and John N. Titikli. The complexity of markov deciion procee. Mathematic of Operation Reearch, 12(3): , Augut doi: /moor [Put94] [Ye05] [Ye11] Martin L. Puterman. Markov Deciion Procee: Dicrete Stochatic Dynamic Programming. Wiley, New York, NY, USA, Yinyu Ye. A new complexity reult on olving the markov deciion problem. Mathematic of Operation Reearch, 30(3): , Augut doi: /moor Yinyu Ye. The implex and policy-iteration method are trongly polynomial for the markov deciion problem with a fixed dicount rate. Mathematic of Operation Reearch, 36(4): , November doi: /moor , 2, 6 14
Problem Set 8 Solutions
Deign and Analyi of Algorithm April 29, 2015 Maachuett Intitute of Technology 6.046J/18.410J Prof. Erik Demaine, Srini Devada, and Nancy Lynch Problem Set 8 Solution Problem Set 8 Solution Thi problem
More informationLecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004
18.997 Topic in Combinatorial Optimization April 29th, 2004 Lecture 21 Lecturer: Michel X. Goeman Scribe: Mohammad Mahdian 1 The Lovaz plitting-off lemma Lovaz plitting-off lemma tate the following. Theorem
More informationPreemptive scheduling on a small number of hierarchical machines
Available online at www.ciencedirect.com Information and Computation 06 (008) 60 619 www.elevier.com/locate/ic Preemptive cheduling on a mall number of hierarchical machine György Dóa a, Leah Eptein b,
More informationarxiv: v1 [math.mg] 25 Aug 2011
ABSORBING ANGLES, STEINER MINIMAL TREES, AND ANTIPODALITY HORST MARTINI, KONRAD J. SWANEPOEL, AND P. OLOFF DE WET arxiv:08.5046v [math.mg] 25 Aug 20 Abtract. We give a new proof that a tar {op i : i =,...,
More informationReinforcement Learning
Reinforcement Learning Yihay Manour Google Inc. & Tel-Aviv Univerity Outline Goal of Reinforcement Learning Mathematical Model (MDP) Planning Learning Current Reearch iue 2 Goal of Reinforcement Learning
More informationLecture 9: Shor s Algorithm
Quantum Computation (CMU 8-859BB, Fall 05) Lecture 9: Shor Algorithm October 7, 05 Lecturer: Ryan O Donnell Scribe: Sidhanth Mohanty Overview Let u recall the period finding problem that wa et up a a function
More informationTheoretical Computer Science. Optimal algorithms for online scheduling with bounded rearrangement at the end
Theoretical Computer Science 4 (0) 669 678 Content lit available at SciVere ScienceDirect Theoretical Computer Science journal homepage: www.elevier.com/locate/tc Optimal algorithm for online cheduling
More informationSocial Studies 201 Notes for November 14, 2003
1 Social Studie 201 Note for November 14, 2003 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the
More informationMulticolor Sunflowers
Multicolor Sunflower Dhruv Mubayi Lujia Wang October 19, 2017 Abtract A unflower i a collection of ditinct et uch that the interection of any two of them i the ame a the common interection C of all of
More informationList coloring hypergraphs
Lit coloring hypergraph Penny Haxell Jacque Vertraete Department of Combinatoric and Optimization Univerity of Waterloo Waterloo, Ontario, Canada pehaxell@uwaterloo.ca Department of Mathematic Univerity
More informationClustering Methods without Given Number of Clusters
Clutering Method without Given Number of Cluter Peng Xu, Fei Liu Introduction A we now, mean method i a very effective algorithm of clutering. It mot powerful feature i the calability and implicity. However,
More informationSocial Studies 201 Notes for March 18, 2005
1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the
More informationConvex Hulls of Curves Sam Burton
Convex Hull of Curve Sam Burton 1 Introduction Thi paper will primarily be concerned with determining the face of convex hull of curve of the form C = {(t, t a, t b ) t [ 1, 1]}, a < b N in R 3. We hall
More information3.1 The Revised Simplex Algorithm. 3 Computational considerations. Thus, we work with the following tableau. Basic observations = CARRY. ... m.
3 Computational conideration In what follow, we analyze the complexity of the Simplex algorithm more in detail For thi purpoe, we focu on the update proce in each iteration of thi procedure Clearly, ince,
More informationSOLUTIONS TO ALGEBRAIC GEOMETRY AND ARITHMETIC CURVES BY QING LIU. I will collect my solutions to some of the exercises in this book in this document.
SOLUTIONS TO ALGEBRAIC GEOMETRY AND ARITHMETIC CURVES BY QING LIU CİHAN BAHRAN I will collect my olution to ome of the exercie in thi book in thi document. Section 2.1 1. Let A = k[[t ]] be the ring of
More informationCodes Correcting Two Deletions
1 Code Correcting Two Deletion Ryan Gabry and Frederic Sala Spawar Sytem Center Univerity of California, Lo Angele ryan.gabry@navy.mil fredala@ucla.edu Abtract In thi work, we invetigate the problem of
More informationChapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover
Coping With NP-Hardne Chapter 12 Local Search Q Suppoe I need to olve an NP-hard problem What hould I do? A Theory ay you're unlikely to find poly-time algorithm Mut acrifice one of three deired feature
More informationLecture 7: Testing Distributions
CSE 5: Sublinear (and Streaming) Algorithm Spring 014 Lecture 7: Teting Ditribution April 1, 014 Lecturer: Paul Beame Scribe: Paul Beame 1 Teting Uniformity of Ditribution We return today to property teting
More informationElectronic Theses and Dissertations
Eat Tenneee State Univerity Digital Common @ Eat Tenneee State Univerity Electronic Thee and Diertation Student Work 5-208 Vector Partition Jennifer French Eat Tenneee State Univerity Follow thi and additional
More informationMemoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o
Memoryle Strategie in Concurrent Game with Reachability Objective Krihnendu Chatterjee, Luca de Alfaro and Thoma A. Henzinger Report No. UCB/CSD-5-1406 Augut 2005 Computer Science Diviion (EECS) Univerity
More informationComparing Means: t-tests for Two Independent Samples
Comparing ean: t-tet for Two Independent Sample Independent-eaure Deign t-tet for Two Independent Sample Allow reearcher to evaluate the mean difference between two population uing data from two eparate
More informationPrimitive Digraphs with the Largest Scrambling Index
Primitive Digraph with the Larget Scrambling Index Mahmud Akelbek, Steve Kirkl 1 Department of Mathematic Statitic, Univerity of Regina, Regina, Sakatchewan, Canada S4S 0A Abtract The crambling index of
More informationEC381/MN308 Probability and Some Statistics. Lecture 7 - Outline. Chapter Cumulative Distribution Function (CDF) Continuous Random Variables
EC38/MN38 Probability and Some Statitic Yanni Pachalidi yannip@bu.edu, http://ionia.bu.edu/ Lecture 7 - Outline. Continuou Random Variable Dept. of Manufacturing Engineering Dept. of Electrical and Computer
More informationAn Inequality for Nonnegative Matrices and the Inverse Eigenvalue Problem
An Inequality for Nonnegative Matrice and the Invere Eigenvalue Problem Robert Ream Program in Mathematical Science The Univerity of Texa at Dalla Box 83688, Richardon, Texa 7583-688 Abtract We preent
More informationON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang
Proceeding of the 2008 Winter Simulation Conference S. J. Maon, R. R. Hill, L. Mönch, O. Roe, T. Jefferon, J. W. Fowler ed. ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION Xiaoqun Wang
More informationAvoiding Forbidden Submatrices by Row Deletions
Avoiding Forbidden Submatrice by Row Deletion Sebatian Wernicke, Jochen Alber, Jen Gramm, Jiong Guo, and Rolf Niedermeier Wilhelm-Schickard-Intitut für Informatik, niverität Tübingen, Sand 13, D-72076
More informationHybrid Control and Switched Systems. Lecture #6 Reachability
Hbrid Control and Switched Stem Lecture #6 Reachabilit João P. Hepanha Univerit of California at Santa Barbara Summar Review of previou lecture Reachabilit tranition tem reachabilit algorithm backward
More information(b) Is the game below solvable by iterated strict dominance? Does it have a unique Nash equilibrium?
14.1 Final Exam Anwer all quetion. You have 3 hour in which to complete the exam. 1. (60 Minute 40 Point) Anwer each of the following ubquetion briefly. Pleae how your calculation and provide rough explanation
More informationSource slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis
Source lideplayer.com/fundamental of Analytical Chemitry, F.J. Holler, S.R.Crouch Chapter 6: Random Error in Chemical Analyi Random error are preent in every meaurement no matter how careful the experimenter.
More informationLecture 8: Period Finding: Simon s Problem over Z N
Quantum Computation (CMU 8-859BB, Fall 205) Lecture 8: Period Finding: Simon Problem over Z October 5, 205 Lecturer: John Wright Scribe: icola Rech Problem A mentioned previouly, period finding i a rephraing
More informationIEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation
IEOR 316: Fall 213, Profeor Whitt Topic for Dicuion: Tueday, November 19 Alternating Renewal Procee and The Renewal Equation 1 Alternating Renewal Procee An alternating renewal proce alternate between
More informationChapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog
Chapter Sampling and Quantization.1 Analog and Digital Signal In order to invetigate ampling and quantization, the difference between analog and digital ignal mut be undertood. Analog ignal conit of continuou
More informationc n b n 0. c k 0 x b n < 1 b k b n = 0. } of integers between 0 and b 1 such that x = b k. b k c k c k
1. Exitence Let x (0, 1). Define c k inductively. Suppoe c 1,..., c k 1 are already defined. We let c k be the leat integer uch that x k An eay proof by induction give that and for all k. Therefore c n
More informationChapter 4. The Laplace Transform Method
Chapter 4. The Laplace Tranform Method The Laplace Tranform i a tranformation, meaning that it change a function into a new function. Actually, it i a linear tranformation, becaue it convert a linear combination
More informationSuggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall
Beyond Significance Teting ( nd Edition), Rex B. Kline Suggeted Anwer To Exercie Chapter. The tatitic meaure variability among core at the cae level. In a normal ditribution, about 68% of the core fall
More information7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281
72 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 28 and i 2 Show how Euler formula (page 33) can then be ued to deduce the reult a ( a) 2 b 2 {e at co bt} {e at in bt} b ( a) 2 b 2 5 Under what condition
More informationALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS
SIAM J. COMPUT. Vol. 30, No. 1, pp. 191 217 c 2000 Society for Indutrial and Applied Mathematic ALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS JON KLEINBERG, YUVAL RABANI, AND ÉVA TARDOS Abtract. In thi paper,
More informationBogoliubov Transformation in Classical Mechanics
Bogoliubov Tranformation in Claical Mechanic Canonical Tranformation Suppoe we have a et of complex canonical variable, {a j }, and would like to conider another et of variable, {b }, b b ({a j }). How
More informationPhysics 741 Graduate Quantum Mechanics 1 Solutions to Final Exam, Fall 2014
Phyic 7 Graduate Quantum Mechanic Solution to inal Eam all 0 Each quetion i worth 5 point with point for each part marked eparately Some poibly ueful formula appear at the end of the tet In four dimenion
More informationOnline Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat
Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat Thi Online Appendix contain the proof of our reult for the undicounted limit dicued in Section 2 of the paper,
More informationNew bounds for Morse clusters
New bound for More cluter Tamá Vinkó Advanced Concept Team, European Space Agency, ESTEC Keplerlaan 1, 2201 AZ Noordwijk, The Netherland Tama.Vinko@ea.int and Arnold Neumaier Fakultät für Mathematik, Univerität
More informationTuning of High-Power Antenna Resonances by Appropriately Reactive Sources
Senor and Simulation Note Note 50 Augut 005 Tuning of High-Power Antenna Reonance by Appropriately Reactive Source Carl E. Baum Univerity of New Mexico Department of Electrical and Computer Engineering
More informationStochastic Perishable Inventory Control in a Service Facility System Maintaining Inventory for Service: Semi Markov Decision Problem
Stochatic Perihable Inventory Control in a Service Facility Sytem Maintaining Inventory for Service: Semi Markov Deciion Problem R.Mugeh 1,S.Krihnakumar 2, and C.Elango 3 1 mugehrengawamy@gmail.com 2 krihmathew@gmail.com
More informationMATEMATIK Datum: Tid: eftermiddag. A.Heintz Telefonvakt: Anders Martinsson Tel.:
MATEMATIK Datum: 20-08-25 Tid: eftermiddag GU, Chalmer Hjälpmedel: inga A.Heintz Telefonvakt: Ander Martinon Tel.: 073-07926. Löningar till tenta i ODE och matematik modellering, MMG5, MVE6. Define what
More informationAssignment for Mathematics for Economists Fall 2016
Due date: Mon. Nov. 1. Reading: CSZ, Ch. 5, Ch. 8.1 Aignment for Mathematic for Economit Fall 016 We now turn to finihing our coverage of concavity/convexity. There are two part: Jenen inequality for concave/convex
More informationA Study on Simulating Convolutional Codes and Turbo Codes
A Study on Simulating Convolutional Code and Turbo Code Final Report By Daniel Chang July 27, 2001 Advior: Dr. P. Kinman Executive Summary Thi project include the deign of imulation of everal convolutional
More informationFast Convolutional Sparse Coding (FCSC)
Fat Convolutional Spare Coding (FCSC) Bailey ong Department of Computer Science Univerity of California, Irvine bhkong@ic.uci.edu Charle C. Fowlke Department of Computer Science Univerity of California,
More informationRELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS
www.arpapre.com/volume/vol29iue1/ijrras_29_1_01.pdf RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS Sevcan Demir Atalay 1,* & Özge Elmataş Gültekin
More informationList Coloring Graphs
Lit Coloring Graph February 6, 004 LIST COLORINGS AND CHOICE NUMBER Thomaen Long Grotzch girth 5 verion Thomaen Long Let G be a connected planar graph of girth at leat 5. Let A be a et of vertice in G
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
Stochatic Optimization with Inequality Contraint Uing Simultaneou Perturbation and Penalty Function I-Jeng Wang* and Jame C. Spall** The John Hopkin Univerity Applied Phyic Laboratory 11100 John Hopkin
More informationSTOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND
OPERATIONS RESEARCH AND DECISIONS No. 4 203 DOI: 0.5277/ord30402 Marcin ANHOLCER STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND The generalized tranportation problem
More informationCHAPTER 6. Estimation
CHAPTER 6 Etimation Definition. Statitical inference i the procedure by which we reach a concluion about a population on the bai of information contained in a ample drawn from that population. Definition.
More informationA relationship between generalized Davenport-Schinzel sequences and interval chains
A relationhip between generalized Davenport-Schinzel equence and interval chain The MIT Faculty ha made thi article openly available. Pleae hare how thi acce benefit you. Your tory matter. Citation A Publihed
More informationChip-firing game and a partial Tutte polynomial for Eulerian digraphs
Chip-firing game and a partial Tutte polynomial for Eulerian digraph Kévin Perrot Aix Mareille Univerité, CNRS, LIF UMR 7279 3288 Mareille cedex 9, France. kevin.perrot@lif.univ-mr.fr Trung Van Pham Intitut
More informationSecretary problems with competing employers
Secretary problem with competing employer Nicole Immorlica 1, Robert Kleinberg 2, and Mohammad Mahdian 1 1 Microoft Reearch, One Microoft Way, Redmond, WA. {nickle,mahdian}@microoft.com 2 UC Berkeley Computer
More informationNonlinear Single-Particle Dynamics in High Energy Accelerators
Nonlinear Single-Particle Dynamic in High Energy Accelerator Part 6: Canonical Perturbation Theory Nonlinear Single-Particle Dynamic in High Energy Accelerator Thi coure conit of eight lecture: 1. Introduction
More informationUnavoidable Cycles in Polynomial-Based Time-Invariant LDPC Convolutional Codes
European Wirele, April 7-9,, Vienna, Autria ISBN 978--87-4-9 VE VERLAG GMBH Unavoidable Cycle in Polynomial-Baed Time-Invariant LPC Convolutional Code Hua Zhou and Norbert Goertz Intitute of Telecommunication
More informationFlag-transitive non-symmetric 2-designs with (r, λ) = 1 and alternating socle
Flag-tranitive non-ymmetric -deign with (r, λ = 1 and alternating ocle Shenglin Zhou, Yajie Wang School of Mathematic South China Univerity of Technology Guangzhou, Guangdong 510640, P. R. China lzhou@cut.edu.cn
More informationCompetitive Analysis of Task Scheduling Algorithms on a Fault-Prone Machine and the Impact of Resource Augmentation
Competitive Analyi of Tak Scheduling Algorithm on a Fault-Prone Machine and the Impact of Reource Augmentation Antonio Fernández Anta a, Chryi Georgiou b, Dariuz R. Kowalki c, Elli Zavou a,d,1 a Intitute
More informationA Full-Newton Step Primal-Dual Interior Point Algorithm for Linear Complementarity Problems *
ISSN 76-7659, England, UK Journal of Information and Computing Science Vol 5, No,, pp 35-33 A Full-Newton Step Primal-Dual Interior Point Algorithm for Linear Complementarity Problem * Lipu Zhang and Yinghong
More informationCONGESTION control is a key functionality in modern
IEEE TRANSACTIONS ON INFORMATION TEORY, VOL. X, NO. X, XXXXXXX 2008 On the Connection-Level Stability of Congetion-Controlled Communication Network Xiaojun Lin, Member, IEEE, Ne B. Shroff, Fellow, IEEE,
More informationCOHOMOLOGY AS A LOCAL-TO-GLOBAL BRIDGE
COHOMOLOGY AS A LOCAL-TO-GLOBAL BRIDGE LIVIU I. NICOLAESCU ABSTRACT. I dicu low dimenional incarnation of cohomology and illutrate how baic cohomological principle lead to a proof of Sperner lemma. CONTENTS.
More informationCS 170: Midterm Exam II University of California at Berkeley Department of Electrical Engineering and Computer Sciences Computer Science Division
1 1 April 000 Demmel / Shewchuk CS 170: Midterm Exam II Univerity of California at Berkeley Department of Electrical Engineering and Computer Science Computer Science Diviion hi i a cloed book, cloed calculator,
More informationCHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS
CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.1 INTRODUCTION 8.2 REDUCED ORDER MODEL DESIGN FOR LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.3
More informationA Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems
A Contraint Propagation Algorithm for Determining the Stability Margin of Linear Parameter Circuit and Sytem Lubomir Kolev and Simona Filipova-Petrakieva Abtract The paper addree the tability margin aement
More informationSymmetric Determinantal Representation of Formulas and Weakly Skew Circuits
Contemporary Mathematic Symmetric Determinantal Repreentation of Formula and Weakly Skew Circuit Bruno Grenet, Erich L. Kaltofen, Pacal Koiran, and Natacha Portier Abtract. We deploy algebraic complexity
More informationOptimal Strategies for Utility from Terminal Wealth with General Bid and Ask Prices
http://doi.org/10.1007/00245-018-9550-5 Optimal Strategie for Utility from Terminal Wealth with General Bid and Ak Price Tomaz Rogala 1 Lukaz Stettner 2 The Author 2018 Abtract In the paper we tudy utility
More informationMoment of Inertia of an Equilateral Triangle with Pivot at one Vertex
oment of nertia of an Equilateral Triangle with Pivot at one Vertex There are two wa (at leat) to derive the expreion f an equilateral triangle that i rotated about one vertex, and ll how ou both here.
More informationControl Systems Analysis and Design by the Root-Locus Method
6 Control Sytem Analyi and Deign by the Root-Locu Method 6 1 INTRODUCTION The baic characteritic of the tranient repone of a cloed-loop ytem i cloely related to the location of the cloed-loop pole. If
More informationTRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL
GLASNIK MATEMATIČKI Vol. 38583, 73 84 TRIPLE SOLUTIONS FOR THE ONE-DIMENSIONAL p-laplacian Haihen Lü, Donal O Regan and Ravi P. Agarwal Academy of Mathematic and Sytem Science, Beijing, China, National
More informationSettling the Complexity of 2-Player Nash-Equilibrium
Electronic Colloquium on Computational Complexity, Report No. 140 (2005) Settling the Complexity of 2-Player Nah-Equilibrium Xi Chen Department of Computer Science Tinghua Univerity Beijing, P.R.China
More informationPerformance Bounds for Constrained Linear Min-Max Control
01 European Control Conference (ECC) July 1-19, 01, Zürich, Switzerland. Performance Bound for Contrained Linear Min-Max Control Tyler H. Summer and Paul J. Goulart Abtract Thi paper propoe a method to
More informationSome Sets of GCF ϵ Expansions Whose Parameter ϵ Fetch the Marginal Value
Journal of Mathematical Reearch with Application May, 205, Vol 35, No 3, pp 256 262 DOI:03770/jin:2095-26520503002 Http://jmredluteducn Some Set of GCF ϵ Expanion Whoe Parameter ϵ Fetch the Marginal Value
More informationarxiv: v4 [math.co] 21 Sep 2014
ASYMPTOTIC IMPROVEMENT OF THE SUNFLOWER BOUND arxiv:408.367v4 [math.co] 2 Sep 204 JUNICHIRO FUKUYAMA Abtract. A unflower with a core Y i a family B of et uch that U U Y for each two different element U
More informationSingular perturbation theory
Singular perturbation theory Marc R. Rouel June 21, 2004 1 Introduction When we apply the teady-tate approximation (SSA) in chemical kinetic, we typically argue that ome of the intermediate are highly
More informationThe Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount
The Simplex and Policy Iteration Methods are Strongly Polynomial for the Markov Decision Problem with Fixed Discount Yinyu Ye Department of Management Science and Engineering and Institute of Computational
More informationA BATCH-ARRIVAL QUEUE WITH MULTIPLE SERVERS AND FUZZY PARAMETERS: PARAMETRIC PROGRAMMING APPROACH
Mathematical and Computational Application Vol. 11 No. pp. 181-191 006. Aociation for Scientific Reearch A BATCH-ARRIVA QEE WITH MTIPE SERVERS AND FZZY PARAMETERS: PARAMETRIC PROGRAMMING APPROACH Jau-Chuan
More informationarxiv: v2 [math.nt] 30 Apr 2015
A THEOREM FOR DISTINCT ZEROS OF L-FUNCTIONS École Normale Supérieure arxiv:54.6556v [math.nt] 3 Apr 5 943 Cachan November 9, 7 Abtract In thi paper, we etablih a imple criterion for two L-function L and
More informationSociology 376 Exam 1 Spring 2011 Prof Montgomery
Sociology 76 Exam Spring Prof Montgomery Anwer all quetion. 6 point poible. You may be time-contrained, o pleae allocate your time carefully. [HINT: Somewhere on thi exam, it may be ueful to know that
More informationTechnical Appendix: Auxiliary Results and Proofs
A Technical Appendix: Auxiliary Reult and Proof Lemma A. The following propertie hold for q (j) = F r [c + ( ( )) ] de- ned in Lemma. (i) q (j) >, 8 (; ]; (ii) R q (j)d = ( ) q (j) + R q (j)d ; (iii) R
More informationOn the Stability Region of Congestion Control
On the Stability Region of Congetion Control Xiaojun Lin and Ne B. Shroff School of Electrical and Computer Engineering Purdue Univerity, Wet Lafayette, IN 47906 {linx,hroff}@ecn.purdue.edu Abtract It
More informationStochastic Neoclassical Growth Model
Stochatic Neoclaical Growth Model Michael Bar May 22, 28 Content Introduction 2 2 Stochatic NGM 2 3 Productivity Proce 4 3. Mean........................................ 5 3.2 Variance......................................
More informationLinear Motion, Speed & Velocity
Add Important Linear Motion, Speed & Velocity Page: 136 Linear Motion, Speed & Velocity NGSS Standard: N/A MA Curriculum Framework (006): 1.1, 1. AP Phyic 1 Learning Objective: 3.A.1.1, 3.A.1.3 Knowledge/Undertanding
More informationOptimal Coordination of Samples in Business Surveys
Paper preented at the ICES-III, June 8-, 007, Montreal, Quebec, Canada Optimal Coordination of Sample in Buine Survey enka Mach, Ioana Şchiopu-Kratina, Philip T Rei, Jean-Marc Fillion Statitic Canada New
More informationProblem 1. Construct a filtered probability space on which a Brownian motion W and an adapted process X are defined and such that
Stochatic Calculu Example heet 4 - Lent 5 Michael Tehranchi Problem. Contruct a filtered probability pace on which a Brownian motion W and an adapted proce X are defined and uch that dx t = X t t dt +
More informationMulticast Network Coding and Field Sizes
Multicat Network Coding and Field Size Qifu (Tyler) Sun, Xunrui Yin, Zongpeng Li, and Keping Long Intitute of Advanced Networking Technology and New Service, Univerity of Science and Technology Beijing,
More informationarxiv: v2 [math.co] 11 Sep 2017
The maximum number of clique in graph without long cycle Ruth Luo September 13, 017 arxiv:1701.0747v [math.co] 11 Sep 017 Abtract The Erdő Gallai Theorem tate that for k 3 every graph on n vertice with
More informationON A CERTAIN FAMILY OF QUARTIC THUE EQUATIONS WITH THREE PARAMETERS. Volker Ziegler Technische Universität Graz, Austria
GLASNIK MATEMATIČKI Vol. 1(61)(006), 9 30 ON A CERTAIN FAMILY OF QUARTIC THUE EQUATIONS WITH THREE PARAMETERS Volker Ziegler Techniche Univerität Graz, Autria Abtract. We conider the parameterized Thue
More informationPlanning and Learning with Stochastic Action Sets
Planning and Learning with Stochatic Action Set Craig Boutilier, Alon Cohen, Avinatan Haidim, Yihay Manour, Ofer Mehi, Martin Mladenov and Dale Schuurman Google Reearch {cboutilier,aloncohen,avinatan,manour,mehi,chuurman}@google.com
More information2 Hatad, Jukna & Pudlak gate, namely we hall tudy the ize of depth-three circuit. The technique we hall ue ha two ource. The rt one i a \nite" verion
TOP-DOWN LOWER BOUNDS FOR DEPTH-THREE CIRCUITS J. Hatad, S. Jukna and P. Pudlak Abtract. We preent a top-down lower bound method for depth-three ^ _ :-circuit which i impler than the previou method and
More informationLINEAR ALGEBRA METHOD IN COMBINATORICS. Theorem 1.1 (Oddtown theorem). In a town of n citizens, no more than n clubs can be formed under the rules
LINEAR ALGEBRA METHOD IN COMBINATORICS 1 Warming-up example Theorem 11 (Oddtown theorem) In a town of n citizen, no more tha club can be formed under the rule each club have an odd number of member each
More informationZ a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is
M09_BERE8380_12_OM_C09.QD 2/21/11 3:44 PM Page 1 9.6 The Power of a Tet 9.6 The Power of a Tet 1 Section 9.1 defined Type I and Type II error and their aociated rik. Recall that a repreent the probability
More informationEcient Parallel Algorithms for Computing All Pair. University of Kentucky. Duke University
Ecient Parallel Algorithm for Computing All Pair Shortet Path in Directed Graph 1 Yijie Han 2 * Victor Y. Pan 3 ** John H. Reif*** *Department of Computer Science Univerity of Kentucky Lexington, KY 40506
More informationTHE DIVERGENCE-FREE JACOBIAN CONJECTURE IN DIMENSION TWO
THE DIVERGENCE-FREE JACOBIAN CONJECTURE IN DIMENSION TWO J. W. NEUBERGER Abtract. A pecial cae, called the divergence-free cae, of the Jacobian Conjecture in dimenion two i proved. Thi note outline an
More informationRolling horizon procedures in Semi-Markov Games: The Discounted Case
Rolling horizon procedure in Semi-Markov Game: The Dicounted Cae Eugenio Della Vecchia, Silvia C. Di Marco, Alain Jean-Marie To cite thi verion: Eugenio Della Vecchia, Silvia C. Di Marco, Alain Jean-Marie.
More informationOPTIMAL STOPPING FOR SHEPP S URN WITH RISK AVERSION
OPTIMAL STOPPING FOR SHEPP S URN WITH RISK AVERSION ROBERT CHEN 1, ILIE GRIGORESCU 1 AND MIN KANG 2 Abtract. An (m, p) urn contain m ball of value 1 and p ball of value +1. A player tart with fortune k
More informationPerformance Evaluation
Performance Evaluation 95 (206) 40 Content lit available at ScienceDirect Performance Evaluation journal homepage: www.elevier.com/locate/peva Optimal cheduling in call center with a callback option Benjamin
More informationMath 273 Solutions to Review Problems for Exam 1
Math 7 Solution to Review Problem for Exam True or Fale? Circle ONE anwer for each Hint: For effective tudy, explain why if true and give a counterexample if fale (a) T or F : If a b and b c, then a c
More informationThe Impact of Imperfect Scheduling on Cross-Layer Rate. Control in Multihop Wireless Networks
The mpact of mperfect Scheduling on Cro-Layer Rate Control in Multihop Wirele Network Xiaojun Lin and Ne B. Shroff Center for Wirele Sytem and Application (CWSA) School of Electrical and Computer Engineering,
More informationONLINE APPENDIX: TESTABLE IMPLICATIONS OF TRANSLATION INVARIANCE AND HOMOTHETICITY: VARIATIONAL, MAXMIN, CARA AND CRRA PREFERENCES
ONLINE APPENDIX: TESTABLE IMPLICATIONS OF TRANSLATION INVARIANCE AND HOMOTHETICITY: VARIATIONAL, MAXMIN, CARA AND CRRA PREFERENCES CHRISTOPHER P. CHAMBERS, FEDERICO ECHENIQUE, AND KOTA SAITO In thi online
More information