Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Worst Case and Average Case Behavior of the Simplex Algorithm

Steven R. Dunbar Department of Mathematics 203 Avery Hall University of Nebrasa-Lincoln Lincoln, NE 68588-030 http://www.math.unl.edu Voice: 402-472-373 Fax: 402-472-8466 Topics in Probability Theory and Stochastic Processes Steven R. Dunbar Worst Case and Average Case Behavior of the Simplex Algorithm Rating Mathematicians Only: prolonged scenes of intense rigor.

Question of the Day. What would be the implications of nowing that worst-case running time of the Simplex Algorithm is exponential in the problem size? 2. What would nowing the average running time of the Simplex Algorithm tell us about actual practice? Key Concepts. 2. 3. Vocabulary. 2. 3. 2

Mathematical Ideas The Klee-Minty Example In 972, Klee and Minty, [3] considered the linear optimization problem: max y n subject to 0 y ɛy y j ɛy, y j 0, where ɛ (0, /2). The change of variables transforms the problem to j =,..., n j = 2,..., n x = y, x j = (y j ɛy )/ɛ, j = 2,..., n max n x subject to 2 x + x j (/ɛ), which is now in canonical form. Proof. Note that the change of variables x j 0, j n. x = y, x j = (y j ɛy )/ɛ is equivalent to the inverse change of variables y = x, y j = ɛ j x. j n This may be seen by writing each linear change of variables in matrix form and verifying the corresponding matrices are inverses of each other. Alternatively, substitute the expression for y j in terms of the x into the expression for x j in terms of y j. Obtain x j = (ɛ j x ɛ ɛ j 2 )/ɛ = x j 3

verifying the inverse relation. Then the objective function max y n becomes max ɛ n n x which is equivalent to the rescaled max n x. The constraint ɛy y j ɛy becomes equivalent to ɛ ɛ j 2 x ɛ This can be rewritten as j x ɛ ɛ j 2 x. 0 ɛ x j 2ɛ x. Since all variables are required to be positive, the inequality 0 ɛ j x j + 2ɛ j is automatic, so the constraint becomes x ɛ jx j + 2ɛ x. This proof is the solution of Problem 8.5, page 427 of Bazaraa, Jarvis, Sherali, []. If we pivot using Dantzig s rule that the element in the vector of modified costs which is most negative determines the entering variable j, then the Simplex Algorithm will go through 2 n pivots on this example. The fairly straightforward but tedious proof is in []. The idea is that for this canonical problem the Simplex Algorithm systematically traces through the 2 n vertices of a feasible set which is a distorted cube, first the 2 n vertices on the face in the y n = 0 plane, then subsequently through the 2 n vertices in the opposite face. Following [2], another formulation of the Klee-Minty example due to Chvátal is achieved with the change of variables x = y, x j = ( /ɛ)y + 4

y j. When ɛ = /0 the problem becomes n max 0 n x subject to 2 0 j x + x j 00, j n x j 0, j n. For example, the n = 3 case is max 00x + 0x 2 + x 3 subject to x 20x + x 2 00 200x + 20x 2 + x 3 0,000 x, x 2, x 3 0 which is quite similar to the first linear optimization problem with a change of scale. This formulation maes the distortion of the unit cube more evident. The initial basic feasible solution is (0, 0, 0,, 00, 0,000). According to Bazaraa, Jarvis and Sherali, [], in 973 Jeroslow showed the existence of problem classes that tae an exponential number of pivots with the maximum improvement entering criterion as well. According to Karloff [2], Chvátal proved in 978 that the Simplex Algorithm requires exponentially many pivots in the worst case if Bland s pivoting selection rule is used. The root of the problem is in the local viewpoint of the Simplex Algorithm whereby decisions are based on the local combinatorial structure of the polytope and the motion is restricted to an edge path. According to Spielman and Teng [5], almost all existing pivot rules are nown to have exponential worst-case complexity. Empirical Results Karloff, [2], claims that in practice for the Simplex Algorithm the number of pivots seems to be between 4m and 6m for Phase I and Phase II together. Rarely does either phase require more than 0m pivots. As n grows, the number of pivots seems to grow slowly, perhaps logarithmically with n. 5

Bazaraa, Jarvis and Sherali, [], claim that the Simplex Algorithm has been empirically observed to tae roughly 3m/2 iterations and seldom more than 3m iterations. By regression analysis, the performance of the Simplex Algorithm is described by Km 2.5 nd 0.33 where d is the density of the matrix A. The Mathworld.com Simplex Method. article says the simplex method is very efficient in practice, generally taing 2m to 3m iterations at most (where m is the number of equality constraints), citing references. All these results point to the observation that practically speaing, not only does the Simplex Algorithm run in polynomial time, it runs in a small multiple of the number of constraints! The question now naturally arises as to whether there is an explanation for that practical effective running time in spite of the fact that exceptional examples actually run in an exponential power of the number of variables. A Marov Chain Model of the Simplex Algorithm Consider the following linear optimization problem in standard form: min cx, subject to Ax = b, x 0. A is an m n matrix of real numbers, c = (c,..., c n ) and b = (b,..., b m ) are constant vectors of real numbers, and x = (x,..., x n ) is the n vector of non-negative values that is to be determined to minimize the linear objective cx = n j= c jx j. We now from the elementary theory of linear optimization that the minimum occurs at an extreme point of the feasible region Ax = b, x 0. Under the natural assumptions that n > m and that the matrix A is full ran, we can find feasible extreme point solutions of the constraint set by setting n m of the entries in x to be 0. Note that there can be as many as ( ) n N = m possible extreme points. 6

A Heuristic Marov Chain Model Here is a simple Marov Chain model for how the Simplex Algorithm moves from extreme point to extreme point on the feasible region. Suppose that if at some time the algorithm is at the ith best extreme point, then after the next pivot operation of the simplex Algorithm the resulting extreme point is equally liely to be any of the remaining i best. This is a probability model that covers all pivoting rules. That is, we model the transition from extreme point to subsequent extreme point as a Marov chain for which P = and P ij =, j =,..., ; < i N and naturally, P ij = 0 for all other indices. Let T i denote the number of transitions needed to go from state i to state. Obtain a recursive formula for E [T i ] by conditioning on the initial transformation: E [T i ] i E [T j ] Note that E [T ] = 0. Then recursively, E [T 2 ] (0) =, E [T 3 ] 2 (0 + ) 2, j= E [T 4 ] (0 + + + /2) /2 + /3, 3 E [T 5 ] (0 + + + /2 + + /2 + /3) /2 + /3 + /4. 4 Then we can inductively guess and prove that where H n = n E [T i ] = i = H i is the nth harmonic number. Proof. Use the strong induction hypothesis that E [T j ] = H for all j < i. 7

Then E [T i ] i E [T j ] j= i H j= i 2 H j j= i 2 j j= i 2 i 2 j= ( i 2 i 2 i 2 + ( i 2 + + + i 2 + =2 =2 j= i 2 =2 i 2 =2 i 2 =2 ) ) i ( + i ) = H i From well-nown inequalities bounding H i, (Mathworld.com Harmonic Number.) we observe that log() + γ + 2i < E [T i] < log() + γ + 8 2()

From that we can deduce that E [T i ] log() in general. Therefore E [T N ] log(n ) log(n) m[ + log(n/m )] by Stirling s approximation applied to ( n m) (see below for more derivation) as an asymptotic bound on the expected value for the number of pivot operations required. Note that this asymptotic result is roughly consistent with Karloff s observation above. Approximate Distribution of Number of Pivots We can obtain an approximate distribution for T N for large N using the Central Limit Theorem. Continue the assumption that if at some time the algorithm is at the ith best extreme point, then after the next pivot operation of the Simplex Algorithm the resulting extreme point is equally liely to be any of the remaining best. Let {, if the Simplex Algorithm ever enters state j I j = 0 otherwise be an indicator random variable. Then T N = N I. Proposition. I,..., I N are independent and P [I j = ] = /j for j N. Proof. Let j be fixed. Given I j+... I N, let = min{i : i > j, I i = } indicate the lowest numbered state, greater than j, that is entered. That is, we now that the process enters state, and that the next state to be entered is any of the states, 2,..., j. Hence, as the next state to be entered from state is equally liely to be any of the lower number states, 2,..., we see that /( ) P [I j = I j+,..., I N ] = j/( ) = /j Hence P [I j = ] = /j and independence follows since the preceding conditional probability does not depend on I j+,... I N. Corollary.. E [I j ] = /j, 2. E [T N ] = N / = H N, 3. Var [T N ] = N (/)( /), 9

4. For N large, T N has approximately a normal distribution with mean log(n) and variance log(n). Proof.. Automatic, since P [I j = ] = /j. 2. Automatic by the representation T N = N I. 3. Automatic by the representation T N = N I and the independence of the I J. 4. Follows from the Central Limit Theorem (basic version with independent random variables having finite variance) and the standard fact that log(n) < N / < + log(n ). Note that if n, m and n m are all large, then by Stirling s approximation we have ( ) n n n+/2 N = m (n m) (n m)+/2 m m+/2 2π. Therefore log(n) or (m n m + /2 ) log(m(n/m)) (m(n/m ) + /2) log(m(n/m )) (m + /2) log(m) ( ( ) ) n n/m log(n) m m log + log(n/m ). n/m Now since lim x x log(x/(x )) =, it follows that log(n) m [ + log(n/m )]. 0

Sources The information on the Klee-Minty example and the empirical observations of the number of pivots in practice is adapted from Bazaraa, Jarvis and Sherali, [] and Karloff, [2]. The Marov Chain Model of the Simplex Algorithm is slightly adapted from Ross s Introduction to Probability Models, [4]. Problems to Wor for Understanding Reading Suggestion: References [] Mohtar S. Bazaraa, John J. Jarvis, and Hanif. D. Sherali. Linear Programming and Networ Flows. Wiley Interscience, 3rd edition, 2005. [2] Howard Karloff. Linear Programming. Birhäuser, 99. Linear Optimization. [3] Victor Klee and G. J. Minty. How good is the simplex algorithm. In O. Shisha, editor, Inequalities - III, pages 59 75. Academic Press, 972. [4] Sheldon M. Ross. Introduction to Probability Models. Academic Press, 9th edition edition, 2006. [5] Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis: An attempt to explain the behavior of algorithms in practice. Communications of the ACM, 52(0):77 84, October 2009.

Outside Readings and Lins:. 2. 3. 4. I chec all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I mae every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your ris. I have checed the lins to external sites for usefulness. Lins to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don t guarantee that the lins are active at all times. Use the lins here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer. Information on this website is subject to change without notice. Steve Dunbar s Home Page, http://www.math.unl.edu/~sdunbar Email to Steve Dunbar, sdunbar at unl dot edu Last modified: Processed from L A TEX source on January 20, 20 2