On the von Neumann and Frank-Wolfe Algorithms with Away Steps

Size: px
Start display at page:

Download "On the von Neumann and Frank-Wolfe Algorithms with Away Steps"

Transcription

1 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili July 16, 015 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine whether the origin belongs to a polytope generated by a finite set of points. When the origin is in the interior of the polytope, the algorithm generates a sequence of points in the polytope that converges linearly to zero. The algorithm s rate of convergence depends on the radius of the largest ball around the origin contained in the polytope. We show that under the weaker condition that the origin is in the polytope, possibly on its boundary, a variant of the von Neumann algorithm that includes away steps generates a sequence of points in the polytope that converges linearly to zero. The new algorithm s rate of convergence depends on a certain geometric parameter of the polytope that extends the above radius but is always positive. Our linear convergence result and geometric insights also extend to a variant of the Frank-Wolfe algorithm with away steps for minimizing a strongly convex function over a polytope. Tepper School of Business, Carnegie Mellon University, USA, jfp@andrew.cmu.edu Department of Mathematical Sciences, Carnegie Mellon University, USA, drod@cmu.edu College of Business Administration, University of Illinois at Chicago, USA, nazad@uic.edu 1

2 1 Introduction Assume A = [ a 1 a n ] R m n with a i = 1, i = 1,..., n. The von Neumann algorithm, communicated by von Neumann to Dantzig in the late 1940s, is a simple algorithm to solve the feasibility problem: Is 0 conv(a) = conv{a 1,..., a n }? More precisely, the algorithm finds an approximate solution to the problem Ax = 0, x n 1 = {x R n + : x 1 = 1}. (1) The algorithm starts from an arbitrary point x 0 n. At the k-th iteration the algorithm updates the current trial solution x k n 1 as follows. First, if finds the column a j of A that forms the widest angle with y k := Ax k. If this angle is acute, i.e., A T y k > 0, then the algorithm halts as the vector y k separates the origin from conv(a). Otherwise the algorithm chooses x k+1 n 1 so that Ax k+1 is the minimum-norm convex combination of Ax k and a j. Let e j n 1 denote the n-dimensional vector with j-th component equal to one and all other components equal to zero. To ease notation, we shall write for throughout the paper. Von Neumann Algorithm 1. pick x 0 n 1 ; put y 0 := Ax 0 ; k := 0.. for k = 0, 1,,... if A T y k > 0 then HALT: 0 conv(a) j := argmin a i, y k ; i=1,...,n θ k := argmin θ [0,1] { y k + θ(a j y k ) }; x k+1 := (1 θ k )x k + θ k e j ; y k+1 := (1 θ k )y k + θ k a j ; end for The von Neumann algorithm can be seen as a kind of coordinatedescent method for finding a solution to (1): At each iteration the algorithm judiciously selects a coordinate j and increases the weight of the j-th component of x k while decreasing all of the others via a linesearch step. Like other currently popular coordinate-descent and firstorder methods for convex optimization, the main attractive features of the von Neumann algorithm are its simplicity and low computational cost per iteration. Another attractive feature is its convergence rate. Epelman and Freund [6] showed that the speed of convergence of the von Neumann algorithm can be characterized in terms of the following condition measure of the matrix A: ρ(a) := max y R m : y =1 min a i, y. () i=1,...,n

3 The condition measure ρ(a) was introduced by Goffin [8] and later independently studied by Cheung and Cucker [3]. The latter set of authors showed that ρ(a) is also a certain distance to ill-posedness in the spirit introduced and developed by Renegar [15, 16]. Observe that ρ(a) > 0 if and only if 0 conv(a), and ρ(a) < 0 if and only if 0 int(conv(a)). When ρ(a) > 0, this condition measure is closely related to the concept of margin in binary classification [19] and with the minimum enclosing ball problem in computational geometry [5]. The quantity ρ(a) also has the following geometric interpretation. If ρ(a) > 0 then and if ρ(a) 0 then ρ(a) = min{ y : y conv(a)}, (3) ρ(a) = max{r : y r y conv(a)}. (4) In particular, ρ(a) = dist(0, conv(a)). Epelman and Freund [6] showed the following properties of the von Neumann algorithm. When ρ(a) < 0 the algorithm generates iterates x k n 1, k = 1,,... such that Ax k ( 1 ρ(a) ) k Ax0. (5) On the other hand, the iterates x k n 1 also satisfy Ax k 1 k as long as the algorithm has not halted. In particular, if ρ(a) > 0 then by (3) the algorithm must halt with a certificate of infeasibility A T 1 y k > 0 for 0 conv(a) in at most ρ(a) iterations. The latter bound is identical to a classical convergence bound for the perceptron algorithm [, 14]. This is not a coincidence as there is a nice duality between the perceptron and the von Neumann algorithms [13, 17]. We show that a variant of the von Neumann algorithm with away steps has the following stronger convergence properties. When 0 conv(a), possibly on its boundary, the algorithm generates a sequence x k n 1 satisfying Ax k ) k/ (1 w(a) Ax 0. (6) 16 The quantity w(a) is a kind of relative width of conv(a) that is at least as large as ρ(a). However, unlike ρ(a) the relative width w(a) is positive for any non-zero matrix A R m n provided 0 conv(a). When ρ(a) > 0, or equivalently 0 conv(a), the von Neumann algorithm with away steps finds a certificate of infeasibility A T y k > 0 for 8 0 conv(a) in at most ρ(a) iterations. 3

4 We show that a linear convergence result similar to (6) also holds for a version of the Frank-Wolfe algorithm with away steps for minimizing a strongly convex function with a Lipschitz gradient over a polytope. These linear convergence results are in the same spirit as the results established in [9, 10, 11] as well as some linear convergence results for the randomized Kaczmarz algorithm [18] and for the methods of randomized coordinate descent and iterated projections [1]. Our main contributions are the succinct and transparent proofs of these linear convergence results that highlight the role of the relative width w(a) and a closely related restricted width ϱ(a). Our presentation unveils a deep connection between problem conditioning as encompassed by the quantities w(a), ϱ(a) and the behavior of the von Neumann and Frank-Wolfe algorithms with away steps. We also provide some lower bounds on w(a) and ϱ(a) in terms of certain radii quantities that naturally extend ρ(a). We note that the linear convergence results in [11] are stated in terms of a certain pyramidal width whose geometric intuition and properties appear to be less understood than those of w(a) and ϱ(a). We also note that during the review process of this manuscript we also became aware of the related and independent work of Beck and Shtern [1]. In contrast to our geometric approach, the approach followed by Beck and Shtern is primarily founded on convex duality. The rest of the paper is organized as follows. In Section we describe a von Neumann Algorithm with Away Steps and establish its main convergence result in terms of the relative width w(a). Section 3 extends our main result to the more general problem of minimizing a quadratic function over the polytope conv(a). Section 4 presents the same ideas for more general strongly convex functions with Lipschitz gradient. Finally, Section 5 discusses some properties of the relative and restricted widths. Von Neumann Algorithm with Away Steps Throughout this section we assume A = [ ] a 1 a n R m n with a i = 1, i = 1,..., n. We next consider a variant of the von Neumann Algorithm that includes so-called away steps. To that end, at each iteration, in addition to a regular step the algorithm considers an alternative away step. Each of these away steps identifies l such that the l-th component of x k is positive and decreases the weight of the l-th component of x k. The algorithm needs to keep track of the support, that is, the set of positive entries of a vector. To that end, 4

5 given x R n +, let the support of x be defined as S(x) := {i {1,..., n} : x i > 0}. Von Neumann Algorithm with Away Steps 1. pick x 0 n 1 ; put y 0 := Ax 0 ; k := 0;.. for k = 0, 1,,... if A T y k > 0 then HALT: 0 conv(a) j := argmin a i, y k ; l := argmax a i, y k ; i=1,...,n i S(x k ) if y k a j, y k > a l, y k y k then (regular step) a := a j y k ; u := e j x k ; θ max := 1 else (away step) a := y k a l ; u := x k e l ; θ max := endif θ k := argmin { y k + θa }; θ [0,θ max] y k+1 = y k + θ k a; x k+1 := x k + θ k u; end for (x k) l 1 (x k ) l Define the relative width w(a) of conv(a) as { } Ax, w(a) := min max al a j : l S(x), j {1,..., n}. x 0,Ax 0 l,j Ax (7) It is easy to show that w(a) ρ(a) when 0 conv(a). In Section 5 below we discuss some properties of w(a). In particular, we will formally prove the intuitively clear property that w(a) > 0 for any nonzero matrix A R m n such that 0 conv(a). We are now ready to state the main properties of the von Neumann algorithm with away steps. Theorem 1 Assume x 0 n 1 is one of the extreme points of n 1. (a) If 0 conv(a) then the iterates x k n 1, y k = Ax k, k = 0, 1,... generated by the von Neumann Algorithm with Away Steps satisfy y k ) k/ (1 w(a) y (b) The iterates x k n 1, y k = Ax k, k = 1,... generated by the von Neumann Algorithm with Away Steps also satisfy y k 8 k 5

6 as long as the algorithm has not halted. In particular, if 0 conv(a) then the von Neumann Algorithm with Away Steps finds a certificate of infeasibility A T y k > 0 for 0 conv(a) in at most iterations. 8 ρ(a) The crux of the proof of Theorem 1 is the following elementary lemma. Lemma 1 Assume a, y R m satisfy a, y < 0. Then min y + θ 0 θa = y a, y a, and the minimum is attained at θ = a,y a. Proof of Theorem 1: (a) The algorithm generates y k+1 by solving a problem of the form y k+1 = min y k + θa θ [0,θ max] where a = a j y k or a = y k a l, and a, y k > 1 a l a j, y k 1 w(a) y k. If θ k < θ max then Lemma 1 applied to y := y k yields y k+1 = y k a, y k a y k w(a) 16 y k. Thus each time the algorithm performs an iterate with θ k < θ max, the value of y k decreases at least by the factor 1 w(a) 16. To conclude, it suffices to show that after N iterations the number of iterates where θ k < θ max is at least N/. To that end, we apply the following argument from [11]: Observe that when θ k = θ max we have S(x k+1 ) < S(x k ). On the other hand, when θ k < θ max we have S(x k+1 ) S(x k ) +1. Since S(x 0 ) = 1 and S(x) 1 for every x n 1, after any number of iterates there must have been at least as many iterates with θ k < θ max as there have been iterates with θ k = θ max. Hence after N iterations, the number of iterates with θ k < θ max is at least N/. (b) Proceed as above but note that if the algorithm does not halt at the k-th iterate then a, y k a j y k, y k y k. Thus each time the algorithm performs an iterate with θ k < θ max, we have y k+1 y k a, y k a y k y k

7 It follows by induction that if the algorithm has not halted after k iterations then we must have y k 8 k. If 0 conv(a) then ρ(a) = min{ y : y conv(a)} > 0 and so the algorithm must halt with a certificate of infeasibility A T y k > 8 0 for 0 conv(a) after at most ρ(a) iterations. 3 Frank-Wolfe Algorithm with Away Steps for Quadratic Functions Throughout this section assume A = [ a 1 a n ] R m n is a nonzero matrix, and f(y) = 1 y, Qy + b, y for a symmetric positive definite matrix Q R m m and b R m. Consider the problem min f(y) min f(ax). (8) y conv(a) x n 1 Problem (1) can be seen as a special case of (8) when Q = I and b = 0. The von Neumann Algorithm can also be seen as a special case of the Frank-Wolfe Algorithm [7] for (8). This section extends the ideas and results from Section to the following variant of the Frank-Wolfe algorithm with away steps. We note that this variant can be traced back to Wolfe [0] as discussed by Guélat and Marcotte [9]. Frank-Wolfe Algorithm with Away Steps 1. pick x 0 n 1 ; put y 0 := Ax 0 ; k := 0;.. for k = 0, 1,,... j := argmin i=1,...,n a i, f(y k ) ; l := argmax i S(x k ) a i, f(y k ) ; if y k a j, f(y k ) > a l y k, f(y k ) then (regular step) a := a j y k ; u := e j x k ; θ max := 1 else (away step) a := y k a l ; u := x k e l ; θ max := endif θ k := argmin f(y k + θa) θ [0,θ max] y k+1 = y k + θ k a; x k+1 := x k + θ k u end for (x k) l 1 (x k ) l Observe that the computation of θ k in the second to last step reduces to minimizing a one-dimensional convex quadratic function over the interval [0, θ max ]. 7

8 We next present a general version of Theorem 1 for the above Frank- Wolfe Algorithm with Away Steps. The linear convergence result depends on a certain restricted width and diameter defined as follows. For x 0 with Ax 0 let ϱ(a, x) := { sup λ > 0 : u, v n 1, S(u) S(x), Au Av = λ } Ax Ax. Define the restricted width ϱ(a) and diameter d(a) of conv(a) as follows. ϱ(a) := min {ϱ(a, x) : x 0, Ax 0}, (9) x and d(a) := max Ax Au. (10) u,x n 1 It is immediate from (7) and (9) that w(a) ϱ(a) for all nonzero A R m n. Furthermore, the restricted width ϱ(a) can be seen as an extension of the radius ρ(a) defined in (). Indeed, when 0 int(conv(a)), we have span(a) = R m. Hence (4) can alternatively be written as { ρ(a) := min max λ : v n 1, Av = λ } x 0:Ax 0 Ax Ax. This implies that ϱ(a, x) ρ(a) + Ax x 1 for all x 0 with Ax 0. Hence the following inequality readily follows ϱ(a) ρ(a). Section 5 presents a stronger lower bound on ϱ(a) in terms of certain variants of ρ(a). In particular, we will show that ϱ(a) > 0, and consequently w(a) > 0, for any nonzero matrix A R m n such that 0 conv(a). The linear convergence property of the von Neumann algorithm with away steps, as stated in Theorem 1(a), extends as follows. Theorem Assume x n 1 is a minimizer of (8). Let y = Ax and Ā := [ Q1/ a 1 y a n y ]. If x 0 n 1 is one of the extreme points of n 1 then the iterates x k n 1, y k = Ax k generated by the Frank-Wolfe Algorithm with Away Steps satisfy ) k/ f(y k ) f(y ) (1 ϱ(ā) (f(y 0 ) f(y )). (11) 4d(Ā) The proof of Theorem relies on the following two lemmas. The first one is similar to Lemma 1 and also follows via a straightforward calculation. 8

9 Lemma Assume f is as above and a, y R m satisfy a, f(y) < 0. Then a, f(y) min f(y + θa) = f(y) θ 0 a, Qa, and the minimum is attained at θ = a, f(y) a,qa. Lemma 3 Assume f, A, y, Ā are as in Theorem above. Then for all x n 1 max f(ax), a l a j ϱ(ā) (f(ax) f(y )). l S(x),j=1,...,n Proof: Let y := Ax conv(a). Assume y y as otherwise there is nothing to show. For ease of notation put y y Q := y y, Q(y y ). It readily follows that so f(y) + f(y), y y + 1 y y Q = f(y ) f(y) f(y ) = f(y), y y 1 y y Q f(y), y y y y Q where the last step follows from the inequality a + b + ab 0. Thus f(y), y y y y Q (f(y) f(y )). (1) On the other hand, by the definition of ϱ(a) there exist u, v n 1 with S(u) S(x) and λ ϱ(ā) such that Āu Āv = λ Āx Āx. Since Āx = Q 1/ (Ax y ) = Q 1/ (y y ), the latter equation can be rewritten as λ Au Av = y y (y y ). (13) Q Putting (1) and (13) together we get f(y), Au Av = λ f(y), y y y y Q ϱ(ā) (f(y) f(y )). To finish, observe that max f(ax), a l a j f(y), Au Av l S(x),j=1,...,n ϱ(ā) (f(ax) f(y )). 9

10 Proof of Theorem : This is a modification of the proof of Theorem 1(a). At iteration k the algorithm yields y k+1 such that f(y k+1 ) = where a = a j y k or a = y k a l, and min f(y k + θa) θ [0,θ max] f(y k ), a > 1 f(y k), a l a j 1 ϱ(ā) (f(y k ) f(y ). The second inequality above follows from Lemma 3. If θ k < θ max then Lemma applied to y := y k yields f(y k+1 ) = f(y k ) a, f(y k) a, Qa f(y k ) ϱ(ā) 4d(Ā) (f(y k) f(y )). That is, ) f(y k+1 ) f(y ) (1 ϱ(ā) (f(y k ) f(y )). 4d(Ā) Then proceeding as in the last part of the proof of Theorem 1(a) we obtain (11). In the special case when Q = I, b = 0, 0 conv(a), and all columns of A have norm one, we have d(a) and the minimizer y of (8) is 0. Thus Theorem yields a weaker version of Theorem 1(a) with w(a) replaced with ϱ(a) w(a). Conversely, a closer look at the proof of Theorem reveals that the convergence bound (11) can be sharpened as follows: Replace ϱ(ā) with w f (A) ϱ(ā), where w f (A) is the following extension of w(a): w f (A) := min x n 1 Ax y max l,j { } f(ax), a l a j (f(ax) f(y )) : l S(x), j {1,..., n}. We have the following related conjecture concerning w(a) and ϱ(a). Conjecture 1 If A R m n is non-zero and 0 conv(a) then ϱ(a) = w(a). The next result shows that the ratio ϱ(ā) in (11) can be bounded d(ā) below in terms of a product of the ratio of the smallest to largest eigenvalue of Q and a second factor that depends only on conv(ã) for à := [ a 1 y a n y ]. We omit the proof as it is a straightforward matrix algebra calculation. 10

11 Proposition 1 Assume x n 1 is a minimizer of (8). Let y = Ax, à := [ a 1 y a n y ], and Ā := Q1/ Ã. Let µ, L be respectively the smallest and largest eigenvalues of Q. Then ϱ(ā) µϱ( Ã) and d(ā) Ld(Ã) = Ld(A). In particular ϱ(ā) µ d(ā) L ϱ(ã) µ d(ã) = L ϱ(ã) d(a). As we discuss in the next section, this results readily extends to the more general problem when f is a strongly convex function with Lipschitz gradient. We discuss that in the next section. 4 Frank-Wolfe Algorithm with Away Steps for Strongly Convex Functions with Lipschitz Gradient We next consider a more general version of the problem (8) where f is a µ-strongly convex and f is a L-Lipschitz function. Theorem 3 Assume f is µ-strongly convex and f is L-Lipschitz. Assume x n 1 is a minimizer of (8). If x 0 n 1 is one of the extreme points of n 1 then the iterates x k n 1, y k = Ax k generated by the Frank-Wolfe Algorithm with Away Steps satisfy where f(y k ) f(y ) (1 w f (A) ) k/ 4Ld(A) (f(y 0 ) f(y )) (14) w f (A) := min x n 1 Ax y max l,j { } f(ax), a l a j (f(ax) f(y )) : l S(x), j {1,..., n}. Furthermore, the above parameter w f (A) satisfies w f (A) µϱ(ã) for à = A y. Proof: Since f is convex and f is L-Lipschitz, we have f(y) f(y k ) + f(y k ), y y k + L y y k. 11

12 Hence proceeding as in Theorem, it follows that if θ k θ max then for either a = a j y k or a = y k a l we have f(y k+1 ) f(y k ) f(y k), a L a f(y k ) f(y k), a l a j /4 L a f(y k ) w f (A) 4Ld(A) (f(y k) f(y )). Therefore, again as in the proof of Theorem, it follows that f(y k ) f(y ) (1 w f (A) ) k/ 4Ld(A) (f(y 0 ) f(y )). We next show the bound w f (A) µϱ(ã). Since f is µ-strongly convex, f(y) + f(y), y y + µ y y f(y ). Thus, the inequality a + b + ab 0 yields f(y) f(y ) f(y), y y µ y y. Hence from the construction of ϱ(a) we get f(y), a l a j f(y), y y y y ϱ(ã) µ(f(y) f(y ))ϱ(ã). Observe that in a nice analogy to the bound in Proposition 1, we readily get the following lower bound on the ratio w f (A) Ld(A) appearing in (14): w f (A) µ Ld(A) L ϱ(ã) d(a). 5 Some properties of the restricted width Throughout this section assume A R m n is a nonzero matrix. As we noted in Section 3 above, when 0 int(conv(a)) it follows that ϱ(a) ρ(a). Our next result establish a stronger lower bound on ϱ(a) in terms of some quantities that generalize ρ(a) to the case when 0 conv(a). To that end, we recall some terminology and results from [4]. Assume A = [ a 1 a n ] R m n is a non-zero matrix. 1

13 Then there exists a unique partition B N = {1,..., n} such that both A B x B = 0, x B > 0 and A T N y > 0, AT By = 0 are feasible. In particular, B if and only if 0 conv(a). Also, if a i = 0 then i B. The above canonical partition (B, N) allows us to refine the quantity ρ(a) defined by () as follows. Let L := span(a B ) and L := {v R m : v, y = 0 for all y L}. By convention, L = {0} and L = R m when B =. If L {0}, let ρ B (A) be defined as ρ B (A) := max min a i, y. y L, y =1 i B Observe that if B, then L = {0} only when a i = 0 for all i B. If N, let ρ N (A) be defined as ρ N (A) := max y L, y =1 min a i, y. i N When L {0}, it can be shown [4] that ρ B (A) < 0. Likewise, when N it can be shown that ρ N (A) > 0. In particular, the latter implies that ρ N (A) := max min a i, y = max min a i, y, (15) i N i N y L, y =1 y L, y 1 where a i is the orthogonal projection of a i onto L. Let A N denote the matrix obtained by projecting each of the columns of A N onto L. From (15) and Lagrangian duality it follows that ρ N (A) = min{ y : y conv(a N)}. (16) Similarly, it can be shown that if L {0} then ρ B (A) = max{r : y L, y r y conv(a B )}. (17) Observe that (16) and (17) nicely extend (3) and (4). Indeed, (16) is identical to (3) when B =. Likewise, (17) is identical to (4) when N =. Furthermore, (16) and (17) imply that ρ N (A) = dist(0, conv(a N )) and ρ B (A) = dist L (0, conv(a B )) thereby extending the fact that ρ(a) = dist(0, conv(a)). The next result shows that ϱ(a) can be bounded below in terms of ρ B (A) and ρ N (A). In particular, it shows that ϱ(a) > 0 whenever A 0 and 0 conv(a). Theorem 4 Assume A = [ a 1 ] a n R m n is a nonzero matrix. (a) If N = then L {0} and ϱ(a) ρ B (A). (b) If B = then ϱ(ā) ρ N(A) for Ā := [ A 0 ]. (c) If B and L = {0} then ϱ(a) ρ N (A). 13

14 (d) If N and L {0} then ϱ(a) Proof: A = max a i. i=1,...,n ρ B(A) ρ N (A) A + ρ N (A), where (a) Assume x 0 is such that y := Ax 0. In this case y span(a B ) = L. Hence L {0} and by (17) there exists v n 1 and r ρ B (A) such that Av = r Ax Ax. Thus for u := x x 1 we have u, v n 1, S(u) S(x) and Au Av = ( ) r + Ax 1 Ax x 1 Ax Ax. It follows that ϱ(a, x) r+ x 1 > ρ B (A). [ x (b) Assume x := 0 is such that y := Ā x = Ax 0. From (16) t] [ x ] it follows that Ax x 1 ρ N (A). Thus for u := x 1, v := e n+1 0 we have u, v n, S(u) S( x) and follows that ϱ(ā, x) Ax x 1 Ax 1 Āu Āv = x 1 Ax Ax. It ρ N (A). (c) Since B and L = {0}, it follows that A B = 0 and the columns of A N are precisely the non-zero columns of A. Thus from part (b) we get ϱ ([ A N 0 ]) ρ N (A). To finish, observe that ϱ(a) = ϱ( [ A N 0 ] ) because A B = 0. (d) Assume x 0 is such that y := Ax 0. Let L := span(a B ) and decompose y = y L + y where y = A N x N L and y L = A B x B + (A N A N )x N L. Put r := y y [0, 1]. Assume r > 0 as otherwise y = y L span(a B ) and the statement holds with the better bound ϱ(a) ρ B (A) by proceeding exactly as in part (a). Since r > 0, we have x N 0. Put r N := y x N 1. From (16) it follows that r N ρ N (A). Next, put ( v := 1 x N (AN 1 A N )x ) N y L. Observe that v max a i i N a i + y L A + r N 1 r and v L. Hence by (17) x N 1 r there exists x B 0, x B 1 = 1 such that A B x B = cv, where ρ B (A) r c := r A + r N 1 r (0, 1). Taking x N := A N x N A B x B = c x N 1 x N we get c x N 1 (y + y L ) = ρ B (A) r N r A + r N 1 r y y. Thus letting u := (1 c)x + (0, x N ), v = ( x B, 0) we get u, v 14

15 n 1, S(u) S(x) and ( ) ρ B (A) r N Ax Au Av = (1 c) Ax + r A + r N 1 r Ax. (18) Next, observe that (1 c) Ax + ρ B (A) r N r A + r N 1 r ρ B (A) r N r A + r N 1 r ρ B (A) r N A + rn ρ B (A) ρ N (A) A + ρ N (A). (19) The first inequality above follows because c (0, 1), the second one follows from max r [0,1] ( r A + r N 1 r ) = A + r N, and the third one follows from r N ρ N (A). Putting (18) and (19) together we get ϱ(a, x) ρ B(A) ρ N (A) A + ρ N (A). References [1] A. Beck and S. Shtern. Linearly convergent away-step conditional gradient for non-strongly convex functions. Technical report, Technical Report, Faculty of Industrial Engineering and Management, Technion, 015. [] H. D. Block. The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:13 135, 196. [3] D. Cheung and F. Cucker. A new condition number for linear programming. Math. Prog., 91(): , 001. [4] D. Cheung, F. Cucker, and J. Peña. On strata of degenerate polyhedral cones I: Condition and distance to strata. Eur. J. Oper. Res., 19(198):3 8, 009. [5] K. Clarkson. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63, 010. [6] M. Epelman and R. M. Freund. Condition number complexity of an elementary algorithm for computing a reliable solution of a conic linear system. Math. Program., 88(3): ,

16 [7] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Quarterly, 3:95 110, [8] J. Goffin. The relaxation method for solving systems of linear inequalities. Math. Oper. Res., 5: , [9] J. Guélat and P. Marcotte. Some comments on Wolfe s away step. Math. Program., 35: , [10] M. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In ICML, volume 8 of JMLR Proceedings, pages , 013. [11] S. Lacoste-Julien and M. Jaggi. An affine invariant linear convergence analysis for Frank-Wolfe algorithms. In Advances in Neural Information Processing Systems (NIPS), 013. [1] D. Leventhal and A. Lewis. Randomized methods for linear constraints: Convergence rates and conditioning. Math. Oper. Res., 35: , 010. [13] D. Li and T. Terlaky. The duality between the perceptron algorithm and the von Neumann algorithm. In Modeling and Optimization: Theory and Applications (MOPTA) Conference, 013. [14] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615 6, 196. [15] J. Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM J. on Optim., 5:506 54, [16] J. Renegar. Linear programming, complexity theory and elementary functional analysis. Math. Program., 70:79 351, [17] N. Soheili and J. Peña. A primal dual smooth perceptron von Neumann algorithm. In Discrete Geometry and Optimization, pages Springer, 013. [18] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl., 15:6 5, 009. [19] V. Vapnik. Statistical Learning Theory. Wiley, [0] P. Wolfe. Convergence theory in nonlinear programming. In Integer and Nonlinear Programming. North-Holland, Amsterdam,

arxiv: v3 [math.oc] 25 Nov 2015

arxiv: v3 [math.oc] 25 Nov 2015 arxiv:1507.04073v3 [math.oc] 5 Nov 015 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili October 14, 018 Abstract The von Neumann algorithm is a simple

More information

Some preconditioners for systems of linear inequalities

Some preconditioners for systems of linear inequalities Some preconditioners for systems of linear inequalities Javier Peña Vera oshchina Negar Soheili June 0, 03 Abstract We show that a combination of two simple preprocessing steps would generally improve

More information

A Deterministic Rescaled Perceptron Algorithm

A Deterministic Rescaled Perceptron Algorithm A Deterministic Rescaled Perceptron Algorithm Javier Peña Negar Soheili June 5, 03 Abstract The perceptron algorithm is a simple iterative procedure for finding a point in a convex cone. At each iteration,

More information

Polytope conditioning and linear convergence of the Frank-Wolfe algorithm

Polytope conditioning and linear convergence of the Frank-Wolfe algorithm Polytope conditioning and linear convergence of the Frank-Wolfe algorithm Javier Peña Daniel Rodríguez December 24, 206 Abstract It is known that the gradient descent algorithm converges linearly when

More information

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins arxiv:1406.5311v1 [math.oc] 20 Jun 2014 Aaditya Ramdas Machine Learning Department Carnegie Mellon University aramdas@cs.cmu.edu

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

A data-independent distance to infeasibility for linear conic systems

A data-independent distance to infeasibility for linear conic systems A data-independent distance to infeasibility for linear conic systems Javier Peña Vera Roshchina May 29, 2018 Abstract We offer a unified treatment of distinct measures of well-posedness for homogeneous

More information

Solving Conic Systems via Projection and Rescaling

Solving Conic Systems via Projection and Rescaling Solving Conic Systems via Projection and Rescaling Javier Peña Negar Soheili December 26, 2016 Abstract We propose a simple projection and rescaling algorithm to solve the feasibility problem find x L

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

An algorithm to compute the Hoffman constant of a system of linear constraints

An algorithm to compute the Hoffman constant of a system of linear constraints An algorithm to compute the Hoffman constant of a system of linear constraints arxiv:804.0848v [math.oc] 23 Apr 208 Javier Peña Juan Vera Luis F. Zuluaga April 24, 208 Abstract We propose a combinatorial

More information

Pairwise Away Steps for the Frank-Wolfe Algorithm

Pairwise Away Steps for the Frank-Wolfe Algorithm Pairwise Away Steps for the Frank-Wolfe Algorithm Héctor Allende Department of Informatics Universidad Federico Santa María, Chile hallende@inf.utfsm.cl Ricardo Ñanculef Department of Informatics Universidad

More information

Rescaling Algorithms for Linear Programming Part I: Conic feasibility

Rescaling Algorithms for Linear Programming Part I: Conic feasibility Rescaling Algorithms for Linear Programming Part I: Conic feasibility Daniel Dadush dadush@cwi.nl László A. Végh l.vegh@lse.ac.uk Giacomo Zambelli g.zambelli@lse.ac.uk Abstract We propose simple polynomial-time

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

A Polynomial Column-wise Rescaling von Neumann Algorithm

A Polynomial Column-wise Rescaling von Neumann Algorithm A Polynomial Column-wise Rescaling von Neumann Algorithm Dan Li Department of Industrial and Systems Engineering, Lehigh University, USA Cornelis Roos Department of Information Systems and Algorithms,

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c

More information

Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures

Journal of Complexity. On strata of degenerate polyhedral cones, II: Relations between condition measures Journal of Complexity 26 (200) 209 226 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco On strata of degenerate polyhedral cones, II: Relations

More information

DISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of

DISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of repper SCHOOL OF BUSINESS DISSERTATION Submitted in partial fulfillment ofthe requirements for the degree of DOCTOR OF PHILOSOPHY INDUSTRIAL ADMINISTRATION (OPERATIONS RESEARCH) Titled "ELEMENTARY ALGORITHMS

More information

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function

A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function A full-newton step infeasible interior-point algorithm for linear programming based on a kernel function Zhongyi Liu, Wenyu Sun Abstract This paper proposes an infeasible interior-point algorithm with

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

The speed of Shor s R-algorithm

The speed of Shor s R-algorithm IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University

More information

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms

An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms Dan Li and Tamás Terlaky Department of Industrial and Systems Engineering, Lehigh University, USA ISE Technical

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Duality revisited. Javier Peña Convex Optimization /36-725

Duality revisited. Javier Peña Convex Optimization /36-725 Duality revisited Javier Peña Conve Optimization 10-725/36-725 1 Last time: barrier method Main idea: approimate the problem f() + I C () with the barrier problem f() + 1 t φ() tf() + φ() where t > 0 and

More information

A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature

A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature Alexandre Belloni and Robert M. Freund April, 007 Abstract For a conic linear system of the form Ax K, K a convex

More information

A Second Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization

A Second Full-Newton Step O(n) Infeasible Interior-Point Algorithm for Linear Optimization A Second Full-Newton Step On Infeasible Interior-Point Algorithm for Linear Optimization H. Mansouri C. Roos August 1, 005 July 1, 005 Department of Electrical Engineering, Mathematics and Computer Science,

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

arxiv: v1 [cs.cc] 5 Dec 2018

arxiv: v1 [cs.cc] 5 Dec 2018 Consistency for 0 1 Programming Danial Davarnia 1 and J. N. Hooker 2 1 Iowa state University davarnia@iastate.edu 2 Carnegie Mellon University jh38@andrew.cmu.edu arxiv:1812.02215v1 [cs.cc] 5 Dec 2018

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,

More information

A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm

A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm Volume 31, N. 1, pp. 205 218, 2012 Copyright 2012 SBMAC ISSN 0101-8205 / ISSN 1807-0302 (Online) www.scielo.br/cam A sensitivity result for quadratic semidefinite programs with an application to a sequential

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials

A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials G. Y. Li Communicated by Harold P. Benson Abstract The minimax theorem for a convex-concave bifunction is a fundamental theorem

More information

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization Robert M. Freund M.I.T. June, 2010 from papers in SIOPT, Mathematics

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Lecture 7: Semidefinite programming

Lecture 7: Semidefinite programming CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Largest dual ellipsoids inscribed in dual cones

Largest dual ellipsoids inscribed in dual cones Largest dual ellipsoids inscribed in dual cones M. J. Todd June 23, 2005 Abstract Suppose x and s lie in the interiors of a cone K and its dual K respectively. We seek dual ellipsoidal norms such that

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

A Simple Derivation of a Facial Reduction Algorithm and Extended Dual Systems

A Simple Derivation of a Facial Reduction Algorithm and Extended Dual Systems A Simple Derivation of a Facial Reduction Algorithm and Extended Dual Systems Gábor Pataki gabor@unc.edu Dept. of Statistics and OR University of North Carolina at Chapel Hill Abstract The Facial Reduction

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Math 5593 Linear Programming Week 1

Math 5593 Linear Programming Week 1 University of Colorado Denver, Fall 2013, Prof. Engau 1 Problem-Solving in Operations Research 2 Brief History of Linear Programming 3 Review of Basic Linear Algebra Linear Programming - The Story About

More information

1. Introduction and background. Consider the primal-dual linear programs (LPs)

1. Introduction and background. Consider the primal-dual linear programs (LPs) SIAM J. OPIM. Vol. 9, No. 1, pp. 207 216 c 1998 Society for Industrial and Applied Mathematics ON HE DIMENSION OF HE SE OF RIM PERURBAIONS FOR OPIMAL PARIION INVARIANCE HARVEY J. REENBER, ALLEN. HOLDER,

More information

An Extended Frank-Wolfe Method with In-Face Directions, and its Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method with In-Face Directions, and its Application to Low-Rank Matrix Completion An Extended Fran-Wolfe Method with In-Face Directions, and its Application to Low-Ran Matrix Completion Robert M. Freund Paul Grigas Rahul Mazumder July 27, 206 Abstract Motivated principally by the low-ran

More information

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty V. Jeyakumar, G. M. Lee and G. Li Communicated by Sándor Zoltán Németh Abstract This paper deals with convex optimization problems

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

On the relative strength of families of intersection cuts arising from pairs of tableau constraints in mixed integer programs

On the relative strength of families of intersection cuts arising from pairs of tableau constraints in mixed integer programs On the relative strength of families of intersection cuts arising from pairs of tableau constraints in mixed integer programs Yogesh Awate Tepper School of Business, Carnegie Mellon University, Pittsburgh,

More information

Interval solutions for interval algebraic equations

Interval solutions for interval algebraic equations Mathematics and Computers in Simulation 66 (2004) 207 217 Interval solutions for interval algebraic equations B.T. Polyak, S.A. Nazin Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya

More information

1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016 AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 7 February 7th Overview In the previous lectures we saw applications of duality to game theory and later to learning theory. In this lecture

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Lecture 5. The Dual Cone and Dual Problem

Lecture 5. The Dual Cone and Dual Problem IE 8534 1 Lecture 5. The Dual Cone and Dual Problem IE 8534 2 For a convex cone K, its dual cone is defined as K = {y x, y 0, x K}. The inner-product can be replaced by x T y if the coordinates of the

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

CO 250 Final Exam Guide

CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS

THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS ADI BEN-ISRAEL AND YURI LEVIN Abstract. The Newton Bracketing method [9] for the minimization of convex

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

arxiv: v1 [math.oc] 10 Oct 2018

arxiv: v1 [math.oc] 10 Oct 2018 8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University

More information

Primal and Dual Predicted Decrease Approximation Methods

Primal and Dual Predicted Decrease Approximation Methods Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

On Friedrichs inequality, Helmholtz decomposition, vector potentials, and the div-curl lemma. Ben Schweizer 1

On Friedrichs inequality, Helmholtz decomposition, vector potentials, and the div-curl lemma. Ben Schweizer 1 On Friedrichs inequality, Helmholtz decomposition, vector potentials, and the div-curl lemma Ben Schweizer 1 January 16, 2017 Abstract: We study connections between four different types of results that

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

LATTICE POINT COVERINGS

LATTICE POINT COVERINGS LATTICE POINT COVERINGS MARTIN HENK AND GEORGE A. TSINTSIFAS Abstract. We give a simple proof of a necessary and sufficient condition under which any congruent copy of a given ellipsoid contains an integral

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization Spring 2017 CO 250 Course Notes TABLE OF CONTENTS richardwu.ca CO 250 Course Notes Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4, 2018 Table

More information

On Sublinear Inequalities for Mixed Integer Conic Programs

On Sublinear Inequalities for Mixed Integer Conic Programs Noname manuscript No. (will be inserted by the editor) On Sublinear Inequalities for Mixed Integer Conic Programs Fatma Kılınç-Karzan Daniel E. Steffy Submitted: December 2014; Revised: July 2015 Abstract

More information

Some global properties of neural networks. L. Accardi and A. Aiello. Laboratorio di Cibernetica del C.N.R., Arco Felice (Na), Italy

Some global properties of neural networks. L. Accardi and A. Aiello. Laboratorio di Cibernetica del C.N.R., Arco Felice (Na), Italy Some global properties of neural networks L. Accardi and A. Aiello Laboratorio di Cibernetica del C.N.R., Arco Felice (Na), Italy 1 Contents 1 Introduction 3 2 The individual problem of synthesis 4 3 The

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008 Lecture 9 Monotone VIs/CPs Properties of cones and some existence results October 6, 2008 Outline Properties of cones Existence results for monotone CPs/VIs Polyhedrality of solution sets Game theory:

More information

Conditioning of linear-quadratic two-stage stochastic programming problems

Conditioning of linear-quadratic two-stage stochastic programming problems Conditioning of linear-quadratic two-stage stochastic programming problems W. Römisch Humboldt-University Berlin Institute of Mathematics http://www.math.hu-berlin.de/~romisch (K. Emich, R. Henrion) (WIAS

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE Yugoslav Journal of Operations Research 24 (2014) Number 1, 35-51 DOI: 10.2298/YJOR120904016K A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE BEHROUZ

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 28th, 2013 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory

More information

Linear Convergence of a Modified Frank-Wolfe Algorithm for Computing Minimum Volume Enclosing Ellipsoids

Linear Convergence of a Modified Frank-Wolfe Algorithm for Computing Minimum Volume Enclosing Ellipsoids Linear Convergence of a Modified Frank-Wolfe Algorithm for Computing Minimum Volume Enclosing Ellipsoids S. Damla Ahipasaoglu Peng Sun Michael J. Todd October 5, 2006 Dedicated to the memory of Naum Shor

More information

On the exponential convergence of. the Kaczmarz algorithm

On the exponential convergence of. the Kaczmarz algorithm On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail:

More information

MIT Sloan School of Management

MIT Sloan School of Management MIT Sloan School of Management Working Paper 4286-03 February 2003 On an Extension of Condition Number Theory to Non-Conic Convex Optimization Robert M Freund and Fernando Ordóñez 2003 by Robert M Freund

More information