An incremental mirror descent subgradient algorithm with random sweeping and proximal step

Size: px
Start display at page:

Download "An incremental mirror descent subgradient algorithm with random sweeping and proximal step"

Transcription

1 An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh Septeber 7, 07 Abstract We investigate the convergence properties of increental irror descent type subgradient algoriths for iniizing the su of convex functions. In each step we only evaluate the subgradient of a single coponent function and irror it back to the feasible doain, which akes iterations very cheap to copute. The analysis is ade for a randoized selection of the coponent functions, which yields the deterinistic algorith as a special case. Under suppleentary differentiability assuptions on the function which induces the irror ap we are also able to deal with the presence of another ter in the objective function, which is evaluated via a proxial type step. In both cases we derive convergence rates of O k in expectation for the kth best objective function value and illustrate our theoretical findings by nuerical experients in positron eission toography and achine learning. Keywords. nonsooth convex iniization; increental irror descent algorith; global rate of convergence; rando sweeping AMS subject classification. 90C5, 90C90, 90C06 Introduction We consider the proble of iniizing the su of nonsooth convex functions in x C f i x, where C R n is a nonepty, convex and closed set and, for every i =,...,, the so-called coponent functions f i : R n R := R {± } are assued to be proper and convex and will be evaluated via their respective subgradients. Iplicitly we will assue that is large and it is therefore very costly to evaluate all coponent functions in each iteration. Consequently, we will exaine algoriths which only use the subgradient of a single coponent function in each iteration. These so-called increental algoriths, see [4, 8], have been applied for large-scale probles arising in toography [3], generalized assignent probles [8] or achine learning []. We refer also to [7] for a slightly different approach, where in the spirit of increental algoriths only the gradient of one of the coponent functions is evaluated in each step, but gradients at old iterates are used for the other coponents. Both, subgradient algoriths and increental Faculty of Matheatics, University of Vienna, Oskar-Morgenstern-Platz, 090 Vienna, Austria, e-ail: radu.bot@univie.ac.at. Research partially supported by FWF Austrian Science Fund, project I 49-N3. Faculty of Matheatics, University of Vienna, Oskar-Morgenstern-Platz, 090 Vienna, Austria, e-ail: axel.boeh@univie.ac.at. Research supported by the doctoral prograe Vienna Graduate School on Coputational Optiization VGSCO, FWF Austrian Science Fund, project W 60.

2 ethods usually require decreasing stepsizes in order to converge, which akes the slow near an optial solution. However, they provide in a very sall nuber of iterations a low accuracy optial value and possess a rate of convergence which is alost independent of the diension of the proble. When solving optiization probles of type one ight want to capture in the forulation of the iterative schee the geoetry of the feasible set C. This can be done by a so-called irror ap, that irrors each iterate onto the feasible set. The Bregan distance associated with the function that induces the irror ap plays an essential role in the convergence analysis and in the forulation of convergence rates results see [,]. So-called irror descent algoriths were first discussed in [9] and ore recently in [, 0, 3] in a very general fraework, and in [, ] fro a statistical learning point of view. The irror ap can be viewed as a generalization of the ordinary orthogonal projection on C in Hilbert spaces see Exaple.4, but allows also for a ore careful consideration of the probles structure, as it is the case when the objective function is subdifferentiable only on the relative interior of the feasible set. In such a setting one can design a irror ap which aps not onto the entire feasible set but only on a subset of it where the objective function is subdifferentiable see Exaple.5. The basic concepts in the forulation of irror descent algoriths are recalled in Section. We also provide soe illustrating exaples, which present soe special cases, as the general setting ight not be iediately intuitive. In Section 3 we forulate an increental irror descent subgradient algorith with rando sweeping of the coponent functions which we show to have a convergence rate of O k in expectation for the kth best objective function value. In Section 4 we ask additionally for differentiability of the function which induces the irror ap and are then able to add another nonsooth convex function to the objective function which is evaluated in the iterative schee by a proxial type step. For the resulting algorith we show a siilar convergence result. In the last section we illustrate the theoretical findings by nuerical experients in positron eission toography and achine learning. Eleents of convex analysis and the irror descent algorith Throughout the paper we assue that R n is endowed with the Euclidean inner product, and corresponding nor =,. For a nonepty convex set C R n we denote by ric its relative interior, which is the interior of C relative to its affine hull. For a convex function f : R n R we denote by dof := {x R n : fx < + } its effective doain and say that f is proper, if f > and dof. The subdifferential of f at x R n is defined as fx := {p R n : fy fx + p, y x y R n }, for fx R, and as fx :=, otherwise. We will write f x for an arbitrary subgradient, i.e. an eleent of the subdifferential fx. Proble.. Consider the optiization proble in x C fx, where C R n is a nonepty, convex and closed set, f : R n R is a proper and convex function, and H : R n R is a proper, lower seicontinuous and σ-strongly convex function such that C = doh and i H intdof. We say that H : R n R is σ-strongly convex for σ > 0, if for every x, x R n and every λ [0, ] it holds σ λ λ x x +Hλx + λx λhx + λhx. It is wellknown that, when H is proper, lower seicontinuous and σ-strongly convex, then its conjugate function H : R n R, H y = sup x R n{ y, x Hx}, is Fréchet differentiable thus it has

3 full doain and its gradient H is σ -Lipschitz continuous or, equivalently, H is Fréchet differentiable and its gradient H is σ-cocoercive, which eans that for every y, y R n it holds σ H y H y y y, H y H y. Recall that i H := { H y : y R n }. The following irror descent algorith has been introduced in [0] under the nae dual averaging. Algorith.. Consider for soe initial values x 0 intdof, y 0 R n and sequence of positive stepsizes t k k 0 the following iterative schee: k 0 [ yk+ = y k t k f x k x k+ = H y k+. We notice that, since the sequence x k k 0 is contained in the interior of the effective doain of f, the algorith is well-defined. The assuptions concerning the function H, which induces the irror ap H, are not consistent in the literature. Soeties H is assued to be a Legendre function as in [], or strongly convex and differentiable as in [, 3]. In the following section we will only assue that H is proper, lower seicontinuous and strongly convex. Exaple.3. For H = we have that H = and thus H is the identity operator on R n. Consequently, Algorith. reduces to the classical subgradient ethod: k 0 x k+ = x k t k f x k. Exaple.4. For C R n a nonepty, convex and closed set, take Hx = x, for x C, and Hx = +, otherwise. Then H = P C, where P C denotes the orthogonal projection onto C. In this setting, Algorith. becoes: k 0 [ yk+ = y k t k f x k x k+ = P C y k+. This iterative schee is siilar to, but different fro the well-known subgradient projection algorith, which reads: [ yk+ = x k 0 k t k f x k x k+ = P C y k+. Exaple.5. When considering nuerical experients in positron eission toography, one often iniizes over the unit siplex := {x = x,..., x n T R n : n j= x j =, x 0}. An appropriate choice for the function H is Hx = n j= x j logx j for x, where 0 log0 = 0, and Hx = +, if x /. In this case H is given for every y R n by H y = n expy i expy, expy,..., expy n T, and aps into the relative interior of. The following result will play an iportant role in the convergence analysis that we will carry out in the next sections. Lea.6. Let H : R n R be a proper, lower seicontinuous and σ-strongly convex function, for σ > 0, x R n and y Hx. Then it holds Hx + y, x x + σ x x Hx x R n. 3

4 Proof. The function H := H σ is convex and y σx Hx. Thus Hx + y σx, x x H x x R n or, equivalently, Hx σ x + y σx, x x H x σ x x R n. Rearranging the ters, leads to the desired conclusion. 3 A stochastic increental irror descent algorith In this section we will address the following optiization proble. Proble 3.. Consider the optiization proble in x C f i x, 3 where C R n is a nonepty, convex and closed set, for every i =,...,, the functions f i : R n R are proper and convex, and H : R n R is a proper, lower seicontinuous and σ-strongly convex function such that C = doh and i H int dof i. In this section we apply the dual averaging approach described in Algorith. to the optiization proble 3 by only using the subgradient of a coponent function at a tie. This increental approach see, also, [4, 8] is siilar to but slightly different fro the extension suggested in []. Furtherore, we introduce a stochastic sweeping of the coponent functions. For a siilar strategy, but in the rando selection of coordinates we refer to [5]. Algorith 3.. Consider for soe initial values x 0 int dof i, y, R n and sequence of positive stepsizes t k k 0 the following iterative schee: k 0 ψ 0,k = x k y 0,k = y,k for i =,..., y i,k = y i,k ɛ i,k t k pi f i ψ i,k ψ i,k = H y i,k end x k+ = ψ,k, where ɛ i,k is a {0, } valued rando variable for every i =,..., and k 0, such that ɛ i,k is independent of ψ i,k and Pɛ i,k = = for every i =,..., and k 0. One can notice that in the above iterative schee y i,k Hψ i,k for every i =,..., and k 0. In the convergence analysis of Algorith 3. we will ake use of the following Bregandistance-like function associated to the proper and convex function H : R n R and defined as d H : R n doh R n R, d H x, y, z := Hx Hy z, x y. 4 We notice that d H x, y, z 0 for every x, y R n doh and every z Hy, due to subgradient inequality. 4

5 The function d H is an extension of the Bregan distance see [,, 3], which is associated to a proper and convex function H : R n R fulfilling do H := {x R n : H is differentiable at x} and defined as D H : R n do H R, D H x, y = Hx Hy Hy, x y. 5 Theore 3.3. In the setting of Proble 3., assue that the functions f i are L fi -Lipschitz continuous on i H for i =,...,. Let x k k 0 be a sequence generated by Algorith 3.. Then for every N and every y R n it holds E in 0 k N f i x k f i y d H y, x 0, y 0,0 + σ L f i p i N k=0 t k + N k=0 t k Proof. Let y dof i doh be fixed. For y outside this set the conclusion follows autoatically. For every i =,..., and every k 0 it holds d H y, ψ i,k, y i,k = Hy Hψ i,k y i,k, y ψ i,k = Hy Hψ i,k y i,k t k ɛ i,k f p iψ i,k, y ψ i,k. i Rearranging the ters, this yields for every i =,..., and every k 0 to d H y, ψ i,k, y i,k = d H y, ψ i,k, y i,k d H ψ i,k, ψ i,k, y i,k + t k ɛ i,k f iψ i,k, y ψ i,k = d H y, ψ i,k, y i,k d H ψ i,k, ψ i,k, y i,k + t k ɛ i,k f iψ i,k, y ψ i,k t k ɛ i,k f iψ i,k, ψ i,k ψ i,k d H y, ψ i,k, y i,k d H ψ i,k, ψ i,k, y i,k + t k ɛ i,k f i y f i ψ i,k + t k ɛ i,k f iψ i,k ψ i,k ψ i,k. Fro here we get for every i =,..., and every k 0 d H y, ψ i,k, y i,k d H y, ψ i,k, y i,k d H ψ i,k, ψ i,k, y i,k + t k ɛ i,k f i y f i ψ i,k + σ t k ɛ i,k f iψ i,k + σ 4 ψ i,k ψ i,k p i which, by using that H is σ-strongly convex and Lea.6, yields d H y, ψ i,k, y i,k d H y, ψ i,k, y i,k d H ψ i,k, ψ i,k, y i,k + t k ɛ i,k f i y f i ψ i,k + σ t k ɛ i,k f iψ i,k + d Hψ i,k, ψ i,k, y i,k p i = d H y, ψ i,k, y i,k + t k ɛ i,k f i y f i ψ i,k + σ t k ɛ i,k f iψ i,k d Hψ i,k, ψ i,k, y i,k. p i. 5

6 Using the fact that f i is L fi -Lipschitz continuous, it follows that f i ψ i,k L fi, for every i =,..., and every k 0, thus d H y, ψ i,k, y i,k d H y, ψ i,k, y i,k + t k ɛ i,k f i y f i ψ i,k + σ t k ɛ i,k L f i d Hψ i,k, ψ i,k, y i,k. Since all the involved functions are easurable, we can take the expected value on both sides of the above inequality and get due to the assued independence of ɛ i,k and ψ i,k for every i =,..., and every k 0 tk E d H y, ψ i,k, y i,k E d H y, ψ i,k, y i,k + E f i y f i ψ i,k Eɛ i,k + σ t k L f i Eɛ i,k E d Hψ i,k, ψ i,k, y i,k. Since Eɛ i,k =, we get for every i =,..., and every k 0 p i E d H y, ψ i,k, y i,k E d H y, ψ i,k, y i,k + E t k f i y f i ψ i,k + σ t k L f E i d Hψ i,k, ψ i,k, y i,k. Suing the above inequality for i =,..., and using that L f i L 4 f L fi i it yields for every k 0 p i Ed H y, ψ,k, y,k Ed H y, x k, y 0,k + E or, equivalently, t k + σ t k L fi Ed H y, ψ,k, y,k Ed H y, x k, y 0,k + E p i + σ t k L fi p i t k p i f i y f i ψ i,k E, p i d Hψ i,k, ψ i,k, y i,k. f i y f i x k + f i x k f i ψ i,k E Thus, for every k 0, Ed H y, ψ,k, y,k Ed H y, x k, y 0,k + t k E f i y + σ t k L fi + E t k p i E f i x k f i ψ i,k d Hψ i,k, ψ i,k, y i,k. f i x k d Hψ i,k, ψ i,k, y i,k. 6 6

7 On the other hand, by using the Lipschitz continuity of H it yields for every k 0 f i x k f i ψ i,k = t k i f i ψ j,k f i ψ j,k i= j= i L fi ψ j,k ψ j,k L fl ψ i,k ψ i,k, i= j= l= i= L fl H y i,k H y i,k l= i= L fl y i,k y i,k σ l= i= = L fl σ ɛ t k i,k f p iψ i,k l= i= i σ t ɛ i,k k L fl L fi. Therefore, for every k 0 E f i x k f i ψ i,k σ t k L fl E Cobining 6 and 7 gives for every k 0 Ed H y, ψ,k, y,k Ed H y, x k, y 0,k + t k E f i y l= + σ t k L fi l= ɛ i,k L fi σ t k L fi. 7 p i E f i x k d Hψ i,k, ψ i,k, y i,k + σ t k L fi. 8 Since ψ,k = x k+ and y,k = y 0,k+ we get for every k 0 that Ed H y, x k+, y 0,k+ Ed H y, x k, y 0,k + t k E f i y + σ t k L fi +. p i f i x k 7

8 By suing up this inequality fro k = 0 to N, where N, we get N t k E f i x k f i y + Ed H y, x N, y 0,N k=0 k=0 N Ed H y, x 0, y 0,0 + σ t k L fi +. Since d H y, x N, y 0,N 0, as y 0,N Hx N, we get E f i x k f i y in 0 k N and this finishes the proof. d H y, x 0, y 0,0 + σ L f i p i N k=0 t k p i + N k=0 t k Reark 3.4. The set fro which the variable y is chosen in the previous theore ight sees to be restrictive, however, we would like to recall that in any applications doh is the set of feasible solutions of the optiization proble 3. Since i H = do H := {x R n : Hx } doh, the inequality in Theore 3.3 is fulfilled for every y i H. The optial stepsize choice, which we provide in the following corollary, is a consequence of [, Proposition 4.], which states that the function z c + σ z T Dz b T, z where c > 0, b R d ++ := {z,..., z d T R d : z i > 0, i =,..., d} and D R d d is a syetric positive definite atrix, attains its iniu on R d ++ at z cσ = b T D b D b and this provides as optial objective value. c σb T D b Corollary 3.5. In the setting of Proble 3., assue that the functions f i are L fi -Lipschitz continuous on i H for i =,...,. Let x doh be an optial solution of 3 and x k k 0 be a sequence generated by Algorith 3. with optial stepsize t k := d H x, x 0, y 0,0 L k k 0. f i + p i Then for every N it holds E in f i x k f i x d Hx, x 0, y 0,0 L fi 0 k N σ p i + N. Reark 3.6. In the last step of the proof of Theore 3.3 one could have chosen to use the following inequality N N k=0 t k E f t kx k N i N k=0 t f i y t k E f i x k f i y k k=0 8 k=0

9 given by the convexity of f i in order to proof convergence of the function values for the ergodic sequence x k := k k i=0 t i=0 t ix i for all k 0. This would lead for every N and i every y R n to E f i x N f i y d H y, x 0, y 0,0 + σ L f i N p + k=0 t k i N k=0 t k and for the optial stepsize choice fro Corollary 3.5 to d Hx, x 0, y 0,0 E f i x N f i y L fi σ p i + N, and ight be beneficial, as it does not require the coputation of objective function values, which are by our iplicit assuption of being large expensive to copute. 4 A stochastic increental irror descent algorith with Bregan proxial step In this section we add another nonsooth convex function to the objective function of the optiization proble 3 and provide an extension of Algorith 3., which evaluates in particular the new suand by a proxial type step. However, this asks for suppleentary differentiability assuption on the function inducing the irror ap. Proble 4.. Consider the optiization proble in x C f i x + gx 9 where C R n is a nonepty, convex and closed set, for every i =,...,, the functions f i : R n R are proper and convex and g : R n R is a proper, convex and lower seicontinuous function, and H : R R is a proper, σ-strongly convex and lower seicontinuous function such that C = doh, H is continuously differentiable on intdoh, i H int dof i intdoh and intdoh dog. For a proper, convex, lower seicontinuous function h : R n R we define its Breganproxial operator with respect to the proper, σ-strongly convex and lower seicontinuous function H : R n R as being prox H h : do H Rn, prox H h x := arg in u R n {hu + D H u, x}. Due to the strong convexity of H, the Bregan-proxial operator is well-defined. For H = it coincides with the classical proxial operator. We are now in the position to forulate the iterative schee we would like to propose for solving 9. In case g = 0, this algorith gives exactly the increental version of the iterative ethod in [], actually suggested by the two authors in this paper. 9

10 Algorith 4.. Consider for soe initial value x 0 i H and sequence of positive stepsizes t k k 0 the following iterative schee: k 0 ψ 0,k = x k for i =,..., ψ i,k = H Hψ i,k ɛ i,k t k pi f i ψ i,k end x k+ = prox H t k gψ,k, where ɛ i,k is a {0, } valued rando variable for every i =,..., and k 0, such that ɛ i,k is independent fro ψ i,k and Pɛ i,k = = for every i =,..., and k 0. Lea 4.3. In the setting of Proble 4., Algorith 4. is well-defined. Proof. As i H int dof i, it follows for every i =,..., and every k 0 iediately that ψ i,k int dof i, thus a subgradient of f i at ψ i,k exists. In what follows we prove that this is the case also for ψ 0,k, for every k 0. To this ai it is enough to show that x k i H for every k 0. For k = 0 this stateent is true by the choice of the initial value. For every k 0 we have that 0 t k g + H Hψ,k, x k+, which, according to intdoh dog, is equivalent to 0 t k gx k+ + Hx k+ Hψ,k. Thus x k+ do H = i H for every k 0 and this concludes the proof. Exaple 4.4. Consider the case when =, ɛ,k = for every k 0 and Hx = x for x C, while Hx = + for x / C, where C R n is a nonepty, convex and closed set. In this setting, H is equal to the orthogonal projection P C onto the set C. Algorith 4. yields the following iterative schee, which basically iniizes the su f + g over the set C: k 0 x k+ = prox H t k gp C x k t k f x k. 0 The difficulty in Exaple 4.4, assuing that it is reasonably possible to project onto the set C, lies in evaluating prox H t k g, for every k 0, as this itself is a constraint optiization proble prox H t k gx = arg in {t k gu + } x u. u C When C = R n, the iterative schee 0 becoes the proxial subgradient algorith investigated in [6]. Theore 4.5. In the setting of Proble 4., assue that the functions f i are L fi -Lipschitz continuous on i H for i =,...,. Let x k k 0 be a sequence generated by Algorith 4.. Then for every N and every y R n it holds E in 0 k N σd H y, x 0 + f i + g p i x k σ N k=0 t k 0 f i + g y L f i N k=0 t k.

11 Proof. Let y dof i dog doh be fixed. For y outside this set the conclusion follows autoatically. As in the first part of the proof of Theore 3.3, we obtain instead of 8 the following inequality which holds for every i =,..., and every k 0 E D H y, ψ,k ED H y, x k + t k E f i y f i x k + σ t k L fi p i + E As pointed out in the proof of Lea 4.3, for every k 0 we have thus The three point identity leads to or, equivalently, 0 t k gx k+ + Hx k+ Hψ,k, t k gy gx k+ Hψ,k Hx k+, y x k+. D Hψ i,k, ψ i,k. t k gy gx k+ D H y, ψ,k D H y, x k+ D H x k+, ψ,k t k gx k+ gy + D H y, x k+ D H y, ψ,k D H x k+, ψ,k for every k 0. Since the involved functions are easurable, we can take the expected value on both sides and obtain for every k 0 t k Egx k+ gy + ED H y, x k+ ED H y, ψ,k ED H x k+, ψ,k. Cobining and gives for every k 0 t k Egx k+ gy + t k E f i x k ED H y, x k + σ t k L fi ED H x k+, ψ,k p i f i y + ED H y, x k+ + E D Hψ i,k, ψ i,k. By adding and subtracting E f ix k+ and by using afterwards the Lipschitz continuity of f i, we get for every k 0 t k E f i + g x k+ f i + g y t k L fi E x k x k+ + ED H y, x k+ E D H y, x k + σ t k L fi E D H x k+, ψ,k p i + E D Hψ i,k, ψ i,k.

12 By the triangle inequality we obtain for every k 0 t k E f i + g x k+ f i + g y + ED H y, x k+ ED H y, x k + σ t k L fi p i + ED H x k+, ψ,k + t k L fi E x k ψ,k + t k L fi E ψ,k x k+ which, due to Young s inequality and the strong convexity of H, leads to t k E f i + g x k+ f i + g y + ED H y, x k+ Since ED H y, x k + σ t k L fi + t k L fi + ED H x k+, ψ,k p i E x k ψ,k + σ t k L fi E D Hψ i,k, ψ i,k. E D Hψ i,k, ψ i,k, + ED H x k+, ψ,k x k ψ,k = ψ i,k ψ i,k ψ i,k ψ i,k, we get for every k 0 that t k E f i + g x k+ f i + g y + ED H y, x k+ ED H y, x k + σ t k L fi p i + t k L fi E ψ i,k ψ i,k + 3 E D Hψ i,k, ψ i,k. Young s inequality and the strong convexity of H iplies that for every i =,..., and every k 0 t k L fi ψ i,k ψ i,k σ t k L fi + σ 4 ψ i,k ψ i,k σ t k L fi + D Hψ i,k, ψ i,k

13 and thus t k E f i + g x k+ f i + g y + ED H y, x k+ ED H y, x k + σ t k L fi p i Suing up this inequality fro k = 0 to N, for N, we get N t k E f i + g x k+ f i + g y + ED H y, x N k=0 This shows that ED H y, x 0 + σ E in 0 k N σd H y, x 0 + L fi p i f i + g x k+ f i + g y and therefore finishes the proof. p i σ N k=0 t k N k=0 t k. L f i N k=0 t k The following result is again a consequence of [, Proposition 4.]. Corollary 4.6. In the setting Proble 4., assue that the functions f i are L fi -Lipschitz continuous on i H for i =,...,. Let x doh be an optial solution of 9 and x k k 0 be a sequence generated by Algorith 4. with optial stepsize t k := D H x, x 0 L k k 0. f i p i Then for every N it holds E f i + g x k f i + g x in 0 k N D Hx, x 0 L fi p i σ N. Reark 4.7. The sae considerations as in Reark 3.6 about ergodic convergence are applicable also for the rates provided in Theore 4.5 and Corollary

14 n = 0 4, = 3n = 0.003, i n = 0 3, = 6n = 0.006, i NI DI SI decrease obj.fun.val. [%] # outer loops 0 63 # subgrad. evaluations decrease in obj.fun.val. [%] # outer loops # subgrad. evaluations Table : Results for the optiization proble 3, where NI denotes the non-increental, DI the deterinistic increental and SI the stochastic increental version of Algorith Applications In the nuerical experients carried out in this section we will copare three versions of the provided algoriths. First of all, the non-increental version, which takes full subgradient steps with respect to the su of all coponent functions instead of every single one individually. This can be viewed as a special case of the algoriths given, when = and ɛ,k = for all k 0. Secondly, we discuss the non-stochastic increental version, which uses the subgradient of every single coponent function in every iteration and thus corresponds to the case when ɛ i,k = for every i =,..., and every k 0. Lastly, we apply the algoriths as intended by evaluating the subgradients of the respective coponent functions increentally with a probability different fro. 5. Toography This application can be found in [3] and arises in the reconstruction of iages in positron eission toography PET. We consider the following proble n y i log r ij x j, 3 in x { where := x R n : } n j= x j =, x 0 and r ij denotes for i =,..., and j =,..., n the entry of the atrix R R n in the i-th row and j-th colun and all of these are assued to be strictly positive. Furtherore, y i denotes for i =,..., the positive nuber of photons easured in the i-th bin. As discussed in Exaple.5 this can be incorporated into our fraework with the irror ap Hx = n x i logx i for x and Hx = +, otherwise. As initial value we use the all ones vector divided by the diension n. We also want to point out that a siilar exaple given in [] in which the iniization of a convex function over the unit siplex soehow does not atch the assuption ade throughout the paper as the interior of is epty and the function H can therefore not be continuously differentiable in a classical sense. However, with the setting of Section 3 we are able to tackle this proble. The bad perforance, see Figure, of the deterinistic increental version of Algorith 3. can be explained by the fact that any ore evaluations of the irror ap are needed, which increases the overall coputation tie draatically. The stochastic version, however, perfors rather well, after only evaluating erely roughly a fifth of the total nuber of coponent functions, see Table. j= 4

15 non-increental increental stochastic increental coputation tie in seconds f Figure : Results for the optiization proble 3. A plot of N fx best fx 0 fx best, where f N := in 0 k N fx k, as a function of tie, i.e. x N is the last iterate coputed before a given point in tie. 5. Support Vector Machines We deal with the classic achine learning proble of binary classification based on the wellknown MNIST dataset, which contains 8 by 8 iages of handwritten nubers on a grey-scale pixel ap. For each of the digits the dataset coprises around 6000 training iages and roughly 000 test iages. In line with [], we train a classifier to distinguish the nubers 6 and 7, by solving the following optiization proble in w R 784 ax{0, y i w, x i } + λ w, 4 where, for i =,...,, x i {0,,..., 55} 784 denotes the i-th training iage and y i {, } denotes the label of the i-th training iage. The -nor serves as a regularization ter and λ > 0 balances the two objectives of iniizing the classification error and reducing the - nor of the classifier w. To incorporate this proble into our fraework, we set H = which leaves us with the identity as irror ap as this proble is unconstrained. The results coparing the three versions of Algorith 4. discussed in the beginning of this section are illustrated in Figure. As initial value we siply use the all ones vector. All three versions show classical first-order behaviour, giving a fast decrease in objective function value first but then slowing down draatically. More inforation about the perforance can be seen in Table. All three algoriths results in a significant decrease in objective function after being run for only 4 seconds each. However, fro a achine learning point of view, only the isclassification rate is of actual iportance. In both regards, the stochastic increental version clearly trups the other two ipleentations. It is also interesting to note that it needs only a sall fraction of the nuber of subgradient evaluations in coparison to the full non-increental algorith. 6 Conclusion In this paper we present two algoriths to solve nonsooth convex optiization probles where the objective function is a su of any functions which are evaluated by their respective subgradients under the iplicit presence of a constraint set which is dealt with by a so-called 5

16 objective function value non-increental increental stochastic increental coputation tie in seconds Figure : Nuerical results for the optiization proble 4 with λ = 0.0. The plot shows in 0 k N fx k as a function of tie, i.e. x N is the last iterate coputed before a given point in tie. λ = 0.0 λ = 0.00 NI DI SI decrease in obj.fun.val. [%] #outer loops #subgrad. evaluations isclassified[%] decrease obj.fun.val. [%] #iter #subgrad. eval isclassified[%] Table : Nuerical results for the optiization proble 4, where NI denotes the nonincreental, DI the deterinistic increental and SI the stochastic increental version of Algorith 4.. The coputation for different regularization paraeters λ show siilar perforances of the algoriths, but a lower isclassification rate for the lower value. 6

17 irror ap. By allowing for a rando selection of each coponent function to evaluate in each iteration, the proposed ethods becoe suitable even for very large-scale probles. We proof a convergence order of O k in expectation for the kth best objective function value, which is standard for subgradient ethods. However, even for the case where all the objective functions are differentiable it is not clear if better theoretical estiates can be achieved, due to the need of using diinishing stepsizes in order to obtain convergence in increental algoriths. Future work could coprise the investigation of different stepsizes, such as constant or dynaic stepsizes as in [8]. Another possible extension of this would be to use different selection procedures such as rando subsets of fixed size. Our fraework, however, does not provide the right setting for such a batch approach as it would leave ɛ i,k and ɛ j,k dependent. References [] Heinz H. Bauschke, Jérôe Bolte, and Marc Teboulle. A descent lea beyond Lipschitz gradient continuity: first-order ethods revisited and applications. Matheatics of Operations Research, 4: , 07. [] Air Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient ethods for convex optiization. Operations Research Letters, 33:67 75, 003. [3] Aharon Ben-Tal, Taar Margalit, and Arkadi Neirovski. The ordered subsets irror descent optiization ethod with applications to toography. SIAM Journal on Optiization, :79 08, 00. [4] Diitri P. Bertsekas. Increental gradient, subgradient, and proxial ethods for convex optiization: A survey. In: Suvrit Sra, Sebastian Nowozin, Stephen J. Wright Eds.. Optiization for Machine Learning, Neural Inforation Processing Series, MIT Press, Massachusetts, 85 0, 0. [5] Patrick L. Cobettes and Jean-Christophe Pesquet. Stochastic quasi-fejér block-coordinate fixed point iterations with rando sweeping. SIAM Journal on Optiization, 5: 48, 05. [6] José Yunier Bello Cruz. On proxial subgradient splitting ethod for iniizing the su of two nonsooth convex functions. Set-Valued and Variational Analysis, 5:45 63, 07. [7] Mert Gurbuzbalaban, Asuan Ozdaglar, and Pablo A. Parrilo. On the convergence rate of increental aggregated gradient algoriths. SIAM Journal on Optiization, 7: , 07. [8] Angelia Nedic and Diitri P. Bertsekas. Increental subgradient ethods for nondifferentiable optiization. SIAM Journal on Optiization, :09 38, 00. [9] Arkadii S. Neirovskii and David B. Yudin. Proble Coplexity and Method Efficiency in Optiization. John Wiley & Sons Ltd, New Jersey, 983. [0] Yurii Nesterov. Prial-dual subgradient ethods for convex probles. Matheatical Prograing, 0: 59, 009. [] Shai Shalev-Shwartz. Online learning and online convex optiization. Foundations and Trends R in Machine Learning, 4:07 94, 0. 7

18 [] Lin Xiao. Dual averaging ethods for regularized stochastic learning and online optiization. Journal of Machine Learning Research, : , 00. [3] Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Babos, Stephen Boyd, and Peter Glynn. Mirror descent in non-convex stochastic prograing. arxiv: , 07. 8

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh April 6, 08 Abstract We investigate the convergence properties of increental irror descent type

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Tight Complexity Bounds for Optimizing Composite Objectives

Tight Complexity Bounds for Optimizing Composite Objectives Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization The Thirty-Second AAAI Conference on Artificial Intelligence AAAI-8) Inexact Proxial Gradient Methods for Non-Convex and Non-Sooth Optiization Bin Gu, De Wang, Zhouyuan Huo, Heng Huang * Departent of Electrical

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Introduction to Optimization Techniques. Nonlinear Programming

Introduction to Optimization Techniques. Nonlinear Programming Introduction to Optiization echniques Nonlinear Prograing Optial Solutions Consider the optiization proble in f ( x) where F R n xf Definition : x F is optial (global iniu) for this proble, if f( x ) f(

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. Proceedings of the 2016 Winter Siulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtan, E. Zhou, T. Huschka, and S. E. Chick, eds. THE EMPIRICAL LIKELIHOOD APPROACH TO SIMULATION INPUT UNCERTAINTY

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

THE CONSTRUCTION OF GOOD EXTENSIBLE RANK-1 LATTICES. 1. Introduction We are interested in approximating a high dimensional integral [0,1]

THE CONSTRUCTION OF GOOD EXTENSIBLE RANK-1 LATTICES. 1. Introduction We are interested in approximating a high dimensional integral [0,1] MATHEMATICS OF COMPUTATION Volue 00, Nuber 0, Pages 000 000 S 0025-578(XX)0000-0 THE CONSTRUCTION OF GOOD EXTENSIBLE RANK- LATTICES JOSEF DICK, FRIEDRICH PILLICHSHAMMER, AND BENJAMIN J. WATERHOUSE Abstract.

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

Composite optimization for robust blind deconvolution

Composite optimization for robust blind deconvolution Coposite optiization for robust blind deconvolution Vasileios Charisopoulos Daek Davis Mateo Díaz Ditriy Drusvyatskiy Abstract The blind deconvolution proble seeks to recover a pair of vectors fro a set

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

THE WEIGHTING METHOD AND MULTIOBJECTIVE PROGRAMMING UNDER NEW CONCEPTS OF GENERALIZED (, )-INVEXITY

THE WEIGHTING METHOD AND MULTIOBJECTIVE PROGRAMMING UNDER NEW CONCEPTS OF GENERALIZED (, )-INVEXITY U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 2, 2018 ISSN 1223-7027 THE WEIGHTING METHOD AND MULTIOBJECTIVE PROGRAMMING UNDER NEW CONCEPTS OF GENERALIZED (, )-INVEXITY Tadeusz ANTCZAK 1, Manuel ARANA-JIMÉNEZ

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming

Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Prograing Bingsheng He and Xiaoing Yuan October 8, 0 [First version: January 3, 0] Abstract Recently, we have

More information

Data-Driven Imaging in Anisotropic Media

Data-Driven Imaging in Anisotropic Media 18 th World Conference on Non destructive Testing, 16- April 1, Durban, South Africa Data-Driven Iaging in Anisotropic Media Arno VOLKER 1 and Alan HUNTER 1 TNO Stieltjesweg 1, 6 AD, Delft, The Netherlands

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information