A Deterministic Rescaled Perceptron Algorithm

Size: px
Start display at page:

Download "A Deterministic Rescaled Perceptron Algorithm"

Transcription

1 A Deterministic Rescaled Perceptron Algorithm Javier Peña Negar Soheili June 5, 03 Abstract The perceptron algorithm is a simple iterative procedure for finding a point in a convex cone. At each iteration, the algorithm only involves a query to a separation oracle for and a simple update on a trial solution. The perceptron algorithm is guaranteed to find a point in after O/τ ) iterations, where is the width of the cone. We propose a version of the perceptron algorithm that includes a periodic rescaling of the ambient space. In contrast to the classical version, our rescaled version finds a point in in Om 5 log/ )) perceptron updates. This result is inspired by and strengthens the previous work on randomized rescaling of the perceptron algorithm by Dunagan and Vempala [Math. Program ), 0 4] and by Belloni, reund, and Vempala [Math. Oper. Res ), 6 64]. In particular, our algorithm and its complexity analysis are simpler and shorter. urthermore, our algorithm does not require randomization or deep separation oracles. Introduction The relaxation method, introduced in the classical articles of Agmon [], and Motzkin and Schoenberg [6], is a conceptual algorithmic scheme for solving the feasibility problem y. ) Here R m is assumed to be an open convex set with an available separation oracle: Given a test point y R m, the oracle either certifies that y or else it finds a hyperplane separating y from, that is, u R m, b R such that u, y b and u, v > b for all v. The relaxation method starts with an arbitrary initial trial solution. At each iteration, the algorithm queries the separation oracle for at the current trial solution y. If y then the algorithm terminates. Otherwise, the algorithm generates a new trial point y + = y + ηu for some step length η > 0 where u R m, b R determine a hyperplane separating y from as above. Tepper School of Business, Carnegie Mellon University, USA, jfp@andrew.cmu.edu Tepper School of Business, Carnegie Mellon University, USA, nsoheili@andrew.cmu.edu

2 The perceptron algorithm can be seen as a particular type of relaxation method for the problem ). It applies to the case when is the interior of a convex cone. It usually starts at the origin as the initial trial solution and each update is of the form y + = y + u u. The perceptron algorithm was originally proposed by Rosenblatt [9] for the polyhedral feasibility problem A T y > 0. As it was noted by Belloni, reund, and Vempala [6], the algorithm readily extends to the more general problem ) when is the interior of a convex cone, as described above. urthermore, Belloni et al. [6, Lemma 3.] showed that the classical perceptron iteration bound of Block [8] and Novikoff [7] also holds in general: The perceptron algorithm finds a solution to ) in at most O/τ ) perceptron updates, where is the width of the cone : := sup {r R + : By, r) }. ) y = Here By, r) denotes the Euclidean ball of radius r centered at y, that is By, r) = {u R m : u y r}. Similar results also hold for the relaxation method as established by Goffin [4]. Since their emergence in the fifties, both the perceptron algorithm and the relaxation method have played major roles in machine learning and in optimization. The perceptron algorithm has attractive properties concerning noise tolerance [9]. It is also closely related to large-margin classification [] and to the highly popular and computationally effective Pegasos algorithm [0] for training support-vector machines. There are also numerous papers in the optimization literature related to various versions and variants of the relaxation method [, 3, 4, 5, 0]. A major drawback of both the perceptron algorithm and the relaxation method is their lack of theoretical efficiency in the standard bit model of computation [5]. In particular, when = {y : A T y > 0} with A Z m n, the perceptron algorithm may have exponential worst-case bit-model complexity because can be exponentially small in the bit-length representation of A. Our main contribution is a variant of the perceptron algorithm that solves ) in Om 5 log/ )) perceptron updates. In particular, when = {y : A T y > 0} with A Z m n, our algorithm is polynomial in the bit-length representation of A. Aside from its theoretical merits, given the close connection between the perceptron algorithm and first-order methods [], our algorithm provides a solid foundation for potential speed ups in the convergence of the widely popular first-order methods for large-scale convex optimization. Some results of similar nature have been recently obtained by Gilpin et al. [3] and by O Donoghue and Candès [8]. Our algorithm is based on a periodic rescaling of the space R m in the same spirit as in previous work by Dunagan and Vempala [], and by Belloni, reund, and Vempala [6]. In contrast to the rescaling procedure in [, 6], which is randomized and relies on a deep separation oracle, our rescaling procedure is deterministic and relies only on a separation oracle. The algorithm performs at most Om log/ )) rescaling steps and at most Om 4 ) perceptron updates between rescaling steps. When = {y R m : A T y > 0} for A R m n, a simplified version of the algorithm has iteration bound Om n log/ )). A smooth version of this algorithm, along the lines developed by Soheili and

3 Peña [], in turn has the improved iteration bound Omn m logn) log/ )). Our rescaled perceptron algorithm consists of an outer loop with two main phases. The first one is a perceptron phase and the second one is a rescaling phase. The perceptron phase applies a restricted number of perceptron updates. If this phase does not find a feasible solution, then it finds a unitary vector d R m such that { y R m : 0 d, y } y. 6m This inclusion means that the feasible cone is nearly perpendicular to d. The second phase of the outer loop, namely a rescaling phase, stretches R m along d and is guaranteed to enlarge the volume of the set {y : y = } by a constant factor. This in turn implies that the algorithm must halt in at most Om log/ )) outer iterations. Polyhedral case or ease of exposition, we first consider the case = {y R m : A T y > 0} for A R m n. Assumption i) The space R m is endowed with the canonical dot inner product u, v := u T v. ii) A = [ a a n ] where ai = for i =,..., n. iii) The problem A T y > 0 is feasible. In particular, > 0. or j =,..., n let e j R n denote the vector with jth component equal to one and all other components equal to zero. Rescaled Perceptron Algorithm. let B := I; à := A; N := 6mn. Perceptron Phase) x 0 := 0 R n ; y 0 := 0 R m ; for k = 0,,..., N if ÃT y k > 0 then HALT output By k else let j {,..., n} be such that ã T j y k 0 x k+ := x k + e j y k+ := y k + ã j end if end for 3. Rescaling Phase) j = argmax e i, x N i=,...,n B := BI ãjã T j ); à := I ãjã T j )à normalize the columns of à 4. Go back to Step. 3

4 The rescaled perceptron algorithm changes the initial constraint matrix A to a new matrix à = BT A. Thus when ÃT y > 0, the non-zero vector By returned by the algorithm solves A T y > 0. Now we can state a special version of our main theorem. Theorem Assume A R m n satisfies Assumption. Then the rescaled perceptron algorithm terminates with a solution to A T y > 0 after at most ) ) log.5) m ) log + )) τ logπ) = O m log τ rescaling steps. Since the algorithm performs Omn ) perceptron updates between rescaling steps, the algorithm terminates after at most )) O m n log perceptron updates. The key ingredients in the proof of Theorem are the three lemmas below. The first of these lemmas states that if the perceptron phase does not solve à T y > 0, then the rescaling phase identifies a column ã j of à that is nearly perpendicular to the feasible cone {y : à T y 0}. The second lemma in turn implies that the rescaling phase increases the volume of this cone by a constant factor. The third lemma states that the volume of the initial feasible cone = {y : A T y 0} is bounded below by a factor of τ m. Lemma If the perceptron phase in the rescaled perceptron algorithm does not find a solution to ÃT y > 0 then the vector ã j in the first step of the rescaling phase satisfies { {y : ÃT y 0} y : 0 ã T j y } y. 3) 6m Proof: Observe that at each iteration of the perceptron phase we have y k+ = y k + a T j y k + y k +. Hence y k k. Also, throughout the perceptron phase x k 0, y k = Ãx k and x k+ = x k +. Thus if the perceptron phase does not find a solution to à T y > 0 then the last iterates y N and x N satisfy x N 0, x N = N = 6mn and y N = Ãx N N = n 6m. In particular, the index j in the first step of the rescaling phase satisfies e j, x N x N /n = 6mn. Next observe that if ÃT y 0 then 0 6mn ã T j y e j, x N ã T j y x T NÃT y Ãx N y n 6m y. So 3) follows. The following two lemmas rely on geometric arguments concerning the unit sphere S m := {u R m : u = }. Given a measurable set C S m, let VolC) denote its volume in S m. 4

5 We rely on the following construction proposed by Betke [7]. Given a S m and α >, let Ψ a,α : S m S m denote the transformation u I + α )aat )u I + α )aa T )u = u + α )at u)a + α )a T u). This transformation stretches the sphere in the direction a. The magnitude of the stretch is determined by α. Lemma Assume a S m, 0 < δ <, and α >. If C {y S m : 0 a T y δ} is a measurable set, then α Vol Ψ a,α C)) VolC). 4) + δ α m/ )) In particular, if δ = 6m and α = then Vol Ψ a,α C)).5VolC). 5) Proof: Without loss of generality assume a = e m. Also for ease of notation, we shall write Ψ as shorthand for Ψ a,α. Under these assumptions, for y = ȳ, y m ) S m we have ȳ, αy m ) Ψȳ, y m ) = α + α ) ȳ. To calculate the volume of C and of ΨC), consider the differentiable map Φ : B m R m, defined by Φ v) = v, ) v that maps the unit ball B m := {v R m : v } to the surface of the hemisphere {ȳ, y m ) S m : y m 0} containing the set C. The volume of C is VolC) = Φ dv. Φ C) where Φ denotes the volume of the m )-dimensional parallelepiped spanned by the vectors Φ/ v,..., Φ/ v m. Likewise, the volume of ΨC) is VolΨC)) = Ψ Φ) dv. Φ C) Hence to prove 4) it suffices to show that Ψ Φ) v) Φ v) α + δ α )) m/ for all v Φ C). 6) Some straightforward calculations show that for all v intb m ) Ψ Φ) v) = α and Φ v) = v α + α ) v ). m/ v Hence for all v intb m ) Ψ Φ) v) Φ v) = α α + α ) v ) m/. 5

6 To obtain 6), observe that if v Φ C) then 0 v δ and thus If δ = 6m and α = then α + α ) v + δ α ). α + δ α )) = m/ ) + m/4 exp0.5).5. m Thus 5) follows from 4). Lemma 3 Assume R m is a closed convex cone. Then Vol S m ) ) m τ π VolSm ). 7) Proof: rom the definition of the cone width it follows that Bz, ) for some z with z =. Therefore τ )z + v for all v Rm such that v τ and z, v = 0. This implies that S m contains a spherical cap of S m with base radius τ. Hence Vol S m ) τ ) m VolB m ). π m, Γ m +) VolSm ) = m + ) Γ + ). The bound 7) now follows from the facts VolB m ) = π m Γ m +), and Γ m Proof of Theorem : Let := {y R m : Ã T y 0}. Observe that the rescaling phase rescales to I + ã j ã T j ). Therefore, Lemma and Lemma imply that after each rescaling phase the quantity Vol S m ) increases by a factor of.5 or more. Since the set S m is always contained in a hemisphere, we conclude that the number of rescaling steps before the algorithm halts cannot be larger than VolS m ) log.5) log )/ Vol S m ) To finish, apply Lemma 3. 3 General case The gist of the algorithm for the general case of a convex cone is the same as that of the polyhedral case presented above. We just need a bit of extra work to identify a suitable direction for the rescaling phase. To do so, we maintain a collection of m index sets S j, j = ±, ±,..., ±m. This collection of sets helps us determine a subset of update steps that align with each other. The sum of these steps in turn defines the appropriate direction for rescaling. Assumption 6

7 i) The space R m is endowed with the canonical dot inner product,. ii) R m is the non-empty interior of a convex cone. In particular, > 0. iii) There is an available separating oracle for the cone : Given y R m the oracle either determines that y or else it finds a non-zero vector u := {u : u, v > 0 for all v } such that u, y 0. or j =,..., m let e j R m denote the vector with jth component equal to one and all other components equal to zero. Observe that for a non-singular matrix B R m m, we have B ) = B T. Thus a separation oracle for := B is readily available provided one for is: Given y R m, apply the separation oracle for to the point By. If By then y B =. If By, then let u be a non-zero vector such that u, By 0. Thus B T u, y = u, By 0 with B T u B ) =. Consequently, throughout the algorithm below we assume that a separation oracle for the rescaled cone is available. General Rescaled Perceptron Algorithm. let B := I; := ; N := 4m 4. for j = ±, ±,..., ±m S j := end for 3. Perceptron Phase) u 0 := 0 R m ; y 0 := 0 R m ; for k = 0,,..., N if y k then HALT and output By k else let u k be such that u k, y k 0 and u k = y k+ = y k + u k j := argmax i=,...,m e i, u k if e j, u k > 0 then S j := S j {k} else S j := S j {k} end if end if end for 4. Rescaling Phase) i = argmax S j d := j=±,...,±m k S u k i k S u k i B := BI ddt ); 5. Go back to Step. := I + dd T ) The general rescaled perceptron algorithm changes the initial cone to = B. Thus when y, we have By. Notice that although the above algorithm implicitly performs this transformation, its steps do not involve inverting any matrices or solving any system of equations. Now we can state the general version of our main theorem. 7

8 Theorem Assume R m is such that Assumption holds. Then the general rescaled perceptron algorithm terminates with a solution to y after at most log.5) m ) log τ ) + logπ) ) )) = O m log rescaling steps. Since the algorithm performs Om 4 ) perceptron updates between rescaling steps, the algorithm terminates after at most )) O m 5 log perceptron updates. The proof of Theorem is almost identical to the proof of Theorem. All we need is the following analog of Lemma. Lemma 4 If the perceptron phase in the general rescaled perceptron algorithm does not find a solution to y then the vector d in the rescaling phase satisfies { y : 0 d, y } y. 8) 6m Proof: Proceeding as in the proof of Lemma, it is easy to see that if the perceptron phase does not find a solution to y then the last iterate y N = N k=0 u k satisfies y N N = 4m 4. Since {e,..., e m } is an orthonormal basis and each u k satisfies u k =, we have e j, u k / m for j = argmax e i, u k. i=,...,m urthermore, since j=±,...,±m S j = N = 4m 4 it follows that the set S i in the rescaling phase must have at least m 3 elements. Thus e i, u k = e i, u k S i m 5/. 9) m k S i k S i k S i u k On the other hand, for all y we have 0 k S i u k, y N k=0 u k, y = y N, y y N y 4m y. 0) Putting 9) and 0) together, it follows that for all y 4m y 0 d, y = y. m 5/ 6m Hence 8) holds. 8

9 4 Smooth version for the polyhedral case Consider again the case when = {y R m : A T y > 0}, where A R m n. We next show that in this case the perceptron phase can be substituted by a smooth perceptron phase by relying on the machinery developed by Soheili and Peña []. This leads to an algorithm with a substantially improved convergence rate but whose work per main iteration is roughly comparable to that in the rescaled perceptron algorithm. Suppose A satisfies Assumption. or µ > 0 let x µ : R m R n be defined by y x µ y) = e AT e AT y. In this expression e AT y µ is shorthand for the n-dimensional vector e AT y µ := e a T y µ.. e a T n y µ Let R n denote the n-dimensional vector of all ones. Consider the following smooth version of the rescaled perceptron algorithm. Smooth Rescaled Perceptron Algorithm. let B := I; à := A; N := 7n m logn).. Smooth Perceptron Phase) y 0 := à n ; µ 0 := ; x 0 := x µ0 y 0 ); for k = 0,,,..., N if ÃT y k > 0 then HALT and output By k else θ k := k+3 ; y k+ := θ k )y k + θ k Ãx k ) + θ kãx µ k y k ); µ k+ := θ k )µ k ; x k+ := θ k )x k + θ k x µk+ y k+ ); end if end for 3. Rescaling Phase) j = argmax e i, x N i=,...,n B := BI ãjã T j ); à := I ãjã T j )à normalize the columns of à 4. Go back to Step. Theorem 3 Assume A R m n satisfies Assumption. Then the smooth rescaled perceptron algorithm terminates with a solution to A T y > 0 after at 9

10 most log.5) m ) log τ ) + logπ) ) )) = O m log rescaling steps. Since the algorithm performs On m logn)) perceptron updates between rescaling steps, the algorithm terminates after at most O mn )) m logn) log perceptron updates. Proof: This proof is a modification of the proof of Theorem. It suffices to show that if the smooth perceptron phase in the rescaled perceptron algorithm does not find a solution to ÃT y > 0 then the vector ã j in the first step of the rescaling phase satisfies { {y : ÃT y 0} y : 0 ã T j y } y. ) 6m Indeed, from [, Lemma 4.] it follows that if the perceptron phase does not find a solution to ÃT y > 0, then Ãx N 8 logn) N+) 8 49mn 6mn. Since x N 0 and x N =, the index j in the rescaling phase satisfies e j, x N n. Therefore, if ÃT y 0 then 0 n ãt j y e j, x N ã T j y x T NÃT y Ãx N y 6mn y. So ) follows. References [] S. Agmon. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 63):38 39, 954. [] E. Amaldi, P. Belotti, and R. Hauser. A randomized algorithm for the maxfs problem. In IPCO, pages 49 64, 005. [3] E. Amaldi and R. Hauser. Boundedness theorems for the relaxation method. Math. Oper. Res., 304): , 005. [4] H. H. Bauschke and J. M. Borwein. Legendre functions and the method of random Bregman projections. J. Convex Anal., 4:7 67, 997. [5] H. H. Bauschke, J. M. Borwein, and A. Lewis. The method of cyclic projections for closed convex sets in Hilbert space. Contemporary Math, 04: 38, 997. [6] A. Belloni, R. reund, and S. Vempala. An efficient rescaled perceptron algorithm for conic systems. Math. Oper. Res, 343):6 64,

11 [7] U. Betke. Relaxation, new combinatorial and polynomial algorithms for the linear feasibility problem. Discrete & Computational Geometry, 3:37 338, 004. [8] H. D. Block. The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:3 35, 96. [9] A. Blum, A. rieze, R. Kannan, and S. Vempala. A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica, - ):35 5, 998. [0] S. Chubanov. A strongly polynomial algorithm for linear systems having a binary solution. Math. Program., 34: , 0. [] J. Dunagan and S. Vempala. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program., 4):0 4, 006. [] Y. reund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:77 96, 999. [3] A Gilpin, J. Peña, and T. Sandholm. irst-order algorithm with Oln/ɛ)) convergence for ɛ-equilibrium in two-person zero-sum games. Math. Program., 33:79 98, 0. [4] J. Goffin. The relaxation method for solving systems of linear inequalities. Math. Oper. Res., 5:388 44, 980. [5] J. Goffin. On the non-polynomiality of the relaxation method for systems of linear inequalities. Math. Program., :93 03, 98. [6] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 63): , 954. [7] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 65 6, 96. [8] B. O Donoghue and E. J. Candès. Adaptive restart for accelerated gradient schemes. oundations of Computational Mathematics, To Appear. [9]. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Cornell Aeronautical Laboratory, Psychological Review, 656): , 958. [0] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: primal estimated sub-gradient solver for SVM. Math. Program., 7:3 30, 0. [] N. Soheili and J. Peña. A smooth perceptron algorithm. SIAM Journal on Optimization, ):78 737, 0.

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

On the von Neumann and Frank-Wolfe Algorithms with Away Steps On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili July 16, 015 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine

More information

A Polynomial Column-wise Rescaling von Neumann Algorithm

A Polynomial Column-wise Rescaling von Neumann Algorithm A Polynomial Column-wise Rescaling von Neumann Algorithm Dan Li Department of Industrial and Systems Engineering, Lehigh University, USA Cornelis Roos Department of Information Systems and Algorithms,

More information

Some preconditioners for systems of linear inequalities

Some preconditioners for systems of linear inequalities Some preconditioners for systems of linear inequalities Javier Peña Vera oshchina Negar Soheili June 0, 03 Abstract We show that a combination of two simple preprocessing steps would generally improve

More information

Rescaling Algorithms for Linear Programming Part I: Conic feasibility

Rescaling Algorithms for Linear Programming Part I: Conic feasibility Rescaling Algorithms for Linear Programming Part I: Conic feasibility Daniel Dadush dadush@cwi.nl László A. Végh l.vegh@lse.ac.uk Giacomo Zambelli g.zambelli@lse.ac.uk Abstract We propose simple polynomial-time

More information

Solving Conic Systems via Projection and Rescaling

Solving Conic Systems via Projection and Rescaling Solving Conic Systems via Projection and Rescaling Javier Peña Negar Soheili December 26, 2016 Abstract We propose a simple projection and rescaling algorithm to solve the feasibility problem find x L

More information

An Efficient Re-scaled Perceptron Algorithm for Conic Systems

An Efficient Re-scaled Perceptron Algorithm for Conic Systems An Efficient Re-scaled Perceptron Algorithm for Conic Systems Alexandre Belloni, Robert M. Freund, and Santosh S. Vempala Abstract. The classical perceptron algorithm is an elementary algorithm for solving

More information

DISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of

DISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of repper SCHOOL OF BUSINESS DISSERTATION Submitted in partial fulfillment ofthe requirements for the degree of DOCTOR OF PHILOSOPHY INDUSTRIAL ADMINISTRATION (OPERATIONS RESEARCH) Titled "ELEMENTARY ALGORITHMS

More information

An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms

An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms Dan Li and Tamás Terlaky Department of Industrial and Systems Engineering, Lehigh University, USA ISE Technical

More information

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on

More information

A simpler unified analysis of Budget Perceptrons

A simpler unified analysis of Budget Perceptrons Ilya Sutskever University of Toronto, 6 King s College Rd., Toronto, Ontario, M5S 3G4, Canada ILYA@CS.UTORONTO.CA Abstract The kernel Perceptron is an appealing online learning algorithm that has a drawback:

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

A data-independent distance to infeasibility for linear conic systems

A data-independent distance to infeasibility for linear conic systems A data-independent distance to infeasibility for linear conic systems Javier Peña Vera Roshchina May 29, 2018 Abstract We offer a unified treatment of distinct measures of well-posedness for homogeneous

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

Machine Learning Lecture 6 Note

Machine Learning Lecture 6 Note Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,

More information

An Efficient Re-scaled Perceptron Algorithm for Conic Systems

An Efficient Re-scaled Perceptron Algorithm for Conic Systems An Efficient Re-scaled Perceptron Algorithm for Conic Systems Alexandre Belloni IBM T. J. Watson Research Center and MIT, 32-22, 0 Kitchawan Road, Yorktown Heights, New York 0598 email: belloni@mit.edu

More information

arxiv: v3 [math.oc] 25 Nov 2015

arxiv: v3 [math.oc] 25 Nov 2015 arxiv:1507.04073v3 [math.oc] 5 Nov 015 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili October 14, 018 Abstract The von Neumann algorithm is a simple

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins arxiv:1406.5311v1 [math.oc] 20 Jun 2014 Aaditya Ramdas Machine Learning Department Carnegie Mellon University aramdas@cs.cmu.edu

More information

A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature

A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature Alexandre Belloni and Robert M. Freund April, 007 Abstract For a conic linear system of the form Ax K, K a convex

More information

A Bound on the Label Complexity of Agnostic Active Learning

A Bound on the Label Complexity of Agnostic Active Learning A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Smoothed Analysis of the Perceptron Algorithm for Linear Programming

Smoothed Analysis of the Perceptron Algorithm for Linear Programming Smoothed Analysis of the Perceptron Algorithm for Linear Programming Avrim Blum John Dunagan Abstract The smoothed complexity [1] of an algorithm is the expected running time of the algorithm on an arbitrary

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007

Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 MIT OpenCourseWare http://ocw.mit.edu 18.409 Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

7. Lecture notes on the ellipsoid algorithm

7. Lecture notes on the ellipsoid algorithm Massachusetts Institute of Technology Michel X. Goemans 18.433: Combinatorial Optimization 7. Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm proposed for linear

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The linear algebra of linear programs (March 15 and 22, 2015) 1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real

More information

A New Perspective on an Old Perceptron Algorithm

A New Perspective on an Old Perceptron Algorithm A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information

arxiv: v4 [math.oc] 17 Dec 2017

arxiv: v4 [math.oc] 17 Dec 2017 Rescaling Algorithms for Linear Conic Feasibility Daniel Dadush dadush@cwi.nl László A. Végh l.vegh@lse.ac.uk Giacomo Zambelli g.zambelli@lse.ac.uk arxiv:1611.06427v4 [math.oc] 17 Dec 2017 Abstract We

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Some Sieving Algorithms for Lattice Problems

Some Sieving Algorithms for Lattice Problems Foundations of Software Technology and Theoretical Computer Science (Bangalore) 2008. Editors: R. Hariharan, M. Mukund, V. Vinay; pp - Some Sieving Algorithms for Lattice Problems V. Arvind and Pushkar

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

From the Zonotope Construction to the Minkowski Addition of Convex Polytopes

From the Zonotope Construction to the Minkowski Addition of Convex Polytopes From the Zonotope Construction to the Minkowski Addition of Convex Polytopes Komei Fukuda School of Computer Science, McGill University, Montreal, Canada Abstract A zonotope is the Minkowski addition of

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation

More information

Learning Optimal Commitment to Overcome Insecurity

Learning Optimal Commitment to Overcome Insecurity Learning Optimal Commitment to Overcome Insecurity Avrim Blum Carnegie Mellon University avrim@cs.cmu.edu Nika Haghtalab Carnegie Mellon University nika@cmu.edu Ariel D. Procaccia Carnegie Mellon University

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Lecture notes on the ellipsoid algorithm

Lecture notes on the ellipsoid algorithm Massachusetts Institute of Technology Handout 1 18.433: Combinatorial Optimization May 14th, 007 Michel X. Goemans Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

Theory and Internet Protocols

Theory and Internet Protocols Game Lecture 2: Linear Programming and Zero Sum Nash Equilibrium Xiaotie Deng AIMS Lab Department of Computer Science Shanghai Jiaotong University September 26, 2016 1 2 3 4 Standard Form (P) Outline

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1 90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression

More information

1 Learning Linear Separators

1 Learning Linear Separators 10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015 Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Lecture notes for quantum semidefinite programming (SDP) solvers

Lecture notes for quantum semidefinite programming (SDP) solvers CMSC 657, Intro to Quantum Information Processing Lecture on November 15 and 0, 018 Fall 018, University of Maryland Prepared by Tongyang Li, Xiaodi Wu Lecture notes for quantum semidefinite programming

More information

Lecture 15: October 15

Lecture 15: October 15 10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

Fundamental Domains for Integer Programs with Symmetries

Fundamental Domains for Integer Programs with Symmetries Fundamental Domains for Integer Programs with Symmetries Eric J. Friedman Cornell University, Ithaca, NY 14850, ejf27@cornell.edu, WWW home page: http://www.people.cornell.edu/pages/ejf27/ Abstract. We

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm. Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm. Instructor: Shaddin Dughmi CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm Instructor: Shaddin Dughmi History and Basics Originally developed in the mid 70s by Iudin, Nemirovski, and Shor for use

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

LINEAR PROGRAMMING III

LINEAR PROGRAMMING III LINEAR PROGRAMMING III ellipsoid algorithm combinatorial optimization matrix games open problems Lecture slides by Kevin Wayne Last updated on 7/25/17 11:09 AM LINEAR PROGRAMMING III ellipsoid algorithm

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

Covering an ellipsoid with equal balls

Covering an ellipsoid with equal balls Journal of Combinatorial Theory, Series A 113 (2006) 1667 1676 www.elsevier.com/locate/jcta Covering an ellipsoid with equal balls Ilya Dumer College of Engineering, University of California, Riverside,

More information

10. Ellipsoid method

10. Ellipsoid method 10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Learning Optimal Commitment to Overcome Insecurity

Learning Optimal Commitment to Overcome Insecurity Learning Optimal Commitment to Overcome Insecurity Avrim Blum Carnegie Mellon University avrim@cs.cmu.edu Nika Haghtalab Carnegie Mellon University nika@cmu.edu Ariel D. Procaccia Carnegie Mellon University

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Online Learning Summer School Copenhagen 2015 Lecture 1

Online Learning Summer School Copenhagen 2015 Lecture 1 Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen (chen@cs.panam.edu) Department of Computer Science, University of Texas-Pan American, 1201 West University

More information

CSC Linear Programming and Combinatorial Optimization Lecture 8: Ellipsoid Algorithm

CSC Linear Programming and Combinatorial Optimization Lecture 8: Ellipsoid Algorithm CSC2411 - Linear Programming and Combinatorial Optimization Lecture 8: Ellipsoid Algorithm Notes taken by Shizhong Li March 15, 2005 Summary: In the spring of 1979, the Soviet mathematician L.G.Khachian

More information

Private Empirical Risk Minimization, Revisited

Private Empirical Risk Minimization, Revisited Private Empirical Risk Minimization, Revisited Raef Bassily Adam Smith Abhradeep Thakurta April 10, 2014 Abstract In this paper, we initiate a systematic investigation of differentially private algorithms

More information

Efficient Learning of Linear Perceptrons

Efficient Learning of Linear Perceptrons Efficient Learning of Linear Perceptrons Shai Ben-David Department of Computer Science Technion Haifa 32000, Israel shai~cs.technion.ac.il Hans Ulrich Simon Fakultat fur Mathematik Ruhr Universitat Bochum

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

ON THE MINIMUM VOLUME COVERING ELLIPSOID OF ELLIPSOIDS

ON THE MINIMUM VOLUME COVERING ELLIPSOID OF ELLIPSOIDS ON THE MINIMUM VOLUME COVERING ELLIPSOID OF ELLIPSOIDS E. ALPER YILDIRIM Abstract. Let S denote the convex hull of m full-dimensional ellipsoids in R n. Given ɛ > 0 and δ > 0, we study the problems of

More information