A Deterministic Rescaled Perceptron Algorithm
|
|
- Dulcie Page
- 5 years ago
- Views:
Transcription
1 A Deterministic Rescaled Perceptron Algorithm Javier Peña Negar Soheili June 5, 03 Abstract The perceptron algorithm is a simple iterative procedure for finding a point in a convex cone. At each iteration, the algorithm only involves a query to a separation oracle for and a simple update on a trial solution. The perceptron algorithm is guaranteed to find a point in after O/τ ) iterations, where is the width of the cone. We propose a version of the perceptron algorithm that includes a periodic rescaling of the ambient space. In contrast to the classical version, our rescaled version finds a point in in Om 5 log/ )) perceptron updates. This result is inspired by and strengthens the previous work on randomized rescaling of the perceptron algorithm by Dunagan and Vempala [Math. Program ), 0 4] and by Belloni, reund, and Vempala [Math. Oper. Res ), 6 64]. In particular, our algorithm and its complexity analysis are simpler and shorter. urthermore, our algorithm does not require randomization or deep separation oracles. Introduction The relaxation method, introduced in the classical articles of Agmon [], and Motzkin and Schoenberg [6], is a conceptual algorithmic scheme for solving the feasibility problem y. ) Here R m is assumed to be an open convex set with an available separation oracle: Given a test point y R m, the oracle either certifies that y or else it finds a hyperplane separating y from, that is, u R m, b R such that u, y b and u, v > b for all v. The relaxation method starts with an arbitrary initial trial solution. At each iteration, the algorithm queries the separation oracle for at the current trial solution y. If y then the algorithm terminates. Otherwise, the algorithm generates a new trial point y + = y + ηu for some step length η > 0 where u R m, b R determine a hyperplane separating y from as above. Tepper School of Business, Carnegie Mellon University, USA, jfp@andrew.cmu.edu Tepper School of Business, Carnegie Mellon University, USA, nsoheili@andrew.cmu.edu
2 The perceptron algorithm can be seen as a particular type of relaxation method for the problem ). It applies to the case when is the interior of a convex cone. It usually starts at the origin as the initial trial solution and each update is of the form y + = y + u u. The perceptron algorithm was originally proposed by Rosenblatt [9] for the polyhedral feasibility problem A T y > 0. As it was noted by Belloni, reund, and Vempala [6], the algorithm readily extends to the more general problem ) when is the interior of a convex cone, as described above. urthermore, Belloni et al. [6, Lemma 3.] showed that the classical perceptron iteration bound of Block [8] and Novikoff [7] also holds in general: The perceptron algorithm finds a solution to ) in at most O/τ ) perceptron updates, where is the width of the cone : := sup {r R + : By, r) }. ) y = Here By, r) denotes the Euclidean ball of radius r centered at y, that is By, r) = {u R m : u y r}. Similar results also hold for the relaxation method as established by Goffin [4]. Since their emergence in the fifties, both the perceptron algorithm and the relaxation method have played major roles in machine learning and in optimization. The perceptron algorithm has attractive properties concerning noise tolerance [9]. It is also closely related to large-margin classification [] and to the highly popular and computationally effective Pegasos algorithm [0] for training support-vector machines. There are also numerous papers in the optimization literature related to various versions and variants of the relaxation method [, 3, 4, 5, 0]. A major drawback of both the perceptron algorithm and the relaxation method is their lack of theoretical efficiency in the standard bit model of computation [5]. In particular, when = {y : A T y > 0} with A Z m n, the perceptron algorithm may have exponential worst-case bit-model complexity because can be exponentially small in the bit-length representation of A. Our main contribution is a variant of the perceptron algorithm that solves ) in Om 5 log/ )) perceptron updates. In particular, when = {y : A T y > 0} with A Z m n, our algorithm is polynomial in the bit-length representation of A. Aside from its theoretical merits, given the close connection between the perceptron algorithm and first-order methods [], our algorithm provides a solid foundation for potential speed ups in the convergence of the widely popular first-order methods for large-scale convex optimization. Some results of similar nature have been recently obtained by Gilpin et al. [3] and by O Donoghue and Candès [8]. Our algorithm is based on a periodic rescaling of the space R m in the same spirit as in previous work by Dunagan and Vempala [], and by Belloni, reund, and Vempala [6]. In contrast to the rescaling procedure in [, 6], which is randomized and relies on a deep separation oracle, our rescaling procedure is deterministic and relies only on a separation oracle. The algorithm performs at most Om log/ )) rescaling steps and at most Om 4 ) perceptron updates between rescaling steps. When = {y R m : A T y > 0} for A R m n, a simplified version of the algorithm has iteration bound Om n log/ )). A smooth version of this algorithm, along the lines developed by Soheili and
3 Peña [], in turn has the improved iteration bound Omn m logn) log/ )). Our rescaled perceptron algorithm consists of an outer loop with two main phases. The first one is a perceptron phase and the second one is a rescaling phase. The perceptron phase applies a restricted number of perceptron updates. If this phase does not find a feasible solution, then it finds a unitary vector d R m such that { y R m : 0 d, y } y. 6m This inclusion means that the feasible cone is nearly perpendicular to d. The second phase of the outer loop, namely a rescaling phase, stretches R m along d and is guaranteed to enlarge the volume of the set {y : y = } by a constant factor. This in turn implies that the algorithm must halt in at most Om log/ )) outer iterations. Polyhedral case or ease of exposition, we first consider the case = {y R m : A T y > 0} for A R m n. Assumption i) The space R m is endowed with the canonical dot inner product u, v := u T v. ii) A = [ a a n ] where ai = for i =,..., n. iii) The problem A T y > 0 is feasible. In particular, > 0. or j =,..., n let e j R n denote the vector with jth component equal to one and all other components equal to zero. Rescaled Perceptron Algorithm. let B := I; à := A; N := 6mn. Perceptron Phase) x 0 := 0 R n ; y 0 := 0 R m ; for k = 0,,..., N if ÃT y k > 0 then HALT output By k else let j {,..., n} be such that ã T j y k 0 x k+ := x k + e j y k+ := y k + ã j end if end for 3. Rescaling Phase) j = argmax e i, x N i=,...,n B := BI ãjã T j ); à := I ãjã T j )à normalize the columns of à 4. Go back to Step. 3
4 The rescaled perceptron algorithm changes the initial constraint matrix A to a new matrix à = BT A. Thus when ÃT y > 0, the non-zero vector By returned by the algorithm solves A T y > 0. Now we can state a special version of our main theorem. Theorem Assume A R m n satisfies Assumption. Then the rescaled perceptron algorithm terminates with a solution to A T y > 0 after at most ) ) log.5) m ) log + )) τ logπ) = O m log τ rescaling steps. Since the algorithm performs Omn ) perceptron updates between rescaling steps, the algorithm terminates after at most )) O m n log perceptron updates. The key ingredients in the proof of Theorem are the three lemmas below. The first of these lemmas states that if the perceptron phase does not solve à T y > 0, then the rescaling phase identifies a column ã j of à that is nearly perpendicular to the feasible cone {y : à T y 0}. The second lemma in turn implies that the rescaling phase increases the volume of this cone by a constant factor. The third lemma states that the volume of the initial feasible cone = {y : A T y 0} is bounded below by a factor of τ m. Lemma If the perceptron phase in the rescaled perceptron algorithm does not find a solution to ÃT y > 0 then the vector ã j in the first step of the rescaling phase satisfies { {y : ÃT y 0} y : 0 ã T j y } y. 3) 6m Proof: Observe that at each iteration of the perceptron phase we have y k+ = y k + a T j y k + y k +. Hence y k k. Also, throughout the perceptron phase x k 0, y k = Ãx k and x k+ = x k +. Thus if the perceptron phase does not find a solution to à T y > 0 then the last iterates y N and x N satisfy x N 0, x N = N = 6mn and y N = Ãx N N = n 6m. In particular, the index j in the first step of the rescaling phase satisfies e j, x N x N /n = 6mn. Next observe that if ÃT y 0 then 0 6mn ã T j y e j, x N ã T j y x T NÃT y Ãx N y n 6m y. So 3) follows. The following two lemmas rely on geometric arguments concerning the unit sphere S m := {u R m : u = }. Given a measurable set C S m, let VolC) denote its volume in S m. 4
5 We rely on the following construction proposed by Betke [7]. Given a S m and α >, let Ψ a,α : S m S m denote the transformation u I + α )aat )u I + α )aa T )u = u + α )at u)a + α )a T u). This transformation stretches the sphere in the direction a. The magnitude of the stretch is determined by α. Lemma Assume a S m, 0 < δ <, and α >. If C {y S m : 0 a T y δ} is a measurable set, then α Vol Ψ a,α C)) VolC). 4) + δ α m/ )) In particular, if δ = 6m and α = then Vol Ψ a,α C)).5VolC). 5) Proof: Without loss of generality assume a = e m. Also for ease of notation, we shall write Ψ as shorthand for Ψ a,α. Under these assumptions, for y = ȳ, y m ) S m we have ȳ, αy m ) Ψȳ, y m ) = α + α ) ȳ. To calculate the volume of C and of ΨC), consider the differentiable map Φ : B m R m, defined by Φ v) = v, ) v that maps the unit ball B m := {v R m : v } to the surface of the hemisphere {ȳ, y m ) S m : y m 0} containing the set C. The volume of C is VolC) = Φ dv. Φ C) where Φ denotes the volume of the m )-dimensional parallelepiped spanned by the vectors Φ/ v,..., Φ/ v m. Likewise, the volume of ΨC) is VolΨC)) = Ψ Φ) dv. Φ C) Hence to prove 4) it suffices to show that Ψ Φ) v) Φ v) α + δ α )) m/ for all v Φ C). 6) Some straightforward calculations show that for all v intb m ) Ψ Φ) v) = α and Φ v) = v α + α ) v ). m/ v Hence for all v intb m ) Ψ Φ) v) Φ v) = α α + α ) v ) m/. 5
6 To obtain 6), observe that if v Φ C) then 0 v δ and thus If δ = 6m and α = then α + α ) v + δ α ). α + δ α )) = m/ ) + m/4 exp0.5).5. m Thus 5) follows from 4). Lemma 3 Assume R m is a closed convex cone. Then Vol S m ) ) m τ π VolSm ). 7) Proof: rom the definition of the cone width it follows that Bz, ) for some z with z =. Therefore τ )z + v for all v Rm such that v τ and z, v = 0. This implies that S m contains a spherical cap of S m with base radius τ. Hence Vol S m ) τ ) m VolB m ). π m, Γ m +) VolSm ) = m + ) Γ + ). The bound 7) now follows from the facts VolB m ) = π m Γ m +), and Γ m Proof of Theorem : Let := {y R m : Ã T y 0}. Observe that the rescaling phase rescales to I + ã j ã T j ). Therefore, Lemma and Lemma imply that after each rescaling phase the quantity Vol S m ) increases by a factor of.5 or more. Since the set S m is always contained in a hemisphere, we conclude that the number of rescaling steps before the algorithm halts cannot be larger than VolS m ) log.5) log )/ Vol S m ) To finish, apply Lemma 3. 3 General case The gist of the algorithm for the general case of a convex cone is the same as that of the polyhedral case presented above. We just need a bit of extra work to identify a suitable direction for the rescaling phase. To do so, we maintain a collection of m index sets S j, j = ±, ±,..., ±m. This collection of sets helps us determine a subset of update steps that align with each other. The sum of these steps in turn defines the appropriate direction for rescaling. Assumption 6
7 i) The space R m is endowed with the canonical dot inner product,. ii) R m is the non-empty interior of a convex cone. In particular, > 0. iii) There is an available separating oracle for the cone : Given y R m the oracle either determines that y or else it finds a non-zero vector u := {u : u, v > 0 for all v } such that u, y 0. or j =,..., m let e j R m denote the vector with jth component equal to one and all other components equal to zero. Observe that for a non-singular matrix B R m m, we have B ) = B T. Thus a separation oracle for := B is readily available provided one for is: Given y R m, apply the separation oracle for to the point By. If By then y B =. If By, then let u be a non-zero vector such that u, By 0. Thus B T u, y = u, By 0 with B T u B ) =. Consequently, throughout the algorithm below we assume that a separation oracle for the rescaled cone is available. General Rescaled Perceptron Algorithm. let B := I; := ; N := 4m 4. for j = ±, ±,..., ±m S j := end for 3. Perceptron Phase) u 0 := 0 R m ; y 0 := 0 R m ; for k = 0,,..., N if y k then HALT and output By k else let u k be such that u k, y k 0 and u k = y k+ = y k + u k j := argmax i=,...,m e i, u k if e j, u k > 0 then S j := S j {k} else S j := S j {k} end if end if end for 4. Rescaling Phase) i = argmax S j d := j=±,...,±m k S u k i k S u k i B := BI ddt ); 5. Go back to Step. := I + dd T ) The general rescaled perceptron algorithm changes the initial cone to = B. Thus when y, we have By. Notice that although the above algorithm implicitly performs this transformation, its steps do not involve inverting any matrices or solving any system of equations. Now we can state the general version of our main theorem. 7
8 Theorem Assume R m is such that Assumption holds. Then the general rescaled perceptron algorithm terminates with a solution to y after at most log.5) m ) log τ ) + logπ) ) )) = O m log rescaling steps. Since the algorithm performs Om 4 ) perceptron updates between rescaling steps, the algorithm terminates after at most )) O m 5 log perceptron updates. The proof of Theorem is almost identical to the proof of Theorem. All we need is the following analog of Lemma. Lemma 4 If the perceptron phase in the general rescaled perceptron algorithm does not find a solution to y then the vector d in the rescaling phase satisfies { y : 0 d, y } y. 8) 6m Proof: Proceeding as in the proof of Lemma, it is easy to see that if the perceptron phase does not find a solution to y then the last iterate y N = N k=0 u k satisfies y N N = 4m 4. Since {e,..., e m } is an orthonormal basis and each u k satisfies u k =, we have e j, u k / m for j = argmax e i, u k. i=,...,m urthermore, since j=±,...,±m S j = N = 4m 4 it follows that the set S i in the rescaling phase must have at least m 3 elements. Thus e i, u k = e i, u k S i m 5/. 9) m k S i k S i k S i u k On the other hand, for all y we have 0 k S i u k, y N k=0 u k, y = y N, y y N y 4m y. 0) Putting 9) and 0) together, it follows that for all y 4m y 0 d, y = y. m 5/ 6m Hence 8) holds. 8
9 4 Smooth version for the polyhedral case Consider again the case when = {y R m : A T y > 0}, where A R m n. We next show that in this case the perceptron phase can be substituted by a smooth perceptron phase by relying on the machinery developed by Soheili and Peña []. This leads to an algorithm with a substantially improved convergence rate but whose work per main iteration is roughly comparable to that in the rescaled perceptron algorithm. Suppose A satisfies Assumption. or µ > 0 let x µ : R m R n be defined by y x µ y) = e AT e AT y. In this expression e AT y µ is shorthand for the n-dimensional vector e AT y µ := e a T y µ.. e a T n y µ Let R n denote the n-dimensional vector of all ones. Consider the following smooth version of the rescaled perceptron algorithm. Smooth Rescaled Perceptron Algorithm. let B := I; à := A; N := 7n m logn).. Smooth Perceptron Phase) y 0 := à n ; µ 0 := ; x 0 := x µ0 y 0 ); for k = 0,,,..., N if ÃT y k > 0 then HALT and output By k else θ k := k+3 ; y k+ := θ k )y k + θ k Ãx k ) + θ kãx µ k y k ); µ k+ := θ k )µ k ; x k+ := θ k )x k + θ k x µk+ y k+ ); end if end for 3. Rescaling Phase) j = argmax e i, x N i=,...,n B := BI ãjã T j ); à := I ãjã T j )à normalize the columns of à 4. Go back to Step. Theorem 3 Assume A R m n satisfies Assumption. Then the smooth rescaled perceptron algorithm terminates with a solution to A T y > 0 after at 9
10 most log.5) m ) log τ ) + logπ) ) )) = O m log rescaling steps. Since the algorithm performs On m logn)) perceptron updates between rescaling steps, the algorithm terminates after at most O mn )) m logn) log perceptron updates. Proof: This proof is a modification of the proof of Theorem. It suffices to show that if the smooth perceptron phase in the rescaled perceptron algorithm does not find a solution to ÃT y > 0 then the vector ã j in the first step of the rescaling phase satisfies { {y : ÃT y 0} y : 0 ã T j y } y. ) 6m Indeed, from [, Lemma 4.] it follows that if the perceptron phase does not find a solution to ÃT y > 0, then Ãx N 8 logn) N+) 8 49mn 6mn. Since x N 0 and x N =, the index j in the rescaling phase satisfies e j, x N n. Therefore, if ÃT y 0 then 0 n ãt j y e j, x N ã T j y x T NÃT y Ãx N y 6mn y. So ) follows. References [] S. Agmon. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 63):38 39, 954. [] E. Amaldi, P. Belotti, and R. Hauser. A randomized algorithm for the maxfs problem. In IPCO, pages 49 64, 005. [3] E. Amaldi and R. Hauser. Boundedness theorems for the relaxation method. Math. Oper. Res., 304): , 005. [4] H. H. Bauschke and J. M. Borwein. Legendre functions and the method of random Bregman projections. J. Convex Anal., 4:7 67, 997. [5] H. H. Bauschke, J. M. Borwein, and A. Lewis. The method of cyclic projections for closed convex sets in Hilbert space. Contemporary Math, 04: 38, 997. [6] A. Belloni, R. reund, and S. Vempala. An efficient rescaled perceptron algorithm for conic systems. Math. Oper. Res, 343):6 64,
11 [7] U. Betke. Relaxation, new combinatorial and polynomial algorithms for the linear feasibility problem. Discrete & Computational Geometry, 3:37 338, 004. [8] H. D. Block. The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:3 35, 96. [9] A. Blum, A. rieze, R. Kannan, and S. Vempala. A polynomial-time algorithm for learning noisy linear threshold functions. Algorithmica, - ):35 5, 998. [0] S. Chubanov. A strongly polynomial algorithm for linear systems having a binary solution. Math. Program., 34: , 0. [] J. Dunagan and S. Vempala. A simple polynomial-time rescaling algorithm for solving linear programs. Math. Program., 4):0 4, 006. [] Y. reund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:77 96, 999. [3] A Gilpin, J. Peña, and T. Sandholm. irst-order algorithm with Oln/ɛ)) convergence for ɛ-equilibrium in two-person zero-sum games. Math. Program., 33:79 98, 0. [4] J. Goffin. The relaxation method for solving systems of linear inequalities. Math. Oper. Res., 5:388 44, 980. [5] J. Goffin. On the non-polynomiality of the relaxation method for systems of linear inequalities. Math. Program., :93 03, 98. [6] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 63): , 954. [7] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 65 6, 96. [8] B. O Donoghue and E. J. Candès. Adaptive restart for accelerated gradient schemes. oundations of Computational Mathematics, To Appear. [9]. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Cornell Aeronautical Laboratory, Psychological Review, 656): , 958. [0] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: primal estimated sub-gradient solver for SVM. Math. Program., 7:3 30, 0. [] N. Soheili and J. Peña. A smooth perceptron algorithm. SIAM Journal on Optimization, ):78 737, 0.
Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014
Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,
More informationOn the von Neumann and Frank-Wolfe Algorithms with Away Steps
On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili July 16, 015 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine
More informationA Polynomial Column-wise Rescaling von Neumann Algorithm
A Polynomial Column-wise Rescaling von Neumann Algorithm Dan Li Department of Industrial and Systems Engineering, Lehigh University, USA Cornelis Roos Department of Information Systems and Algorithms,
More informationSome preconditioners for systems of linear inequalities
Some preconditioners for systems of linear inequalities Javier Peña Vera oshchina Negar Soheili June 0, 03 Abstract We show that a combination of two simple preprocessing steps would generally improve
More informationRescaling Algorithms for Linear Programming Part I: Conic feasibility
Rescaling Algorithms for Linear Programming Part I: Conic feasibility Daniel Dadush dadush@cwi.nl László A. Végh l.vegh@lse.ac.uk Giacomo Zambelli g.zambelli@lse.ac.uk Abstract We propose simple polynomial-time
More informationSolving Conic Systems via Projection and Rescaling
Solving Conic Systems via Projection and Rescaling Javier Peña Negar Soheili December 26, 2016 Abstract We propose a simple projection and rescaling algorithm to solve the feasibility problem find x L
More informationAn Efficient Re-scaled Perceptron Algorithm for Conic Systems
An Efficient Re-scaled Perceptron Algorithm for Conic Systems Alexandre Belloni, Robert M. Freund, and Santosh S. Vempala Abstract. The classical perceptron algorithm is an elementary algorithm for solving
More informationDISSERTATION. Submitted in partial fulfillment ofthe requirements for the degree of
repper SCHOOL OF BUSINESS DISSERTATION Submitted in partial fulfillment ofthe requirements for the degree of DOCTOR OF PHILOSOPHY INDUSTRIAL ADMINISTRATION (OPERATIONS RESEARCH) Titled "ELEMENTARY ALGORITHMS
More informationAn Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms
An Example with Decreasing Largest Inscribed Ball for Deterministic Rescaling Algorithms Dan Li and Tamás Terlaky Department of Industrial and Systems Engineering, Lehigh University, USA ISE Technical
More informationA Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility
A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on
More informationA simpler unified analysis of Budget Perceptrons
Ilya Sutskever University of Toronto, 6 King s College Rd., Toronto, Ontario, M5S 3G4, Canada ILYA@CS.UTORONTO.CA Abstract The kernel Perceptron is an appealing online learning algorithm that has a drawback:
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationA data-independent distance to infeasibility for linear conic systems
A data-independent distance to infeasibility for linear conic systems Javier Peña Vera Roshchina May 29, 2018 Abstract We offer a unified treatment of distinct measures of well-posedness for homogeneous
More informationA strongly polynomial algorithm for linear systems having a binary solution
A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th
More informationMachine Learning Lecture 6 Note
Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,
More informationAn Efficient Re-scaled Perceptron Algorithm for Conic Systems
An Efficient Re-scaled Perceptron Algorithm for Conic Systems Alexandre Belloni IBM T. J. Watson Research Center and MIT, 32-22, 0 Kitchawan Road, Yorktown Heights, New York 0598 email: belloni@mit.edu
More informationarxiv: v3 [math.oc] 25 Nov 2015
arxiv:1507.04073v3 [math.oc] 5 Nov 015 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili October 14, 018 Abstract The von Neumann algorithm is a simple
More informationCutting Plane Training of Structural SVM
Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationOptimized first-order minimization methods
Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationTowards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins
Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins arxiv:1406.5311v1 [math.oc] 20 Jun 2014 Aaditya Ramdas Machine Learning Department Carnegie Mellon University aramdas@cs.cmu.edu
More informationA Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature
A Geometric Analysis of Renegar s Condition Number, and its interplay with Conic Curvature Alexandre Belloni and Robert M. Freund April, 007 Abstract For a conic linear system of the form Ax K, K a convex
More informationA Bound on the Label Complexity of Agnostic Active Learning
A Bound on the Label Complexity of Agnostic Active Learning Steve Hanneke March 2007 CMU-ML-07-103 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Machine Learning Department,
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationSmoothed Analysis of the Perceptron Algorithm for Linear Programming
Smoothed Analysis of the Perceptron Algorithm for Linear Programming Avrim Blum John Dunagan Abstract The smoothed complexity [1] of an algorithm is the expected running time of the algorithm on an arbitrary
More informationOn the power and the limits of evolvability. Vitaly Feldman Almaden Research Center
On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant
More informationTopics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007
MIT OpenCourseWare http://ocw.mit.edu 18.409 Topics in Theoretical Computer Science: An Algorithmist's Toolkit Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More information7. Lecture notes on the ellipsoid algorithm
Massachusetts Institute of Technology Michel X. Goemans 18.433: Combinatorial Optimization 7. Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm proposed for linear
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More information1 The linear algebra of linear programs (March 15 and 22, 2015)
1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real
More informationA New Perspective on an Old Perceptron Algorithm
A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationarxiv: v4 [math.oc] 17 Dec 2017
Rescaling Algorithms for Linear Conic Feasibility Daniel Dadush dadush@cwi.nl László A. Végh l.vegh@lse.ac.uk Giacomo Zambelli g.zambelli@lse.ac.uk arxiv:1611.06427v4 [math.oc] 17 Dec 2017 Abstract We
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationSome Sieving Algorithms for Lattice Problems
Foundations of Software Technology and Theoretical Computer Science (Bangalore) 2008. Editors: R. Hariharan, M. Mukund, V. Vinay; pp - Some Sieving Algorithms for Lattice Problems V. Arvind and Pushkar
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationFrom the Zonotope Construction to the Minkowski Addition of Convex Polytopes
From the Zonotope Construction to the Minkowski Addition of Convex Polytopes Komei Fukuda School of Computer Science, McGill University, Montreal, Canada Abstract A zonotope is the Minkowski addition of
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationThe Perceptron Algorithm
The Perceptron Algorithm Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Outline The Perceptron Algorithm Perceptron Mistake Bound Variants of Perceptron 2 Where are we? The Perceptron
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationSolving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels
Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels Y.C. Hon and R. Schaback April 9, Abstract This paper solves the Laplace equation u = on domains Ω R 3 by meshless collocation
More informationLearning Optimal Commitment to Overcome Insecurity
Learning Optimal Commitment to Overcome Insecurity Avrim Blum Carnegie Mellon University avrim@cs.cmu.edu Nika Haghtalab Carnegie Mellon University nika@cmu.edu Ariel D. Procaccia Carnegie Mellon University
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationLecture notes on the ellipsoid algorithm
Massachusetts Institute of Technology Handout 1 18.433: Combinatorial Optimization May 14th, 007 Michel X. Goemans Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationTheory and Internet Protocols
Game Lecture 2: Linear Programming and Zero Sum Nash Equilibrium Xiaotie Deng AIMS Lab Department of Computer Science Shanghai Jiaotong University September 26, 2016 1 2 3 4 Standard Form (P) Outline
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationSparse Optimization Lecture: Dual Certificate in l 1 Minimization
Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization
More informationPreliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1
90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationChapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.
Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationLecture notes for quantum semidefinite programming (SDP) solvers
CMSC 657, Intro to Quantum Information Processing Lecture on November 15 and 0, 018 Fall 018, University of Maryland Prepared by Tongyang Li, Xiaodi Wu Lecture notes for quantum semidefinite programming
More informationLecture 15: October 15
10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationInformation-Theoretic Limits of Matrix Completion
Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016
U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest
More informationFundamental Domains for Integer Programs with Symmetries
Fundamental Domains for Integer Programs with Symmetries Eric J. Friedman Cornell University, Ithaca, NY 14850, ejf27@cornell.edu, WWW home page: http://www.people.cornell.edu/pages/ejf27/ Abstract. We
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationCS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm. Instructor: Shaddin Dughmi
CS675: Convex and Combinatorial Optimization Spring 2018 The Ellipsoid Algorithm Instructor: Shaddin Dughmi History and Basics Originally developed in the mid 70s by Iudin, Nemirovski, and Shor for use
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLINEAR PROGRAMMING III
LINEAR PROGRAMMING III ellipsoid algorithm combinatorial optimization matrix games open problems Lecture slides by Kevin Wayne Last updated on 7/25/17 11:09 AM LINEAR PROGRAMMING III ellipsoid algorithm
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationCovering an ellipsoid with equal balls
Journal of Combinatorial Theory, Series A 113 (2006) 1667 1676 www.elsevier.com/locate/jcta Covering an ellipsoid with equal balls Ilya Dumer College of Engineering, University of California, Riverside,
More information10. Ellipsoid method
10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationLearning Optimal Commitment to Overcome Insecurity
Learning Optimal Commitment to Overcome Insecurity Avrim Blum Carnegie Mellon University avrim@cs.cmu.edu Nika Haghtalab Carnegie Mellon University nika@cmu.edu Ariel D. Procaccia Carnegie Mellon University
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLearning convex bodies is hard
Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More informationSubdifferential representation of convex functions: refinements and applications
Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential
More informationSome Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm
Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen (chen@cs.panam.edu) Department of Computer Science, University of Texas-Pan American, 1201 West University
More informationCSC Linear Programming and Combinatorial Optimization Lecture 8: Ellipsoid Algorithm
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 8: Ellipsoid Algorithm Notes taken by Shizhong Li March 15, 2005 Summary: In the spring of 1979, the Soviet mathematician L.G.Khachian
More informationPrivate Empirical Risk Minimization, Revisited
Private Empirical Risk Minimization, Revisited Raef Bassily Adam Smith Abhradeep Thakurta April 10, 2014 Abstract In this paper, we initiate a systematic investigation of differentially private algorithms
More informationEfficient Learning of Linear Perceptrons
Efficient Learning of Linear Perceptrons Shai Ben-David Department of Computer Science Technion Haifa 32000, Israel shai~cs.technion.ac.il Hans Ulrich Simon Fakultat fur Mathematik Ruhr Universitat Bochum
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationON THE MINIMUM VOLUME COVERING ELLIPSOID OF ELLIPSOIDS
ON THE MINIMUM VOLUME COVERING ELLIPSOID OF ELLIPSOIDS E. ALPER YILDIRIM Abstract. Let S denote the convex hull of m full-dimensional ellipsoids in R n. Given ɛ > 0 and δ > 0, we study the problems of
More information