Sublinear Optimization for Machine Learning

Size: px
Start display at page:

Download "Sublinear Optimization for Machine Learning"

Transcription

1 Sublinear Optimization for Machine Learning Kenneth L. Clarkson, IBM Almaen Research Center Ela Hazan, Technion - Israel Institute of technology Davi P. Wooruff, IBM Almaen Research Center In this paper we escribe an analyze sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers an fining minimum enclosing balls. Our algorithms can be extene to some kernelize versions of these problems, such as SVDD, har margin SVM, an L -SVM, for which sublinear-time algorithms were not known before. These new algorithms use a combination of a novel sampling techniques an a new multiplicative upate algorithm. We give lower bouns which show the running times of many of our algorithms to be nearly best possible in the unit-cost RAM moel. 1. INTRODUCTION Linear classification is a funamental problem of machine learning, in which positive an negative examples of a concept are represente in Eucliean space by their feature vectors, an we seek to fin a hyperplane separating the two classes of vectors. The Perceptron Algorithm for linear classification is one of the olest algorithms stuie in machine learning [Novikoff 1963; Minsky an Papert 1988]. It can be use to efficiently give a goo approximate solution, if one exists, an has nice noise-stability properties which allow it to be use as a subroutine in many applications such as learning with noise [Bylaner 1994; Blum et al. 1998], boosting [Serveio 1999] an more general optimization [Dunagan an Vempala 004]. In aition, it is extremely simple to implement: the algorithm starts with an arbitrary hyperplane, an iteratively fins a vector on which it errs, an moves in the irection of this vector by aing a multiple of it to the normal vector to the current hyperplane. The stanar implementation of the Perceptron Algorithm must iteratively fin a ba vector which is classifie incorrectly, that is, for which the inner prouct with the current normal vector has an incorrect sign. Our new algorithm is similar to the Perceptron Algorithm, in that it maintains a hyperplane an moifies it iteratively, accoring to the examples seen. However, instea of explicitly fining a ba vector, we run another ual learning algorithm to learn the most aversarial istribution over the vectors, an use that istribution to generate an expecte ba vector. Moreover, we o not compute the inner proucts with the current normal vector exactly, but instea estimate them using a fast sampling-base scheme. Thus our upate to the hyperplane uses a vector whose baness is etermine quickly, but very cruely. We show that espite this, an approximate solution is still obtaine in about the same number of iterations as the stanar perceptron. So our algorithm is faster; notably, it can be execute in time sublinear in the size of the Part of this work was one while E. Hazan was at IBM Almaen Research Center. He is currently supporte by Israel Science Founation grant 810/11. Permission to make igital or har copies of part or all of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies show this notice on the first page or initial screen of a isplay along with the full citation. Copyrights for components of this work owne by others than ACM must be honore. Abstracting with creit is permitte. To copy otherwise, to republish, to post on servers, to reistribute to lists, or to use any component of this work in other works requires prior specific permission an/or a fee. Permissions may be requeste from Publications Dept., ACM, Inc., Penn Plaza, Suite 701, New York, NY USA, fax +1 (1) , or permissions@acm.org. c 01 ACM /01/01-ART0 $10.00 DOI /

2 0: Clarkson, Hazan an Wooruff Problem Previous time Time Here Lower Boun linear Õ(ε M) classification [Novikoff 1963] Õ(ε (n + )) Ω(ε (n + )) 6.1 min. enc. Õ(ε 1/ M) ball (MEB) [Saha an Vishwanathan 009] Õ(ε n + ε 1 ) 3.1 Ω(ε n + ε 1 ) 6. QP in the O(ε 1 M) simplex [Frank an Wolfe 1956] Õ(ε n + ε 1 ) 3.3 Las Vegas aitive O(M) versions Cor.11 Ω(M) 6.4 kernelize factors O(s 4 ) MEB an QP or O(q) 5 Fig. 1. Our results, parameters efine in relevant sections input ata, an still have goo output, with high probability. (Here we must make some reasonable assumptions about the way in which the ata is store, as iscusse below.) This technique applies more generally than to the perceptron: we also obtain sublinear time approximation algorithms for the relate problems of fining an approximate Minimum Enclosing Ball (MEB) of a set of points, an training a Support Vector Machine (SVM), in the har margin or L -SVM formulations. We give lower bouns that imply that our algorithms for classification are best possible, up to polylogarithmic factors, in the unit-cost RAM moel, while our bouns for MEB are best possible up to an Õ(ε 1 ) factor. For most of these bouns, we give a family of inputs such that a single coorinate, ranomly plante over a large collection of input vector coorinates, etermines the output to such a egree that all coorinates in the collection must be examine for even a /3 probability of success. Our approach can be extene to give algorithms for the kernelize versions of these problems, for some popular kernels incluing the Gaussian an polynomial, an also easily gives Las Vegas results, where the output guarantees always hol, an only the running time is probabilistic. 1 Our main results are given in Figure 1, using the following notation: all the problems we consier have an n matrix A as input, with M nonzero entries, an with each row of A with Eucliean length no more than one. The parameter ɛ > 0 is the aitive error; for MEB, this can be a relative error, after a simple O(M) preprocessing step. We n use the asymptotic notation Õ(f) = O(f polylog ε ). The parameter σ is the margin of the problem instance, explaine below. The parameters s an q etermine the stanar eviation of a Gaussian kernel, an egree of a polynomial kernel, respectively. The time bouns given for our algorithms, except the Las Vegas ones, are uner the assumption of constant error probability; for output guarantees that hol with probability 1 δ, our bouns shoul be multiplie by log(n/δ). The time bouns also require the assumption that the input ata is store in such a way that a given entry A i,j can be recovere in constant time. This can be one by, for example, keeping each row A i of A as a hash table. (Simply keeping the entries of the row in sorte orer by column number is also sufficient, incurring an O(log ) overhea in running time for binary search.) 1 For MEB an the kernelize versions, we assume that the Eucliean norms of the relevant input vectors are known. Even with the aition of this linear-time step, all our algorithms improve on prior bouns, with the exception of MEB when M = o(ε 3/ (n + )).

3 Sublinear Optimization for Machine Learning 0:3 Formal Description: Classification. In the linear classification problem, the learner is given a set of n labele examples in the form of -imensional vectors, comprising the input matrix A. The labels comprise a vector y {+1, 1} n. The goal is to fin a separating hyperplane, that is, a normal vector x in the unit Eucliean ball B such that for all i, y(i) A i x 0; here y(i) enotes the i th coorinate of y. As mentione, we will assume throughout that A i B for all i [n], where generally [m] enotes the set of integers {1,,..., m}. As is stanar, we may assume that the labels y(i) are all 1, by taking A i A i for any i with y(i) = 1. The approximation version of linear classification is to fin a vector x ε B that is an ε-approximate solution, that is, i A i x ε max min x B i A i x ε. (1) The optimum for this formulation is obtaine when x = 1, except when no separating hyperplane exists, an then the optimum x is the zero vector. Note that min i A i x = min p p Ax, where R n is the unit simplex {p R n p i 0, i p i = 1}. Thus we can regar the optimum as the outcome of a game to etermine p Ax, between a minimizer choosing p, an a maximizer choosing x B, yieling σ max x B min p p Ax, where this optimum σ is calle the margin. From stanar uality results, σ is also the optimum of the ual problem min max p x B p Ax, an the optimum vectors p an x are the same for both problems. The classical Perceptron algorithm returns an ε-approximate solution to this problem in 1 ε iterations, an total time O(ε M). For given δ (0, 1), our new algorithm takes O(ε (n + )(log n) log(n/δ)) time to return an ε-approximate solution with probability at least 1 δ. Further, we show this is optimal in the unit-cost RAM moel, up to poly-logarithmic factors. Formal Description: Minimum Enclosing Ball (MEB). The MEB problem is to fin the smallest Eucliean ball in R containing the rows of A. It is a special case of quaratic programming (QP) in the unit simplex, namely, to fin min p p b+p AA p, where b is an n-vector. This relationship, an the generalization of our MEB algorithm to QP in the simplex, is iscusse in 3.3; for more general backgroun on QP in the simplex, an relate problems, see for example [Clarkson 008] Relate work Perhaps the most closely relate work is that of [Grigoriais an Khachiyan 1995], who showe how to approximately solve a zero-sum game up to aitive precision ε in time Õ(ε (n + )), where the game matrix is n. This problem is analogous to ours, an our algorithm is similar in structure to theirs, but where we minimize over p an maximize over x B, their optimization has not only p but also x in a unit simplex. Their algorithm (an ours) relies on sampling base on x an p, to estimate inner proucts x v or p w for vectors v an w that are rows or columns of A. For a vector p, this estimation is easily one by returning w i with probability p i. strictly speaking, this is true only for ε equals the margin, enote σ, an efine below. Yet a slight moification of the perceptron gives this running time for any small enough ε > 0

4 0:4 Clarkson, Hazan an Wooruff For vectors x B, however, the natural estimation technique is to pick i with probability x i, an return v i/x i. The estimator from this l sample is less well-behave, since it is unboune, an can have a high variance. While l sampling has been use in streaming applications [Monemizaeh an Wooruff 010], it has not previously foun applications in optimization ue to this high variance problem. Inee, it might seem surprising that sublinearity is at all possible, given that the correct classifier might be etermine by very few examples, as shown in figure. It thus seems necessary to go over all examples at least once, instea of looking at noisy estimates base on sampling. Fig.. The optimum x is etermine by the vectors near the horizontal axis. However, as we show, in our setting there is a version of the funamental Multiplicative Weights (MW) technique that can cope with unboune upates, an for which the variance of l -sampling is manageable. In our version of MW, the multiplier associate with a value z is quaratic in z, in contrast to the more stanar multiplier that is exponential in z; while the latter is a funamental builing block in approximate optimization algorithms, as iscusse at [Plotkin et al. 1991], in our setting such exponential upates can lea to a very expensive Ω(1) iterations. We analyze MW from the perspective of on-line optimization, an show that our version of MW has low expecte expecte regret given only that the ranom upates have the variance bouns provable for l sampling. We also use another technique from on-line optimization, a graient escent variant which is better suite for the ball. For the special case of zero-sum games in which the entries are all non-negative (this is equivalent to packing an covering linear programs), [Koufogiannakis an Young 007] give a sublinear-time algorithm which returns a relative approximation in time Õ(ε (n + )). Our lower bouns show that a similar relative approximation boun for sublinear algorithms is impossible for general classification, an hence general linear programming.. LINEAR CLASSIFICATION AND THE PERCEPTRON Before our algorithm, some reminers an further notation: R n is the unit simplex {p R n p i 0, i p i = 1}, B R is the Eucliean unit ball, an the unsubscripte x enotes the Eucliean norm x. The n-vector, all of whose entries are one, is enote by 1 n. The i th row of the input matrix A is enote A i, although a vector is a column vector unless otherwise inicate. The i th coorinate of vector v is enote v(i). For a vector v, we let v enote the vector whose coorinates have v (i) v(i) for all i.

5 Sublinear Optimization for Machine Learning 0:5.1. The Sublinear Perceptron Our sublinear perceptron algorithm is given in Figure 1. The algorithm maintains a vector w t R n, with nonnegative coorinates, an also p t, which is w t scale to have unit l 1 norm. A vector y t R is maintaine also, an x t which is y t scale to have Eucliean norm no larger than one. These normalizations are one on line 4. In lines 5 an 6, the algorithm is upating y t by aing a row of A ranomly chosen using p t. This is a ranomize version of Online Graient Descent (OGD); ue to the ranom choice of i t, A it is an unbiase estimator of p t A, which is the graient of p t Ay with respect to y. In lines 7 through 1, the algorithm is upating w t using a column j t of A ranomly chosen base on x t, an also using the value x t (j t ). This is a version of the Multiplicative Weights (MW) technique for online optimization in the unit simplex, where v t is an unbiase estimator of Ax t, the graient of p Ax t with respect to p. Actually, v t is not unbiase, after the clip operation: for z, V R, clip(z, V ) min{v, max{ V, z}}, an our analysis is helpe by clipping the entries of v t ; we show that the resulting slight bias is not harmful. As iscusse in 1.1, the sampling use to choose j t (an upate p t ) is l -sampling, an that for i t, l 1 -sampling. These techniques, which can be regare as special cases of an l p -sampling technique, for p [1, ), yiel unbiase estimators of vector ot proucts. It is important for us also that l -sampling has a variance boun here; in particular, for each relevant i an t, E[v t (i) ] A i x t 1. () Algorithm 1 Sublinear Perceptron 1: Input: ε > 0, A R n with A i B for i [n]. : Let T 00 ε log n, y 1 0, w 1 1 n, log n η T. 3: for t = 1 to T o 4: p t wt y t max{1, y t }. w t 1, x t 5: Choose i t [n] by i t i with prob. p t (i). 6: y t+1 y t + 1 T A it 7: Choose j t [] by j t j with probability x t (j) / x t. 8: for i [n] o 9: ṽ t (i) A i (j t ) x t /x t (j t ) 10: v t (i) clip(ṽ t (i), 1/η) 11: w t+1 (i) w t (i)(1 ηv t (i) + η v t (i) ) 1: en for 13: en for 14: return x = 1 T t x t First we note the running time. THEOREM.1. The sublinear perceptron takes O(ε log n) iterations, with a total running time of O(ε (n + ) log n). PROOF. The algorithm iterates T = O( log n ε ) times. Each iteration requires: (1) One l sample per iterate, which takes O() time using known ata structures.

6 0:6 Clarkson, Hazan an Wooruff () Sampling i t R p t which takes O(n) time. (3) The upate of x t an p t, which takes O(n + ) time. The total running time is O(ε (n + ) log n). Next we analyze the output quality. The proof uses new tools from regret minimization an sampling that are the builing blocks of most of our upper boun results. Let us first state the MW algorithm use in all our algorithms. Definition. (MW algorithm). Consier a sequence of vectors q 1,..., q T R n. The Multiplicative Weights (MW) algorithm is as follows. Let w 1 1 n, an for t 1, an for 0 < η R, forall i [n] p t w t / w t 1, (3) w t+1 (i) w t (i)(1 ηq t (i) + η q t (i) ), (4) The following is a key lemma, which proves a novel boun on the regret of the MW algorithm above, suitable for the case where the losses are ranom variables with boune variance. As oppose to previous multiplicative-upates algorithms, this is the only MW algorithm we are familiar with that oes not require an upper boun on the losses/payoffs. The proof is iffere to after the main theorem an its proof. LEMMA.3 (VARIANCE MW LEMMA). The MW algorithm satisfies (recall v enotes the vector with v (i) v(i) ). p t q t min i [n] max{q t(i), 1 η } + log n η + η p t qt. The following three lemmas give concentration bouns on our ranom variables from their expectations. The first two are base on stanar martingale analysis, an the last is a simple Markov application. LEMMA.4. For η log n T, with probability at least 1 O(1/n), max i [v t (i) A i x t ] 4ηT. log n LEMMA.5. For η T, with probability at least 1 O(1/n), it hols that A i t x t t p t v t 10ηT. LEMMA.6. With probability at least 1 1 4, it hols that t p t v t 8T. THEOREM.7 (MAIN THEOREM). With probability 1/, the sublinear perceptron returns a solution x that is an ε-approximation. PROOF. First we use the regret bouns for lazy graient escent to lower boun A i t x t, next we get an upper boun for that quantity using Lemma.3, an then we combine the two. By efinition, A i x σ for all i [n], an so, using the boun of Lemma A., T σ max A it x A it x t + T, (5) x B

7 Sublinear Optimization for Machine Learning 0:7 or rearranging, A it x t T σ T. (6) Now we turn to the MW part of our algorithm. By the Weak Regret Lemma.3, an using the clipping of v t (i), p t v t min v t (i) + (log n)/η + η p t vt. i [n] By Lemma.4 above, with high probability, for any i [n], A i x t v t (i) 4ηT, so that with high probability p t v t min i [n] A ix t + (log n)/η + η p t vt + 4T η. Combining (6) an (7) we get min A i x t (log n)/η η p t vt 4T η i [n] + T σ T p t v t A it x t By Lemmas.5,.6 we have w.p at least 3 4 O( 1 n ) 1 min A i x t (log n)/η 8ηT 4T η + T σ T 10ηT i [n] T σ log n η ηt. Diviing through by T, an using our choice of η, we have min i A i x σ ε/ w.p. at least 1/ as claime. PROOF PROOF OF LEMMA.3. We first show an upper boun on log w T +1 1, then a lower boun, an then relate the two. From (4) an (3) we have w t+1 1 = w t+1 (i) i [n] = p t (i) w t 1 (1 ηq t (i) + η q t (i) ) i [n] = w t 1 (1 ηp t q t + η p t q t ). This implies by inuction on t, an using 1 + z exp(z) for z R, that log w T +1 1 = log n + log(1 ηp t q t + η p t qt ) log n ηp t q t + η p t qt. (7) Now for the lower boun. From (4) we have by inuction on t that w T +1 (i) = (1 ηq t (i) + η q t (i) ),

8 0:8 Clarkson, Hazan an Wooruff an so log w T +1 1 = log (1 ηq t (i) + η q t (i) ) i [n] log max (1 ηq t (i) + η q t (i) ) i [n] = max log(1 ηq t (i) + η q t (i) ) i [n] max [min{ ηq t (i), 1}], i [n] where the last inequality uses the fact that 1 + z + z exp(min{z, 1}) for all z R. Putting this together with the upper boun (7), we have max [min{ ηq t (i), 1}] log n ηp t q t + η p t qt, i [n] Changing sies ηp t q t max [min{ ηq t (i), 1}] + log n + η p t qt, i [n] = min [max{ηq t (i), 1}] + log n + η p t qt, i [n] an the lemma follows, iviing through by η. COROLLARY.8 (DUAL SOLUTION). The vector p t e i t /T is, with probability 1/, an O(ε)-approximate ual solution. PROOF. Observing in (5) that the mile expression max x B A i t x is equal to T max x B p Ax, we have T max x B p Ax A i t x t + T, or changing sies, A it x t T max x B p Ax T Recall from (7) that with high probability, p t v t min A i x t + (log n)/η + η p t vt + 4T η. (8) i [n] Following the proof of the main Theorem, we combine both inequalities an use Lemmas.5,.6, such that with probability at least 1 : T max x B p Ax min A i x t + (log n)/η + η p t vt + 90T η + T + p t v t A it x t i [n] T σ + O( T log n) Diviing through by T we have with probability at least 1 that max x B p Ax σ+o(ɛ) for our choice of T an η.

9 Sublinear Optimization for Machine Learning 0:9.. High Success Probability an Las Vegas Given two vectors u, v B, we have seen that a single l -sample is an unbiase estimator of their inner prouct with variance at most one. Averaging 1 ε such samples reuces the variance to ε, which reuces the stanar eviation to ε. Repeating O(log 1 δ ) such estimates, an taking the meian, gives an estimator enote X ε,δ, which satis- fies, via a Chernoff boun: Pr[ X ε,δ v u > ε] δ As an immeiate corollary of this fact we obtain: COROLLARY.9. There exists a ranomize algorithm that with probability 1 δ, successfully etermines whether a given hyperplane with normal vector x B, together with an instance of linear classification an parameter σ > 0, is an ε-approximate solution. The algorithm runs in time O( + n ε log n δ ). PROOF. Let δ = δ/n. Generate the ranom variable X ε,δ for each inner prouct pair x, A i, an return true if an only if X ε,δ σ ε for each pair. By the observation above an taking union boun over all n inner proucts, with probability 1 δ the estimate X ε,δ was ε-accurate for all inner-prouct pairs, an hence the algorithm returne a correct answer. The running time inclues preprocessing of x in O() time, an n inner-prouct estimates, for a total of O( + n ε log n δ ). Hence, we can amplify the success probability of Algorithm 1 to 1 δ for any δ > 0 albeit incurring aitional poly-log factors in running time: COROLLARY.10 (HIGH PROBABILITY). There exists a ranomize algorithm that with probability 1 δ returns an ε-approximate solution to the linear classification problem, an runs in expecte time O( n+ ε log n δ ). PROOF. 1 Run Algorithm 1 for log δ times to generate that many caniate solutions. By Theorem.7, at least one caniate solution is an ε-approximate solution with probability at least 1 log 1 δ = 1 δ. For each caniate solution apply the verification proceure above with success probability 1 δ 1 δ, an all verifications will be correct again with probability at least 1 δ. Hence, both events hol with probability at least 1 δ. The result log 1 δ follows after ajusting constants. The worst-case running time comes to O( n+ ε log n δ log 1 δ ). However, we can generate the caniate solutions an verify them one at a time, rather than all at once. The expecte number of caniates we nee to generate is constant. It is also possible to obtain an algorithm that never errs: COROLLARY.11 (LAS VEGAS VERSION). After O(ε log n) iterations, the sublinear perceptron returns a solution that with probability 1/ can be verifie in O(M) time to be ε-approximate. Thus with expecte O(1) repetitions, an a total of expecte O(M + ε (n + ) log n) work, a verifie ε-approximate solution can be foun. PROOF. We have min A i i x σ p A,

10 0:10 Clarkson, Hazan an Wooruff an so if min i A i x p A ɛ, (9) then x is an ε-approximate solution, an x will pass this test if it an p are (ε/)- approximate solutions, an the same for p. Thus, running the algorithm for a constant factor more iterations, so that with probability 1/, x an p are both (ɛ/)-approximate solutions, it can be verifie that both are ε-approximate solutions..3. Further Optimizations The regret of OGD as given in Lemma A. is smaller than the ual strategy of ranom MW. We can take avantage of this an improve the running time slightly, by replacing line [6] of the sublinear algorithm with the line shown below. [6 ] With probability 1 log T, let y t+1 y t + 1 T A i t (else o nothing). This has the effect of increasing the regret of the primal online algorithm by a log n factor, which oes not hurt the number of iterations require to converge, since the overall regret is ominate by that of the MW algorithm. Since the primal solution x t is not upate in every iteration, we improve the running time slightly to O(ε log n(n + /(log 1/ε + log log n))). We use this technique to greater effect for the MEB problem, where it is iscusse in more etail..4. Implications in the PAC moel Consier the separable case of hyperplane learning, in which there exists a hyperplane classifying all ata points correctly. It is well known that the concept class of hyperplanes in imensions with margin σ has effective imension at most min{, 1 σ }+1. 1 Consier the case in which the margin is significant, i.e. σ <. PAC learning theory implies that the number of examples neee to attain generalization error of δ is O( 1 σ δ ). Using the metho of online to batch conversion (see [Cesa-Bianchi et al. 004]), an applying the online graient escent algorithm, it is possible to obtain δ generalization error in time O( σ δ ) time, by going over the ata once an performing a graient step on each example. Our algorithm improves upon this running time boun as follows: we use the sublinear perceptron to compute a σ/-approximation to the best hyperplane over the test ata, where the number of examples is taken to be n = O( 1 σ δ ) (in orer to obtain δ generalization error). As shown previously, the total running time amounts to Õ( 1 σ + δ σ ) = O( 1 σ 4 δ + σ ). This improves upon stanar methos by a factor of Õ(σ ), which is always an improvement by our initial assumption on σ an.

11 Sublinear Optimization for Machine Learning 0:11 3. STRONGLY CONVEX PROBLEMS: MEB AND SVM 3.1. Minimum Enclosing Ball In the Minimum Enclosing Ball problem the input consists of a matrix A R n. The rows are interprete as vectors an the problem is to fin a vector x R such that x argmin x R max i [n] x A i We further assume for this problem that all vectors A i have Eucliean norm at most one. Denote by σ = max i [n] x A i the raius of the optimal ball, an we say that a solution is ε-approximate if the ball it generates has raius at most σ + ε. As in the case of linear classification, to obtain tight running time bouns we use a primal-ual approach; the algorithm is given below. (This is a conceptual version of the algorithm: in the analysis of the running time, we use the fact that we can batch together the upates for w t over the iterations for which x t oes not change.) Algorithm Sublinear Primal-Dual MEB 1: Input: ε > 0, A R n with A i B for i [n] an A i known. : Let T Θ(ε log n), y 1 0, w 1 1, η (log n)/t, α log T T log n. 3: for t = 1 to T o 4: p t wt w t 1 5: Choose i t [n] by i t i with probability p t (i). 6: With probability α, upate y t+1 y t + A it, x t+1 yt+1 t. (else o nothing) 7: Choose j t [] by j t j with probability x t (j) / x t. 8: for i [n] o 9: ṽ t (i) A i (j t ) x t /x t (j t ) + A i + x t. 10: v t (i) clip(ṽ t (i), 1 η ). 11: w t+1 (i) w t (i)(1 + ηv t (i) + η v t (i) ). 1: en for 13: en for 14: return x = 1 T t x t THEOREM 3.1. Algorithm runs in O( log n ε ) iterations, with a total expecte running time of ( n Õ ε + ), ε an with probability 1/, returns an ε-approximate solution. PROOF. Except for the running time analysis, the proof of this theorem is very similar to that of Theorem.7, where we take avantage of a tighter regret boun for strictly convex loss functions in the case of MEB, for which the OGD algorithm with a learning rate of 1 t is known to obtain a tighter regret boun of O(log T ) instea of O( T ). For presentation, we use asymptotic notation rather than computing the exact constants (as one for the linear classification problem). Let f t (x) = x A it t. Notice that arg min x B τ=1 f t τ=1 τ (x) = Aiτ t. By Lemma A.5 such that f t (x) = x A it, with G an H =, an x being the solution to the

12 0:1 Clarkson, Hazan an Wooruff instance, we have E {ct}[ t x t A it ] E {ct}[ t x A it ] + 4 α log T T σ + 4 log T, (10) α where σ is the square MEB raius. Here the expectation is taken only over the ranom coin tosses for upating x t, enote c t, an hols for any outcome of the inices i t sample from p t an the coorinates j t use for the l sampling. Now we turn to the MW part of our algorithm. By the Weak Regret Lemma.3, using the clipping of v t (i), an reversing inequalities to account for the change of sign, we have p t v t max v t (i) O( log n + η p t vt ). i [n] η Using Lemmas B.4,B.5 with high probability i [n]. v t (i) A i x t O(ηT ), x t A it t p t v t = O(ηT ). Plugging these two facts in the previous inequality we have w.h.p x t A it max A i x t O( log n + η p t vt + T η). i [n] η This hols w.h.p over the ranom choices of {i t, j t }, an irrespective of the coin tosses {c t }. Hence, we can take expectations w.r.t {c t }, an obtain E {ct}[ x t A it ] E {ct}[max i [n] A i x t ] O( log n η + η p t vt + T η). (11) Combining with equation (10), we obtain that w.h.p. over the ranom variables {i t, j t } T σ + 4 α log T E {c t}[max x t A i ] O( log n + η p t vt + T η) i [n] η Rearranging an using Lemma B.8, we have w.p. at least 1 E {ct}[max x t A i ] O(T σ + log T i [n] α + log n + T η) η Diviing through by T an applying Jensen s inequality, we have E[max x A j ] 1 j T E[max x t A i ] O(σ + log T i [n] T α + log n T η + η). Optimizing over the values of α, η, an T, this implies that the expecte error is O(ε), an so using Markov s inequality, x is a O(ε)-approximate solution with probability at least 1/.

13 Sublinear Optimization for Machine Learning 0:13 Running time. The algorithm above consists of T = O( log n ε ) iterations. Naively, this woul result in the same running time as for linear classification. Yet notice that x t changes only an expecte αt times, an only then o we perform an O() operation. The expecte number of iterations in which x t changes is αt 16ε 1 log T, an so the running time is O(ε 1 (log T ) + log n ε n)) = Õ(ε n + ε 1 ). The following Corollary is a irect analogue of Corollary.8. COROLLARY 3. (DUAL SOLUTION). 1/, an O(ε)-approximate ual solution. The vector p t e i t /T is, with probability 3.. High Success Probability an Las Vegas As for linear classification, we can amplify the success probability of Algorithm to 1 δ for any δ > 0 albeit incurring aitional poly-log factors in running time. COROLLARY 3.3 (MEB HIGH PROBABILITY). There exists a ranomize algorithm that with probability 1 δ returns an ε-approximate solution to the MEB problem, an runs in expecte time Õ( n ε log n εδ + ε log 1 ε ). There is also a ranomize algorithm that returns an ε-approximate solution in Õ(M + n ε + ε ) time. PROOF. We can estimate the istance between two points in B in O(ε log(1/δ)) time, with error at most ε an failure probability at most δ, using the ot prouct estimator escribe in.. Therefore we can estimate the maximum istance of a given point to every input point in O(nε log(n/δ)) time, with error at most ε an failure probability at most δ. This istance is σ ε, where σ is the optimal raius attainable, w.p. 1 δ. Because Algorithm yiels an ε-ual solution with probability 1/, we can use this solution to verify that the raius of any possible solution to the farthest point is at least σ ε. So, to obtain a solution as escribe in the lemma statement, run Algorithm, an verify that it yiels an ε-approximation, using this approximate ual solution; with probability 1/, this gives a verifie ε-approximation. Keep trying until this succees, in an expecte trials. For a Las Vegas algorithm, we simply apply the same scheme, but verify the istances exactly Convex Quaratic Programming in the Simplex We can exten our approach to problems of the form min p p b + p AA p, (1) where b R n, A R n, an is, as usual, the unit simplex in R n. As is well known, an as we partially review below, this problem inclues the MEB problem, margin estimation as for har margin support vector machines, the L -SVM variant of support vector machines, the problem of fining the shortest vector in a polytope, an others. Applying v x = v v + x x v x 0 with v A p, we have max p Ax x = p AA p, (13) x R

14 0:14 Clarkson, Hazan an Wooruff with equality at x = A p. Thus (1) can be written as min max p (b + Ax 1 n x ). (14) p x R The Wolfe ual of this problem exchanges the max an min: Since max min x R p p (b + Ax 1 n x ). (15) min p p (b + Ax 1 n x ) = min b(i) + A i x + x, (16) i with equality when pî = 0 if î is not a minimizer, the ual can also be expresse as max min b(i) + A i x x (17) x R i By the two relations (13) an (16) use to erive the ual problem from the primal, we have immeiately the weak uality conition that the objective function of the ual (17) is always no more than the objective function value of the primal (1). The strong uality conition, that the two problems take the same optimal value, also hols here; inee, the optimum x also solves (13), an the optimal p also solves (16). To generalize Algorithm, we make v t an unbiase estimator of b + Ax t 1 n x t, an set x t+1 to be the minimizer of b(i t ) + A it x t x t, t [t] namely, as with MEB, y t+1 t [t] A i t, an x t+1 y t+1 /t. (We also make some sign changes to account for the max-min formulation here, versus the min-max formulation use for MEB above.) This allows the use of Lemma A.4 for essentially the same analysis as for MEB; the graient boun G an Hessian boun H are both at most, again assuming that all A i B. MEB. When the b(i) A i, we have max min x R i b(i) + A i x x = min max x R i the objective function for the MEB problem. A i A i x + x = min max x R i x A i, Margin Estimation. When b 0 in the primal problem (1), that problem is one of fining the shortest vector in the polytope {A p p }. Consiering this case of the ual problem (17), for any given x R with min i A i x 0, the value of β R such that βx maximizes min i A i βx βx is β = 0. On the other han if x is such that min i A i x > 0, the maximizing value β is β = A i x/ x, so that the solution of (17) also maximizes min i (A i x) / x. The latter is the square of the margin σ, which as before is the minimum istance of the points A i to the hyperplane that is normal to x an passes through the origin. Aapting Algorithm for margin estimation, an with the slight changes neee for its analysis, we have that there is an algorithm taking Õ(n/ɛ + /ɛ) time that fins x R such that, for all i [n], A i x x σ ɛ. When σ ɛ, we on t appear to gain any useful information. However, when σ > ɛ, we have min i [n] A i x > 0, an so, by appropriate scaling of x, we have ˆx such that ˆσ = min i [n] (A iˆx) / ˆx = min i [n] A iˆx ˆx σ ɛ,

15 Sublinear Optimization for Machine Learning 0:15 an so ˆσ σ ɛ/σ. That is, letting ɛ ɛ σ, if ɛ σ, there is an algorithm taking Õ(n/(ɛσ) + /ɛ σ) time that fins a solution ˆx with ˆσ σ ɛ. 4. A GENERIC SUBLINEAR PRIMAL-DUAL ALGORITHM We note that our technique above can be applie more broaly to any constraine optimization problem for which low-regret algorithms exist an low-variance sampling can be applie efficiently; that is, consier the general problem with optimum σ: max min x K i c i (x) = σ. (18) Suppose that for the set K an cost functions c i (x), there exists an iterative low regret algorithm, enote LRA, with regret R(T ) = o(t ). Let T ε (LRA) be the smallest T such that R(T ) T ε. We enote by x t+1 LRA(x t, c) an invocation of this algorithm, when at state x t K an the cost function c is observe. Let Sample(x, c) be a proceure that returns an unbiase estimate of c(x) with variance at most one, that runs in constant time. Further assume c i (x) 1 for all x K, i [n]. Algorithm 3 Generic Sublinear Primal-Dual Algorithm 1: Let T max{t ε (LRA), log n ε }, x 1 LRA(initial), w 1 1 n, η log n T. : for t = 1 to T o 3: for i [n] o 4: Let v t (i) Sample(x t, c i ) 5: v t (i) clip(ṽ t (i), 1/η) 6: w t+1 (i) w t (i)(1 ηv t (i) + η v t (i) ) 7: en for 8: p t wt w t 1, 9: Choose i t [n] by i t i with probability p t (i). 10: x t LRA(x t 1, c it ) 11: en for 1: return x = 1 T t x t Applying the techniques of section we can obtain the following generic lemma. LEMMA 4.1. The generic sublinear primal-ual algorithm returns a solution x that with probability at least 1 is an ε-approximate solution in max{t ε(lra), log n ε } iterations. PROOF. First we use the regret bouns for LRA to lower boun c i t (x t ), next we get an upper boun for that quantity using the Weak Regret Lemma, an then we combine the two in expectation. By efinition, c i (x ) σ for all i [n], an so, using the LRA regret guarantee, T σ max x B c it (x) c it (x t ) + R(T ), (19)

16 0:16 Clarkson, Hazan an Wooruff or rearranging, c it (x t ) T σ R(T ). (0) Now we turn to the MW part of our algorithm. By the Weak Regret Lemma.3, an using the clipping of v t (i), p t v t min v t (i) + (log n)/η + η p t vt. i [n] Using Lemma B.4 an Lemma B.5, since the proceure Sample is unbiase an has variance at most one, with high probability: i [n], v t (i) c i (x t ) + O(ηT ), c it (x t ) t p t v t = O(ηT ). Plugging these two facts in the previous inequality we have w.h.p, c it (x t ) min c i (x t ) + O( log n + η p t vt + ηt ) (1) i [n] η Combining (0) an (1) we get w.h.p min c i (x t ) O( log n + ηt + η p t vt ) R(T ) i [n] η An via Lemma B.8 we have w.p. at least 1 that min i [n] c i (x t ) O( log n η + ηt ) R(T ) Diviing through by T, an using our choice of η, we have min i c i x σ ε/ w.p. at least least 1/ as claime. High-probability results can be obtaine using the same technique as for linear classification More applications The generic algorithm above can be use to erive the result of [Grigoriais an Khachiyan 1995] on sublinear approximation of zero sum games with payoffs/losses boune by one (up to poly-logarithmic factors in running time). A zero sum game can be cast as the following min-max optimization problem: min max A i x x i n That is, the constraints are inner proucts with the rows of the game matrix. This is exactly the same as the linear classification problem, but the vectors x are taken from the convex set K which is the simplex - or the set of all mixe strategies of the column player.

17 Sublinear Optimization for Machine Learning 0:17 A low regret algorithm for the simplex is the multiplicative weights algorithm, which attains regret R(T ) T log n. The proceure Sample(x, A i ) to estimate the inner prouct A i x is much simpler than the one use for linear classification: we sample from the istribution x an return A i (j) w.p. x(j). This has correct expectation an variance boune by one (in fact, the ranom variable is always boune by one). Lemma 4.1 then implies: COROLLARY 4.. The sublinear primal-ual algorithm applie to zero sum games returns a solution x that with probability at least 1 is an ε-approximate solution in O( log n n+ ε ) iterations an total time Õ( ε ). Essentially any constraine optimization problem which has convex or linear constraints, an is over a simple convex boy such as the ball or simplex, can be approximate in sublinear time using our metho. 5. KERNELIZING THE SUBLINEAR ALGORITHMS An important generalization of linear classifiers is that of kernel-base linear preictors (see e.g. [Schölkopf an Smola 003]). Let Ψ : R H be a mapping of feature vectors into a reproucing kernel Hilbert space. In this setting, we seek a non-linear classifier given by h H so as to maximize the margin: σ max h H min i [n] h, Ψ(A i). The kernels of interest are those for which we can compute inner proucts of the form k(x, y) = Ψ(x), Ψ(y) efficiently. One popular kernel is the polynomial kernel, for which the corresponing Hilbert space is the set of polynomials over R of egree q. The mapping Ψ for this kernel is given by S [], S q. Ψ(x) S = i S x i. That is, all monomials of egree at most q. The kernel function in this case is given by k(x, y) = (x y) q. Another useful kernel is the Gaussian kernel k(x, y) = exp( x y s ), where s is a parameter. The mapping here is efine by the kernel function (see [Schölkopf an Smola 003] for more etails). The kernel version of Algorithm 1 is shown in Figure 4. Note that x t an y t are members of H, an not maintaine explicitly, but rather are implicitly represente by the values i t. (An thus y t is the norm of H, not R.) Also, Ψ(A i ) is not compute. The neee kernel prouct x t, Ψ(A i ) is estimate by the proceure Kernel-L-Sampling, using the implicit representations an specific properties of the kernel being use. In the regular sublinear algorithm, this inner prouct coul be sufficiently well approximate in O(1) time via l -sampling. As we show below, for many interesting kernels the time for Kernel-L-Sampling is not much longer. For the analog of Theorem.7 to apply, we nee the expectation of the estimates v t (i) to be correct, with variance O(1). By Lemma C.1, it is enough if the estimates v t (i) have an aitive bias of O(ɛ). Hence, we efine the proceure Kernel-L-Sampling to obtain such an not-too-biase estimator with variance at most one; first we show how to implement Kernel-L-Sampling, assuming that there is an estimator k() of the kernel k() such that E[ k(x, y)] = k(x, y) an Var( k(x, y)) 1, an then we show how to implement such kernel estimators.

18 0:18 Clarkson, Hazan an Wooruff Algorithm 4 Sublinear Kernel Perceptron 1: Input: ε > 0, A R n with A i B for i [n]. : Let T 00 ε log n, y 1 0, w 1 1 n, η : for t = 1 to T o 4: p t wt y t max{1, y t }. log n T. w t 1, x t 5: Choose i t [n] by i t i with probability p t (i). 6: y t+1 τ [t] Ψ(A i τ )/ T. 7: for i [n] o 8: ṽ t (i) Kernel-L-Sampling(x t, Ψ(A i )). (estimating x t, Ψ(A i ) ) 9: v t (i) clip(ṽ t (i), 1/η). 10: w t+1 (i) w t (i)(1 ηv t (i) + η v t (i) ). 11: en for 1: en for 13: return x = 1 T t x t 5.1. Implementing Kernel-L-Sampling Estimating y t. A key step in Kernel-L-Sampling is the estimation of y t, which reaily reuces to estimating Y t T y t /t = 1 t τ,τ [t] k(a iτ, A iτ ), that is, the mean of the summans. Since we use max{1, y t ), we nee not be concerne with small y t, an it is enough that the aitive bias in our estimate of Y be at most ɛ/t ɛ(t/t ) for t [T ], implying a bias for y t no more than ɛ. Since we nee 1/ y t in the algorithm, it is not enough for estimates of Y just to be goo in mean an variance; we will fin an estimator whose error bouns hol with high probability. Our estimate Ỹt of Y t can first be consiere assuming we only nee to make an estimate for a single value of t. Let N Y t (8/3) log(1/δ)t /ɛ t. To estimate Y t, we compute, for each τ, τ [t], n t N Y /t inepenent estimates X τ,τ,m clip( k(a iτ, A iτ ), T/ɛ), for m [n t ], an our estimate is Ỹ t τ,τ [t] m [n t] X τ,τ,m/n Y. LEMMA 5.1. With probability at least 1 δ, Y Ỹt ɛ/t. PROOF. We apply Bernstein s inequality (as in 31) to the N Y ranom variables X τ,τ,m E[X τ,τ,m]. which have mean zero, variance at most one, an are at most T/ɛ in magnitue. Bernstein s inequality implies, using Var[X τ,τ,m] 1, log Prob{ τ,τ [t] m [n t] (X τ,τ,m E[X τ,τ,m]) > α} α /(N Y + (T/ɛ)α/3),

19 Sublinear Optimization for Machine Learning 0:19 an putting α N Y ɛ/t gives log Prob{Ỹ E[Ỹ ] > ɛ/t } N Y (ɛ/t ) /(N Y + (T/ɛ)N Y (ɛ/t )/3) (8/3) log(1/δ)(3/4) log(1/δ). Similar reasoning for X τ,τ,m, an the union boun, implies the lemma. To compute Y for t = 1... T, we can save some work by reusing estimates from one t to the next. Now let N Y (8/3) log(1/δ)t /ɛ. Compute Ỹ1 as above for t = 1, an let Ŷ1 Ỹ1. For t > 1, let n t N Y /t, an let Ŷ t (X t,τ,m + X τ,t,m )/n t, an return Ỹt τ [t] Ŷτ /t. m [n t] X t,t,m /n t + τ [t] m [n t] Since for each τ an τ, the expecte total contribution of all X τ,τ,m terms to Ỹt is k(a iτ, A iτ ), we have E[Ỹt] = Y t. Moreover, the number of instances of X τ,τ,m average to compute Ỹt is always at least as large as the number use for the above batch version; it follows that the total variance of Ỹt is non-increasing in t, an therefore Lemma 5.1 hols also for the Ỹt compute stepwise. Since the number of calls to k(, ) is (1 + n t) = O(N Y ), we have the following lemma. LEMMA 5.. The values Ỹt(t /T ) y t, t [T ], can be estimate with O((log(1/ɛδ)T /ɛ ) calls to k(, ), so that with probability at least 1 δ, Ỹt(t /T ) y t ɛ. The values y t, t [T ], can be compute exactly with T calls to the exact kernel k(, ). PROOF. This follows from the iscussion above, applying the union boun over t [T ], an ajusting constants. The claim for exact computation is straightforwar. Given this proceure for estimating y t, we can escribe Kernel-L-Sampling. Since x t+1 = y t+1 / max{1, y t+1 }, we have 1 x t+1, A i = max{1, y t+1 } Ψ(A iτ ), Ψ(A i ) T τ [t] 1 = max{1, y t+1 } k(a iτ, A i ), () T so that the main remaining step is to estimate τ [t] k(a i τ, A i ), for i [n]. Here we simply call k(a iτ, A i ) for each τ. We save time, at the cost of O(n) space, by saving the value of the sum for each i [n], an upating it for the next t with n calls k(a it, A i ). LEMMA 5.3. Let L k enote the expecte time neee for one call to k(, ), an T k enote the time neee for one call to k(, ). Except for estimating y t, Kernel-L-Sampling can be compute in nl k expecte time per iteration t. The resulting estimate has expectation within aitive ɛ of x t, A i, an variance at most one. Thus Algorithm 4 runs in time Õ( (L kn+) ε + min{ L k ε, T 6 k ε }), an prouces a solution with properties as in Algorithm 1. 4 PROOF. For Kernel-L-Sampling it remains only to show that its variance is at most one, given that each k(, ) has variance at most one. We observe from ( that t τ [t]

20 0:0 Clarkson, Hazan an Wooruff inepenent estimates k(, ) are ae together, an scale by a value that is at most 1/ T. Since the variance of the sum is at most t, an the variance is scale by a value no more than 1/T, the variance of Kernel-L-Sampling is at most one. The only bias in the estimate is ue to estimation of y t, which gives relative error of ɛ. For our kernels, Ψ(v) 1 if v B, so the aitive error of Kernel-L-Sampling is O(ɛ). The analysis of Algorithm 4 then follows as for the un-kernelize perceptron; we neglect the time neee for preprocessing for the calls to k(, ), as it is ominate by other terms for the kernels we consier, an this is likely in general. 5.. Implementing the Kernel Estimators Using the lemma above we can erive corollaries for the Gaussian an polynomial kernels. More general kernels can be hanle via the technique of [Cesa-Bianchi et al. 010]. Polynomial kernels. For the polynomial kernel of egree q, estimating a single kernel prouct, i.e. k(x, y) = k(a i, A j ), where the norm of x, y is at most one, takes O(q) as follows: Recall that for the polynomial kernel, k(x, y) = (x y) q. To estimate this kernel we take the prouct of q inepenent l -samples, yieling k(x, y). Notice that the expectation of this estimator is exactly equal to the prouct of expectations, E[ k(x, y)] = (x y) q. The variance of this estimator is equal to the prouct of variances, which is Var( k(x, y)) ( x y ) q 1. Of course, calculating the inner prouct exactly takes O( log q) time. We obtain: COROLLARY 5.4. For the polynomial egree-q kernel, Algorithm 4 runs in time q(n + ) Õ( ε + min{ log q ε 4, q ε 6 }). Gaussian kernels. To estimate the Gaussian kernel function, we assume that x an y are known an no more than s/; thus to estimate k(x, y) = exp( x y ) = exp(( x + y )/s ) exp(x y/s ), we nee to estimate exp(x y/s ). For exp(γx) = i 0 γi X i /i! with ranom X an parameter γ > 0, we pick inex i with probability exp( γ)γ i /i! (that is, i has a Poisson istribution) an return exp(γ) times the prouct of i inepenent estimates of X. In our case we take X to be the average of c l -samples of x y, an hence E[X] = x y, E[X ] 1 c E[(x y) ] 1 c. The expectation of our kernel estimator is thus: E[ k(x, y)] = E[ e γ γ i i! e γ X i ] = i γ i i! E[X] = exp(γx y). i 0 i 0 j=1 The secon moment of this estimator is boune by: E[ k(x, y) ] = E[ i 0 e γ γ i i! e γ (X i ) ] = e γ i 0 γ i i! i E[X ] exp( γ c ). Hence, we take γ = c = 1 s. This gives a correct estimator in terms of expectation an constant variance. The variance can be further mae smaller than one by taking the average of a constant estimators of the above type. As for evaluation time, the expecte size of the inex i is γ = 1 s. Thus, we require on the expectation γ c = 1 s of l 4 -samples. We obtain: j=1

21 Sublinear Optimization for Machine Learning 0:1 COROLLARY 5.5. For the Gaussian kernel with parameter s, Algorithm 4 runs in time (n + ) Õ( s 4 ε + min{ ε 4, 1 s 4 ε 6 }) Kernelizing the MEB an strictly convex problems Analogously to Algorithm 4, we can efine the kernel version of strongly convex problems, incluing MEB. The kernelize version of MEB is particularly efficient, since as in Algorithm, the norm y t is never require. This means that the proceure Kernel-L-Sampling can be compute in time O(nL k ) per iteration, for a total running time of O(L k (ε n + ε 1 )). 6. LOWER BOUNDS All of our lower bouns are information-theoretic, meaning that any successful algorithm must rea at least some number of entries of the input matrix A. Clearly this also lower bouns the time complexity of the algorithm in the unit-cost RAM moel. Some of our arguments use the following meta-theorem. Consier a p q matrix A, where p is an even integer. Consier the following ranom process. Let W q. Let a = 1 1/W, an let e j enote the j-th stanar q-imensional unit vector. For each i [p/], choose a ranom j [q] uniformly, an set A i+p/ A i ae j +b(1 q e j ), where b is chosen so that A i = 1. We say that such an A is a YES instance. With probability 1/, transform A into a NO instance as follows: choose a ranom i [p/] uniformly, an if A i = ae j + b(1 q e j ) for a particular j [q], set A i +p/ ae j + b(1 q e j ). Suppose there is a ranomize algorithm reaing at most s positions of A which istinguishes YES an NO instances with probability /3, where the probability is over the algorithm s coin tosses an this istribution µ on YES an NO instances. By averaging this implies a eterministic algorithm Alg reaing at most s positions of A an istinguishing YES an NO instances with probability /3, where the probability is taken only over µ. We show the following meta-theorem with a stanar argument. THEOREM 6.1. (Meta-theorem) For any such algorithm Alg, s = Ω(pq). This Meta-Theorem follows from the following folklore fact: FACT 6.. Consier the following ranom process. Initialize a length-r array A to an array of r zeros. With probability 1/, choose a ranom position i [r] an set A[i] = 1. With the remaining probability 1/, leave A as the all zero array. Then any algorithm which etermines if A is the all zero array with probability /3 must rea Ω(r) entries of A. Let us prove Theorem 6.1 using this fact: PROOF. Consier the matrix B R (p/) q which is efine by subtracting the bottom half of the matrix from the top half, that is, B i,j = A i,j A i+p/,j. Then B is the all zeros matrix, except that with probability 1/, there is one entry whose value is roughly two, an whose location is ranom an istribute uniformly. An algorithm istinguishing between YES an NO instances of A in particular istinguishes between the two cases for B, which cannot be one without reaing a linear number of entries. In the proofs of Theorem 6.3, Corollary 6.4, an Theorem 6.6, it will be more convenient to use M as an upper boun on the number of non-zero entries of A rather than the exact number of non-zero entries. However, it shoul be unerstoo that these the-

Sublinear Optimization for Machine Learning

Sublinear Optimization for Machine Learning Sublinear Optimization for Machine Learning Kenneth L. Clarkson IBM Almaden Research Center San Jose, CA Elad Hazan Department of Industrial Engineering Technion - Israel Institute of Technology Haifa

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Sublinear Time Algorithms for Approximate Semidefinite Programming

Sublinear Time Algorithms for Approximate Semidefinite Programming Noname manuscript No. (will be inserted by the editor) Sublinear Time Algorithms for Approximate Semidefinite Programming Dan Garber Elad Hazan Received: date / Accepted: date Abstract We consider semidefinite

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Two formulas for the Euler ϕ-function

Two formulas for the Euler ϕ-function Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

arxiv: v1 [cs.ds] 31 May 2017

arxiv: v1 [cs.ds] 31 May 2017 Succinct Partial Sums an Fenwick Trees Philip Bille, Aners Roy Christiansen, Nicola Prezza, an Freerik Rye Skjoljensen arxiv:1705.10987v1 [cs.ds] 31 May 2017 Technical University of Denmark, DTU Compute,

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

Pure Further Mathematics 1. Revision Notes

Pure Further Mathematics 1. Revision Notes Pure Further Mathematics Revision Notes June 20 2 FP JUNE 20 SDB Further Pure Complex Numbers... 3 Definitions an arithmetical operations... 3 Complex conjugate... 3 Properties... 3 Complex number plane,

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Homework 2 EM, Mixture Models, PCA, Dualitys

Homework 2 EM, Mixture Models, PCA, Dualitys Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

Approximate Constraint Satisfaction Requires Large LP Relaxations

Approximate Constraint Satisfaction Requires Large LP Relaxations Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

Math 1271 Solutions for Fall 2005 Final Exam

Math 1271 Solutions for Fall 2005 Final Exam Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

Unit vectors with non-negative inner products

Unit vectors with non-negative inner products Unit vectors with non-negative inner proucts Bos, A.; Seiel, J.J. Publishe: 01/01/1980 Document Version Publisher s PDF, also known as Version of Recor (inclues final page, issue an volume numbers) Please

More information

On colour-blind distinguishing colour pallets in regular graphs

On colour-blind distinguishing colour pallets in regular graphs J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article

More information

Beating CountSketch for Heavy Hitters in Insertion Streams

Beating CountSketch for Heavy Hitters in Insertion Streams Beating CountSketch for eavy itters in Insertion Streams ABSTRACT Vlaimir Braverman Johns opkins University Baltimore, MD, USA vova@cs.jhu.eu Nikita Ivkin Johns opkins University Baltimore, MD, USA nivkin1@jhu.eu

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

APPPHYS 217 Thursday 8 April 2010

APPPHYS 217 Thursday 8 April 2010 APPPHYS 7 Thursay 8 April A&M example 6: The ouble integrator Consier the motion of a point particle in D with the applie force as a control input This is simply Newton s equation F ma with F u : t q q

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Ramsey numbers of some bipartite graphs versus complete graphs

Ramsey numbers of some bipartite graphs versus complete graphs Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer

More information

Inverse Theory Course: LTU Kiruna. Day 1

Inverse Theory Course: LTU Kiruna. Day 1 Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not

More information

Probabilistic Analysis of Power Assignments

Probabilistic Analysis of Power Assignments Probabilistic Analysis of Power Assignments Maurits e Graaf 1,2 an Boo Manthey 1 1 University of Twente, Department of Applie Mathematics, Enschee, Netherlans m.egraaf/b.manthey@utwente.nl 2 Thales Neerlan

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

15.1 Upper bound via Sudakov minorization

15.1 Upper bound via Sudakov minorization ECE598: Information-theoretic methos in high-imensional statistics Spring 206 Lecture 5: Suakov, Maurey, an uality of metric entropy Lecturer: Yihong Wu Scribe: Aolin Xu, Mar 7, 206 [E. Mar 24] In this

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization JMLR: Workshop an Conference Proceeings vol 30 013) 1 On the Complexity of Banit an Derivative-Free Stochastic Convex Optimization Oha Shamir Microsoft Research an the Weizmann Institute of Science oha.shamir@weizmann.ac.il

More information

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation

Binary Discrimination Methods for High Dimensional Data with a. Geometric Representation Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

A Randomized Approximate Nearest Neighbors Algorithm - a short version

A Randomized Approximate Nearest Neighbors Algorithm - a short version We present a ranomize algorithm for the approximate nearest neighbor problem in - imensional Eucliean space. Given N points {x } in R, the algorithm attempts to fin k nearest neighbors for each of x, where

More information

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS

TIME-DELAY ESTIMATION USING FARROW-BASED FRACTIONAL-DELAY FIR FILTERS: FILTER APPROXIMATION VS. ESTIMATION ERRORS TIME-DEAY ESTIMATION USING FARROW-BASED FRACTIONA-DEAY FIR FITERS: FITER APPROXIMATION VS. ESTIMATION ERRORS Mattias Olsson, Håkan Johansson, an Per öwenborg Div. of Electronic Systems, Dept. of Electrical

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

Vectors in two dimensions

Vectors in two dimensions Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information