An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
|
|
- Emory Washington
- 6 years ago
- Views:
Transcription
1 Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an Applie Mathematics Weizmann Institute of Science Rehovot 76000, Israel oha.shamir@weizmann.ac.il Eitor: Alexaner Rakhlin Abstract We consier the closely relate problems of banit convex optimization with two-point feeback, an zero-orer stochastic convex optimization with two function evaluations per roun. We provie a simple algorithm an analysis which is optimal for convex Lipschitz functions. his improves on Duchi et al. 05), which only provies an optimal result for smooth functions; Moreover, the algorithm an analysis are simpler, an reaily exten to non-eucliean problems. he algorithm is base on a small but surprisingly powerful moification of the graient estimator. Keywors: zero-orer optimization, banit optimization, stochastic optimization, graient estimator. Introuction We consier the problem of banit convex optimization with two-point feeback Agarwal et al. 00). his problem can be efine as a repeate game between a learner an an aversary as follows: At each roun t, the aversary picks a convex function f t on R, which is not reveale to the learner. he learner then chooses a point w t from some known an close convex set W R, an suffers a loss f t w t ). As feeback, the learner may choose two points w t, w t W an receive f t w t), f t w t ). he learner s goal is to minimize average regret, efine as f t w t ) min w W f t w). In this paper, we focus on obtaining bouns on the expecte average regret with respect to the learner s ranomness). A closely-relate an easier setting is zero-orer stochastic convex optimization. In this setting, our goal is to approximately solve F w) = min w W E ξ fw; ξ), given limite access to {f ; ξ t )} where ξ t are i.i.. instantiations. Specifically, we assume that each. his is slightly ifferent than the moel of Agarwal et al. 00), where the learner only chooses w t, w t an the loss is ftw t) + f tw t )). However, our results an analysis can be easily translate to their setting, an the moel we iscuss translates more irectly to the zero-orer stochastic optimization consiere later. c 07 Oha Shamir. License: CC-BY 4.0, see Attribution requirements are provie at
2 Shamir f, ξ t ) is not irectly observe, but rather can be querie at two points. his moels situations where computing graients irectly is complicate or infeasible. It is well-known Cesa-Bianchi et al., 004) that given an algorithm with expecte average regret R in the banit optimization setting above, if we fee it with the functions f t w) = fw; ξ t ), then the average w = w t of the points generate satisfies the following boun on the expecte optimization error: EF w ) min w W F w) R. hus, an algorithm for banit optimization can be converte to an algorithm for zero-orer stochastic optimization with similar guarantees. he banit optimization setting with two-point feeback was propose an stuie in Agarwal et al. 00). Inepenently, Nesterov 0) an consiere two-point methos for stochastic optimization. Both papers are base on ranomize graient estimates which are then fe into stanar first-orer algorithms e.g. graient escent, or more generally mirror escent). However, the regret/error guarantees in both papers were suboptimal in terms of the epenence on the imension. Recently, Duchi et al. 05) consiere a similar approach for the stochastic optimization setting, attaining an optimal error guarantee when f ; ξ) is a smooth function ifferentiable an with Lipschitz-continuous graients). Relate results in the smooth case were also obtaine by Ghaimi an Lan 03). However, to tackle the general case, where f ; ξ) may be non-smooth, Duchi et al. 05) resorte to a non-trivial smoothing scheme an a significantly more involve analysis. he resulting bouns have aitional factors logarithmic in the imension) compare to the guarantees in the smooth case. Moreover, an analysis is only provie for Eucliean problems where the omain W an Lipschitz parameter of f t scale with the L norm). In this note, we present an analyze a simple algorithm with the following properties: For Eucliean problems, it is optimal up to constants for both smooth an nonsmooth functions. his closes the gap between the smooth an non-smooth Eucliean problems in this setting. he algorithm an analysis are reaily applicable to non-eucliean problems. We give an example for the -norm, with the resulting boun optimal up to logarithmic factors. he algorithm an analysis are simpler than those propose in Duchi et al. 05). hey apply equally to the banit an zero-orer optimization setting, an can be reaily extene using stanar techniques, e.g. improve bouns for strongly-convex functions; regret/error bouns holing with high-probability rather than just in expectation; an improve bouns if allowe k > observations per roun instea of just two Hazan et al., 007; Shalev-Shwartz, 007; Agarwal et al., 00). Like previous algorithms, our algorithm is base on a ranom graient estimator, which given a function f an point w, queries f at two ranom locations close to w, an computes a ranom vector whose expectation is a graient of a smoothe version of f. he papers Nesterov 0); Duchi et al. 05); Ghaimi an Lan 03) essentially use the estimator
3 Banit an Zero-Orer Convex Optimization with wo-point Feeback which queries at w an w + δu where u is a ranom unit vector an δ > 0 is a small parameter), an returns fw + δu) fw)) u. ) δ he intuition is reaily seen in the one-imensional = ) case, where the expectation of this expression equals fw + δ) fw δ)), ) δ which inee approximates the erivative of f assuming f is ifferentiable) at w, if δ is small enough. In contrast, our algorithm uses a slightly ifferent estimator also use in Agarwal et al., 00), which queries at w δu, w + δu, an returns fw + δu) fw δu)) u. 3) δ Again, the intuition is reaily seen in the case =, where the expectation of this expression also equals Eq. ). When δ is sufficiently small an f is ifferentiable at w, both estimators compute a goo approximation of the true graient fw). However, when f is not ifferentiable, the variance of the estimator in Eq. ) can be quaratic in the imension, as pointe out by Duchi et al. 05): For example, for fw) = w an w = 0, the secon moment equals E fδu) f0)) u δ = E u 4 =. Since the performance of the algorithm crucially epens on the secon moment of the graient estimate, this leas to a highly sub-optimal guarantee. In Duchi et al. 05), this was hanle by aing an aitional ranom perturbation an using a more involve analysis. Surprisingly, it turns out that the slightly ifferent estimator in Eq. 3) oes not suffer from this problem, an its secon moment is essentially linear in the imension. We note that in this work, we assume that u is a ranom unit vector, similar to previous works. However, our results can be reaily extene to other istributions, such as uniform in the Eucliean unit ball, or a Gaussian istribution.. Algorithm an Main Results We consier the algorithm escribe in Figure, which performs stanar mirror escent using a ranomize graient estimator g t of a smoothe) version of f t at point w t. Following Duchi et al. 05), we assume that one can inee query f t at any point w t + δ t u t as specifie in the algorithm. he analysis of the algorithm is presente in the following theorem:. his may require us to query at a istance δ t outsie W. If we must query within W, then a stanar technique see Agarwal et al., 00) is to simply run the algorithm on a slightly smaller set ɛ)w, where ɛ > 0 is sufficiently large so that w t + δ tu t must be in W. Since the formal guarantee in hm. hols for arbitrarily small δ t, an each f t is Lipschitz, we can generally take δ t an hence ɛ) sufficiently small so that the aitional regret/error incurre is arbitrarily small. 3
4 Shamir Algorithm wo-point Banit Convex Optimization Algorithm Input: Step size η, function r : W R, exploration parameters δ t > 0 Initialize θ = 0. for t =,..., o Preict w t = arg max w W θ t, w rw) Sample u t uniformly from the Eucliean unit sphere {w : w = } Query f t w t + δ t u t ) an f t w t δ t u t ) Set g t = δ t f t w t + δ t u t ) f t w t δ t u t )) u t Upate θ t+ = θ t η g t en for heorem Assume the following conitions hol:. r is -strongly convex with respect to a norm, an sup w W rw) R for some R <.. f t is convex an G -Lipschitz with respect to the -norm. 3. he ual norm of is such that 4 E ut u t 4 p for some p <. R If η = p G, an δ t chosen such that δ t p R, then the sequence w,..., w generate by the algorithm satisfies the following for any an w W: E f t w t ) f t w ) c p G R, where c is some numerical constant. We note that conition is stanar in the analysis of the mirror-escent metho see the specific corollaries below), whereas conitions an 3 are neee to ensure that the variance of our graient estimator is controlle. As mentione earlier, the boun on the average regret which appears in hm. immeiately implies a similar boun on the error in a stochastic optimization setting, for the average point w = w t. We note that the result is robust to the choice of η, an is the same up to constants as long as η = ΘR/p G ). Also, the constant c, while always strictly positive, shrinks as δ t 0 see the proof below for etails). As a first application of the theorem, let us consier the case where is the Eucliean norm. In this case, we can take rw) = w, an the algorithm reuces to a stanar variant of online graient escent, efine as θ t+ = θ t g t an w t = arg min w W w θ t. In this case, we get the following corollary: Corollary Suppose f t for all t is G -Lipschitz with respect to the Eucliean norm, an W {w : w R}. hen using = an rw) = w, it hols for some constant c an any w W that E f t w t ) f t w ) c G R, 4
5 Banit an Zero-Orer Convex Optimization with wo-point Feeback he proof is immeiately obtaine from hm., noting that p = in our case. his boun matches up to constants) the lower boun in Duchi et al. 05), hence closing the gap between upper an lower bouns in this setting. As a secon application, let us consier the case where is the -norm,, the omain W is the simplex in R, > although our result easily extens to any subset of the -norm unit ball), an we use a stanar entropic regularizer: Corollary 3 Suppose f t for all t is G -Lipschitz with respect to the L norm. hen using = an rw) = i= w i logw i ), it hols for some constant c an any w W that E f t w t ) f t w log ) ) c G. his boun matches this time up to a factor polylogarithmic in ) the lower boun in Duchi et al. 05) for this setting. Proof he function r is -strongly convex with respect to the -norm see for instance Shalev-Shwartz, 0, Example.5), an has value at most log) on the simplex. Also, if f t is G -Lipschitz with respect to the -norm, then it must be G -Lipschitz with respect to the Eucliean norm. Finally, to satisfy conition 3 in hm., we upper boun 4 E u t 4 using the following lemma, whose proof is given in the appenix: Lemma 4 If u is uniformly istribute on the unit sphere in R, >, then 4 E u 4 log) c where c is a positive numerical constant inepenent of. Plugging these observations into hm. leas to the esire result. Finally, we make two aitional remarks on possible extensions an improvements to hm.. Remark 5 Querying at k > points) If the algorithm is allowe to query f t at k >, then it can be moifie to attain an improve regret boun, by computing k/ inepenent estimates of g t at every roun using a freshly sample u t each time), an using their average. his leas to a new graient estimator g t k, which satisfies E g t k k E g t + E g t. Base on the proof of hm., it is easily verifie that this leas to an average expecte regret boun of cg ) R + p /k for some numerical constant c. Remark 6 Non-Eucliean Geometries) When consiering norms other than the Eucliean norm, it is tempting to conjecture that our algorithm an analysis can be improve, by sampling u t from a istribution aapte to the geometry of that norm not necessarily the Eucliean ball), an assuming f t is Lipschitz w.r.t. the ual norm. However, aapting the proof an in particular getting appropriate versions of Lemma 8 an Lemma 9) oes not appear straightforwar, an the potential performance improvement is currently unclear. 5
6 Shamir 3. Proof of heorem As iscusse in the introuction, the key to getting improve results compare to previous papers is the use of a slightly ifferent ranom graient estimator, which turns out to have significantly less variance. he formal proof relies on a few simple lemmas liste below. he key lemma is Lemma 0, which establishes the improve variance behavior. Lemma 7 For any w W, it hols that g t, w t w η R + η g t. his lemma is the canonical result on the convergence of online mirror escent, an the proof is stanar see e.g. Shalev-Shwartz, 0). Lemma 8 Define the function ˆf t w) = E ut f t w + δ t u t ), over W, where u t is a vector picke uniformly at ranom from the Eucliean unit sphere. hen the function is convex, Lipschitz with constant G, satisfies sup ˆf t w) f t w) δ t G, w W an is ifferentiable with the following graient: ˆf t w) = E ut δ t f t w + δ t u t )u t. Proof he fact that the function is convex an Lipschitz is immeiate from its efinition an the assumptions in the theorem. he inequality follows from u t being a unit vector an that f t is assume to be G -Lipschitz with respect to the -norm. he ifferentiability property follows from Lemma. in Flaxman et al. 005). Lemma 9 For any function g which is L-Lipschitz with respect to the -norm, it hols that if u is uniformly istribute on the Eucliean unit sphere, then E gu) Egu)) 4 c L. for some numerical constant c. Proof A stanar result on the concentration of Lipschitz functions on the Eucliean unit sphere implies that Pr gu) Egu) > t) exp c t /L ) 6
7 Banit an Zero-Orer Convex Optimization with wo-point Feeback for some numerical constant c > 0 see the proof of Proposition.0 an Corollary.6 in Leoux, 005). herefore, = E gu) Egu)) 4 = Pr t=0 Pr t=0 gu) Egu) > 4 ) t t ) gu) Egu)) 4 > t t exp t=0 c ) t L t = L4 c ), where in the last step we use the fact that x=0 exp x)x =. he expression above equals cl / for some numerical constant c. Lemma 0 It hols that E g t w t = ˆf t w t ) where ˆf t ) is as efine in Lemma 8), an E g t w t cp G for some numerical constant c. Proof For simplicity of notation, we rop the t subscript. Since u has a symmetric istribution aroun the origin, E g w = E u δ fw + δu) fw δu)) u = E u δ fw + δu)) u + E u δ fw δu) u) = E u δ fw + δu)) u + E u δ fw + δu)u) = E u δ fw + δu)u which equals ˆfw) by Lemma 8. As to the secon part of the lemma, we have the following, where α is an arbitrary parameter an where we use the elementary inequality a b) a + b ). E g w = E u fw + δu) fw δu)) u δ = 4δ E u u fw + δu) fw δu)) = 4δ E u u fw + δu) α) fw δu) α)) δ E u u fw + δu) α) + fw δu) α) ) = δ E u u fw + δu) α) + E u u fw δu) α) ). 7
8 Shamir Again using the symmetric istribution of u, this equals δ E u u fw + δu) α) + E u u fw + δu) α) ) = δ E u u fw + δu) α). Applying Cauchy-Schwartz an using the conition 4 E u u 4 p state in the theorem, we get the upper boun Eu δ u 4 E u fw + δu) α) 4 = p E δ u fw + δu) α) 4. In particular, taking α = E u fw + δu) an using Lemma 9 noting that fw + δu) is G δ-lipschitz w.r.t. u in terms of the -norm), this is at most p c G δ) δ = cp G as require. We are now reay to prove the theorem. aking expectations on both sies of the inequality in Lemma 7, we have E g t, w t w η R + η E g t = η R + η E E g t w t. 4) Using Lemma 0, the right han sie is at most η R + ηcp G he left han sie of Eq. 4), by Lemma 0 an convexity of ˆf t, equals E E g t w t, w t w = E ˆf t w t ), w t w E ˆft w t ) ˆf ) t w ). By Lemma 8, this is at least E f t w t ) f t w )) G Combining these inequalities an plugging back into Eq. 4), we get E f t w t ) f t w )) G δ t + η R + cp G η. Choosing η = R/p G ), an any δt p R /, we get E f t w t ) f t w )) c + 3)p G R. 8 δ t.
9 Banit an Zero-Orer Convex Optimization with wo-point Feeback Diviing both sies by, the result follows. Acknowlegments his research was supporte in part by an Israel Science Founation grant 45/3 an an FP7 Marie Curie CIG grant. We thank the anonymous reviewers for several helpful comments. Appenix A. Proof of Lemma 4 We note that the istribution of u 4 is ientical to that of n 4, where n N 0, I n 4 ) is a stanar Gaussian ranom vector. Moreover, by a stanar concentration boun on the norm of Gaussian ranom vectors e.g. Corollary.3 in Barvinok, 005, with ɛ = /): max { Pr n ), Pr n ) } exp ). 6 Finally, for any value of n, we always have n n, since the Eucliean norm is always larger than the infinity norm. Combining these observations, an using A for the inicator function of the event A, we have n E u 4 4 = E n 4 ) = Pr n E ) + Pr n > n 4 n 4 E n 4 n 4 exp ) + Pr n > 6 = exp exp ) n n > ) E ) E n 4 n > / n 4 / ) 4 n > ) + 4 E n 4. 5) hus, it remains to upper boun E n 4 where n is a stanar Gaussian ranom variable. Letting n = n,..., n ), an noting that n,..., n are inepenent an ientically 9
10 Shamir istribute stanar Gaussian ranom variables, we have for any scalar z that Pr n z) = n Pr n i z) = i= Pr n z)) = Pr n > z)) ) Pr n > z) = Prn > z) ) exp z /), where ) is Bernoulli s inequality, an ) is using a stanar tail boun for a Gaussian ranom variable. In particular, the above implies that Pr n > z) exp z /). herefore, for an arbitrary positive scalar r, E n 4 = z=0 r Pr n 4 > z ) z z + Pr n > 4 z ) z z=0 z=r ) z r + exp z z=r = r ) r r) exp. In particular, plugging r = 4 log ) which is larger than, since we assume > ), we get 4 + log) + log )). Plugging this back into Eq. 5), we get that E u 4 exp ) log) + log ) 6, ) which can be shown to be at most c log) for all >, where c < 50 is a numerical constant. In particular, this means that 4 E u 4 4 c log) as require. References A. Agarwal, O. Dekel, an L. Xiao. Optimal algorithms for online convex optimization with multi-point banit feeback. In Conference on Learning heory COL), 00. A. Barvinok. Measure concentration lecture notes. ~barvinok/total70.pf, 005. N. Cesa-Bianchi, A. Conconi, an C. Gentile. On the generalization ability of on-line learning algorithms. Information heory, IEEE ransactions on, 509): , 004. J. Duchi, M. Joran, M. Wainwright, an A. Wibisono. Optimal rates for zero-orer optimization: the power of two function evaluations. Information heory, IEEE ransactions on, 65): , May 05. 0
11 Banit an Zero-Orer Convex Optimization with wo-point Feeback A. Flaxman, A. Kalai, an B. McMahan. Online convex optimization in the banit setting: graient escent without a graient. In ACM-SIAM Symposium on Discrete Algorithms SODA), 005. S. Ghaimi an G. Lan. Stochastic first- an zeroth-orer methos for nonconvex stochastic programming. SIAM Journal on Optimization, 34):34 368, 03. E. Hazan, A. Agarwal, an S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69):69 9, 007. M. Leoux. he concentration of measure phenomenon, volume 89. American Mathematical Soc., 005. Y. Nesterov. Ranom graient-free minimization of convex functions. echnical Report 0/6, ECORE, 0. S. Shalev-Shwartz. Online learning: heory, algorithms, an applications. PhD thesis, he Hebrew University, 007. S. Shalev-Shwartz. Online learning an online convex optimization. Founations an rens in Machine Learning, 4), 0.
Algorithms and matching lower bounds for approximately-convex optimization
Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department
More informationOn the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
JMLR: Workshop an Conference Proceeings vol 30 013) 1 On the Complexity of Banit an Derivative-Free Stochastic Convex Optimization Oha Shamir Microsoft Research an the Weizmann Institute of Science oha.shamir@weizmann.ac.il
More informationLinear Regression with Limited Observation
Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationOn conditional moments of high-dimensional random vectors given lower-dimensional projections
Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of
More informationChaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena
Chaos, Solitons an Fractals (7 64 73 Contents lists available at ScienceDirect Chaos, Solitons an Fractals onlinear Science, an onequilibrium an Complex Phenomena journal homepage: www.elsevier.com/locate/chaos
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationA Unified Theorem on SDP Rank Reduction
A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationDatabase-friendly Random Projections
Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional
More informationSelf-normalized Martingale Tail Inequality
Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationAPPROXIMATE SOLUTION FOR TRANSIENT HEAT TRANSFER IN STATIC TURBULENT HE II. B. Baudouy. CEA/Saclay, DSM/DAPNIA/STCM Gif-sur-Yvette Cedex, France
APPROXIMAE SOLUION FOR RANSIEN HEA RANSFER IN SAIC URBULEN HE II B. Bauouy CEA/Saclay, DSM/DAPNIA/SCM 91191 Gif-sur-Yvette Ceex, France ABSRAC Analytical solution in one imension of the heat iffusion equation
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More information1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.
Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationConnections Between Duality in Control Theory and
Connections Between Duality in Control heory an Convex Optimization V. Balakrishnan 1 an L. Vanenberghe 2 Abstract Several important problems in control theory can be reformulate as convex optimization
More informationCollaborative Ranking for Local Preferences Supplement
Collaborative Raning for Local Preferences Supplement Ber apicioglu Davi S Rosenberg Robert E Schapire ony Jebara YP YP Princeton University Columbia University Problem Formulation Let U {,,m} be the set
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationMonotonicity for excited random walk in high dimensions
Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationBasic Thermoelasticity
Basic hermoelasticity Biswajit Banerjee November 15, 2006 Contents 1 Governing Equations 1 1.1 Balance Laws.............................................. 2 1.2 he Clausius-Duhem Inequality....................................
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More informationRobust Bounds for Classification via Selective Sampling
Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationPolynomial Inclusion Functions
Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl
More informationarxiv: v2 [cs.ds] 11 May 2016
Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv:5.04466v2 [cs.ds] May 206 Department of Computer Science Brown University {jasperchlee,paul_valiant}@brown.eu May 3, 206 Abstract We
More informationII. First variation of functionals
II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent
More informationGLOBAL SOLUTIONS FOR 2D COUPLED BURGERS-COMPLEX-GINZBURG-LANDAU EQUATIONS
Electronic Journal of Differential Equations, Vol. 015 015), No. 99, pp. 1 14. ISSN: 107-6691. URL: http://eje.math.txstate.eu or http://eje.math.unt.eu ftp eje.math.txstate.eu GLOBAL SOLUTIONS FOR D COUPLED
More informationHow to Minimize Maximum Regret in Repeated Decision-Making
How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:
More informationSublinear Optimization for Machine Learning
Sublinear Optimization for Machine Learning Kenneth L. Clarkson, IBM Almaen Research Center Ela Hazan, Technion - Israel Institute of technology Davi P. Wooruff, IBM Almaen Research Center In this paper
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationCalculus of Variations
Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,
More informationIPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy
IPA Derivatives for Make-to-Stock Prouction-Inventory Systems With Backorers Uner the (Rr) Policy Yihong Fan a Benamin Melame b Yao Zhao c Yorai Wari Abstract This paper aresses Infinitesimal Perturbation
More information2886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 5, MAY 2015
886 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 61, NO 5, MAY 015 Simultaneously Structure Moels With Application to Sparse an Low-Rank Matrices Samet Oymak, Stuent Member, IEEE, Amin Jalali, Stuent Member,
More informationAdaptive Gain-Scheduled H Control of Linear Parameter-Varying Systems with Time-Delayed Elements
Aaptive Gain-Scheule H Control of Linear Parameter-Varying Systems with ime-delaye Elements Yoshihiko Miyasato he Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, okyo 6-8569, Japan
More informationA Sketch of Menshikov s Theorem
A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationTRAJECTORY TRACKING FOR FULLY ACTUATED MECHANICAL SYSTEMS
TRAJECTORY TRACKING FOR FULLY ACTUATED MECHANICAL SYSTEMS Francesco Bullo Richar M. Murray Control an Dynamical Systems California Institute of Technology Pasaena, CA 91125 Fax : + 1-818-796-8914 email
More informationResistant Polynomials and Stronger Lower Bounds for Depth-Three Arithmetical Formulas
Resistant Polynomials an Stronger Lower Bouns for Depth-Three Arithmetical Formulas Maurice J. Jansen University at Buffalo Kenneth W.Regan University at Buffalo Abstract We erive quaratic lower bouns
More informationLecture 10: October 30, 2017
Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π
More informationCalculus and optimization
Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function
More informationThe chromatic number of graph powers
Combinatorics, Probability an Computing (19XX) 00, 000 000. c 19XX Cambrige University Press Printe in the Unite Kingom The chromatic number of graph powers N O G A A L O N 1 an B O J A N M O H A R 1 Department
More informationORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS. Gianluca Crippa
Manuscript submitte to AIMS Journals Volume X, Number 0X, XX 200X Website: http://aimsciences.org pp. X XX ORDINARY DIFFERENTIAL EQUATIONS AND SINGULAR INTEGRALS Gianluca Crippa Departement Mathematik
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics
UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,
More informationREAL ANALYSIS I HOMEWORK 5
REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove
More informationSwitching Time Optimization in Discretized Hybrid Dynamical Systems
Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More informationConcentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification
Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection an System Ientification Borhan M Sananaji, Tyrone L Vincent, an Michael B Wakin Abstract In this paper,
More informationON THE OPTIMALITY SYSTEM FOR A 1 D EULER FLOW PROBLEM
ON THE OPTIMALITY SYSTEM FOR A D EULER FLOW PROBLEM Eugene M. Cliff Matthias Heinkenschloss y Ajit R. Shenoy z Interisciplinary Center for Applie Mathematics Virginia Tech Blacksburg, Virginia 46 Abstract
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationExponential asymptotic property of a parallel repairable system with warm standby under common-cause failure
J. Math. Anal. Appl. 341 (28) 457 466 www.elsevier.com/locate/jmaa Exponential asymptotic property of a parallel repairable system with warm stanby uner common-cause failure Zifei Shen, Xiaoxiao Hu, Weifeng
More informationMonte Carlo Methods with Reduced Error
Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationNew bounds on Simonyi s conjecture
New bouns on Simonyi s conjecture Daniel Soltész soltesz@math.bme.hu Department of Computer Science an Information Theory, Buapest University of Technology an Economics arxiv:1510.07597v1 [math.co] 6 Oct
More informationCharacterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations
Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial
More informationMARKO NEDELJKOV, DANIJELA RAJTER-ĆIRIĆ
GENERALIZED UNIFORMLY CONTINUOUS SEMIGROUPS AND SEMILINEAR HYPERBOLIC SYSTEMS WITH REGULARIZED DERIVATIVES MARKO NEDELJKOV, DANIJELA RAJTER-ĆIRIĆ Abstract. We aopt the theory of uniformly continuous operator
More informationLinear and quadratic approximation
Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationStep 1. Analytic Properties of the Riemann zeta function [2 lectures]
Step. Analytic Properties of the Riemann zeta function [2 lectures] The Riemann zeta function is the infinite sum of terms /, n. For each n, the / is a continuous function of s, i.e. lim s s 0 n = s n,
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationMinimax rates for memory-bounded sparse linear regression
JMLR: Workshop an Conference Proceeings vol 40:1 4, 015 Minimax rates for memory-boune sparse linear regression Jacob Steinhart Stanfor University, Department of Computer Science John Duchi Stanfor University,
More informationThe Subtree Size Profile of Plane-oriented Recursive Trees
The Subtree Size Profile of Plane-oriente Recursive Trees Michael FUCHS Department of Applie Mathematics National Chiao Tung University Hsinchu, 3, Taiwan Email: mfuchs@math.nctu.eu.tw Abstract In this
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationOn the Surprising Behavior of Distance Metrics in High Dimensional Space
On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com
More informationHyperbolic Moment Equations Using Quadrature-Based Projection Methods
Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the
More informationLOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES
LOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES ÁRPÁD BÉNYI AND KASSO A. OKOUDJOU Abstract. By using tools of time-frequency analysis, we obtain some improve local well-poseness
More informationAn extension of Alexandrov s theorem on second derivatives of convex functions
Avances in Mathematics 228 (211 2258 2267 www.elsevier.com/locate/aim An extension of Alexanrov s theorem on secon erivatives of convex functions Joseph H.G. Fu 1 Department of Mathematics, University
More informationGeneralized Tractability for Multivariate Problems
Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,
More informationPerturbation Analysis and Optimization of Stochastic Flow Networks
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,
More information