arxiv: v2 [cs.ds] 11 May 2016

Size: px
Start display at page:

Download "arxiv: v2 [cs.ds] 11 May 2016"

Transcription

1 Optimizing Star-Convex Functions Jasper C.H. Lee Paul Valiant arxiv: v2 [cs.ds] May 206 Department of Computer Science Brown University May 3, 206 Abstract We introuce a polynomial time algorithm for optimizing the class of star-convex functions, uner no Lipschitz or other smoothness assumptions whatsoever, an no restrictions except exponential bouneness on a region about the origin, an Lebesgue measurability. The algorithm s performance is polynomial in the requeste number of igits of accuracy an the imension of the search omain. This contrasts with the previous best known algorithm of Nesterov an Polyak which has exponential epenence on the number of igits of accuracy, but only n ω epenence on the imension n (where ω is the matrix multiplication exponent), an which further requires Lipschitz secon ifferentiability of the function [3]. Star-convex functions constitute a rich class of functions generalizing convex functions, incluing, for example: for any convex (or star-convex) ( ) functions f, g, with global minima f(0) = g(0) = 0, their power mean h(x) = f(x) p +g(x) p /p 2 is star-convex, for any real p, efining powers via limits as appropriate. Star-convex functions arise as loss functions in non-convex machine learning contexts, incluing, for ata points X i, parameter ( ) /p, vector θ, an any real exponent p, the loss function h θ,x (ˆθ) = i (ˆθ θ) X i p significantly generalizing the well-stuie convex case where p. Further, for any function g > 0 on the surface of the unit sphere (incluing iscontinuous, or pathological functions that ( have) ifferent x behavior at rational vs. irrational angles), the star-convex function h(x) = x 2 g x 2 extens the arbitrary behavior of g to the whole space. Despite a long history of successful graient-base optimization algorithms, star-convex optimization is a uniquely challenging regime because ) graients an/or subgraients often o not exist; an 2) even in cases when graients exist, there are star-convex functions for which graients provably provie no information about the location of the global optimum. We thus bypass the usual approach of relying on graient oracles an introuce a new ranomize cutting plane algorithm that relies only on function evaluations. Our algorithm essentially looks for structure at all scales, since, unlike with convex functions, star-convex functions o not necessarily isplay simpler behavior on smaller length scales. Thus, while our cutting plane algorithm refines a feasible region of exponentially ecreasing volume by iteratively removing cuts, unlike for the stanar convex case, the structure to efficiently iscover such cuts may not be foun within the feasible region: our novel star-convex cutting plane approach iscovers cuts by sampling the function exponentially far outsie the feasible region. We emphasize that the class of star-convex functions we consier is as unrestricte as possible: the class of Lebesgue measurable star-convex functions has theoretical appeal, introucing to the omain of polynomial-time algorithms a huge class with many interesting pathologies. We view our results as a step forwar in unerstaning the scope of optimization techniques beyon the garen of convex optimization an local graient-base methos.

2 Introuction Optimization is one of the most influential ieas in computer science, central to many rapily eveloping areas within computer science, an also one of the primary exports to other fiels, incluing operations research, economics an finance, bioinformatics, an many esign problems in engineering. Convex optimization, in particular, has prouce many general an robust algorithmic frameworks that have each become funamental tools in many ifferent areas: linear programming has become a general moeling tool, its simple structure powering many algorithms an reuctions; semiefinite programming is an area whose scope is rapily expaning, yieling many of the best known approximation algorithms for optimizing constraint satisfaction problems [2]; convex optimization generalizes both of these an has introuce powerful optimization techniques incluing interior point an cutting plane methos. Our eveloping unerstaning of optimization has also le to new algorithmic esign principles, which in turn leas to new insights into optimization. Recent progress in algorithmic graph theory has benefite enormously from the optimization perspective, as many recent results on max flow/min cut [0, 6, 8, 3, 5], bipartite matching [0], an Laplacian solvers [20, 3, 7] have mae breakthroughs that stem from eveloping a eeper unerstaning of the characteristics of convex optimization techniques in the context of graph theory. These successes motivate the quest for a eeper an broaer unerstaning of optimization techniques: ) to what egree can convex optimization techniques be extene to non-convex functions; 2) can we evelop general new tools for tackling non-convex optimization problems; an 3) can new techniques from non-convex optimization yiel new insights into convex optimization? As a partial answer to the first question, graient escent perhaps the most natural optimization approach has ha enormous success recently in a variety of practically-motivate non-convex settings, sometimes with provable guarantees. The metho an its variants are the e facto stanar for training (eep) neural networks, a hot topic in high imensional non-convex optimization, with many recent practical results (e.g. [7, 6]). Graient escent can be thought of as a greey algorithm, which repeately chooses the most attractive irection from the local lanscape. The efficacy of graient escent algorithms relies on local assumptions about the function: for example, if the first erivative is Lipschitz (slowly varying), then one can take large ownhill steps in the graient irection without worrying that the function will change to going uphill along this irection. Thus the convergence of graient escent algorithms typically epens on a Lipschitz parameter (or other smoothness measure), an conveniently oes not epen on the imension of the search space. Many of these algorithms converge to within ɛ of a local optimum in time poly(/ɛ), with aitional polynomial epenence on the Lipschitz constant or other appropriate smoothness guarantee [2, 5]. In cases where all local optima are global optima, then one has global convergence guarantees. While one intuitively expects graient escent algorithms to always converge to a local minimum, in this paper we stuy the optimization of a natural generalization of convex functions where, espite the global optimum being the only stationary point/local minimum for every function in this class, provably no variant of graient escent converges in polynomial time. The class of star-convex functions, which we efine below, inclues many functions of both practical an theoretical interest, both generalizing common families of convex functions to wier parameter regimes, an introucing new pathologies not foun in the convex case. We show, essentially, how to make a graient-base cutting plane algorithm robust to many new pathologies, incluing lack of graients or subgraients, long narrow riges an rapi oscillation in irections transverse to the global minimum, an, in fact, almost arbitrary iscontinuities. This challenging setting gives new insights into what funamentally enables cutting plane algorithms to work, which we view as progress towars answering questions 2 an 3 above.

3 (a) g efine on the unit circle (b) Linear extension of g to f Figure : An example star-convex function f efine by linearly extrapolating an arbitrary positive function g efine on the unit circle (in white).. Star-convex functions This paper focuses on the optimization of star-convex functions, a particular class of (typically) nonconvex functions that inclues convex functions as a special case. We efine these functions as follows, base on the efinition in Nesterov an Polyak [3]. Definition. (Star-convex functions). A function f : R n R is star-convex if there is a global minimum x R n such that for all α [0, ] an x R n, f(αx + ( α)x) αf(x ) + ( α)f(x) We call x the star center of f. For convenience, in the rest of the paper we shall refer to the minimum function value as f. See Appenix A for a variety of basic constructions an examples of star-convex functions. Intuitively, if we visualize the objective function as a lanscape, star-convexity means that the global optimum is visible from every point there are no riges on the way to the global optimum, but there coul be many riges in transverse irections. Since the global optimum is always visible in a ownhill irection from every point, graient escent methos woul seem to be effective. Counterintuitively, these methos all fail in general. In Section.3 we emonstrate that stanar graient escent an cutting plane methos fail in the absence of Lipschitz guarantees. One broa class of star-convex functions that gives a goo sense of the scope of this efinition is constructe by the following process: ) pick an arbitrary positive function g(θ) on the unit circle (see Figure a) which may be iscontinuous, rapily oscillating, or otherwise baly behave; an 2) linearly exten this function to a function f on the entire plane, about the value f(0) = 0 (see Figure b). The name star-convex comes from the fact that each sublevel set (the set of x for which f(x) < c, for some c) is star-shape. At any point at which a graient or subgraient exists, the star center (global optimum) lies in the halfspace opposite the (sub)graient. However, even for the simple example of Figure, graients o not exist for angles θ at which g is iscontinuous, an subgraients o not exist at most angles, incluing those angles where g is a local maxima with respect to θ. Further, even for ifferentiable star-convex functions, graients can be misleaing, since a rapily oscillating g implies that graients typically point in the transverse irection, nearly orthogonal to the irection of the star center. While it is often stanar to esign algorithms assuming one has access to an oracle that returns both function values an graients, we instea only assume access to the function value: even in the case when graients exist, it is unclear whether they are algorithmically helpful in our setting. (See Section.3 for etails.) Inee, we pose this as an open problem: uner what assumptions (short of 2

4 the Lipschitz guarantees on the secon erivative of Nesterov an Polyak [3]), can one meaningfully use a graient oracle to optimize star-convex functions? In this paper, we show that, assuming only Lebesgue measurability an an exponential boun on the function value within a large ball, our algorithm optimizes a star-convex function in polylog(/ɛ) time where ɛ is the esire accuracy in function value. Theorem.2. (Informal) Given evaluation oracle access to a Lebesgue measurable star-convex function f : R n R an an error parameter ɛ, with the guarantee that x is within raius R of the origin, Algorithm returns an estimate f 0 of the minimum function value such that f 0 f ɛ with high probability in time poly(n, log ɛ, log R). In previous work, Nesterov an Polyak [3] introuce a new aaptation of Newton s metho to fin local minima in functions that are twice-ifferentiable, with Lipschitz continuity of their secon erivatives. For the class of Lipschitz-twice-ifferentiable star-convex functions, they show that their algorithm converges to within ɛ of the global optimum in time O( ɛ ), using star convexity to lower boun the amount of progress in each of their optimization steps. Our results are stronger in two significant senses: ) as oppose to assuming Lipschitz twiceifferentiability, we make no continuity assumptions whatsoever, optimizing over an essentially arbitrary measurable function within the class of star-convex functions; 2) while the algorithm of [3] requires exponential time to estimate the optimum to k igits of accuracy, our algorithm is polynomial in the requeste number of igits of accuracy. The reaer may note that the complexity of our algorithm oes epen polynomially on the number of imensions n of the search space, whereas the Nesterov-Polyak algorithm uses a number of calls to a Hessian oracle that is inepenent of the imension n. However, our epenency on the imension n is necessary in the absence of Lipschitz guarantees, even for convex optimization []. We further point out an important istinction between star-convex optimization an star-shape optimization. A star-shape function is efine similarly to a star-convex function, except the star center is allowe to be locate away from the global minimum. Certain NP-har optimization problems, incluing max-clique can be rephrase in terms of star-shape optimization [], an the problem is in general impossible without continuity guarantees, for the global minimum may be hien along a single ray from the star center that is iscontinuous from the rest of the function. In this sense, our star-convex optimization algorithm narrowly avois solving an NP-har optimization problem..2 Applications of Star-Convex Functions We begin with the following concrete example: the function ( x p + y p ) /p is convex for p, yet is star-convex for all real p, positive or negative, taking limits as appropriate for p = 0. We generalize this example significantly below. Empirical risk minimization is a central technique in machine learning [9], where, given training ata x i with labels y i, we seek a hypothesis h so as to minimize m m L(h(x i ), y i ), i= where L is a loss function that escribes the penalty for mispreiction, on input our preiction h(x i ) an the true answer y i. We take the hypothesis h to be a linear function (this setting inclues kernel methos that preprocess x i first an then apply a linear function). If the loss function L is convex, then fining the optimal hypothesis h is a convex optimization problem, which is wiely use in practice: for exponent p, we let L(ŷ i, y i ) = ŷ i y i p an thus aim to optimize the convex function arg min h m h x i y i p. () i= 3

5 In this paper we raw attention to the interesting regime of Equation where p <. This regime is not accessible with stanar techniques. It is straightforwar to verify, however, that the (/p)th power of the objective function of Equation is star-convex, an thus the main result of our paper yiels an efficient optimization algorithm. Slightly more generally: Corollary.3. For a loss function L(ŷ i, y i ) such that there exists a real number p for which L(, ) p is a star-convex function of its first argument, the following empirical risk minimization problem can be optimize to accuracy ɛ in time poly(n, log /ɛ) by inputting its (/p)th power to the main algorithm of this paper: arg min h m m L(h(x i ), y i ), i= (We note that negative p is a somewhat peculiar regime in the above corollary, where running a minimization algorithm on the (/p)th power of the loss woul actually yiel a global maximum instea of minimum; we inclue this case here for completeness sake.) The regime where p (0, ) has the structure that, when one is far from the right answer, small changes to the hypothesis o not significantly affect performance, but the closer one gets to the true parameters, the richer the lanscape becomes. One of many interesting settings with this behavior is biological evolution, where fit creatures have a rich lanscape to traverse, while rastically unfit creatures fail to survive. A paper by one of the authors propose star-convex optimization as a regime where evolutionary algorithms might be unexpectely successful. This current work fins an affirmative answer to an open question raise there by showing the first polynomial time algorithm for optimizing this general class of functions. (The previous results by Nesterov an Polyak [3] fail to apply to this setting because the objective function has regions that behave for some parameter regimes like x, which is not Lipschitz.) Corollary.4 (Extening Theorem 4.5 of [8]). There is a single mutation algorithm uner which for any p > 0, efining the loss function L(ŷ, y) = ŷ y p, the class of constant-egree polynomials from R n R m with boune coefficients is evolvable with respect to all istributions over the raius r ball..3 Nee for Novel Algorithmic Techniques Before we outline our approach to optimizing star-convex functions, we emonstrate the nee for novel algorithmic approaches by explaining how a wie variety of stanar techniques fail on this challenging class of functions. We start by explaining how simple star-convex functions such as ( x + y ) 2 confoun graient escent, cutting plane an relate approaches, an en with a pathological example with information-theoretic guarantees about its harness to optimize, even given arbitrarily accurate first-orer (graient) oracle access. Consier the star-convex function ( x + y ) 2, which can be thought of essentially as x + y. This function has unique global minimum (an star center) at the origin. Graients of this function go to infinity as either x or y goes to 0, thus the function essentially has eep canyons along both the x an y axes. Intuitively, this function is challenging to optimize because, espite the origin lying in a ownhill irection from every point, graient escent will typically converge to one of the axes first, at which point the graient has near-infinite magnitue in the irection of the nearest axis, an this irection is thus near-orthogonal to the irection of the origin. In short, many variants of graient escent will have the following behavior: rapily jump towars one of the two axes, an then fail to make significant further progress as the search point oscillates aroun that axis. This is in part a reflection of the fact that the graient of ( x + y ) 2 is not Lipschitz, an in fact varies arbitrarily rapily as (x, y) converges towars either axis; since these axes are canyons in the search 4

6 space, typical algorithms will spen a isproportionate amount of their time stumbling aroun this baly-behave region. There are many variants of graient escent constructe to tackle optimization settings that are challenging for vanilla graient escent. One recent promising line of work concerns normalize graient escent [5], which is esigne to work well espite regions where the magnitue of the graient is unusually small or large. However, the analysis techniques rely on the strict local quasi-convexity property, which says that there is a global constant ɛ such that the ownhill halfspace inuce by any graient contains a ball of raius ɛ about the global optimum; this property oes not apply even to the simple function ( x + y ) 2, where halfspaces cut arbitrarily close to the origin, the closer one gets to either the x or y axis. As an alternative to graient escent methos, one might consier a graient-base cutting plane algorithm. Cutting plane methos work by iteratively cutting a large search space into ever-smaller regions until they converge to a tiny region guarantee to contain the global optimum. Cutting plane algorithms, even for convex optimization, are known to typically lea to ill-conitione behavior, where one imension of the search region becomes much smaller than the overall iameter of the region, leaing to ba numerical properties, ifficulty fining an manipulating graients, etc. These problems become even worse in our star-convex setting, even for the simple function ( x + y ) 2 : as the search space converges aroun one of the axes, the graients become even more strongly oriente towars that axis, yieling cuts that make the alreay thin region even thinner, instea of cutting in a more prouctive transverse irection. Inee, up to numerical precision, the compute graients may be foun to point exactly towars the nearest axis, making cuts in the transverse irection impossible, an thus halting the overall progress of the algorithm inefinitely. Having seen simple examples showing how stanar techniques cannot optimize star-convex functions, we now present a more sophisticate an pathological example which proves that, even with access to a first-orer oracle an the assumptions of infinite ifferentiability an bouneness on a region aroun the origin, no eterministic algorithm can efficiently optimize star-convex functions without Lipschitz guarantees, motivating the somewhat unusual ranomize flavor of the algorithm of this paper. In particular, we show that, for any eterministic polynomial time algorithm, there is an infinitely ifferentiable star-convex function on R 2 with a unique global minimum near the origin such that for all querie points ) the returne function value is always an 2) the returne graient epens only on the y-coorinate. The existence of such a function implies the inability of the algorithm to fin the x-coorinate of the star center. Example.5. Given a eterministic polynomial time optimization algorithm A, we construct a starconvex function f A that algorithm A fails to optimize. We assume that A will only query the function insie some isk of raius R about the origin, for some (possibly exponentially large) R. Now, we first choose a y-coorinate y 0 (that A cannot specify exactly, for example an irrational number) for the star center; we shall choose x 0 later. Then, we simulate running the algorithm A, assuming that in response to any query (x, y) that A makes to the function/graient oracle, the return value is f A (x, y) = an the graient f A (x, y) is 0 in the x irection an /(y y 0 ) in the y irection. Since A is a polynomial time algorithm, there must only be polynomially many queries before A terminates. Therefore, we may choose x 0 for the star center such that ) (x 0, y 0 ) is not exponentially close to collinear with any pair of query points, an 2) the x coorinate of the star center, x 0, is chosen to be significantly far from the x coorinates of all query points. We thus construct a star-convex function f A subject to the constraints that ) it passes through all the points simulate above, with corresponing graients, 2) it has star center at (x 0, y 0 ), an 3) it is infinitely ifferentiable, with each erivative being at most exponentially large in the egree of the erivative an the running time of A. (Since no pair of query points are close to collinear with the 5

7 star center, stanar interpolation lets us construct f A by first efining a smooth function g A on the unit circle an linearly extening g A to the entire plane, except for an exponentially small region about (x 0, y 0 ) that we make smooth instea of conically shape.) Since the function an graient oracles return zero information about the x-coorinate of the star center, algorithm A clearly has a hopeless task optimizing this function. In summary, graient information is very har to use effectively in star-convex optimization, even for smooth functions. Our algorithm, outline below, instea only queries the function value not because we object to graients, but rather because they o not seem to help..4 Our Approach Our overall approach is via the ellipsoi metho, which repeately refines an ellipsoial region containing the star center (global optimum) by iteratively computing cuts that contain the star center, while significantly reucing the overall volume of the ellipsoi. As mentione in Section.3, even for smooth functions with access to a graient oracle, the cutting planes inuce by the graients may yiel no significant progress in some irections. Fining useful cutting planes in the star-convex setting requires three novelties see the introuction to Section 4 for complete etails. First, the star-convex function may be iscontinuous, without even subgraients efine at most points, so we instea rely on a sampling process involving the blurre logarithm of the objective function. The blurre logarithm mitigates both the (potentially) exponential range of the function, an the (potentially) arbitrary iscontinuities in the omain. The blurre logarithm, furthermore, is ifferentiable, an sampling results let us estimate an use its erivatives in our algorithm. This technique is similar to that of ranomize smoothing [4]. Secon, the negative graient of the blurre logarithm might point away from the star center (global optimum) espite all graients of the unblurre function (when they exist) pointing towars the star center because of the way blurring interacts with sharp wege-shape valleys. Aressing this requires an averaging technique that repeately samples Gaussians-in-Gaussians, until it etects sufficient conitions to conclue that the graient may be use as a cut irection. Thir an finally, the usual cutting plane criterion that the volume of the feasible region ecreases exponentially is no longer sufficient in the star-convex setting, as an ellipsoi in two coorinates (x, y) might get repeately cut in the y irection without any progress restricting the range of x. This is provably not a concern in convex optimization. Our algorithm tackles this issue by locking axes of the ellipsoi smaller than some threshol τ, an seeking cuts orthogonal to these axes; this orthogonal signal may be hien by much larger erivatives in other irections, an hence requires new techniques to expose. The counterintuitive approach is that, in orer to expose structure within an exponentially thin ellipsoi imension we must search exponentially far outsie the ellipsoi in this imension..5 Stochastic Optimization Our approach extens easily to the stochastic optimization setting. Here, the objective function is the expectation over a istribution of star-convex functions which share the same star center an optimum value. For example, in the context of empirical risk minimization with a star-convex loss function L where we assume there is a true hypothesis h such that each labele example is correctly preicte by h, then for each example i, the loss L(ĥ(x i), y i ) consiere as a function of ĥ is star-convex, with global optimum 0 at ĥ = h. An the overall objective function is the expectation of this quantity, over ranom examples: E i [L(ĥ(x i), y i )], which is thus also star-convex with global optimum 0 at ĥ = h. The optimization algorithm for this new stochastic setting is actually ientical to the general starconvex optimization algorithm. Since star-convex functions can take arbitrarily unrelate values on nearby rays from the star center, our algorithm is alreay fully equippe to eal with the situation, an 6

8 aing explicitly unrelate values to the moel by stochastically sampling from a family of functions L i makes the problem no harer. The algorithm, analysis, an convergence are unchange..6 Outline In Section 2 we introuce the oracle moel by which we moel access to star-convex functions. Because we aress such a large class of potentially baly-behave functions (incluing, for example, starconvex functions which, in polar coorinates, behave very ifferently epening on whether the angle is rational or irrational), we take great care to specify the input an output requirements of our optimization algorithm appropriately, in the spirit of Lovász [9]. In Section 3 we present the overall star-convex optimization algorithm, a cutting-plane metho that at each stage tracks an ellipsoi that contains the star center (global optimum). The heart of the algorithm consists of fining appropriate cutting planes at each step, which we present in Section 4. 2 Problem Statement an Overview Our aim here is to iscuss algorithms for optimizing star-convex functions in the most general setting possible, an thus we must be careful about how the functions are specifie to the algorithms. In particular, we o not assume continuity, so functions with one behavior on the rational points an a separate behavior on irrational points are possible. Therefore, it is essential that our algorithms have access to the function values of f at inputs beyon the usual rational points expressible via stanar computer number representations. As motivation, see Example 2. which escribes a star-convex function that has value on essentially all rational points an algorithm is likely to query, espite having a rather ifferent lanscape on the irrational points, leaing to a global minimum value of 0. Example 2.. We present the following example of a class of star-convex functions f in 2-imensions parameterize by integers (i, j) such that f i,j has global optimum at (/ 2 + i, / 3 + j). The class has the property that, if i an j are chosen ranomly from an exponential range, then, except with exponentially small probability, any (probabilistic) polynomial time algorithm that accesses f only at rational points will learn nothing about the location of the global optimum. It emonstrates the nee for access beyon the rationals. We efine a star-convex function f i,j parameterize by integers (i, j) that has a unique global optimum at (/ 2 + i, / 3 + j). We evaluate f(x, y) via three cases:. If the ray from (/ 2+i, / 3+j) to (x, y) passes through no rational points, then let f(x, y) = (x, y) (/ 2 + i, / 3 + j) Otherwise, if the ray from (/ 2 + i, / 3 + j) to (x, y) passes through a rational point (p, q), then: (a) If p = i an q = j then let f(x, y) = (x, y) (/ 2 + i, / 3 + j) 2 as above. (b) Else, let f(x, y) = (x, y) (/ 2 + i, / 3 + j) 2 (p, q) (/ 2 + i, / 3 + j) 2 We note that since / 2 an / 3 are linearly inepenent over the rationals, no ray from (/ 2 + i, / 3 + j) passes through more than one rational point. Hence the above three cases give a complete efinition of f i,j. The function f is star-convex since each of the three cases above apply on the entirety of any ray from the global optimum, an these cases each efine a linear function on the ray. The erivative of 7

9 f along any of these rays is at most /( / 2), since ( / 2) is the closest that a point outsie the integer square at (i, j) can get to the global optimum (/ 2 + i, / 3 + j); thus the function is linearly boune in any finite region. Further, f evaluate at any rational point (p, q) will fall into Case 2, an thus, unless ( p, q ) = (i, j), the function f will return. Thus, no algorithm that queries f only at rational points can efficiently optimize this class of functions, because unless the algorithm exactly guesses the pair (i, j) which was rawn from an exponentially large range no information is gaine (the value of f will always be otherwise). Since irectly querying function values of general star-convex functions at rational points is so limiting, we instea introuce the notion of a weak sampling evaluation oracle for a star-convex function, aapting the efinition of a weak evaluation oracle by Lovász [9]. Definition 2.2. A weak sampling evaluation oracle for a function f : R n R takes as inputs a point x Q n, a positive efinite covariance matrix Σ, an an error parameter ɛ. The oracle first chooses a ranom point y N (x, Σ), an returns a value r such that f(y) r < ɛ. The Gaussian sampling in Definition 2.2 can be equivalently change to choosing a ranom point in a ball of esire (small) raius, since any Gaussian istribution can be approximate to arbitrary precision (in the total variation sense) as the convolution of itself with a small enough ball, an vice versa. Because inputs an outputs to the oracle must be expressible in a polynomial number of igits, we consier well-guarantee star-convex functions (in analogy with Lovász [9]), where the raius boun R an function boun B below shoul be interprete as huge numbers (with polynomial numbers of igits), such as 0 00 an respectively. Numerical accuracy issues in our analysis are analogous to those for the stanar ellipsoi algorithm an we o not iscuss them further. Definition 2.3. A weak sampling evaluation oracle for a function f is calle well-guarantee if the oracle comes with two bouns, R an B, such that ) the global minimum of f is within istance R of the origin, an 2) within istance 0nR of the origin, f(x) B. Such sampling gets aroun the obstacles of Examples.5 an 2. because even if the evaluation oracle is querie at a preictable rational point, the value it returns will represent the function evaluate at an unpreictable, (typically) irrational point nearby. At this point, our notion of oracle access may seem unnatural, in part because it allows access to the function at irrational points that are not computationally expressible. However, there are two natural an wiely use justifications for this approach, of ifferent flavors. First, the oracle represents a computational moel of a mathematical abstraction (the unerlying star-convex function), where even for mathematically pathological functions, we can actually in many cases implement simple coe to emulate oracle access in the manner escribe above. For example, it is in fact easy to implement a weak sampling evaluation oracle for the function of Example 2., where rays from the star center through rational points behave ifferently from the other rays. Since the rational points have Lebesgue measure 0, the complexities introuce by case 2 of Example 2. happen with probability 0, an thus we can write coe to simulate a weak sampling evaluation oracle by only implementing the simpler case, evaluating f(x, y) = (x, y) (/ 2 + i, / 3 + j) 2 to the requeste accuracy ±ɛ. Secon, setting oracle implementation issues asie, the moel cleanly separates accessing a potentially pathological function, from computing properties of it, in this case, optimizing it. The more pathological a function is, the more surprising it is that any efficient automate technique can extract structure from it. Thus, in some sense, the unrealistic regime of pathological functions, for which we o not even know how to write coe emulating oracle access, yiels the most surprising regime for the success of the algorithmic results of this paper. This regime is the most insightful from a theory perspective almost exactly because it is the most unnatural an counterintuitive from a practical perspective. 8

10 Having establishe weak sampling evaluation oracles as input to our algorithms, we must now take the same care to efine the form of their outputs. The output of a traitional (convex) optimization problem is an x value such that f(x) is close to the global minimum. For star-convex functions, accesse via a weak sampling evaluation oracle, one can instea ask for a small spherical region in which f(x) remains close to its global optimum given an input point x within istance τ of the star 4B center, star convexity an bouneness of f imply that f(x) must be within value τ 0nR R τ of the global optimum (see the proof of Lemma 3.2). However, such an output requirement is too much to ask of a star-convex optimization algorithm: see Example B. for an example of a star-convex function for which there is a large region in which the istributions of f(x) in all small (Gaussian) balls are inistinguishable from each other except with negligible probability, say That is, no algorithm can hope to istinguish a small ball aroun the global optimum from a small ball anywhere else in this region, except by using close to 0 00 function evaluations. Instea, since on this entire region, the function is close to the global optimum except with 0 00 probability, the most natural option is for an optimization algorithm to return any portion of this region, namely a region where the function value is within ɛ of optimal except with probability This is how we efine the output of a star-convex optimization problem: in analogy to the input convention, we ask optimization algorithms to return a Gaussian region, specifie by a mean an covariance, on which the function is near-optimal on all but a δ fraction of its points. (See Example B. for etails.) Definition 2.4. The weak star-convex optimization problem for a Lebesgue measurable star-convex function f, parameterize by δ, ɛ, F, is as follows. Given a well-guarantee weak sampling evaluation oracle for f, return with probability at least F a Gaussian G such that P r[f(x) f + ɛ : x G] δ. We now formally state our main result. Theorem 2.5. Algorithm optimizes (in the sense of Definition 2.4) any Lebesgue measurable starconvex function f in time poly(n, /δ, log ɛ, log F, log R, log B). Observe that we require only Lebesgue measurability of our objective function, an we make no further continuity or ifferentiability assumptions. The measurability is necessary to ensure that probabilities an expectations regaring the function are well-efine, an is essentially the weakest assumption possible for any probabilistic algorithm. The minimality in our assumptions contrasts that of the work by Nesterov an Polyak, which assumes Lipschitz continuity in the secon erivative [3]. It is mathematically interesting that the carinality of the set of star-convex functions which we optimize, even in the 2-imensional case, equals the huge quantity 2 R, while the carinality of the entire set of continuous functions which is NP-har to optimize, an strictly contains most stanar optimization settings is only R. 3 Our Optimization Approach In this section, we escribe our general strategy of optimizing star-convex functions by an aaptation of the ellipsoi algorithm. Our algorithm, like the stanar ellipsoi algorithm, looks for the global optimum insie an ellipsoi whose volume ecreases by a fixe ratio for each in a series of iterations, via algorithmically iscovere cuts that remove portions of the ellipsoi that are iscovere to lie in the uphill irection from the center of the ellipsoi. In Section 4 we will explore the properties of star-convex functions that will enable us to prouce such cuts. However, it is crucial that, unlike for stanar convex optimization, such cuts are not enough. Consier, for example, the possibility that for a star-convex function f(x, y), a cutting plane oracle only ever returns cuts in the x-irection, an 9

11 never reuces the size of the ellipsoi in the y irection, a situation which provably cannot occur in the stanar convex setting. Thus, when constructing a cutting plane, our algorithm efines an exponentially small threshol τ (efine in Definition 4.4), such that whenever a semi-principal axis of the ellipsoi is smaller than τ, we guarantee a cut orthogonal to this axis. In this section, we make use of the following characterization of our cutting plane algorithm, Algorithm 2, introuce an analyze in Section 4: Proposition (See Proposition 4.4). With negligible probability of failure, given an ellipsoi containing the star center, centere at the origin with all semi-principal axes longer than τ scale to, Algorithm 2 either a) returns a Gaussian region G such that P[f(x) f + ɛ : x G] δ, or b) returns a irection, restricte to the subspace spanne by those ellipsoi semi-principal axes longer than τ, such that when normalize to a unit vector ˆ, the cut {x : x ˆ 3n } contains the global minimum. Note that the above preconition on the ellipsoi input is satisfie by applying the appropriate affine transformation to the current ellipsoi before supplying it to Algorithm 2. Also note that in case a) above, the returne Gaussian region is a solution to the overall optimization problem. Given these guarantees about the behavior of the cutting plane algorithm, Algorithm 2, we now introuce our overall optimization algorithm, base on the ellipsoi metho. For the following algorithm, we efine the number of iterations m = 6(n+) [n log R + τ (n ) log 2 accoring to a stanar analysis of the multiplicative volume ecrease of the ellipsoi metho state in Lemma C.2 in Appenix C. Algorithm (Ellipsoi metho) Input: A raius R ball centere at the origin which is guarantee to contain the global minimum. Output: Either a) A Gaussian G such that P[f(x) f + ɛ : x G] δ or b) An ellipsoi E such that all function values in E are at most f + ɛ.. Let ellipsoi E be the input raius R ball. 2. For i [, m + ] 2a. If all the axes of E i are shorter than τ, then Return E i an Halt. 2b. Otherwise, execute Algorithm 2 with ellipsoi E i. 2c. If it returns a Gaussian, then Return this Gaussian an Halt. 2. Otherwise, it returns a cut irection. Apply an affine transformation such that E i becomes the unit ball centere at the origin an points in the negative x irection. Use the construction in Lemma C. (see appenix) to construct a new ellipsoi E i+ that inclues the intersection of ellipsoi E i with the cut. 2e. If any semi-principal axis of the ellipsoi E i+ is larger than 3nR then apply Lemma C.3, an if the center of the ellipsoi has istance > R from the origin then apply Lemma C.4, to yiel an ellipsoi of smaller volume, containing the entire intersection of E i+ with the ball of raius R, that now has all semi-principal axes smaller than 3nR, an has center in the ball of raius R. 3n ] The analysis of Algorithm is a straightforwar aaptation of stanar techniques for analyzing the ellipsoi algorithm, bouning the ecrease in volume of the feasible ellipsoi at each step, until either the algorithm returns an explicit Gaussian as the solution (in Step 2c), or terminates because the ellipsoi is containe in an exponentially small ball (in Step 2a). See Appenix C for full etails. 0

12 Lemma 3.. Algorithm halts within m + iterations either through Step 2a or Step 2c. The following lemma shows that, if the star center (global optimum) is foun to lie within a ball of exponentially small raius τ, then this entire ball has function value sufficiently close to the optimum that any point within the ball can be returne. Lemma 3.2. Given an ellipsoi E that a) has its center within R of the origin, b) is containe within a ball of raius τ, an c) contains the star center x, then f(x) [f, f + ɛ] for all x E. Proof. For any x E, let y be the intersection of the ray x x with the sphere of raius 0nR about the origin. The bouneness of f implies that f(y) B, an that f(x ) B. Thus star-convexity yiels that f(x) f + 2B x x 2 x y 2 f 2τ + 2B 0nR R τ < f + ɛ since τ ɛ. Since the ellipsoi returne in Step 2a will always satisfy the preconitions of Lemma 3.2, the entire ellipsoi has function value within ɛ accuracy of the global minimum. In practice, we can just return this region. However, if we o wish to conform to our problem statement (Definition 2.4), we may simply return a Gaussian ball of sufficiently small raius an centere at the ellipsoi center, such that there is only negligible probability of sampling a point more than τ from its center. In summary, Lemmas 3. an 3.2 imply that Algorithm optimizes a star-convex function in time poly(n, /δ, log ɛ, log F, log R, log B), proving Theorem Computing Cuts for Star-Convex Functions The general goal of this section is to explain how to compute a single cut, in the context of our aapte ellipsoi algorithm explaine in Section 3. Namely, given (weak sampling, in the sense of Definition 2.2) access to a star-convex function f, an a bouning ellipsoi in which we know the global optimum lies, we present an algorithm (Algorithm 2) that will either a) return a cut passing close to the center of the ellipsoi an containing the global optimum, or b) irectly return the answer to the overall optimization problem. There are several obstacles to this kin of algorithm that we must overcome. Star-convex functions may be very iscontinuous, without any graients or even subgraients efine at most points. Furthermore, arbitrarily small changes in the input value x may prouce exponentially large changes in the output of f, for any x that is not alreay exponentially close to the global optimum. This means that stanar techniques to even approximate the function s shape o not remotely suffice in the context of star-convex functions. Finally, consiering again the example of linearly extening an arbitrary function of the unit sphere surface (see Figure or item 3 in Appenix A), unlike regular convex functions, star-convex functions o not become locally flat along thin imensions of increasingly thin ellipsois. Thus, the ellipsoi algorithm in general might repeately cut certain imensions while neglecting others. The algorithm in this section (Algorithm 2) employs three key strategies that we intuitively explain now.. We o not work irectly with the star-convex function f, but instea work with a blurre version of its logarithm (efine in Definition 4.), which has a) well-efine erivatives, that b) we can estimate efficiently. Given a baly-behave (measurable) function f, an the pf of a Gaussian, g, blurring f by the Gaussian prouces the convolution f g, which has well-behave erivatives since erivatives commute with convolution: namely, letting D stan for a erivative or secon erivative, we have D(f g) = f (Dg). Since the erivatives of a Gaussian pf g are each boune, we can estimate erivatives of blurre versions of iscontinuous functions by sampling. Sampling bouns imply that if we have a ranom variable boune to a range of size r, then

13 we can estimate its mean to within accuracy ±ɛ by taking the average of O(( r ɛ )2 ) samples. The logarithm function (after appropriate translation so that its inputs are positive), maps a starconvex function f to a much smaller range, which enables accurate sampling (in polynomial time). Therefore, we can efficiently estimate erivatives of the blurre logarithm of f. 2. While intuitively, the negative graient of (the blurre logarithm of) f points towars the global minimum, this signal might be overpowere by a confouning term in a ifferent irection (see Lemma 4.5), making the negative graient point away from the global optimum in some cases. To combat this, our algorithm repeately estimates the graient at points sample from a istribution aroun the ellipsoi center, an for each graient, estimates the confouning terms, returning the corresponing graient only once the confouning terms are foun to be small. Lemma 4.8 shows that the confouning terms are small in expectation, so Markov s inequality shows that our strategy will rapily succee. 3. The ellipsoi algorithm in general might repeately cut certain imensions while neglecting others, an our algorithm must actively combat this. If some axes of the ellipsoi ever become exponentially small, then we lock them, meaning that we eman a cut orthogonal to those imensions, thus maintaining a global lower boun on the length of any axis of the ellipsoi. Combine with the stanar ellipsoi algorithm guarantee that the volume of the ellipsoi ecreases exponentially, this implies that the iameter of the ellipsoi can be mae exponentially small in polynomial time, letting us conclue our optimization. Fining cuts uner this new locking guarantee, however, requires a new algorithmic technique. We must take avantage of the fact that the ellipsoi is exponentially small along a certain irection b to somehow gain the new ability to prouce a cut orthogonal to b. Convex functions become increasingly well-behave on increasingly narrow regions, however, star-convex functions crucially o not (see item 3 in Section A). Thus, as the thickness of the ellipsoi in irection b gets smaller, the structural properties of the function insie our ellipsoi o not give us any new powers. Strangely, we can take avantage of one new parameter regime that at first appears useless: relative to the thinness of the ellipsoi, we have exponentially much space in irection b outsie of the ellipsoi. To implement the intuition of Step above, we efine L z (x), a truncate an translate logarithm of the star-convex function f, which maps the potentially exponentially large range of f to the (polynomial-size) range [log ɛ, log 2B], where ɛ is efine below (Definition 4.4), an is slightly smaller than our function accuracy boun ɛ. In the below efinition, z intuitively represents our estimate of the global optimum function value, an will recor, essentially, the smallest function evaluation seen so far (see Algorithm 2). Definition 4.. Given an objective function f with boun f(x) B when x R an an offset value z f, we efine the truncate logarithmic version of f to be log ɛ f(x) z ɛ L z (x) = log 2B f(x) z 2B log(f(x) z) otherwise While mapping to a small range, L z (x) nonetheless gives us a precise view of small changes in the function as we converge to the optimum. The next result shows that, if we blur L z (x) by rawing x from a Gaussian istribution, then not only can we efficiently estimate the expecte value of the blurre logarithm of f, we can also estimate the erivatives of this expectation with respect to changing either the mean or the variance of the Gaussian. 2

14 For an arbitrary boune (measurable) function h, the erivative of its expecte value over a Gaussian of with σ with respect to either a) moving the center of the Gaussian or b) changing its with σ, is boune by O( σ ). Thus we normalize the estimates below in terms of the prouct of the Gaussian with an the erivative, instea of estimating the erivative alone. Proposition 4.2. Let N (µ, Σ) be a Gaussian with iagonal covariance matrix Σ consisting of elements σ 2, σ2 2,..., σ2 n. For an error boun κ > 0 an a probability of error δ > 0, we can estimate each of the following functions to within error κ with probability at least δ using poly(n, κ, log δ, log 2B/ɛ ) samples: ) the expectation E[L z (x) : x N (µ, Σ)]; 2) the (scale) erivative σ µ E[L z (x) : x N (µ, Σ)]; an 3) the erivative with respect to scaling σ σ E[L z (x) : x N (µ, Σ)]. A fortiori, these erivatives exist. Proof. Chernoff bouns yiel the first claim, since L z [log ɛ log 2B/ɛ, log 2B], so O(( κ ) 2 log δ ) samples suffice. For the secon claim, we note that the expectation can be rewritten as the integral over R n of L z times the probability ensity function of the normal istribution: E[L z (x) : x N (µ, Σ)] = (2π) n/2 Σ /2 R L n z (x) e (x µ)t Σ (x µ)/2 x. Thus the erivative of this expression with respect to the first coorinate µ can be expresse by taking the erivative of the probability ensity function insie the integral, which ens up scaling it by the vector (x µ )/σ 2. Thus σ µ E[L z (x) : x N (µ, Σ)] = E[ x µ σ L z (x) : x N (µ, Σ)]. The multiplier x µ σ is effectively boune, because for real numbers c, the probability P[ > c : x N (µ, Σ)] vanishes faster than exp( c n ). x µ σ Thus we can pick c = poly(n, log κ, log 2B/ɛ ) such that replacing the expression in the expectation, x µ σ L z (x), by this expression clampe between ±c will change the expectation by less than κ 2. We can thus estimate the expectation of this clampe quantity to within error κ 2 with probability at least δ via some sample of size poly(n, κ, log δ, log 2B/ɛ ), as esire. The analysis for the thir claim (the erivative with respect to σ ) is analogous: the erivative of the Gaussian probability ensity function with respect to σ (as in the case of a univariate Gaussian of variance σ 2) scales the probability ensity function by x µ 2 /σ 3, an after scaling by σ, this expression measures the square of the number of stanar eviations from the mean, an as above, the prouct x µ 2 σ 2 the expectation by more than κ 2 L z (x) can be clampe to some polynomially-boune interval [ c, c] without changing. The conclusion is analogous to the previous case. As mentione in the previous section, our guiing aim for the ellipsoi metho is to prevent any axis of the ellipsoi from getting too small. Therefore, in orer to treat axes ifferently epening on their length, we shall ientify our basis as the unit vectors along the axes of the current ellipsoi an istinguish between axes that are smaller than τ versus at least τ, where τ is an exponentially small threshol for thinness, efine below in Definition 4.4. Definition 4.3. Given an ellipsoi, consier an orthonormal basis parallel to its axes. Each semiprincipal axis of the ellipsoi whose length is less than τ, we call a thin imension, an the rest are non-thin imensions. Given a vector µ, we ecompose it into µ = µ + µ where µ is non-zero only in the non-thin imensions, an µ is non-zero only in the thin imensions. Similarly, given the ientity matrix I, we ecompose it into I = I + I. We apply a scaling to the non-thin imensions so as to scale the non-thin semi-principal axes of the ellipsoi to unit vectors (making the ellipsoi a unit ball in the non-thin imensions). We keep the thin imensions as they are. We present again Proposition 4.4, escribing the guarantees on our cutting plane algorithm require by Algorithm, an then state the cutting plane algorithm, Algorithm 2. Below, we make use of 3

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth MA 2232 Lecture 08 - Review of Log an Exponential Functions an Exponential Growth Friay, February 2, 2018. Objectives: Review log an exponential functions, their erivative an integration formulas. Exponential

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Vectors in two dimensions

Vectors in two dimensions Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication

More information

Exam 2 Review Solutions

Exam 2 Review Solutions Exam Review Solutions 1. True or False, an explain: (a) There exists a function f with continuous secon partial erivatives such that f x (x, y) = x + y f y = x y False. If the function has continuous secon

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

A Second Time Dimension, Hidden in Plain Sight

A Second Time Dimension, Hidden in Plain Sight A Secon Time Dimension, Hien in Plain Sight Brett A Collins. In this paper I postulate the existence of a secon time imension, making five imensions, three space imensions an two time imensions. I will

More information

We G Model Reduction Approaches for Solution of Wave Equations for Multiple Frequencies

We G Model Reduction Approaches for Solution of Wave Equations for Multiple Frequencies We G15 5 Moel Reuction Approaches for Solution of Wave Equations for Multiple Frequencies M.Y. Zaslavsky (Schlumberger-Doll Research Center), R.F. Remis* (Delft University) & V.L. Druskin (Schlumberger-Doll

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

In the usual geometric derivation of Bragg s Law one assumes that crystalline

In the usual geometric derivation of Bragg s Law one assumes that crystalline Diffraction Principles In the usual geometric erivation of ragg s Law one assumes that crystalline arrays of atoms iffract X-rays just as the regularly etche lines of a grating iffract light. While this

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

6 Wave equation in spherical polar coordinates

6 Wave equation in spherical polar coordinates 6 Wave equation in spherical polar coorinates We now look at solving problems involving the Laplacian in spherical polar coorinates. The angular epenence of the solutions will be escribe by spherical harmonics.

More information

Monte Carlo Methods with Reduced Error

Monte Carlo Methods with Reduced Error Monte Carlo Methos with Reuce Error As has been shown, the probable error in Monte Carlo algorithms when no information about the smoothness of the function is use is Dξ r N = c N. It is important for

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Introduction to variational calculus: Lecture notes 1

Introduction to variational calculus: Lecture notes 1 October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational

More information

Calculus Class Notes for the Combined Calculus and Physics Course Semester I

Calculus Class Notes for the Combined Calculus and Physics Course Semester I Calculus Class Notes for the Combine Calculus an Physics Course Semester I Kelly Black December 14, 2001 Support provie by the National Science Founation - NSF-DUE-9752485 1 Section 0 2 Contents 1 Average

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Bohr Model of the Hydrogen Atom

Bohr Model of the Hydrogen Atom Class 2 page 1 Bohr Moel of the Hyrogen Atom The Bohr Moel of the hyrogen atom assumes that the atom consists of one electron orbiting a positively charge nucleus. Although it oes NOT o a goo job of escribing

More information

Iterated Point-Line Configurations Grow Doubly-Exponentially

Iterated Point-Line Configurations Grow Doubly-Exponentially Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

4.2 First Differentiation Rules; Leibniz Notation

4.2 First Differentiation Rules; Leibniz Notation .. FIRST DIFFERENTIATION RULES; LEIBNIZ NOTATION 307. First Differentiation Rules; Leibniz Notation In this section we erive rules which let us quickly compute the erivative function f (x) for any polynomial

More information

Math 1271 Solutions for Fall 2005 Final Exam

Math 1271 Solutions for Fall 2005 Final Exam Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides

Make graph of g by adding c to the y-values. on the graph of f by c. multiplying the y-values. even-degree polynomial. graph goes up on both sides Reference 1: Transformations of Graphs an En Behavior of Polynomial Graphs Transformations of graphs aitive constant constant on the outsie g(x) = + c Make graph of g by aing c to the y-values on the graph

More information

Differentiability, Computing Derivatives, Trig Review. Goals:

Differentiability, Computing Derivatives, Trig Review. Goals: Secants vs. Derivatives - Unit #3 : Goals: Differentiability, Computing Derivatives, Trig Review Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an

More information

Prep 1. Oregon State University PH 213 Spring Term Suggested finish date: Monday, April 9

Prep 1. Oregon State University PH 213 Spring Term Suggested finish date: Monday, April 9 Oregon State University PH 213 Spring Term 2018 Prep 1 Suggeste finish ate: Monay, April 9 The formats (type, length, scope) of these Prep problems have been purposely create to closely parallel those

More information

and from it produce the action integral whose variation we set to zero:

and from it produce the action integral whose variation we set to zero: Lagrange Multipliers Monay, 6 September 01 Sometimes it is convenient to use reunant coorinates, an to effect the variation of the action consistent with the constraints via the metho of Lagrange unetermine

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Polynomial Inclusion Functions

Polynomial Inclusion Functions Polynomial Inclusion Functions E. e Weert, E. van Kampen, Q. P. Chu, an J. A. Muler Delft University of Technology, Faculty of Aerospace Engineering, Control an Simulation Division E.eWeert@TUDelft.nl

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Dot trajectories in the superposition of random screens: analysis and synthesis

Dot trajectories in the superposition of random screens: analysis and synthesis 1472 J. Opt. Soc. Am. A/ Vol. 21, No. 8/ August 2004 Isaac Amiror Dot trajectories in the superposition of ranom screens: analysis an synthesis Isaac Amiror Laboratoire e Systèmes Périphériques, Ecole

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Solutions to Math 41 Second Exam November 4, 2010

Solutions to Math 41 Second Exam November 4, 2010 Solutions to Math 41 Secon Exam November 4, 2010 1. (13 points) Differentiate, using the metho of your choice. (a) p(t) = ln(sec t + tan t) + log 2 (2 + t) (4 points) Using the rule for the erivative of

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization JMLR: Workshop an Conference Proceeings vol 30 013) 1 On the Complexity of Banit an Derivative-Free Stochastic Convex Optimization Oha Shamir Microsoft Research an the Weizmann Institute of Science oha.shamir@weizmann.ac.il

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

CONTROL CHARTS FOR VARIABLES

CONTROL CHARTS FOR VARIABLES UNIT CONTOL CHATS FO VAIABLES Structure.1 Introuction Objectives. Control Chart Technique.3 Control Charts for Variables.4 Control Chart for Mean(-Chart).5 ange Chart (-Chart).6 Stanar Deviation Chart

More information

Differentiability, Computing Derivatives, Trig Review

Differentiability, Computing Derivatives, Trig Review Unit #3 : Differentiability, Computing Derivatives, Trig Review Goals: Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an original function Compute

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information