PETER L. BARTLETT AND MARTEN H. WEGKAMP

Size: px
Start display at page:

Download "PETER L. BARTLETT AND MARTEN H. WEGKAMP"

Transcription

1 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a ifficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss use in support vector machines SVMs. Its convexity ensures that the sample average of this surrogate loss can be efficiently minimize. We stuy its statistical properties. We show that minimizing the expecte surrogate loss the φ-risk also minimizes the risk. We also stuy the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conitional probability PY = 1 X is unlikely to be close to certain critical values. Key wors an phrases: Bayes classifiers; classification; convex surrogate loss; empirical risk minimization; hinge loss; large margin classifiers; margin conition; reject option; support vector machines. MSC 2000: Primary 62C05; seconary 62G05, 62G08 1. Introuction The aim of binary classification is to classify observations that take values in an arbitrary feature space X into one of two classes, labelle 1 or +1. Since an observation X oes not fully etermine its label y, we construct a classifier t : X { 1, +1} that represents our guess tx of the label Y of a future observation X. The rule with the smallest probability of error P{tX Y } is the Bayes rule { t 1 if P{Y = 1 X = x} P{Y = +1 X = x} 1 x := +1 otherwise Observations x for which the conitional probability 2 ηx := P{Y = +1 X = x} is close to 1/2, are the most ifficult to classify. In the extreme case where ηx = 1/2, we may just as well toss a coin to make a ecision. While it is our aim to classify the majority of future observations in an automatic way, it is often appropriate to instea report a warning for those observations that are har to classify the ones having conitional probability ηx near 1

2 2 BARTLETT AND WEGKAMP the value 1/2. This motivates the introuction of a reject option for classifiers, by allowing for a thir ecision, R reject, expressing oubt. In case the reject option is invoke, no ecision is mae. Although such classifiers are valuable in practice, few theoretical results are available see [10] an [7] for references. In this note, we assume that the cost of making a wrong ecision is 1 an the cost of utilizing the reject option is > 0. Given a classifier with reject option t : X { 1, 1, R}, the appropriate risk function is 3 L t := P{tX = R} + P{tX Y, tx R}. It is easy to see that the rule minimizing this risk assigns 1, 1 or R epening on which of ηx, 1 ηx or is smallest. Accoring to this rule, which we refer to as the Bayes rule, we nee never reject if 1/2. For this reason we restrict ourselves to the cases 0 1/2 an the Bayes rule with reject option is then 1 4 with risk t x := if ηx < +1 if ηx > 1 R otherwise 5 L := L t = E min{ηx, 1 ηx, }. The case = 1/2 reuces to the classical situation without the reject option an 4 coincies with 1. Herbei an Wegkamp [7] stuy plug-in rules that replace the regression function ηx by an estimate ηx in the formula for t x above. They show that the rate of convergence of the risk 3 to the Bayes risk L epens on how well ηx estimates ηx an on the behavior of ηx near the values an 1. This conition on ηx nicely generalizes Tsybakov s noise conition cf. [13] from the classical setting = 1/2 to our more general framework 0 1/2. The same paper consiers classifiers t that minimize the empirical counterpart 6 L t := n I{tX i = R} + 1 n I{tX i Y i, tx i R} of the risk L t over a class T of classifiers t : X { 1, +1, R} with reject option, base on a sample X 1, Y 1,..., X n, Y n of inepenent copies of the pair X, Y. For such rules t,

3 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 3 [7] establish oracle inequalities for the excess risk of the form L t L C 0 inf t T [L t L ] + n for some constant C 0 > 1 an the remainer n epens on the sample size n, the complexity of the class T an the behavior of ηx near the values an 1. Their finings are in line with the recent theoretical evelopments of stanar binary classification without the reject option = 1/2, see e.g. [3], [4], [9]. Despite the attractive theoretical properties of this metho, it is often har to implement. This paper aresses this pitfall by consiering a convex surrogate for the loss function akin to the hinge loss that is use in SVMs. The next section introuces a piecewise linear loss function φ x that generalizes the hinge loss function max{0, 1 x} in that it allows for the reject option an φ x = max{0, 1 x} for = 1/2. We prove that the Bayes classifier 4 is a transforme version of the minimizer of the risk associate with this new loss an that the excess risk L L can be boune by 2 times the excess risk base on the piecewise linear loss φ. Thus classifiers with small excess φ -risk automatically have small excess classification risk, proviing theoretical justification of the more computationally appealing metho. In Section 3, we illustrate the computational convenience of the new loss, showing that the SVM classifier with reject option can be obtaine by solving a quaratic program. Finally, in Section 4, we show that fast rates for instance, faster than n 1/2 of the SVM classifier with reject option are possible uner the same noise conitions on ηx use in [7]. As a sie effect, for the stanar SVM the special case of = 1/2, our results imply fast rates without an assumption that ηx is unlikely to be near 0 an 1, a technical conition that has been impose in the literature for that case [5], [12]. 2. Generalize hinge loss We can associate any real-value function f with a classifier with reject option using the transformation 1 if fx < δ, 7 t f,δ x = R if fx δ, +1 if fx > δ, where 0 δ 1 is an arbitrary positive number. We recommen the value δ = 1/2 but we postpone the iscussion on the choice of δ until after Theorem 2. Accoring to 3, the risk

4 4 BARTLETT AND WEGKAMP of t f,δ x equals then L t f,δ = P{t f,δ X Y, t f,δ X R} + P{t f,δ X = R} = P{fX < δ, Y = 1} + P{fX > δ, Y = 1} + P{ δ fx δ} = P{Y fx < δ} + P{ Y fx δ}. For example, the Bayes classifier t x efine in 4 correspons to the function 1 if ηx <, 8 f x = 0 if ηx 1, 1 if ηx > 1. Although the function f x is not unique, the classifier t x is the unique minimizer of L t f,δ over all measurable f : X R. We see that 9 L t f,δ = L,δ f := El,δ Y fx for the iscontinuous loss 1 if α < δ, l,δ α = if α < δ, 0 otherwise. The choice δ, = 0, 1/2 correspons to the classical case L t f,δ = P{Y fx < 0}. All other choices δ > 0 lea to classification that allows for the reject option with minimal risk L,δ f = El,δY f X = L. Instea of the iscontinuous loss l,δ, we consier a convex surrogate loss. Define the piecewise linear function 1 aα if α < 0, φ α = 1 α if 0 α < 1, 0 otherwise where a = 1 / 1. The next result states that the minimizer of the expectation of the iscrete loss l,δ an the convex loss φ α remains the same: f efine in 8 minimizes both El,δY fx an Eφ Y fx an the minimal risks are relate via the equality L,δ f = L φ f. Proposition 1. The minimizer of the risk L φ f = Eφ Y fx over all measurable f : X R is the Bayes rule 8. Furthermore, L φ f = L,δf.

5 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 5 Proof. Note that Hence, for L φ f = EηXφ fx + E1 ηxφ fx. 10 r η,φ α = ηφ α + 1 ηφ α it suffices to show that 1 if η < 1/1 + a, 11 α = 0 if 1/1 + a η a/1 + a, 1 if ηx > a/1 + a minimizes r η,φ α. The function r η,φ α can be written as η aηα if α 1, 1 + α1 1 + aη if 1 α 0, r η,φ α = 1 + α η + a1 η if 0 α 1, αa1 η + 1 η if α 1 an it is now a simple exercise to verify that α inee minimizes L φ α. Finally, since L φ f = Er η,φ fx an we fin that inf α ηφ α + 1 ηφ α = ηφ α + 1 ηφ α = η 1 η {η } + 1{ η 1 } + {η 1 }, L φ f = E [minηx, 1 ηx, ] = L. an the secon claim follows as well. We see that φ α l,δ α for all α R as long as 0 δ 1. Since this pointwise relation remains preserve uner taking expecte values, we immeiately obtain L t f,δ L φ f. The following comparison theorem shows that a relation like this hols not only for the risks, but for the excess risks as well. Theorem 2. Let 0 < 1/2 be fixe. For all 0 < δ 1/2, we have 12 L t f,δ L δ Lφ f L φ, where L φ = L φ f. For 1/2 δ 1, we have 13 L t f,δ L L φ f L φ.

6 6 BARTLETT AND WEGKAMP Finally, for δ, = 0, 1/2, we have 14 Lt f L L φ f L φ, where Lt f := P{Y fx < 0}, L := E minηx, 1 ηx an φx = max{0, 1 x}. Remark. The optimal multiplicative constant /δ or 1 epening on the value of δ in front of the φ -excess risk is achieve at δ = 1/2. For this choice, Theorem 2 states that L t f,1/2 L 2 L φ f L φ. For all δ 1, the multiplicative constant in front of the φ -excess risk oes not excee 1. The choice δ = 1/2 with the smallest constant 2 < 1 is right in the mile of the interval [, 1 ]. The choice δ = 1 correspons to the largest value of δ for which the piecewise constant function l,δ α is still majorize by the convex surrogate φ α. For δ = we will reject less frequently than for δ = 1 an δ = 1/2 can be seen as a compromise among these two extreme cases. an Inequality 14 is ue to Zhang [14]. Before we prove the theorem, we nee an intermeiate result. We efine the functions ξη = η{η } + { η 1 } + 1 η{η 1 } Hη = inf α ηφ α + 1 ηφ α = η 1 η {η } + 1{ η 1 } + {η 1 }. We suppress their epenence on in our notation. Their expectations are L = EξηX an L φ = EHηX, respectively. Furthermore, we efine H 1 η = inf α< δ ηφ α + 1 ηφ α, H R η = inf α δ ηφ α + 1 ηφ α, H 1 η = inf α>δ ηφ α + 1 ηφ α ; ξ 1 η = η ξη, ξ R η = ξη, ξ 1 η = 1 η ξη.

7 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 7 Proposition 3. Let 0 < 1/2. If 0 < δ 1/2, then, for b { 1, 1, R}, If δ 1, then, for b { 1, 1, R}, ξ b η δ {H bη Hη}. ξ b η H b η Hη. If δ, = 0, 1/2, then, for b { 1, 1, R}, Proof. First we compute inf r φα = η α 1, inf r φα = η {η } + 1 α δ inf r φα = {η } + δ α 0 inf r φα = {η 1 } + 0 α δ ξ b η H b η Hη. inf r φα = 1 η {η 1 } + δ α 1 inf r φα = 1 η α 1 It is now easy to verify that so that δ η + 1 δ {η } 1 δ + η δ {η } 1 + δ δ δ η {η 1 } 1 + δ δ δ η {η 1 } H 1 η = inf α< δ ηφ α + 1 ηφ α = η {η } + δ η + 1 δ {η } H 1 η Hη = δ 1 + δ 0{η } + η δ { η 1 } + η + 1 δ 1 {η 1 } On the other han, ξ 1 η = η ξη = 0{η } + η { η 1 } + 2η 1{η 1 }

8 8 BARTLETT AND WEGKAMP an we see that δ ξ 1η H 1 η Hη for all 0 < δ 1. Next, we compute H R η = inf ηφ α + 1 ηφ α α δ = 1 δ + δ η {η } + { η 1 } + 1 δ + δ δ η {η 1 } an Since H R η Hη = ξ R η = ξη 1 δ 1 δ η {η } + 1 δ 1 δ + 1 δ η {η 1 }. = η{η } + 0{ η 1 } η{η 1 } we fin that δ ξ Rη H R η Hη provie 0 < δ 1/2. Finally, we fin that an consequently Now, H 1 η = inf ηφ α + 1 ηφ α α>δ H 1 η Hη = ξ 1 η = 1 η ξη = 1 η {η 1 } + δ + 1 δ δ η {η 1 } 1 δ + δ δ η η {η } δ + δ δ η { η 1 } + 0{η 1 }. = 1 2η{η } + 1 η { η 1 } + 0{η 1 }, an we fin that δ ξ 1η H 1 η Hη

9 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 9 provie 0 < δ 1. We now verify the secon claim of Proposition 3. Assume that δ 1. First we consier the case η. Then ξ 1 η H 1 η Hη hols trivially. ξ R η H R η Hη 1 + δ 2η δ1. As η, we nee that 1 + δ 2 δ1, that is, δ ξ 1 η H 1 η Hη 1 + δ 2η δ1. As η, we nee that 1 + δ 2 δ1, equivalently, δ Next, if η 1, we see that ξ 1 η H 1 η Hη δ η δ. ξ R η H R η Hη hols trivially. ξ 1 η H 1 η Hη δ η 1 δ. Finally, if η 1, we fin that ξ 1 η H 1 η Hη 1 + δ 2η 1 + δ 2. For η 1 this hols provie 1 + δ δ 2 δ ξ R η H R η Hη 1 δ η 1 1 δ. ξ 1 η H 1 η Hη hols trivially. This conclues the proof of the secon claim since δ 1. The last claim for the case δ, = 0, 1/2 follows as well from the preceing calculations. Proof of Theorem 2. Assume 0 < δ 1/2 an 0 < 1/2. Define ψx = xδ/. By linearity of ψ, we have for any measurable function f, ψl t f,δ L = P{f < δ}ψφ 1η + P{ δ f δ}ψφ R η + P{f > δ}ψφ 1 η, where P is the probability measure of X an Pf = f P. Invoke now Proposition 3 to euce ψl t f,δ L P{f < δ} [H 1η Hη] + P{ δ f δ}[h R η Hη] +P{f > δ} [H 1 η Hη] an conclue the proof by observing that the term on the right of the previous inequality equals L φ f L φ.

10 10 BARTLETT AND WEGKAMP For the case δ, = 0, 1/2 an the case δ, with δ 1 an 0 < 1/2, take ψx = x. 3. SVM classifiers with reject option In this section, we consier the SVM classifier with reject option, an show that it can be obtaine by solving a quaratic program. Let k : X 2 R be the kernel of a reproucing kernel Hilbert space RKHS H, an let f be the norm of f in H. The SVM classifier with reject option is the minimizer of the sum of the empirical φ -risk an a regularization term that is proportional to the square RKHS norm. The following theorem shows that this classifier is the solution to a quaratic program, that is, it is the minimizer of a quaratic criterion on a subset of Eucliean space efine by linear inequalities. Thus, the classifier can be foun efficiently using general-purpose algorithms. Theorem 4. For any x 1,..., x n X an y 1,..., y n { 1, 1}, let f H be the minimizer of the regularize risk functional f φ y i fx i + λ f 2, where λ > 0. Then we can represent f as the finite sum f x = αi kx i, x, where α1,..., α n is the solution to the following quaratic program. 1 min ξ i γ i + λ α i α j kx i, x j α i,ξ i,γ i n i,j s.t. ξ i 0, γ i 0, ξ i 1 y i α j kx i, x j, γ i y i j=1 α j kx i, x j for i = 1,..., n. j=1 Proof. The fact that f can be represente as a finite sum over the kernel basis functions is a stanar argument see [8, 6]. It follows from Pythagoras theorem: the square RKHS norm can be split into the square norm of the component in the space spanne by the kernel basis functions x kx i, x an that of the component in the orthogonal subspace. Since the cost

11 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 11 function epens on f only at the points x i, an the reproucing property fx i = kx i,, f shows that these values epen only on the component of f in the space spanne by the kernel basis functions, the orthogonal subspace contributes only to the square norm term an not to the cost term. Thus, a minimizing f can be represente in terms of the solution α to the minimization 1 min α 1,...,α n n φ y i α j kx i, x j + λ α i α j kx i, x j. j=1 But then it is easy to see that we can ecompose φ as φ β = max{0, 1 β} i,j n max{0, β}. Defining ξ i = max{0, 1 y i fx i } an γ i = max{0, y i fx i } gives the QP. 4. Tsybakov s noise conition, Bernstein classes, an fast rates In this section, we consier methos that choose the function f from some class F so as to minimize the empirical φ -risk, L φ f := 1 n φ Y i fx i. For instance, to analyze the SVM classifier with reject option, we coul consier classes F n = {f H : f c n } for some sequence of constants c n. We are intereste in bouns on the excess φ -risk, that is, the ifference between the φ -risk of f an the minimal φ -risk over all measurable functions, of the form EL φ f L φ 2 inf f F Lφ f L φ + ɛn. Such bouns can be combine with an assumption on the rate of ecrease of the approximation error inf f Fn L φ f L φ for a sequence of classes F n use by a metho of sieves, an thus provie bouns on the rate of convergence of risk to Bayes risk. For many binary classification methos incluing empirical risk minimization, plug-in estimates, an minimization of the sample average of a suitable convex loss, the estimation error term ɛ n approaches zero at a faster rate when the conitional probability ηx is unlikely to be close to the critical value of 1/2 see [13, 2, 5, 12, 1]. For plug-in rules, [7] showe an analogous result for classification with a reject option, where the corresponing conition concerns the probability that ηx is close to the critical values of an 1. In this

12 12 BARTLETT AND WEGKAMP section, we prove a boun on the excess φ -risk of f that converges rapily when a conition of this kin applies. We begin with a precise statement of the conition. For = 1/2, it is equivalent to Tsybakov s margin conition [13]. Definition 5. We say that η satisfies the margin conition at with exponent α if there is an A 1 such that for all t > 0, P{ ηx t} At α an P{ηX 1 t} At α. The reason that conitions of this kin allow fast rates is relate to the variance of the excess φ -loss, g f x, y = φ yfx φ yf x, where f minimizes the φ -risk. Notice that the expectation of g f is precisely the excess risk of f, Eg f X, Y = L φ f L φ. We will show that when η satisfies the margin conition at with exponent α, the variance of each g f is boune in terms of its expectation, an thus approaches zero as the φ-risk of f approaches the minimal value. Classes for which this occurs are calle Bernstein classes. Definition 6. We say that G L 2 P is a β, B-Bernstein class with respect to the probability measure P 0 < β 1, B 1 if every g G satisfies Pg 2 B{Pg} β. We say that G has a Bernstein exponent β with respect to P if there exists a constant B for which G is a β, B-Bernstein class. Lemma 7. If η satisfies the margin conition at with exponent α, then for any class any class F of measurable uniformly boune functions, the class G = {g f : f F} has a Bernstein exponent β = α/1 + α. The result relies on the following two lemmas. The first shows that the excess φ -risk is at least linear in a certain pseuo-norm of the ifference between f an f. It is similar to the L 1 P norm, but it penalizes f less for large excursions that have little impact on the φ -risk. For example, if ηx = 1, then the conitional φ -risk is zero even if fx takes a large positive value. For η [0, 1], efine η f f if η < an f < 1, ρ η f, f = 1 η f f if η > 1 an f > 1, f f otherwise,

13 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 13 an recall the efinition of the conitional φ -risk in 10. Lemma 8. For η [0, 1], r η,φ f r η,φ f η η 1 ρ η f, f. Proof. Since r η,φ is convex, r η,φ f r η,φ f + gf f for any g in the subgraient of r η,φ f at f. In our case, r η,φ is piecewise linear, with four pieces, an the subgraients inclue Thus, we have η 1 at f = 1, η 1 at f = 1, 0, 1 η 1 at f = 0, 1, 1 η 1 at f = 1. r η,φ f r η,φ f η1 f f if η < an f < 1, η f f if η < an f > 1, η 1 η f f if η 1, 1 η f f if η > 1 an f < 1, 1 η1 f f if η > 1, f > 1. 1 ρ η f, f if η < an f < 1, η ρ η f, f if η < an f > 1, = η 1 η ρ η f, f if η 1, 1 η ρ η f, f if η > 1 an f < 1, 1 ρ η f, f if η > 1, f > 1. η 1 η ρ η f, f. We shall also use the following inequalities. Lemma 9. For η [0, 1], an ρ η f, f f f, η φ f φ f η φ f φ f B + 1ρ η f, f.

14 14 BARTLETT AND WEGKAMP Proof. The first inequality is immeiate from the efinition of ρ η. To see the secon, use the fact that φ is flat to the right of 1 to notice that η φ f φ f η φ f φ f 2 { η φ f φ f 2 if η < an f < 1, = 1 η φ f φ f 2 if η > 1 an f > 1. Since φ has Lipschitz constant a = 1 /, this implies η φ f φ f η φ f φ f 2 ηa 2 f f 2 if η < an f < 1, 1 ηa 2 f f 2 if η > 1 an f > 1, a 2 f f 2 otherwise a Bρ η f, f. Proof of Lemma 7. By Lemma 8, we have with L φ f L φ 1 Pρ η f, f η 1 I E + η I E+, E = { η 1 η }, E + = { η 1 > η }. Using the assumption on η, there is an A 1 such that for all t > 0 P{ ηx t} At α an P{ηX 1 t} At α. Thus, for any set E, Pρ η f, f η 1 I E tpρ η f, f I { η 1 t} I E = tpρ η f, f I E tpρ η f, f I { η 1 <t} I E t{pρ η f, f I E B + 1At α }, where B is such that f B. Similarly, Pρ η f, f η I E t{pρ η f, f I E B + 1At α }, an we obtain L φ f L φ 1 t Pρ η f, f I E+ E 2B + 1At α = 1 t Pρ η f, f 2B + 1At α.

15 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 15 Choose Pρη f, f 1/α t =, 4B + 1A in the expression above, an we obtain an so 15 Eg f X, Y = L φ f L φ Pρ η f, f In aition, by Lemma 9, 1 24B + 1A 1/α Pρ ηf, f 1+α/α, { 24B + 1A 1/α} α/α+1 {Egf X, Y } α/1+α. E{g f X, Y } 2 = EE[{g f X, Y } 2 X] = P η φ f φ f η φ f φ f B + 1 Pρ η f, f. Combining these two inequalities shows that 1 2 E{g f X, Y } 2 B AB + 1 1/α α/α+1 Egf X, Y α/1+α. Remark. Specialize to the case δ, = 0, 1/2, we note that Lemma 7 removes unnecessary technical restrictions on ηx near 0 an 1, impose in [5] an [12]. Lemma 7 provies the main ingreient for establishing fast rates of minimizers f of the empirical risk L φ f := 1 n φ Y i fx i. Theorem 10. If η satisfies the margin conition at with exponent α, F is a countable class of functions f : X R satisfying f B, an F satisfies log Nε, L, F Cε p for all ε > 0 an some 0 p 2, then there exists a constant C inepenent of n, such that EL φ f L φ 2 inf Lφ f L φ f F + C n 1+α 2+p+α+pα, where f = arg min f F Lφ f.

16 16 BARTLETT AND WEGKAMP Proof. We use the notation Pg f = Eg f X, Y an P n g f = 1 g f X i, Y i. n By efinition of f, we have L φ f L φ = Pg bf = 2P n g bf + P 2P n g bf 2 inf P ng f + supp 2P n g f. f F Taking expecte values on both sies, yiels, [ ] EL φ f L φ 2 inf Lφ f L φ f F + E supp 2P n g f. f F Since g f g f f f, it follows that [ E ] supp 2P n g f 4ε n + 2BP f F where F n is a ε n -minimal covering net of F with { f F ε n = An 1+α/2+p+α+pα. sup f F n P 2P n g f ε n The union boun an Bernstein s exponential inequality for the tail probability of sums of boune ranom variables in conjunction with Lemma 7, yiel { } P sup f F n P 2P n g f ε n F n max f F n exp expcε p n n 8 cnε 2 β n } ε n + Pg f 2 Pg 2 f + Bε n + Pg f /6 with 0 β = α/1 + α 1 an some c > 0 inepenent of n. Conclue the proof by noting that expcε p n cnε 2 β n = exp an by choosing the constant A in ε n such that Cε p n oε n. c 2 nε2 β n, = 2cnε 2 β n, an exp nε 2 β n = Remark. Consier for simplicity the case F is finite p = 0. Then, if the margin conition hols for α = +, we obtain rates of convergence log F /n. If α = 0, we in fact impose no restriction on ηx at all, an the rate equals log F /n 1/2.

17 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 17 The constant 2 in front of the minimal excess risk on the right coul be mae closer to 1, at the expense of increasing C. References 1. J. Y. Auibert an A. B. Tsybakov Fast convergence rates for plug-in classifiers uner margin conitions. Annals of Statistics. To appear 2. P. L. Bartlett, M. I. Joran an J. D. McAuliffe Convexity, classification, an risk bouns, Journal of the American Statistical Association, : S. Boucheron, O. Bousquet an G. Lugosi 2004a. Introuction to statistical learning theory. In Avance Lectures in Machine Learning O. Bousquet, U. von Luxburg, an G. Rätsch, Eitors, Springer, New-York. 4. S. Boucheron, O. Bousquet an G. Lugosi Theory of Classification: a Survey of Recent Avances. ESAIM: Probability an Statistics, 9: G. Blanchar, O. Bousquet an P. Massart Statistical Performance of Support Vector Machines. Manuscript blanchar/publi. 6. D. Cox an F. O Sullivan Asymptotic analysis of penalize likelihoo an relate estimators. Annals of Statistics, 18: , R. Herbei an M. H. Wegkamp Classification with reject option. Canaian Journal of Statistics. To appear 8. G. S. Kimelorf an G. Wahba Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33: P. Massart St Flour Lecture Notes. Manuscript. massart/stf2003 massart.pf 10. B. D. Ripley Pattern recognition an neural networks. Cambrige University Press, Cambrige. 11. I. Steinwart an C. Scovel Fast Rates for Support Vector Machines using Gaussian Kernels. Los Alamos National Laboratory Technical Report LA-UR B. Tarigan an S. A. van e Geer Aaptivity of support vector machines with l 1 penalty. Technical Report MI , University of Leien. 13. A. B. Tsybakov Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32, T. Zhang Statistical behavior an consistency of classification methos base on convex risk minimization. Annals of Statistics, 32: Computer Science Division an Department of Statistics, University of California, Berkeley aress: bartlett@cs.berkeley.eu Department of Statistics, Floria State University, Tallahassee aress: wegkamp@stat.fsu.eu

Classification Methods with Reject Option Based on Convex Risk Minimization

Classification Methods with Reject Option Based on Convex Risk Minimization Journal of Machine Learning Research 11 (010) 111-130 Submitte /09; Revise 11/09; Publishe 1/10 Classification Methos with Reject Option Base on Convex Risk Minimization Ming Yuan H. Milton Stewart School

More information

Classification with Reject Option

Classification with Reject Option Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Statistical Properties of Large Margin Classifiers

Statistical Properties of Large Margin Classifiers Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

Iterated Point-Line Configurations Grow Doubly-Exponentially

Iterated Point-Line Configurations Grow Doubly-Exponentially Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

AdaBoost and other Large Margin Classifiers: Convexity in Classification

AdaBoost and other Large Margin Classifiers: Convexity in Classification AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

A Unified Theorem on SDP Rank Reduction

A Unified Theorem on SDP Rank Reduction A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

inflow outflow Part I. Regular tasks for MAE598/494 Task 1 MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Lecture 10 February 23

Lecture 10 February 23 EECS 281B / STAT 241B: Advanced Topics in Statistical LearningSpring 2009 Lecture 10 February 23 Lecturer: Martin Wainwright Scribe: Dave Golland Note: These lecture notes are still rough, and have only

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

1 Heisenberg Representation

1 Heisenberg Representation 1 Heisenberg Representation What we have been ealing with so far is calle the Schröinger representation. In this representation, operators are constants an all the time epenence is carrie by the states.

More information

Survival exponents for fractional Brownian motion with multivariate time

Survival exponents for fractional Brownian motion with multivariate time Survival exponents for fractional Brownian motion with multivariate time G Molchan Institute of Earthquae Preiction heory an Mathematical Geophysics Russian Acaemy of Science 84/3 Profsoyuznaya st 7997

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results

Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Peter L. Bartlett 1 and Ambuj Tewari 2 1 Division of Computer Science and Department of Statistics University of California,

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM

A LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM Teor Imov r. ta Matem. Statist. Theor. Probability an Math. Statist. Vip. 81, 1 No. 81, 1, Pages 147 158 S 94-911)816- Article electronically publishe on January, 11 UDC 519.1 A LIMIT THEOREM FOR RANDOM

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

Solutions to Math 41 Second Exam November 4, 2010

Solutions to Math 41 Second Exam November 4, 2010 Solutions to Math 41 Secon Exam November 4, 2010 1. (13 points) Differentiate, using the metho of your choice. (a) p(t) = ln(sec t + tan t) + log 2 (2 + t) (4 points) Using the rule for the erivative of

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

Online Appendix for Trade Policy under Monopolistic Competition with Firm Selection

Online Appendix for Trade Policy under Monopolistic Competition with Firm Selection Online Appenix for Trae Policy uner Monopolistic Competition with Firm Selection Kyle Bagwell Stanfor University an NBER Seung Hoon Lee Georgia Institute of Technology September 6, 2018 In this Online

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Two formulas for the Euler ϕ-function

Two formulas for the Euler ϕ-function Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena

Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena Chaos, Solitons an Fractals (7 64 73 Contents lists available at ScienceDirect Chaos, Solitons an Fractals onlinear Science, an onequilibrium an Complex Phenomena journal homepage: www.elsevier.com/locate/chaos

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Discrete Operators in Canonical Domains

Discrete Operators in Canonical Domains Discrete Operators in Canonical Domains VLADIMIR VASILYEV Belgoro National Research University Chair of Differential Equations Stuencheskaya 14/1, 308007 Belgoro RUSSIA vlaimir.b.vasilyev@gmail.com Abstract:

More information

ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AND NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AND APPROXIMATION

ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AND NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AND APPROXIMATION ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AN NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AN APPROXIMATION MICHAEL GRIEBEL AN HENRYK WOŹNIAKOWSKI Abstract. We stuy the optimal rate of convergence

More information

HITTING TIMES FOR RANDOM WALKS WITH RESTARTS

HITTING TIMES FOR RANDOM WALKS WITH RESTARTS HITTING TIMES FOR RANDOM WALKS WITH RESTARTS SVANTE JANSON AND YUVAL PERES Abstract. The time it takes a ranom walker in a lattice to reach the origin from another vertex x, has infinite mean. If the walker

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Why Bernstein Polynomials Are Better: Fuzzy-Inspired Justification

Why Bernstein Polynomials Are Better: Fuzzy-Inspired Justification Why Bernstein Polynomials Are Better: Fuzzy-Inspire Justification Jaime Nava 1, Olga Kosheleva 2, an Vlaik Kreinovich 3 1,3 Department of Computer Science 2 Department of Teacher Eucation University of

More information

SHARP BOUNDS FOR EXPECTATIONS OF SPACINGS FROM DECREASING DENSITY AND FAILURE RATE FAMILIES

SHARP BOUNDS FOR EXPECTATIONS OF SPACINGS FROM DECREASING DENSITY AND FAILURE RATE FAMILIES APPLICATIONES MATHEMATICAE 3,4 (24), pp. 369 395 Katarzyna Danielak (Warszawa) Tomasz Rychlik (Toruń) SHARP BOUNDS FOR EXPECTATIONS OF SPACINGS FROM DECREASING DENSITY AND FAILURE RATE FAMILIES Abstract.

More information

The Generalized Incompressible Navier-Stokes Equations in Besov Spaces

The Generalized Incompressible Navier-Stokes Equations in Besov Spaces Dynamics of PDE, Vol1, No4, 381-400, 2004 The Generalize Incompressible Navier-Stokes Equations in Besov Spaces Jiahong Wu Communicate by Charles Li, receive July 21, 2004 Abstract This paper is concerne

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

INVERSE PROBLEM OF A HYPERBOLIC EQUATION WITH AN INTEGRAL OVERDETERMINATION CONDITION

INVERSE PROBLEM OF A HYPERBOLIC EQUATION WITH AN INTEGRAL OVERDETERMINATION CONDITION Electronic Journal of Differential Equations, Vol. 216 (216), No. 138, pp. 1 7. ISSN: 172-6691. URL: http://eje.math.txstate.eu or http://eje.math.unt.eu INVERSE PROBLEM OF A HYPERBOLIC EQUATION WITH AN

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

A Weak First Digit Law for a Class of Sequences

A Weak First Digit Law for a Class of Sequences International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of

More information

High-Dimensional p-norms

High-Dimensional p-norms High-Dimensional p-norms Gérar Biau an Davi M. Mason Abstract Let X = X 1,...,X be a R -value ranom vector with i.i.. components, an let X p = j=1 X j p 1/p be its p-norm, for p > 0. The impact of letting

More information

Lie symmetry and Mei conservation law of continuum system

Lie symmetry and Mei conservation law of continuum system Chin. Phys. B Vol. 20, No. 2 20 020 Lie symmetry an Mei conservation law of continuum system Shi Shen-Yang an Fu Jing-Li Department of Physics, Zhejiang Sci-Tech University, Hangzhou 3008, China Receive

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Physics 505 Electricity and Magnetism Fall 2003 Prof. G. Raithel. Problem Set 3. 2 (x x ) 2 + (y y ) 2 + (z + z ) 2

Physics 505 Electricity and Magnetism Fall 2003 Prof. G. Raithel. Problem Set 3. 2 (x x ) 2 + (y y ) 2 + (z + z ) 2 Physics 505 Electricity an Magnetism Fall 003 Prof. G. Raithel Problem Set 3 Problem.7 5 Points a): Green s function: Using cartesian coorinates x = (x, y, z), it is G(x, x ) = 1 (x x ) + (y y ) + (z z

More information

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems

Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Milan Kora 1, Diier Henrion 2,3,4, Colin N. Jones 1 Draft of September 8, 2016 Abstract We stuy the convergence rate

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

arxiv: v1 [math.ap] 6 Jul 2017

arxiv: v1 [math.ap] 6 Jul 2017 Local an global time ecay for parabolic equations with super linear first orer terms arxiv:177.1761v1 [math.ap] 6 Jul 17 Martina Magliocca an Alessio Porretta ABSTRACT. We stuy a class of parabolic equations

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

The effect of dissipation on solutions of the complex KdV equation

The effect of dissipation on solutions of the complex KdV equation Mathematics an Computers in Simulation 69 (25) 589 599 The effect of issipation on solutions of the complex KV equation Jiahong Wu a,, Juan-Ming Yuan a,b a Department of Mathematics, Oklahoma State University,

More information

FURTHER BOUNDS FOR THE ESTIMATION ERROR VARIANCE OF A CONTINUOUS STREAM WITH STATIONARY VARIOGRAM

FURTHER BOUNDS FOR THE ESTIMATION ERROR VARIANCE OF A CONTINUOUS STREAM WITH STATIONARY VARIOGRAM FURTHER BOUNDS FOR THE ESTIMATION ERROR VARIANCE OF A CONTINUOUS STREAM WITH STATIONARY VARIOGRAM N. S. BARNETT, S. S. DRAGOMIR, AND I. S. GOMM Abstract. In this paper we establish an upper boun for the

More information