PETER L. BARTLETT AND MARTEN H. WEGKAMP
|
|
- Gertrude Hutchinson
- 6 years ago
- Views:
Transcription
1 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a ifficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss use in support vector machines SVMs. Its convexity ensures that the sample average of this surrogate loss can be efficiently minimize. We stuy its statistical properties. We show that minimizing the expecte surrogate loss the φ-risk also minimizes the risk. We also stuy the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conitional probability PY = 1 X is unlikely to be close to certain critical values. Key wors an phrases: Bayes classifiers; classification; convex surrogate loss; empirical risk minimization; hinge loss; large margin classifiers; margin conition; reject option; support vector machines. MSC 2000: Primary 62C05; seconary 62G05, 62G08 1. Introuction The aim of binary classification is to classify observations that take values in an arbitrary feature space X into one of two classes, labelle 1 or +1. Since an observation X oes not fully etermine its label y, we construct a classifier t : X { 1, +1} that represents our guess tx of the label Y of a future observation X. The rule with the smallest probability of error P{tX Y } is the Bayes rule { t 1 if P{Y = 1 X = x} P{Y = +1 X = x} 1 x := +1 otherwise Observations x for which the conitional probability 2 ηx := P{Y = +1 X = x} is close to 1/2, are the most ifficult to classify. In the extreme case where ηx = 1/2, we may just as well toss a coin to make a ecision. While it is our aim to classify the majority of future observations in an automatic way, it is often appropriate to instea report a warning for those observations that are har to classify the ones having conitional probability ηx near 1
2 2 BARTLETT AND WEGKAMP the value 1/2. This motivates the introuction of a reject option for classifiers, by allowing for a thir ecision, R reject, expressing oubt. In case the reject option is invoke, no ecision is mae. Although such classifiers are valuable in practice, few theoretical results are available see [10] an [7] for references. In this note, we assume that the cost of making a wrong ecision is 1 an the cost of utilizing the reject option is > 0. Given a classifier with reject option t : X { 1, 1, R}, the appropriate risk function is 3 L t := P{tX = R} + P{tX Y, tx R}. It is easy to see that the rule minimizing this risk assigns 1, 1 or R epening on which of ηx, 1 ηx or is smallest. Accoring to this rule, which we refer to as the Bayes rule, we nee never reject if 1/2. For this reason we restrict ourselves to the cases 0 1/2 an the Bayes rule with reject option is then 1 4 with risk t x := if ηx < +1 if ηx > 1 R otherwise 5 L := L t = E min{ηx, 1 ηx, }. The case = 1/2 reuces to the classical situation without the reject option an 4 coincies with 1. Herbei an Wegkamp [7] stuy plug-in rules that replace the regression function ηx by an estimate ηx in the formula for t x above. They show that the rate of convergence of the risk 3 to the Bayes risk L epens on how well ηx estimates ηx an on the behavior of ηx near the values an 1. This conition on ηx nicely generalizes Tsybakov s noise conition cf. [13] from the classical setting = 1/2 to our more general framework 0 1/2. The same paper consiers classifiers t that minimize the empirical counterpart 6 L t := n I{tX i = R} + 1 n I{tX i Y i, tx i R} of the risk L t over a class T of classifiers t : X { 1, +1, R} with reject option, base on a sample X 1, Y 1,..., X n, Y n of inepenent copies of the pair X, Y. For such rules t,
3 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 3 [7] establish oracle inequalities for the excess risk of the form L t L C 0 inf t T [L t L ] + n for some constant C 0 > 1 an the remainer n epens on the sample size n, the complexity of the class T an the behavior of ηx near the values an 1. Their finings are in line with the recent theoretical evelopments of stanar binary classification without the reject option = 1/2, see e.g. [3], [4], [9]. Despite the attractive theoretical properties of this metho, it is often har to implement. This paper aresses this pitfall by consiering a convex surrogate for the loss function akin to the hinge loss that is use in SVMs. The next section introuces a piecewise linear loss function φ x that generalizes the hinge loss function max{0, 1 x} in that it allows for the reject option an φ x = max{0, 1 x} for = 1/2. We prove that the Bayes classifier 4 is a transforme version of the minimizer of the risk associate with this new loss an that the excess risk L L can be boune by 2 times the excess risk base on the piecewise linear loss φ. Thus classifiers with small excess φ -risk automatically have small excess classification risk, proviing theoretical justification of the more computationally appealing metho. In Section 3, we illustrate the computational convenience of the new loss, showing that the SVM classifier with reject option can be obtaine by solving a quaratic program. Finally, in Section 4, we show that fast rates for instance, faster than n 1/2 of the SVM classifier with reject option are possible uner the same noise conitions on ηx use in [7]. As a sie effect, for the stanar SVM the special case of = 1/2, our results imply fast rates without an assumption that ηx is unlikely to be near 0 an 1, a technical conition that has been impose in the literature for that case [5], [12]. 2. Generalize hinge loss We can associate any real-value function f with a classifier with reject option using the transformation 1 if fx < δ, 7 t f,δ x = R if fx δ, +1 if fx > δ, where 0 δ 1 is an arbitrary positive number. We recommen the value δ = 1/2 but we postpone the iscussion on the choice of δ until after Theorem 2. Accoring to 3, the risk
4 4 BARTLETT AND WEGKAMP of t f,δ x equals then L t f,δ = P{t f,δ X Y, t f,δ X R} + P{t f,δ X = R} = P{fX < δ, Y = 1} + P{fX > δ, Y = 1} + P{ δ fx δ} = P{Y fx < δ} + P{ Y fx δ}. For example, the Bayes classifier t x efine in 4 correspons to the function 1 if ηx <, 8 f x = 0 if ηx 1, 1 if ηx > 1. Although the function f x is not unique, the classifier t x is the unique minimizer of L t f,δ over all measurable f : X R. We see that 9 L t f,δ = L,δ f := El,δ Y fx for the iscontinuous loss 1 if α < δ, l,δ α = if α < δ, 0 otherwise. The choice δ, = 0, 1/2 correspons to the classical case L t f,δ = P{Y fx < 0}. All other choices δ > 0 lea to classification that allows for the reject option with minimal risk L,δ f = El,δY f X = L. Instea of the iscontinuous loss l,δ, we consier a convex surrogate loss. Define the piecewise linear function 1 aα if α < 0, φ α = 1 α if 0 α < 1, 0 otherwise where a = 1 / 1. The next result states that the minimizer of the expectation of the iscrete loss l,δ an the convex loss φ α remains the same: f efine in 8 minimizes both El,δY fx an Eφ Y fx an the minimal risks are relate via the equality L,δ f = L φ f. Proposition 1. The minimizer of the risk L φ f = Eφ Y fx over all measurable f : X R is the Bayes rule 8. Furthermore, L φ f = L,δf.
5 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 5 Proof. Note that Hence, for L φ f = EηXφ fx + E1 ηxφ fx. 10 r η,φ α = ηφ α + 1 ηφ α it suffices to show that 1 if η < 1/1 + a, 11 α = 0 if 1/1 + a η a/1 + a, 1 if ηx > a/1 + a minimizes r η,φ α. The function r η,φ α can be written as η aηα if α 1, 1 + α1 1 + aη if 1 α 0, r η,φ α = 1 + α η + a1 η if 0 α 1, αa1 η + 1 η if α 1 an it is now a simple exercise to verify that α inee minimizes L φ α. Finally, since L φ f = Er η,φ fx an we fin that inf α ηφ α + 1 ηφ α = ηφ α + 1 ηφ α = η 1 η {η } + 1{ η 1 } + {η 1 }, L φ f = E [minηx, 1 ηx, ] = L. an the secon claim follows as well. We see that φ α l,δ α for all α R as long as 0 δ 1. Since this pointwise relation remains preserve uner taking expecte values, we immeiately obtain L t f,δ L φ f. The following comparison theorem shows that a relation like this hols not only for the risks, but for the excess risks as well. Theorem 2. Let 0 < 1/2 be fixe. For all 0 < δ 1/2, we have 12 L t f,δ L δ Lφ f L φ, where L φ = L φ f. For 1/2 δ 1, we have 13 L t f,δ L L φ f L φ.
6 6 BARTLETT AND WEGKAMP Finally, for δ, = 0, 1/2, we have 14 Lt f L L φ f L φ, where Lt f := P{Y fx < 0}, L := E minηx, 1 ηx an φx = max{0, 1 x}. Remark. The optimal multiplicative constant /δ or 1 epening on the value of δ in front of the φ -excess risk is achieve at δ = 1/2. For this choice, Theorem 2 states that L t f,1/2 L 2 L φ f L φ. For all δ 1, the multiplicative constant in front of the φ -excess risk oes not excee 1. The choice δ = 1/2 with the smallest constant 2 < 1 is right in the mile of the interval [, 1 ]. The choice δ = 1 correspons to the largest value of δ for which the piecewise constant function l,δ α is still majorize by the convex surrogate φ α. For δ = we will reject less frequently than for δ = 1 an δ = 1/2 can be seen as a compromise among these two extreme cases. an Inequality 14 is ue to Zhang [14]. Before we prove the theorem, we nee an intermeiate result. We efine the functions ξη = η{η } + { η 1 } + 1 η{η 1 } Hη = inf α ηφ α + 1 ηφ α = η 1 η {η } + 1{ η 1 } + {η 1 }. We suppress their epenence on in our notation. Their expectations are L = EξηX an L φ = EHηX, respectively. Furthermore, we efine H 1 η = inf α< δ ηφ α + 1 ηφ α, H R η = inf α δ ηφ α + 1 ηφ α, H 1 η = inf α>δ ηφ α + 1 ηφ α ; ξ 1 η = η ξη, ξ R η = ξη, ξ 1 η = 1 η ξη.
7 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 7 Proposition 3. Let 0 < 1/2. If 0 < δ 1/2, then, for b { 1, 1, R}, If δ 1, then, for b { 1, 1, R}, ξ b η δ {H bη Hη}. ξ b η H b η Hη. If δ, = 0, 1/2, then, for b { 1, 1, R}, Proof. First we compute inf r φα = η α 1, inf r φα = η {η } + 1 α δ inf r φα = {η } + δ α 0 inf r φα = {η 1 } + 0 α δ ξ b η H b η Hη. inf r φα = 1 η {η 1 } + δ α 1 inf r φα = 1 η α 1 It is now easy to verify that so that δ η + 1 δ {η } 1 δ + η δ {η } 1 + δ δ δ η {η 1 } 1 + δ δ δ η {η 1 } H 1 η = inf α< δ ηφ α + 1 ηφ α = η {η } + δ η + 1 δ {η } H 1 η Hη = δ 1 + δ 0{η } + η δ { η 1 } + η + 1 δ 1 {η 1 } On the other han, ξ 1 η = η ξη = 0{η } + η { η 1 } + 2η 1{η 1 }
8 8 BARTLETT AND WEGKAMP an we see that δ ξ 1η H 1 η Hη for all 0 < δ 1. Next, we compute H R η = inf ηφ α + 1 ηφ α α δ = 1 δ + δ η {η } + { η 1 } + 1 δ + δ δ η {η 1 } an Since H R η Hη = ξ R η = ξη 1 δ 1 δ η {η } + 1 δ 1 δ + 1 δ η {η 1 }. = η{η } + 0{ η 1 } η{η 1 } we fin that δ ξ Rη H R η Hη provie 0 < δ 1/2. Finally, we fin that an consequently Now, H 1 η = inf ηφ α + 1 ηφ α α>δ H 1 η Hη = ξ 1 η = 1 η ξη = 1 η {η 1 } + δ + 1 δ δ η {η 1 } 1 δ + δ δ η η {η } δ + δ δ η { η 1 } + 0{η 1 }. = 1 2η{η } + 1 η { η 1 } + 0{η 1 }, an we fin that δ ξ 1η H 1 η Hη
9 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 9 provie 0 < δ 1. We now verify the secon claim of Proposition 3. Assume that δ 1. First we consier the case η. Then ξ 1 η H 1 η Hη hols trivially. ξ R η H R η Hη 1 + δ 2η δ1. As η, we nee that 1 + δ 2 δ1, that is, δ ξ 1 η H 1 η Hη 1 + δ 2η δ1. As η, we nee that 1 + δ 2 δ1, equivalently, δ Next, if η 1, we see that ξ 1 η H 1 η Hη δ η δ. ξ R η H R η Hη hols trivially. ξ 1 η H 1 η Hη δ η 1 δ. Finally, if η 1, we fin that ξ 1 η H 1 η Hη 1 + δ 2η 1 + δ 2. For η 1 this hols provie 1 + δ δ 2 δ ξ R η H R η Hη 1 δ η 1 1 δ. ξ 1 η H 1 η Hη hols trivially. This conclues the proof of the secon claim since δ 1. The last claim for the case δ, = 0, 1/2 follows as well from the preceing calculations. Proof of Theorem 2. Assume 0 < δ 1/2 an 0 < 1/2. Define ψx = xδ/. By linearity of ψ, we have for any measurable function f, ψl t f,δ L = P{f < δ}ψφ 1η + P{ δ f δ}ψφ R η + P{f > δ}ψφ 1 η, where P is the probability measure of X an Pf = f P. Invoke now Proposition 3 to euce ψl t f,δ L P{f < δ} [H 1η Hη] + P{ δ f δ}[h R η Hη] +P{f > δ} [H 1 η Hη] an conclue the proof by observing that the term on the right of the previous inequality equals L φ f L φ.
10 10 BARTLETT AND WEGKAMP For the case δ, = 0, 1/2 an the case δ, with δ 1 an 0 < 1/2, take ψx = x. 3. SVM classifiers with reject option In this section, we consier the SVM classifier with reject option, an show that it can be obtaine by solving a quaratic program. Let k : X 2 R be the kernel of a reproucing kernel Hilbert space RKHS H, an let f be the norm of f in H. The SVM classifier with reject option is the minimizer of the sum of the empirical φ -risk an a regularization term that is proportional to the square RKHS norm. The following theorem shows that this classifier is the solution to a quaratic program, that is, it is the minimizer of a quaratic criterion on a subset of Eucliean space efine by linear inequalities. Thus, the classifier can be foun efficiently using general-purpose algorithms. Theorem 4. For any x 1,..., x n X an y 1,..., y n { 1, 1}, let f H be the minimizer of the regularize risk functional f φ y i fx i + λ f 2, where λ > 0. Then we can represent f as the finite sum f x = αi kx i, x, where α1,..., α n is the solution to the following quaratic program. 1 min ξ i γ i + λ α i α j kx i, x j α i,ξ i,γ i n i,j s.t. ξ i 0, γ i 0, ξ i 1 y i α j kx i, x j, γ i y i j=1 α j kx i, x j for i = 1,..., n. j=1 Proof. The fact that f can be represente as a finite sum over the kernel basis functions is a stanar argument see [8, 6]. It follows from Pythagoras theorem: the square RKHS norm can be split into the square norm of the component in the space spanne by the kernel basis functions x kx i, x an that of the component in the orthogonal subspace. Since the cost
11 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 11 function epens on f only at the points x i, an the reproucing property fx i = kx i,, f shows that these values epen only on the component of f in the space spanne by the kernel basis functions, the orthogonal subspace contributes only to the square norm term an not to the cost term. Thus, a minimizing f can be represente in terms of the solution α to the minimization 1 min α 1,...,α n n φ y i α j kx i, x j + λ α i α j kx i, x j. j=1 But then it is easy to see that we can ecompose φ as φ β = max{0, 1 β} i,j n max{0, β}. Defining ξ i = max{0, 1 y i fx i } an γ i = max{0, y i fx i } gives the QP. 4. Tsybakov s noise conition, Bernstein classes, an fast rates In this section, we consier methos that choose the function f from some class F so as to minimize the empirical φ -risk, L φ f := 1 n φ Y i fx i. For instance, to analyze the SVM classifier with reject option, we coul consier classes F n = {f H : f c n } for some sequence of constants c n. We are intereste in bouns on the excess φ -risk, that is, the ifference between the φ -risk of f an the minimal φ -risk over all measurable functions, of the form EL φ f L φ 2 inf f F Lφ f L φ + ɛn. Such bouns can be combine with an assumption on the rate of ecrease of the approximation error inf f Fn L φ f L φ for a sequence of classes F n use by a metho of sieves, an thus provie bouns on the rate of convergence of risk to Bayes risk. For many binary classification methos incluing empirical risk minimization, plug-in estimates, an minimization of the sample average of a suitable convex loss, the estimation error term ɛ n approaches zero at a faster rate when the conitional probability ηx is unlikely to be close to the critical value of 1/2 see [13, 2, 5, 12, 1]. For plug-in rules, [7] showe an analogous result for classification with a reject option, where the corresponing conition concerns the probability that ηx is close to the critical values of an 1. In this
12 12 BARTLETT AND WEGKAMP section, we prove a boun on the excess φ -risk of f that converges rapily when a conition of this kin applies. We begin with a precise statement of the conition. For = 1/2, it is equivalent to Tsybakov s margin conition [13]. Definition 5. We say that η satisfies the margin conition at with exponent α if there is an A 1 such that for all t > 0, P{ ηx t} At α an P{ηX 1 t} At α. The reason that conitions of this kin allow fast rates is relate to the variance of the excess φ -loss, g f x, y = φ yfx φ yf x, where f minimizes the φ -risk. Notice that the expectation of g f is precisely the excess risk of f, Eg f X, Y = L φ f L φ. We will show that when η satisfies the margin conition at with exponent α, the variance of each g f is boune in terms of its expectation, an thus approaches zero as the φ-risk of f approaches the minimal value. Classes for which this occurs are calle Bernstein classes. Definition 6. We say that G L 2 P is a β, B-Bernstein class with respect to the probability measure P 0 < β 1, B 1 if every g G satisfies Pg 2 B{Pg} β. We say that G has a Bernstein exponent β with respect to P if there exists a constant B for which G is a β, B-Bernstein class. Lemma 7. If η satisfies the margin conition at with exponent α, then for any class any class F of measurable uniformly boune functions, the class G = {g f : f F} has a Bernstein exponent β = α/1 + α. The result relies on the following two lemmas. The first shows that the excess φ -risk is at least linear in a certain pseuo-norm of the ifference between f an f. It is similar to the L 1 P norm, but it penalizes f less for large excursions that have little impact on the φ -risk. For example, if ηx = 1, then the conitional φ -risk is zero even if fx takes a large positive value. For η [0, 1], efine η f f if η < an f < 1, ρ η f, f = 1 η f f if η > 1 an f > 1, f f otherwise,
13 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 13 an recall the efinition of the conitional φ -risk in 10. Lemma 8. For η [0, 1], r η,φ f r η,φ f η η 1 ρ η f, f. Proof. Since r η,φ is convex, r η,φ f r η,φ f + gf f for any g in the subgraient of r η,φ f at f. In our case, r η,φ is piecewise linear, with four pieces, an the subgraients inclue Thus, we have η 1 at f = 1, η 1 at f = 1, 0, 1 η 1 at f = 0, 1, 1 η 1 at f = 1. r η,φ f r η,φ f η1 f f if η < an f < 1, η f f if η < an f > 1, η 1 η f f if η 1, 1 η f f if η > 1 an f < 1, 1 η1 f f if η > 1, f > 1. 1 ρ η f, f if η < an f < 1, η ρ η f, f if η < an f > 1, = η 1 η ρ η f, f if η 1, 1 η ρ η f, f if η > 1 an f < 1, 1 ρ η f, f if η > 1, f > 1. η 1 η ρ η f, f. We shall also use the following inequalities. Lemma 9. For η [0, 1], an ρ η f, f f f, η φ f φ f η φ f φ f B + 1ρ η f, f.
14 14 BARTLETT AND WEGKAMP Proof. The first inequality is immeiate from the efinition of ρ η. To see the secon, use the fact that φ is flat to the right of 1 to notice that η φ f φ f η φ f φ f 2 { η φ f φ f 2 if η < an f < 1, = 1 η φ f φ f 2 if η > 1 an f > 1. Since φ has Lipschitz constant a = 1 /, this implies η φ f φ f η φ f φ f 2 ηa 2 f f 2 if η < an f < 1, 1 ηa 2 f f 2 if η > 1 an f > 1, a 2 f f 2 otherwise a Bρ η f, f. Proof of Lemma 7. By Lemma 8, we have with L φ f L φ 1 Pρ η f, f η 1 I E + η I E+, E = { η 1 η }, E + = { η 1 > η }. Using the assumption on η, there is an A 1 such that for all t > 0 P{ ηx t} At α an P{ηX 1 t} At α. Thus, for any set E, Pρ η f, f η 1 I E tpρ η f, f I { η 1 t} I E = tpρ η f, f I E tpρ η f, f I { η 1 <t} I E t{pρ η f, f I E B + 1At α }, where B is such that f B. Similarly, Pρ η f, f η I E t{pρ η f, f I E B + 1At α }, an we obtain L φ f L φ 1 t Pρ η f, f I E+ E 2B + 1At α = 1 t Pρ η f, f 2B + 1At α.
15 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 15 Choose Pρη f, f 1/α t =, 4B + 1A in the expression above, an we obtain an so 15 Eg f X, Y = L φ f L φ Pρ η f, f In aition, by Lemma 9, 1 24B + 1A 1/α Pρ ηf, f 1+α/α, { 24B + 1A 1/α} α/α+1 {Egf X, Y } α/1+α. E{g f X, Y } 2 = EE[{g f X, Y } 2 X] = P η φ f φ f η φ f φ f B + 1 Pρ η f, f. Combining these two inequalities shows that 1 2 E{g f X, Y } 2 B AB + 1 1/α α/α+1 Egf X, Y α/1+α. Remark. Specialize to the case δ, = 0, 1/2, we note that Lemma 7 removes unnecessary technical restrictions on ηx near 0 an 1, impose in [5] an [12]. Lemma 7 provies the main ingreient for establishing fast rates of minimizers f of the empirical risk L φ f := 1 n φ Y i fx i. Theorem 10. If η satisfies the margin conition at with exponent α, F is a countable class of functions f : X R satisfying f B, an F satisfies log Nε, L, F Cε p for all ε > 0 an some 0 p 2, then there exists a constant C inepenent of n, such that EL φ f L φ 2 inf Lφ f L φ f F + C n 1+α 2+p+α+pα, where f = arg min f F Lφ f.
16 16 BARTLETT AND WEGKAMP Proof. We use the notation Pg f = Eg f X, Y an P n g f = 1 g f X i, Y i. n By efinition of f, we have L φ f L φ = Pg bf = 2P n g bf + P 2P n g bf 2 inf P ng f + supp 2P n g f. f F Taking expecte values on both sies, yiels, [ ] EL φ f L φ 2 inf Lφ f L φ f F + E supp 2P n g f. f F Since g f g f f f, it follows that [ E ] supp 2P n g f 4ε n + 2BP f F where F n is a ε n -minimal covering net of F with { f F ε n = An 1+α/2+p+α+pα. sup f F n P 2P n g f ε n The union boun an Bernstein s exponential inequality for the tail probability of sums of boune ranom variables in conjunction with Lemma 7, yiel { } P sup f F n P 2P n g f ε n F n max f F n exp expcε p n n 8 cnε 2 β n } ε n + Pg f 2 Pg 2 f + Bε n + Pg f /6 with 0 β = α/1 + α 1 an some c > 0 inepenent of n. Conclue the proof by noting that expcε p n cnε 2 β n = exp an by choosing the constant A in ε n such that Cε p n oε n. c 2 nε2 β n, = 2cnε 2 β n, an exp nε 2 β n = Remark. Consier for simplicity the case F is finite p = 0. Then, if the margin conition hols for α = +, we obtain rates of convergence log F /n. If α = 0, we in fact impose no restriction on ηx at all, an the rate equals log F /n 1/2.
17 CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS 17 The constant 2 in front of the minimal excess risk on the right coul be mae closer to 1, at the expense of increasing C. References 1. J. Y. Auibert an A. B. Tsybakov Fast convergence rates for plug-in classifiers uner margin conitions. Annals of Statistics. To appear 2. P. L. Bartlett, M. I. Joran an J. D. McAuliffe Convexity, classification, an risk bouns, Journal of the American Statistical Association, : S. Boucheron, O. Bousquet an G. Lugosi 2004a. Introuction to statistical learning theory. In Avance Lectures in Machine Learning O. Bousquet, U. von Luxburg, an G. Rätsch, Eitors, Springer, New-York. 4. S. Boucheron, O. Bousquet an G. Lugosi Theory of Classification: a Survey of Recent Avances. ESAIM: Probability an Statistics, 9: G. Blanchar, O. Bousquet an P. Massart Statistical Performance of Support Vector Machines. Manuscript blanchar/publi. 6. D. Cox an F. O Sullivan Asymptotic analysis of penalize likelihoo an relate estimators. Annals of Statistics, 18: , R. Herbei an M. H. Wegkamp Classification with reject option. Canaian Journal of Statistics. To appear 8. G. S. Kimelorf an G. Wahba Some results on Tchebycheffian spline functions. J. Math. Anal. Applic., 33: P. Massart St Flour Lecture Notes. Manuscript. massart/stf2003 massart.pf 10. B. D. Ripley Pattern recognition an neural networks. Cambrige University Press, Cambrige. 11. I. Steinwart an C. Scovel Fast Rates for Support Vector Machines using Gaussian Kernels. Los Alamos National Laboratory Technical Report LA-UR B. Tarigan an S. A. van e Geer Aaptivity of support vector machines with l 1 penalty. Technical Report MI , University of Leien. 13. A. B. Tsybakov Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32, T. Zhang Statistical behavior an consistency of classification methos base on convex risk minimization. Annals of Statistics, 32: Computer Science Division an Department of Statistics, University of California, Berkeley aress: bartlett@cs.berkeley.eu Department of Statistics, Floria State University, Tallahassee aress: wegkamp@stat.fsu.eu
Classification Methods with Reject Option Based on Convex Risk Minimization
Journal of Machine Learning Research 11 (010) 111-130 Submitte /09; Revise 11/09; Publishe 1/10 Classification Methos with Reject Option Base on Convex Risk Minimization Ming Yuan H. Milton Stewart School
More informationClassification with Reject Option
Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationGeneralized Tractability for Multivariate Problems
Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationFast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationIterated Point-Line Configurations Grow Doubly-Exponentially
Iterate Point-Line Configurations Grow Doubly-Exponentially Joshua Cooper an Mark Walters July 9, 008 Abstract Begin with a set of four points in the real plane in general position. A to this collection
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationREAL ANALYSIS I HOMEWORK 5
REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove
More informationLecture 5. Symmetric Shearer s Lemma
Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationA Unified Theorem on SDP Rank Reduction
A Unifie heorem on SDP Ran Reuction Anthony Man Cho So, Yinyu Ye, Jiawei Zhang November 9, 006 Abstract We consier the problem of fining a low ran approximate solution to a system of linear equations in
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationQubit channels that achieve capacity with two states
Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationSturm-Liouville Theory
LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationLECTURE NOTES ON DVORETZKY S THEOREM
LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More information1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.
Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationLecture 10 February 23
EECS 281B / STAT 241B: Advanced Topics in Statistical LearningSpring 2009 Lecture 10 February 23 Lecturer: Martin Wainwright Scribe: Dave Golland Note: These lecture notes are still rough, and have only
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More information1 Heisenberg Representation
1 Heisenberg Representation What we have been ealing with so far is calle the Schröinger representation. In this representation, operators are constants an all the time epenence is carrie by the states.
More informationSurvival exponents for fractional Brownian motion with multivariate time
Survival exponents for fractional Brownian motion with multivariate time G Molchan Institute of Earthquae Preiction heory an Mathematical Geophysics Russian Acaemy of Science 84/3 Profsoyuznaya st 7997
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationNOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,
NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationSparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results
Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Peter L. Bartlett 1 and Ambuj Tewari 2 1 Division of Computer Science and Department of Statistics University of California,
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationA LIMIT THEOREM FOR RANDOM FIELDS WITH A SINGULARITY IN THE SPECTRUM
Teor Imov r. ta Matem. Statist. Theor. Probability an Math. Statist. Vip. 81, 1 No. 81, 1, Pages 147 158 S 94-911)816- Article electronically publishe on January, 11 UDC 519.1 A LIMIT THEOREM FOR RANDOM
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationSolutions to Math 41 Second Exam November 4, 2010
Solutions to Math 41 Secon Exam November 4, 2010 1. (13 points) Differentiate, using the metho of your choice. (a) p(t) = ln(sec t + tan t) + log 2 (2 + t) (4 points) Using the rule for the erivative of
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationLogarithmic spurious regressions
Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate
More informationOnline Appendix for Trade Policy under Monopolistic Competition with Firm Selection
Online Appenix for Trae Policy uner Monopolistic Competition with Firm Selection Kyle Bagwell Stanfor University an NBER Seung Hoon Lee Georgia Institute of Technology September 6, 2018 In this Online
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationTwo formulas for the Euler ϕ-function
Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationImplicit Differentiation
Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationChaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena
Chaos, Solitons an Fractals (7 64 73 Contents lists available at ScienceDirect Chaos, Solitons an Fractals onlinear Science, an onequilibrium an Complex Phenomena journal homepage: www.elsevier.com/locate/chaos
More informationA Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential
Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationDiscrete Operators in Canonical Domains
Discrete Operators in Canonical Domains VLADIMIR VASILYEV Belgoro National Research University Chair of Differential Equations Stuencheskaya 14/1, 308007 Belgoro RUSSIA vlaimir.b.vasilyev@gmail.com Abstract:
More informationON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AND NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AND APPROXIMATION
ON THE OPTIMAL CONVERGENCE RATE OF UNIVERSAL AN NON-UNIVERSAL ALGORITHMS FOR MULTIVARIATE INTEGRATION AN APPROXIMATION MICHAEL GRIEBEL AN HENRYK WOŹNIAKOWSKI Abstract. We stuy the optimal rate of convergence
More informationHITTING TIMES FOR RANDOM WALKS WITH RESTARTS
HITTING TIMES FOR RANDOM WALKS WITH RESTARTS SVANTE JANSON AND YUVAL PERES Abstract. The time it takes a ranom walker in a lattice to reach the origin from another vertex x, has infinite mean. If the walker
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationWhy Bernstein Polynomials Are Better: Fuzzy-Inspired Justification
Why Bernstein Polynomials Are Better: Fuzzy-Inspire Justification Jaime Nava 1, Olga Kosheleva 2, an Vlaik Kreinovich 3 1,3 Department of Computer Science 2 Department of Teacher Eucation University of
More informationSHARP BOUNDS FOR EXPECTATIONS OF SPACINGS FROM DECREASING DENSITY AND FAILURE RATE FAMILIES
APPLICATIONES MATHEMATICAE 3,4 (24), pp. 369 395 Katarzyna Danielak (Warszawa) Tomasz Rychlik (Toruń) SHARP BOUNDS FOR EXPECTATIONS OF SPACINGS FROM DECREASING DENSITY AND FAILURE RATE FAMILIES Abstract.
More informationThe Generalized Incompressible Navier-Stokes Equations in Besov Spaces
Dynamics of PDE, Vol1, No4, 381-400, 2004 The Generalize Incompressible Navier-Stokes Equations in Besov Spaces Jiahong Wu Communicate by Charles Li, receive July 21, 2004 Abstract This paper is concerne
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationINVERSE PROBLEM OF A HYPERBOLIC EQUATION WITH AN INTEGRAL OVERDETERMINATION CONDITION
Electronic Journal of Differential Equations, Vol. 216 (216), No. 138, pp. 1 7. ISSN: 172-6691. URL: http://eje.math.txstate.eu or http://eje.math.unt.eu INVERSE PROBLEM OF A HYPERBOLIC EQUATION WITH AN
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationA Weak First Digit Law for a Class of Sequences
International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of
More informationHigh-Dimensional p-norms
High-Dimensional p-norms Gérar Biau an Davi M. Mason Abstract Let X = X 1,...,X be a R -value ranom vector with i.i.. components, an let X p = j=1 X j p 1/p be its p-norm, for p > 0. The impact of letting
More informationLie symmetry and Mei conservation law of continuum system
Chin. Phys. B Vol. 20, No. 2 20 020 Lie symmetry an Mei conservation law of continuum system Shi Shen-Yang an Fu Jing-Li Department of Physics, Zhejiang Sci-Tech University, Hangzhou 3008, China Receive
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationPhysics 505 Electricity and Magnetism Fall 2003 Prof. G. Raithel. Problem Set 3. 2 (x x ) 2 + (y y ) 2 + (z + z ) 2
Physics 505 Electricity an Magnetism Fall 003 Prof. G. Raithel Problem Set 3 Problem.7 5 Points a): Green s function: Using cartesian coorinates x = (x, y, z), it is G(x, x ) = 1 (x x ) + (y y ) + (z z
More informationConvergence rates of moment-sum-of-squares hierarchies for optimal control problems
Convergence rates of moment-sum-of-squares hierarchies for optimal control problems Milan Kora 1, Diier Henrion 2,3,4, Colin N. Jones 1 Draft of September 8, 2016 Abstract We stuy the convergence rate
More informationθ x = f ( x,t) could be written as
9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)
More informationarxiv: v1 [math.ap] 6 Jul 2017
Local an global time ecay for parabolic equations with super linear first orer terms arxiv:177.1761v1 [math.ap] 6 Jul 17 Martina Magliocca an Alessio Porretta ABSTRACT. We stuy a class of parabolic equations
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationThe effect of dissipation on solutions of the complex KdV equation
Mathematics an Computers in Simulation 69 (25) 589 599 The effect of issipation on solutions of the complex KV equation Jiahong Wu a,, Juan-Ming Yuan a,b a Department of Mathematics, Oklahoma State University,
More informationFURTHER BOUNDS FOR THE ESTIMATION ERROR VARIANCE OF A CONTINUOUS STREAM WITH STATIONARY VARIOGRAM
FURTHER BOUNDS FOR THE ESTIMATION ERROR VARIANCE OF A CONTINUOUS STREAM WITH STATIONARY VARIOGRAM N. S. BARNETT, S. S. DRAGOMIR, AND I. S. GOMM Abstract. In this paper we establish an upper boun for the
More information