Classification Methods with Reject Option Based on Convex Risk Minimization

Size: px
Start display at page:

Download "Classification Methods with Reject Option Based on Convex Risk Minimization"

Transcription

1 Journal of Machine Learning Research 11 (010) Submitte /09; Revise 11/09; Publishe 1/10 Classification Methos with Reject Option Base on Convex Risk Minimization Ming Yuan H. Milton Stewart School of Inustrial an Systems Engineering Georgia Institute of Technology Atlanta, GA , USA Marten Wegkamp Department of Statistics Floria State University Tallahassee, FL 3306, USA Eitor: Bin Yu Abstract In this paper, we investigate the problem of binary classification with a reject option in which one can withhol the ecision of classifying an observation at a cost lower than that of misclassification. Since the natural loss function is non-convex so that empirical risk minimization easily becomes infeasible, the paper proposes minimizing convex risks base on surrogate convex loss functions. A necessary an sufficient conition for infinite sample consistency (both risks share the same minimizer) is provie. Moreover, we show that the excess risk can be boune through the excess surrogate risk uner appropriate conitions. These bouns can be tightene by a generalize margin conition. The impact of the results is illustrate on several commonly use surrogate loss functions. Keywors: classification, convex surrogate loss, empirical risk minimization, generalize margin conition, reject option 1. Introuction In binary classification, one observes inepenent realizations (X 1,Y 1 ),...,(X n,y n ) of the ranom pair (X,Y) where X X an Y Y = { 1,1}. The goal is to learn from these training ata a classification rule g :X Y that classifies an observation X into the two classes. It is recognize that in many applications the consequences of misclassification can be substantial. In such situations, a less specific response that reserves the right of not making a ecision, sometimes referre to as a reject option (see, e.g., Herbei an Wegkamp, 006), may even be more preferable than risking misclassification. This, for example, is typical in meical stuies where screening of a certain isease can be one base on relatively inexpensive clinical measures. If the classification base on these measurements are satisfactory, nothing further nees to be one. But in the event that there are ambiguities, it woul be more esirable to take a rejection option an seek more expensive stuies to ientify a subject s isease status. Similar approaches are often aopte in DNA sequencing or genotyping applications, where the rejection option is commonly referre to as a no-call. Similar problems have attracte much attention in various application fiels an also receive increasing c 010 Ming Yuan an Marten Wegkamp.

2 YUAN AND WEGKAMP amount of interest more recently in machine learning literature. See Ripley (1996) an Bartlett an Wegkamp (008) an references therein. To accommoate the reject option, we now seek a classification rule g :X Ỹ where Ỹ = { 1,1,0} is an augmente response space an g(x) = 0 inicates that no efinitive classification will be mae for X or a reject option is taken. To measure the performance of a classification rule, we employ the following loss function that generalizes the usual 0-1 loss to account for reject option: 1 if g(x) Y an g(x) 0 l[g(x),y] = if g(x) = 0. 0 if g(x) = Y In other wors, an ambiguous response (g(x) = 0) incurs a loss of whereas misclassification incurs a loss of 1. Note that is necessarily smaller than 1/. Otherwise, rather than taking a rejection option with a loss, we can always flip a fair coin to ranomly assign ±1 as the value of g(x), which incurs an average loss of 1/. For this reason, we shall assume that < 1/ in what follows. For any classification rule g :X Ỹ, the risk function is then given by R(g) = E(l[g(X),Y]) where the expectation is taken over the joint istribution of X an Y. It is not har to show that the optimal classification rule g := argminr(g) is given by (see, e.g., Bartlett an Wegkamp, 008) 1 if η(x) > 1 g (X) = 0 if η(x) 1 1 if η(x) < where η(x) = P(Y = 1 X). The corresponing risk is R := infr(g) = R(g ) = E(min{η(X),1 η(x),}). Thus, the performance of any classification rule g : X Ỹ can be measure by the excess risk R(g) := R(g) R. Appealing to the general empirical risk minimization strategy, one coul attempt to erive a classification rule from the training ata by minimizing the empirical risk R n (g) = 1 n n i=1 l[g(x i ),Y i ]. Similar to the usual 0-1 loss, however, l is not convex in g; an irect minimization of R n is typically an NP-har problem. A common remey is to consier a surrogate convex loss function. To this en, let φ : R R be a convex function. Denote by Q( f) = E[φ(Y f(x))] the corresponing risk for a iscriminant function f :X R. Let ˆf n be the minimizer of Q n ( f) = 1 n n i=1 11 φ(y i f(x i ))

3 CLASSIFICATION WITH REJECTION OPTION over a certain functional space F consisting of functions that map from X to R. ˆf n can be conveniently converte to a classification rule C( ˆf n ;δ) as follows: 1 if f(x) > δ C( f(x);δ) = 0 if f(x) δ 1 if f(x) < δ where δ > 0 is a parameter that as we shall see plays a critical role in etermining the performance of C( ˆf n,δ). In this paper, we investigate the statistical properties of this general convex risk minimization technique. To what extent C( ˆf n,δ) mimics the optimal classification rule g plays a critical role in the success of this technique. Let fφ be the minimizer of Q( f). We shall assume throughout the paper that fφ is uniquely efine. Typically f φ reflects the limiting behavior of ˆf n when F is rich enough an there are infinitely many training ata. Therefore the first question is whether or not fφ can be use to recover the optimal rule g. A surrogate loss function φ that satisfies this property is often calle infinite sample consistent (see, e.g., Zhang, 004) or classification calibrate (see, e.g., Bartlett, Joran an McAuliffe, 006). A secon question further concerns the relationship between the excess risk R[C( f,δ)] an the excess φ risk Q( f) = Q( f) infq( f): Can we fin an increasing function ρ : R R such that for all f, R[C( f,δ)] ρ( Q( f))? (1) Clearly the infinite sample consistency of φ implies that ρ(0) = 0. Such a boun on the excess risk provies useful tools in bouning the excess risk of ˆf n. In particular, (1) inicates that R[C( ˆf n,δ)] ρ ( Q( ˆf n ) ) = ρ [ Q( f)+ ( Q( ˆf n ) Q( f) )], where f = argmin f F Q( f). The first term Q( f) on the right-han sie exhibits the approximation error of functional classf whereas the secon term Q( ˆf n ) Q( f) is the estimation error. In the case when there is no reject option, these problems have been well investigate in recent years (Lin, 00; Zhang, 004; Bartlett, Joran an McAuliffe, 006). In this paper, we establish similar results when there is a reject option. The most significant ifference between the two situations, with or without the reject option, is the role of δ. As we shall see, for some loss functions such as least squares, exponential or logistic, a goo choice of δ yiels classifiers that are infinite sample consistent. For other loss functions, however, such as the hinge loss, no matter how δ is chosen, the classification rule C( f,δ) cannot be infinite sample consistent. The remainer of the paper is organize as follows. We first examine in Section the infinite sample consistency for classification with reject option. After establishing a general result, we consier its implication on several commonly use loss functions. In Section 3, we establish bouns on the excess risk in the form of (1), followe by applications to the popular loss functions. We also show that uner an aitional assumption on the behavior of η(x) near an 1 as in Herbei an Wegkamp (006), generalizing the conition in the case of = 1/ of Mammen an Tsybakov (1999) an Tsybakov (004), the boun (1) can be tightene consierably. Section 4 iscusses rates of convergence of the empirical risk minimizer f n that minimizes the empirical risk Q n ( f) over a boune class F. Section 5 consiers extension to asymmetric loss where one type of misclassification may be more costly than the other. All proofs are relegate to Section

4 YUAN AND WEGKAMP. Infinite Sample Consistency We first give a general result on the infinite sample consistency of the classification rule C( f φ,δ). Theorem 1 Assume that φ is convex. Then the classification rule C( fφ,δ) for some δ > 0 is infinite sample consistent, that is, C( fφ,δ) = g if an only if φ (δ) an φ ( δ) both exist, φ (δ) < 0, an φ (δ) φ (δ)+φ =. () ( δ) When there is no reject option, it is known that the necessary an sufficient conition for the infinite sample consistency is that φ is ifferentiable at 0 an φ (0) < 0 (see, e.g., Bartlett, Joran an McAuliffe, 006). As inicate by Theorem 1, the ifferentiability of φ at ±δ plays a more prominent role in the general case when there is a reject option. From Theorem 1 it is also evient that the infinite sample consistency epens on both φ an the choice of thresholing parameter δ. Observe that for any δ 1 < δ, φ ( δ ) φ ( δ 1 ) φ (δ 1 ) φ (δ ), which implies that the left-han sie of () is a ecreasing function of δ. If φ is strictly convex, then it is strictly ecreasing; an therefore there is at most one value of δ that satisfies (). In other wors, for strictly convex φ, there is at most one thresholing parameter δ such that C( f φ,δ) = g. On the other han, if φ is twice ifferentiable such that φ (0) < 0 an φ (z) 0 as z +, then for any < 1/, there always exists a δ > 0 such that () hols. This is because the left-han sie of () is a ecreasing function of δ, which approaches its supremum 1/ when δ 0 an 0 when δ increases. Moreover, the twice ifferentiability of φ ensures that the left-han sie of () is also a continuous function of δ. The following is therefore a irect consequence of Theorem 1: Corollary If φ is strictly convex, then either there is a unique δ > 0 such that C( fφ,δ) is infinite sample consistent; or C( fφ,δ) is not infinite sample consistent for any δ > 0. In aition to convexity, if φ is twice ifferentiable such that φ (0) < 0 an φ (z) 0 as z +, then there always exists a δ > 0 such that C( fφ,δ) is infinite sample consistent. Theorem 1 provies a general guieline on how to choose δ for common choices of convex losses. Below we look at several concrete examples..1 Least Squares Loss We first examine the least squares loss φ(z) = (1 z). Observe that φ (δ) φ ( δ)+φ (δ) = 1 δ. All conitions of Theorem 1 are met if an only if δ = 1. Corollary 3 For the least squares loss, C( f φ,1 ) = g. 114

5 CLASSIFICATION WITH REJECTION OPTION. Exponential Loss Exponential loss, φ(z) = exp( z), is connecte with boosting (Frieman, Hastie an Tibshirani, 000). Because φ (δ) φ ( δ)+φ (δ) = 1 1+exp(δ), Therefore all conitions of Theorem 1 are met if an only if δ = 1 log ( 1 1 ). Corollary 4 For the exponential loss, ( C fφ, 1 ( )) 1 log 1 = g..3 Logistic Loss Logisitic regression employs loss φ(z) = ln(1+exp( z)). Similar to before, φ (δ) φ ( δ)+φ (δ) = 1 1+exp(δ), which suggests that all conitions of Theorem 1 are met if ( ) 1 δ = log 1. Corollary 5 For the logistic loss, ( ( )) 1 C fφ,log 1 = g..4 Square Hinge Loss Square hinge loss, φ(z) = (1 z) +, is another popular choice for which φ (δ) φ ( δ)+φ (δ) = 1 δ. Similar to the least squares loss, we have the following corollary. Corollary 6 For the square hinge loss, C ( f φ,1 ) = g. 115

6 YUAN AND WEGKAMP.5 Distance Weighte Discrimination Marron, To an Ahn (007) recently introuce the so-calle istance weighte iscrimination metho where the following loss function (see, e.g., Bartlett, Joran an McAuliffe, 006) is use { 1 z if z γ φ(z) = ( ) 1 γ z γ if z < γ, (3) where γ > 0 is a constant. It is not har to see that φ is convex. Moreover, { φ 1/z if z γ (z) = 1/γ if z < γ. Thus, { φ (δ) 1/ if δ < γ φ ( δ)+φ (δ) = 1/δ if δ > γ. 1/δ +1/γ In other wors, we have the following result for the istance weighte iscrimination loss. Corollary 7 For the loss (3),.6 Hinge Loss ( ) C fφ,[(1 )/] 1/ γ = g. The popular support vector machine employs the hinge loss, φ(z) = (1 z) +. The hinge loss is ifferentiable everywhere except 1. Therefore φ { (δ) 1 φ ( δ)+φ (δ) = if 0 < δ < 1. 0 if δ > 1 Because 0 < < 1/, there oes not exist a δ such that all conitions of Theorem 1 are met. As a matter of fact, for any δ > 0, C( fφ,δ) g. Motivate by this observation, Bartlett an Wegkamp (008) introuce the following moification to the hinge loss: φ(z) = where a > 1. Note that with this moification, φ (δ) φ ( δ)+φ (δ) = Therefore, we have the following corollary. 1 az if z 0 1 z if 0 < z 1 0 if z > 1 { 1/(a+1) if 0 < δ < 1 0 if δ > 1 Corollary 8 For the moifie hinge loss (4) an any δ < 1, if a = (1 )/, then C ( f φ,δ ) = g., (4) It is interesting to note that for the examples we consiere previously, a specific choice of δ is neee to ensure the infinite sample consistent. Whereas for the moifie hinge loss, a range of choice of δ can serve the same purpose. However, as we shall see in the next section, ifferent choices of δ for the moifie hinge loss may result in slightly ifferent boun on the excess risk with δ = 1/ appearing to be more preferable in that it yiels the smallest upper boun of the excess risk.. 116

7 CLASSIFICATION WITH REJECTION OPTION 3. Excess Risk We now turn to the excess risk R[C( f,δ)] an show how it can be boune through the excess φ risk Q( f) := Q( f) Q( f φ). Recall that the infinite sample consistency establishe in the previous section means that Q( f) = 0 implies throughout this section that R(C( f,δ)) = 0. For brevity, we shall assume implicitly that δ is chosen in accorance with Theorem 1 to ensure infinite sample consistency. Write By efinition, Denote Q η(x) (z) = η(x)φ(z)+(1 η(x))φ( z). Q η(x) ( f φ(x)) = inf z Q η(x)(z). Q η ( f) = Q η ( f) Q η ( f φ) where we suppress the epenence of η, f an fφ on X for brevity. Theorem 9 Assume that φ is convex, φ (δ) an φ ( δ) both exist, φ (δ) < 0, an () hols. In aition, suppose that there exist constants C > 0 an s 1 such that η s C s Q η ( δ); (1 η) s C s Q η (δ). Then R[C( f,δ)] C[ Q( f)] 1/s. (5) It is immeiate from Theorem 9 that Q( ˆf n ) p 0 implies R( ˆf n ) p 0. In other wors, consistency in terms of φ risk implies the consistency in terms of loss l. It is worth noting that the constant in the upper boun can be tightene uner stronger conitions. Theorem 10 In aition to the assumptions of Theorem 9, assume that (η 1) s + C s Q η ( δ); (1 η) s + C s Q η (δ). Then R[C( f,δ)] C[ Q( f)] 1/s. We can improve the bouns even further by the following margin conition. Assume that for some α 0 an A 1 P{ η(x) z t} At α (6) for all 0 t < at z = an z = 1. This assumption was introuce in Herbei an Wegkamp (006) an generalizes the margin conition of Mammen an Tsybakov (1999) an Tsybakov (004). It is always met for α = 0 an A = 1. The other extreme is for α + - the case where η(x) stays away from an 1 with probability one. 117

8 YUAN AND WEGKAMP Theorem 11 In aition to the assumptions of Theorem 9, assume that (6) hols for some α 0 an A 1. Then, for some K epening on A an α, where β = α/(1+α). R[C( f,δ)] K[ Q( f)] 1/(s+β βs), (7) In case α = 0, the exponent 1/(s+β βs) is 1/s on the right han sie in (7) above, an the situation is as in Theorem 9. For α +, the boun (7) improves upon the one in Theorem 9 as the exponent 1/(s + β βs) converges to 1. We now examine the consequences of Theorems 9, 10 an 11 on several common loss functions. 3.1 Least Squares Note that for the least squares loss Simple algebraic manipulations show that Therefore, by Theorems 9 an 11, Corollary 1 For the least squares loss, Q η ( f) = (η 1 f). Q η ( δ) = 4 η ; Q η (δ) = 4 (1 η). R[C( f,1 )] [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then for some constant K > Exponential Loss R[C( f,1 )] K[ Q( f)] 1+α +α An application of Taylor expansion yiels (see, e.g., Zhang, 004) Therefore, Therefore, by Theorems 9 an 11, ( ) 1 Q η ( f) η. 1+exp( f) Q η ( δ) η ; Q η (δ) (1 η). 118

9 CLASSIFICATION WITH REJECTION OPTION Corollary 13 For the exponential loss, [ ( R C f, 1 ( ))] 1 log 1 [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then [ ( R C f, 1 ( ))] 1 log 1 K[ Q( f)] +α 1+α for some constant K > Logistic Loss Similar to exponential loss, an application of Taylor expansion yiels ( ) 1 Q η ( f) η. 1+exp( f) Therefore, Therefore, by Theorems 9 an 11, Corollary 14 For the logistic loss, [ R C Q η ( δ) η ; Q η (δ) (1 η). ( f,log ( ))] 1 1 [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then [ ( ( ))] 1 R C f,log 1 K[ Q( f)] 1+α +α for some constant K > Square Hinge Loss Simple algebraic erivation shows Therefore, By Theorems 9 an 11, Corollary 15 For the square hinge loss, Q η ( f) = (η 1 f) η( f 1) + (1 η)( f + 1). Q η ( δ) = 4 η ; Q η (δ) = 4 (1 η). R[C( f,1 )] [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then for some constant K > 0. R[C( f,1 )] K[ Q( f)] 1+α +α 119

10 YUAN AND WEGKAMP 3.5 Distance Weighte Discrimination Observe that Hence Q η (z) = infq η (z) = γ η z + (1 η)z γ γ + z + (1 η) γ (1 η) γ η γ ηz 1 η γ z if z γ if z < γ if z γ ( η(1 η)+min{η,1 η} ). an (η/(1 η)) 1/ γ if η > 1/ fφ = any value in [ γ,γ] if η = 1/ ((1 η)/η) 1/ γ if η < 1/ Recall that δ = ((1 )/) 1/ γ. Then ( η (1 η)δ Q η (δ) + δ γ ) η(1 η)/γ ( (η ) ( ) ) 1/ (1 η)δ 1/ = δ γ ( ( = ηδ ) 1/ ( ) ) 1 η 1/ γ 1 η. Observe that Thus, Similarly, ( ( = ηδ γ = δ γ (1 ) From Theorems 9 an 11, we conclue that ) 1/ ( ) ) 1 η 1/ ( + 1 η 1 1 η ) η [ ( ) ] 1/ η 1/ +(1 η) 1/ (1 η ). 1 ( ) 1/ η 1/ +(1 η) 1/ (1 ) 1/. 1 (1 η ) γ(1 ) 1/ 1/ Q η (δ). (η ) γ(1 ) 1/ 1/ Q η ( δ). Corollary 16 For the istance weighte iscrimination loss, [ ] R C( f,((1 )/) 1/ γ) γ 1/ (1 ) 1/4 1/4 [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then [ ] R C( f,((1 )/) 1/ γ) K[ Q( f)] +α 1+α for some constant K > 0. 10

11 CLASSIFICATION WITH REJECTION OPTION 3.6 Hinge Loss with Rejection Option As shown by Bartlett an Wegkamp (008), for the moifie hinge loss (4), 1 if η arg minq η (z) = 0 if < η < 1. z 1 if η > 1 Simple algebraic manipulations lea to (1 δ)( η)/ if η Q η ( δ) = (η )δ/ if < η < 1 1 (1 η)/ +(η )δ/ if η > 1, an Therefore, Furthermore, Q η (δ) = 1 η/ +(1 η )δ/ if η (1 η )δ/ if < η < 1 (δ 1)(1 η )/ if η > 1 min{δ,1 δ} η Q η ( δ); min{δ,1 δ} (1 η) Q η (δ).. From Theorems 10 an 11, we conclue that min{δ,1 δ} (η 1) + Q η ( δ); min{δ,1 δ} (1 η) + Q η (δ). Corollary 17 For the moifie hinge loss an any δ < 1, R[C( f,δ)] Furthermore, if the margin conition (6) hols, then for some constant K > 0. Q( f). (8) min{δ,1 δ} R[C( f,δ)] K Q( f) (9) Notice that the corollary also suggests that δ = 1/ yiels the best constant in the upper boun. A similar result has also been recently establishe by Bartlett an Wegkamp (008). It is also interesting to see that (8) cannot be further improve by the generalize margin conition (6) as the bouns (8) an (9) only iffer by a constant. 11

12 YUAN AND WEGKAMP 4. Rates of Convergence for Empirical Risk Minimizers In this section we briefly review the possible rates of convergence for minimizers of the empirical risk Q n ( f) = (1/n) n i=1 φ(y i f(x i )) over a convex class of iscriminant functions F ; an show the implications of the excess risk bouns obtaine in the previous section. The analysis of the generalize hinge loss is complicate an is treate in etail in Wegkamp (007) an Bartlett an Wegkamp (008). The other loss functions φ consiere in this paper have in common that the moulus of convexity of Q, { ( ) } Q( f)+q(g) f + g δ(ε) = inf Q : E[( f g) (X)] ε satisfies δ(ε) cε for some c > 0 an that, for some L <, φ(x) φ(x ) L x x for all x,x R. We have the following result that imposes a restriction on the 1/n-covering number N n = N(1/n,L,F ), the carinality of the set of close balls with raius 1/n in L neee to coverf. Theorem 18 Assume that f B for all f F an let 0 < γ < 1. With probability at least 1 γ, Q( ˆf n ) inf Q( f)+ 3L ( L f F n + 8 c + B ) log(nn /γ) 6 n Together with the excess risk bouns from Theorems 9 an 11, we have Corollary 19 Uner the assumptions of Theorems 9 an 18, we have, with probability at least 1 γ, { R(C( ˆf n,δ)) C inf Q( f)+ 3L ( L f F n + 8 c + LB ) } 1/s log(nn /γ). 3 n Furthermore, if the generalize margin conition (6) hols, then with probability at least 1 γ, { R(C( ˆf n,δ)) K inf Q( f)+ 3L ( L f F n + 8 c + LB ) } 1/(s+β βs) log(nn /γ) 3 n for some constant K > 0. In the special case wheref consists of linear combinations f λ (x) = M λ j f j (x) j=1 of simple iscriminant functions (ecision stumps) f 1,..., f M with M j=1 λ j B an f j 1, we obtain the rate (M logn/n) 1/(s+β βs). We can view B as a tuning parameter here, an if the functions f j are near orthogonal in the sense that max 1 i j M E[ f i (X) f j (X)] E[ fi (X)]E[ f j (X)] for some small c > 0, a small moification of Theorem 1 in Wegkamp (007) shows that we also aapt to the unknown sparsity of the minimizer λ 0 of Q(f λ ) over λ in that the rate becomes ( λ 0 0 logn/n) 1/(s+β βs) for suitably chosen B = B(n). c λ 0 0 1

13 CLASSIFICATION WITH REJECTION OPTION 5. Asymmetric Loss We have focuse thus far on the case where misclassifying from one class to the other, either g(x) = 1 while Y = 1 or g(x) = 1 while Y = 1, is assigne the same loss. In many applications, however, one type of misclassification may incur a heavier loss than the other. Such situations naturally arise in risk management or meical iagonsis. To this en, the following loss function can be aopte in place of l: 1 if g(x) = 1 an Y = 1 θ l θ [g(x),y] = if g(x) = 1 an Y = 1 if g(x) = 0 0 if g(x) = Y We shall assume that θ < 1 without loss of generality. It can be shown that the rejection option is only available if < θ/(1+θ) (see, e.g., Herbei an Wegkamp, 006), which we shall assume throughout the section. When this hols, the corresponing Bayes rule is given by (see, e.g., Herbei an Wegkamp, 006) 1 if η(x) > 1 /θ g θ(x) = 0 if η(x) 1 /θ. 1 if η(x) < Instea of C( ˆf n,δ), an asymmetrically truncate classification rule, ˆf n, C( ˆf n ;δ 1,δ ), can be use for our purpose here where 1 if f(x) > δ 1 C( f(x);δ 1,δ ) = 0 if δ f(x) δ 1. 1 if f(x) < δ The behavior of the asymmetically truncate classification rule C( ˆf n ;δ 1,δ ) can be stuie in a similar fashion as before. In particular, we have the following results in parallel to Theorems 1 an 9. Theorem 0 Assume that φ is convex. Then C( fφ,δ 1,δ ) for some δ 1,δ > 0 is infinite sample consistent, that is, C( fφ,δ 1,δ ) = g θ if an only if φ (±δ 1 ) an φ (±δ ) exist; φ (δ 1 ),φ (δ ) < 0; an φ (δ 1 ) φ ( δ 1 )+φ = (δ 1 ) θ ; φ (δ ) φ ( δ )+φ =. (δ ) Furthermore, if C( f φ,δ 1,δ ) is infinite sample consistent an then θ(1 η) s C s Q η (δ 1 ); η s C s Q η ( δ ), R θ [C( f,δ 1,δ )] C[ Q( f)] 1/s, where R θ (g) = R θ (g) R θ (g θ ) an R θ(g) = E[l θ (g(x),y)]. Theorem 0 can be prove in the same fashion as Theorems 1 an 9 an is therefore omitte for brevity.. 13

14 YUAN AND WEGKAMP 6. Proofs Proof of Theorem 1. We first show the if part. Recall that Q( f) = E[φ(Y f(x))] With slight abuse of notation, write = E(E[φ(Y f(x)) X]) = E[η(X)φ( f(x))+(1 η(x))φ( f(x))]. Q η(x) ( f(x)) = η(x)φ( f(x))+(1 η(x))φ( f(x)). Then f φ (X) minimizes Q η(x)( ). We now procee by separately consiering three ifferent scenarios: (a) η(x) < ; (b) η(x) > 1 ; an (c) < η(x) < 1. For brevity, we shall abbreviate the epenence of η an f φ on X in the reminer of the proof when no confusion occurs. First consier the case when η <. Recall that φ ( δ) < φ (δ) < 0, an Therefore, By the convexity of φ, for any z > 0, φ (δ) φ ( δ)+φ (δ) =. ηφ ( δ) (1 η)φ (δ) > 0. φ(z δ) φ( δ) φ( z+δ) φ(δ) φ ( δ)z; φ (δ)z. Hence Q η (z δ) Q η ( δ) [ ηφ ( δ) (1 η)φ (δ) ] z > 0, which implies that f φ δ. It now suffices to show that f φ δ. By the efinition of φ ( δ) an φ (δ), for any ε > 0, there exists a ζ > 0 such that for any 0 < z < ζ, Therefore for any 0 < z < ζ, φ( z δ) φ( δ) z φ(z+δ) φ(δ) z φ ( δ) ε; φ (δ)+ε. Q η ( z δ) Q η ( δ) = η[φ( z δ) φ( δ)]+(1 η)[φ(z+δ) φ(δ)] η [ φ ( δ) ε ] z+(1 η) [ φ (δ)+ε ] z = ([ (1 η)φ (δ) ηφ ( δ) ] + ε ) z. Recall that (1 η)φ (δ) ηφ ( δ)<0. By setting ε small enough, we can ensure that (1 η)φ (δ) ηφ ( δ)+ε remains negative. Hence Q η ( z δ) < Q η ( δ), 14

15 CLASSIFICATION WITH REJECTION OPTION which implies that fφ δ. Now consier the case when η > 1. Observe that Q η (z) = Q 1 η ( z). From the previous iscussion, f φ = argmin z Q η (z) = argminq 1 η ( z) > δ. z At last, consier the case when < η < 1. Observe that in this case, Hence for any z > 0, ηφ (δ) (1 η)φ ( δ) > 0; ηφ ( δ) (1 η)φ (δ) < 0. Q η (z+δ) Q η (δ) [ ηφ (δ) (1 η)φ ( δ) ] z > 0, which implies that f φ δ. Similarly, Q η ( z δ) Q η ( δ) [ ηφ ( δ)+(1 η)φ (δ) ] z > 0, which implies that fφ δ. In summary, f φ [ δ,δ]. We now consier the only if part. Let [a,b ] an [a +,b + ] be the subifferential of φ at δ an δ respectively. We nee to show that a = b, a + = b + an a + /(a + + a ) =. We begin by showing that b + 0. Assume the contrary. The infinite sample consistency implies that for any η > 1, fφ > δ. Because b + > 0, we have φ( fφ ) > φ(δ). Together with the fact that Q η ( fφ ) < Q η(δ), this implies that φ( fφ ) < φ( δ). Subsequently, we have a > 0. The convexity of φ also suggests that a a + b b +. Because we have φ( f φ) φ(δ) b + ( f φ δ); φ( δ) φ( f φ) a ( f φ δ), Q η ( f φ) Q η (δ) (ηb + (1 η)a )( f φ δ) > 0. This contraiction suggests that b + 0. Given that a a + b b + 0, we have a a + b b +, which implies that It suffices to show that b + a + b + b + a + b + a + b + a +. an a + b + a +. Assume the contrary. First consier the case when b + /(a + b + ) <. Let η be such that b + /(a + b + ) < η <. By efinition, for any f < δ, φ( f) φ( δ) a ( f + δ); φ( f) φ(δ) b + ( f δ). Hence Q η ( f) Q η ( δ) [ηa (1 η)b + ]( f + δ) > 0, 15

16 YUAN AND WEGKAMP which implies that argminq η (z) δ. This contraicts with the infinite sample consistency. Therefore, b + /(a + b + ). Next we eal with the case of a + /(b + a + ) >. Let η be such that a + /(b + a + ) > η >. Following a similar argument as before, one can show that Q 1 η ( f) Q 1 η (δ) > 0 for any f > δ, which implies that argminq η (z) δ. This again contraicts infinite sample consistency because 1 η < 1. Therefore, a + /(b + a + ). The proof is now conclue. Proof of Theorem 9. Recall that Q η ( f) = ηφ( f)+(1 η)φ( f). Similarly, write R η [C( f,δ)] = ηl(c( f,δ),1)+(1 η)l(c( f,δ), 1). Also write Q η ( f) = Q η ( f) infq η ( f) an R η ( f) = R η ( f) infr η ( f). It suffices to show that The theorem can be euce from (10) by Jensen s inequality: R η [C( f,δ)] C[ Q η ( f)] 1/s. (10) R[C( f,δ)] = E [ R η(x) [C( f(x),δ)] ] CE [ Q η(x) ( f(x)) ] 1/s C ( E [ Q η(x) ( f(x)) ]) 1/s = C[ Q( f)] 1/s. To show (10), we consier separately the ifferent combinations of values of η an f. For brevity, we shall abbreviate their epenence on X in what follows. Case 1. η < an f < δ. As shown before, in this case fφ (X) < δ. Thus, R η [C( f,δ)] = 0 C[ Q η ( f)] 1/s. Case. η < an f < δ. Observe that Q η ( f) Q η ( δ) [ ηφ ( δ) (1 η)φ (δ) ] ( f + δ) = φ (δ) ( η)( f + δ) 0. Together with the fact that C s Q η ( δ) η s, we have Q η ( f) Q η ( δ) C s η s. Note that We have R η [C( f,δ)] = η. R η [C( f,δ)] C[ Q η ( f)] 1/s. 16

17 CLASSIFICATION WITH REJECTION OPTION Case 3. η < an f > δ. Observe that Q η ( f) Q η (δ) [ ηφ (δ) (1 η)φ ( δ) ] ( f δ) = φ (δ) (1 η )( f δ) 0. Together with the facts that < 1/ an C s Q η (δ) 1 η s, we have Q η ( f) Q η (δ) C s 1 η s (C) s 1 η s. Note that Therefore, R η [C( f,δ)] = 1 η. R η [C( f,δ)] C[ Q η ( f)] 1/s. Case 4. < η < 1 an f < δ. Following a similar argument as before, Therefore, Q η ( f) Q η ( δ) φ (δ) ( η)( f + δ) 0. Q η ( f) Q η ( δ) C s η s, which, together with the fact that R η [C( f,δ)] = η, implies that Case 5. < η < 1 an f < δ. In this case, Case 6. < η < 1 an f > δ. Observe that Hence R η [C( f,δ)] C[ Q η ( f)] 1/s. R η [C( f,δ)] = 0 C[ Q η ( f)] 1/s. Q η ( f) Q η (δ) φ (δ) (1 η )( f δ) 0. Q η ( f) Q η (δ) C s 1 η s, which, together with the fact that R η [C( f,δ)] = 1 η, implies that R η [C( f,δ)] C[ Q η ( f)] 1/s. Case 7. η > 1. Observe that R η [C( f,δ)] = R 1 η [C( f,δ)], an Q η ( f) = Q 1 η ( f). Because 1 η <, from Cases 1, an 3, we have The proof is therefore complete. R η [C( f,δ)] = R 1 η [C( f,δ)] C[ Q 1 η ( f)] 1/s = C[ Q η ( f)] 1/s. 17

18 YUAN AND WEGKAMP Proof of Theorem 10. The proof follows from the same argument as that of Theorem 9. The only ifference takes place in Case 3 where uner the current assumptions R η ( f) = 1 η C[ Q η (δ)] 1/s. Proof of Theorem 11. The last part of the proof is base on the proof of Theorem 3 in Bartlett, Joran an McAuliffe (006). Let g = C( f,δ) be the classification rule with reject option base on f :X R an set g = C( fφ,δ). We have shown above that uner the assumptions of Theorem 9, η 1{g g }(1{g = 1}+1{g = 1}) C[ Q η ( f)] 1/s 1 η 1{g g }(1{g = 1}+1{g = 1}) C[ Q η ( f)] 1/s. Moreover, Lemma 1 in Herbei an Wegkamp (006) states that R(g) = E[ η(x) 1{g(X) g (X)}(1{g(X) = 1}+1{g (X) = 1})] (11) Hence, for any ε > 0, R(g) +E[ 1 η(x) 1{g(X) g (X)}(1{g(X) = 1}+1{g (X) = 1})]. = E[ η(x) 1{ η(x) ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] +E[ η(x) 1{ η(x) > ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] +E[ 1 η(x) 1{ 1 η(x) ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] +E[ 1 η(x) 1{ 1 η(x) > ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] εp{g (X) g(x)}+ε 1 s Q( f) where we use (11) an the inequality x 1{ x ε} x r ε 1 r for r 1. Using the boun [ ] β P{g(X) g (X)} (8A) 1/α R(g) from the proof of Lemma 4 of Herbei an Wegkamp (006), an choosing ε = c[ R(g)] 1 β with c = [(8A) 1/α ] β /4 reaily gives the esire claim with K = 4Cc 1 s. Proof of Theorem 18. φ(y f(x)). Since we have Recall that f F minimizes Q( f) over f F. Let h(y f(x)) = φ(y f(x)) Q( f)+q( f) ( ) f + f Q + ce [ ( f f) (X) ] Q( f)+ce [ ( f f) (X) ], E [ h (Y f(x)) ] L E [ ( f f) (X) ] L c {Q( f) Q( f)} = L E[h(Y f(x))], c 18

19 CLASSIFICATION WITH REJECTION OPTION see, for example, Bartlett, Joran an McAuliffe (006). Since f n minimizes Q n ( f), we have Q( f n ) Q( f) = Ph(y f n (x)) = P n h(y f n (X))+(P P n )h(y f n (X)) P n h(y f(x))+(p P n )h(y f n (X)) sup(p P n )h(y f(x)) f F where Ph(Y f(x)) = E[h(Y f(x))] an P n h(y f(x)) = (1/n) n i=1 h(y i f(x i )) for any f F. Next we observe that sup(p P n )h(y f(x)) 3L f F n + max (P P n )h(y f(x)) f F n wheref n is the minimal 1/n-net off. By Bernstein s inequality, we get { } [ n{t +Ph(Y f(x))} ] /8 P sup(p P n )h(y f(x)) t N n exp f F n Ph (Y f(x))+(lb){t +Ph(Y f(x))}/6 [ N n exp nt ( L 8 c + LB ) 1 ] 3 an the conclusion follows easily. Acknowlegments The authors gratefully acknowlege the fact that part of the research was one while the authors were visiting the Isaac Newton Institute for Mathematical Sciences (Statistical Theory an Methos for Complex, High-Dimensional Data Programme) at Cambrige University uring Spring 008. The research of Ming Yuan was supporte in part by NSF grants DMS-MPSA an DMS The research of Marten Wegkamp was supporte in part by NSF Grant DMS References P.L. Bartlett, M.I. Joran an J. McAuliffe. Convexity, classification, an risk bouns. Journal of the American Statistical Association, 101: , 006. P.L. Bartlett an M.H. Wegkamp. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9: , 008. J. Frieman, T. Hastie an R. Tibshirani. Aitive logistic regression: a statistical view of boosting. Annals of Statistics, 8(): , 000. R. Herbei an M.H. Wegkamp. Classification with reject option. Canaian Journal of Statistics, 34(4):709-71, 006. Y. Lin. Support vector machines an the Bayes rule in classification. Machine Learning an Knowlege Discovery, 6:59-75,

20 YUAN AND WEGKAMP E. Mammen an A.B. Tsybakov. Smooth iscrimination analysis. Annals of Statistics, 7(6): , J.S. Marron, M. To an J. Ahn. Distance weighte iscrimination. Journal of the American Statistical Association, 10: , 007. B.D. Ripley. Pattern Recognition an Neural Networks. Cambrige University Press, Cambrige, A.B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 3: , 004. M.H. Wegkamp. Lasso type classifiers with a reject option. Electronic Journal of Statistics, 1: , 007. T. Zhang. Statistical behavior an consistency of classification methos base on convex risk minimization. Annals of Statistics, 3:56-134,

PETER L. BARTLETT AND MARTEN H. WEGKAMP

PETER L. BARTLETT AND MARTEN H. WEGKAMP CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost,

More information

Classification with Reject Option

Classification with Reject Option Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite

More information

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Acute sets in Euclidean spaces

Acute sets in Euclidean spaces Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Monotonicity for excited random walk in high dimensions

Monotonicity for excited random walk in high dimensions Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Math 1271 Solutions for Fall 2005 Final Exam

Math 1271 Solutions for Fall 2005 Final Exam Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

A Weak First Digit Law for a Class of Sequences

A Weak First Digit Law for a Class of Sequences International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of

More information

Statistical Properties of Large Margin Classifiers

Statistical Properties of Large Margin Classifiers Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

AdaBoost and other Large Margin Classifiers: Convexity in Classification

AdaBoost and other Large Margin Classifiers: Convexity in Classification AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Introduction to variational calculus: Lecture notes 1

Introduction to variational calculus: Lecture notes 1 October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

The effect of dissipation on solutions of the complex KdV equation

The effect of dissipation on solutions of the complex KdV equation Mathematics an Computers in Simulation 69 (25) 589 599 The effect of issipation on solutions of the complex KV equation Jiahong Wu a,, Juan-Ming Yuan a,b a Department of Mathematics, Oklahoma State University,

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

arxiv:math/ v1 [math.pr] 19 Apr 2001

arxiv:math/ v1 [math.pr] 19 Apr 2001 Conitional Expectation as Quantile Derivative arxiv:math/00490v math.pr 9 Apr 200 Dirk Tasche November 3, 2000 Abstract For a linear combination u j X j of ranom variables, we are intereste in the partial

More information

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an

More information

arxiv: v2 [math.pr] 27 Nov 2018

arxiv: v2 [math.pr] 27 Nov 2018 Range an spee of rotor wals on trees arxiv:15.57v [math.pr] 7 Nov 1 Wilfrie Huss an Ecaterina Sava-Huss November, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Two formulas for the Euler ϕ-function

Two formulas for the Euler ϕ-function Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,

More information

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling

Balancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January

More information

On the Cauchy Problem for Von Neumann-Landau Wave Equation

On the Cauchy Problem for Von Neumann-Landau Wave Equation Journal of Applie Mathematics an Physics 4 4-3 Publishe Online December 4 in SciRes http://wwwscirporg/journal/jamp http://xoiorg/436/jamp4343 On the Cauchy Problem for Von Neumann-anau Wave Equation Chuangye

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems

Construction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu

More information

Calculus of Variations

Calculus of Variations Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,

More information

Existence of equilibria in articulated bearings in presence of cavity

Existence of equilibria in articulated bearings in presence of cavity J. Math. Anal. Appl. 335 2007) 841 859 www.elsevier.com/locate/jmaa Existence of equilibria in articulate bearings in presence of cavity G. Buscaglia a, I. Ciuperca b, I. Hafii c,m.jai c, a Centro Atómico

More information

On colour-blind distinguishing colour pallets in regular graphs

On colour-blind distinguishing colour pallets in regular graphs J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Lie symmetry and Mei conservation law of continuum system

Lie symmetry and Mei conservation law of continuum system Chin. Phys. B Vol. 20, No. 2 20 020 Lie symmetry an Mei conservation law of continuum system Shi Shen-Yang an Fu Jing-Li Department of Physics, Zhejiang Sci-Tech University, Hangzhou 3008, China Receive

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device Close an Open Loop Optimal Control of Buffer an Energy of a Wireless Device V. S. Borkar School of Technology an Computer Science TIFR, umbai, Inia. borkar@tifr.res.in A. A. Kherani B. J. Prabhu INRIA

More information

arxiv: v1 [math.co] 15 Sep 2015

arxiv: v1 [math.co] 15 Sep 2015 Circular coloring of signe graphs Yingli Kang, Eckhar Steffen arxiv:1509.04488v1 [math.co] 15 Sep 015 Abstract Let k, ( k) be two positive integers. We generalize the well stuie notions of (k, )-colorings

More information

The Three-dimensional Schödinger Equation

The Three-dimensional Schödinger Equation The Three-imensional Schöinger Equation R. L. Herman November 7, 016 Schröinger Equation in Spherical Coorinates We seek to solve the Schröinger equation with spherical symmetry using the metho of separation

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Calibrated Surrogate Losses

Calibrated Surrogate Losses EECS 598: Statistical Learning Theory, Winter 2014 Topic 14 Calibrated Surrogate Losses Lecturer: Clayton Scott Scribe: Efrén Cruz Cortés Disclaimer: These notes have not been subjected to the usual scrutiny

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk Notes on Lie Groups, Lie algebras, an the Exponentiation Map Mitchell Faulk 1. Preliminaries. In these notes, we concern ourselves with special objects calle matrix Lie groups an their corresponing Lie

More information

1 Heisenberg Representation

1 Heisenberg Representation 1 Heisenberg Representation What we have been ealing with so far is calle the Schröinger representation. In this representation, operators are constants an all the time epenence is carrie by the states.

More information

LOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES

LOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES LOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES ÁRPÁD BÉNYI AND KASSO A. OKOUDJOU Abstract. By using tools of time-frequency analysis, we obtain some improve local well-poseness

More information

Unit #6 - Families of Functions, Taylor Polynomials, l Hopital s Rule

Unit #6 - Families of Functions, Taylor Polynomials, l Hopital s Rule Unit # - Families of Functions, Taylor Polynomials, l Hopital s Rule Some problems an solutions selecte or aapte from Hughes-Hallett Calculus. Critical Points. Consier the function f) = 54 +. b) a) Fin

More information

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

Number of wireless sensors needed to detect a wildfire

Number of wireless sensors needed to detect a wildfire Number of wireless sensors neee to etect a wilfire Pablo I. Fierens Instituto Tecnológico e Buenos Aires (ITBA) Physics an Mathematics Department Av. Maero 399, Buenos Aires, (C1106ACD) Argentina pfierens@itba.eu.ar

More information

New bounds on Simonyi s conjecture

New bounds on Simonyi s conjecture New bouns on Simonyi s conjecture Daniel Soltész soltesz@math.bme.hu Department of Computer Science an Information Theory, Buapest University of Technology an Economics arxiv:1510.07597v1 [math.co] 6 Oct

More information

On the enumeration of partitions with summands in arithmetic progression

On the enumeration of partitions with summands in arithmetic progression AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 8 (003), Pages 149 159 On the enumeration of partitions with summans in arithmetic progression M. A. Nyblom C. Evans Department of Mathematics an Statistics

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial

More information

Integration Review. May 11, 2013

Integration Review. May 11, 2013 Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In

More information

4.2 First Differentiation Rules; Leibniz Notation

4.2 First Differentiation Rules; Leibniz Notation .. FIRST DIFFERENTIATION RULES; LEIBNIZ NOTATION 307. First Differentiation Rules; Leibniz Notation In this section we erive rules which let us quickly compute the erivative function f (x) for any polynomial

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1

Assignment 1. g i (x 1,..., x n ) dx i = 0. i=1 Assignment 1 Golstein 1.4 The equations of motion for the rolling isk are special cases of general linear ifferential equations of constraint of the form g i (x 1,..., x n x i = 0. i=1 A constraint conition

More information

Fall 2016: Calculus I Final

Fall 2016: Calculus I Final Answer the questions in the spaces provie on the question sheets. If you run out of room for an answer, continue on the back of the page. NO calculators or other electronic evices, books or notes are allowe

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

A. Incorrect! The letter t does not appear in the expression of the given integral

A. Incorrect! The letter t does not appear in the expression of the given integral AP Physics C - Problem Drill 1: The Funamental Theorem of Calculus Question No. 1 of 1 Instruction: (1) Rea the problem statement an answer choices carefully () Work the problems on paper as neee (3) Question

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Students need encouragement. So if a student gets an answer right, tell them it was a lucky guess. That way, they develop a good, lucky feeling.

Students need encouragement. So if a student gets an answer right, tell them it was a lucky guess. That way, they develop a good, lucky feeling. Chapter 8 Analytic Functions Stuents nee encouragement. So if a stuent gets an answer right, tell them it was a lucky guess. That way, they evelop a goo, lucky feeling. 1 8.1 Complex Derivatives -Jack

More information

Energy behaviour of the Boris method for charged-particle dynamics

Energy behaviour of the Boris method for charged-particle dynamics Version of 25 April 218 Energy behaviour of the Boris metho for charge-particle ynamics Ernst Hairer 1, Christian Lubich 2 Abstract The Boris algorithm is a wiely use numerical integrator for the motion

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

A FURTHER REFINEMENT OF MORDELL S BOUND ON EXPONENTIAL SUMS

A FURTHER REFINEMENT OF MORDELL S BOUND ON EXPONENTIAL SUMS A FURTHER REFINEMENT OF MORDELL S BOUND ON EXPONENTIAL SUMS TODD COCHRANE, JEREMY COFFELT, AND CHRISTOPHER PINNER 1. Introuction For a prime p, integer Laurent polynomial (1.1) f(x) = a 1 x k 1 + + a r

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Exam 2 Review Solutions

Exam 2 Review Solutions Exam Review Solutions 1. True or False, an explain: (a) There exists a function f with continuous secon partial erivatives such that f x (x, y) = x + y f y = x y False. If the function has continuous secon

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information