Classification Methods with Reject Option Based on Convex Risk Minimization
|
|
- Bartholomew Bishop
- 6 years ago
- Views:
Transcription
1 Journal of Machine Learning Research 11 (010) Submitte /09; Revise 11/09; Publishe 1/10 Classification Methos with Reject Option Base on Convex Risk Minimization Ming Yuan H. Milton Stewart School of Inustrial an Systems Engineering Georgia Institute of Technology Atlanta, GA , USA Marten Wegkamp Department of Statistics Floria State University Tallahassee, FL 3306, USA Eitor: Bin Yu Abstract In this paper, we investigate the problem of binary classification with a reject option in which one can withhol the ecision of classifying an observation at a cost lower than that of misclassification. Since the natural loss function is non-convex so that empirical risk minimization easily becomes infeasible, the paper proposes minimizing convex risks base on surrogate convex loss functions. A necessary an sufficient conition for infinite sample consistency (both risks share the same minimizer) is provie. Moreover, we show that the excess risk can be boune through the excess surrogate risk uner appropriate conitions. These bouns can be tightene by a generalize margin conition. The impact of the results is illustrate on several commonly use surrogate loss functions. Keywors: classification, convex surrogate loss, empirical risk minimization, generalize margin conition, reject option 1. Introuction In binary classification, one observes inepenent realizations (X 1,Y 1 ),...,(X n,y n ) of the ranom pair (X,Y) where X X an Y Y = { 1,1}. The goal is to learn from these training ata a classification rule g :X Y that classifies an observation X into the two classes. It is recognize that in many applications the consequences of misclassification can be substantial. In such situations, a less specific response that reserves the right of not making a ecision, sometimes referre to as a reject option (see, e.g., Herbei an Wegkamp, 006), may even be more preferable than risking misclassification. This, for example, is typical in meical stuies where screening of a certain isease can be one base on relatively inexpensive clinical measures. If the classification base on these measurements are satisfactory, nothing further nees to be one. But in the event that there are ambiguities, it woul be more esirable to take a rejection option an seek more expensive stuies to ientify a subject s isease status. Similar approaches are often aopte in DNA sequencing or genotyping applications, where the rejection option is commonly referre to as a no-call. Similar problems have attracte much attention in various application fiels an also receive increasing c 010 Ming Yuan an Marten Wegkamp.
2 YUAN AND WEGKAMP amount of interest more recently in machine learning literature. See Ripley (1996) an Bartlett an Wegkamp (008) an references therein. To accommoate the reject option, we now seek a classification rule g :X Ỹ where Ỹ = { 1,1,0} is an augmente response space an g(x) = 0 inicates that no efinitive classification will be mae for X or a reject option is taken. To measure the performance of a classification rule, we employ the following loss function that generalizes the usual 0-1 loss to account for reject option: 1 if g(x) Y an g(x) 0 l[g(x),y] = if g(x) = 0. 0 if g(x) = Y In other wors, an ambiguous response (g(x) = 0) incurs a loss of whereas misclassification incurs a loss of 1. Note that is necessarily smaller than 1/. Otherwise, rather than taking a rejection option with a loss, we can always flip a fair coin to ranomly assign ±1 as the value of g(x), which incurs an average loss of 1/. For this reason, we shall assume that < 1/ in what follows. For any classification rule g :X Ỹ, the risk function is then given by R(g) = E(l[g(X),Y]) where the expectation is taken over the joint istribution of X an Y. It is not har to show that the optimal classification rule g := argminr(g) is given by (see, e.g., Bartlett an Wegkamp, 008) 1 if η(x) > 1 g (X) = 0 if η(x) 1 1 if η(x) < where η(x) = P(Y = 1 X). The corresponing risk is R := infr(g) = R(g ) = E(min{η(X),1 η(x),}). Thus, the performance of any classification rule g : X Ỹ can be measure by the excess risk R(g) := R(g) R. Appealing to the general empirical risk minimization strategy, one coul attempt to erive a classification rule from the training ata by minimizing the empirical risk R n (g) = 1 n n i=1 l[g(x i ),Y i ]. Similar to the usual 0-1 loss, however, l is not convex in g; an irect minimization of R n is typically an NP-har problem. A common remey is to consier a surrogate convex loss function. To this en, let φ : R R be a convex function. Denote by Q( f) = E[φ(Y f(x))] the corresponing risk for a iscriminant function f :X R. Let ˆf n be the minimizer of Q n ( f) = 1 n n i=1 11 φ(y i f(x i ))
3 CLASSIFICATION WITH REJECTION OPTION over a certain functional space F consisting of functions that map from X to R. ˆf n can be conveniently converte to a classification rule C( ˆf n ;δ) as follows: 1 if f(x) > δ C( f(x);δ) = 0 if f(x) δ 1 if f(x) < δ where δ > 0 is a parameter that as we shall see plays a critical role in etermining the performance of C( ˆf n,δ). In this paper, we investigate the statistical properties of this general convex risk minimization technique. To what extent C( ˆf n,δ) mimics the optimal classification rule g plays a critical role in the success of this technique. Let fφ be the minimizer of Q( f). We shall assume throughout the paper that fφ is uniquely efine. Typically f φ reflects the limiting behavior of ˆf n when F is rich enough an there are infinitely many training ata. Therefore the first question is whether or not fφ can be use to recover the optimal rule g. A surrogate loss function φ that satisfies this property is often calle infinite sample consistent (see, e.g., Zhang, 004) or classification calibrate (see, e.g., Bartlett, Joran an McAuliffe, 006). A secon question further concerns the relationship between the excess risk R[C( f,δ)] an the excess φ risk Q( f) = Q( f) infq( f): Can we fin an increasing function ρ : R R such that for all f, R[C( f,δ)] ρ( Q( f))? (1) Clearly the infinite sample consistency of φ implies that ρ(0) = 0. Such a boun on the excess risk provies useful tools in bouning the excess risk of ˆf n. In particular, (1) inicates that R[C( ˆf n,δ)] ρ ( Q( ˆf n ) ) = ρ [ Q( f)+ ( Q( ˆf n ) Q( f) )], where f = argmin f F Q( f). The first term Q( f) on the right-han sie exhibits the approximation error of functional classf whereas the secon term Q( ˆf n ) Q( f) is the estimation error. In the case when there is no reject option, these problems have been well investigate in recent years (Lin, 00; Zhang, 004; Bartlett, Joran an McAuliffe, 006). In this paper, we establish similar results when there is a reject option. The most significant ifference between the two situations, with or without the reject option, is the role of δ. As we shall see, for some loss functions such as least squares, exponential or logistic, a goo choice of δ yiels classifiers that are infinite sample consistent. For other loss functions, however, such as the hinge loss, no matter how δ is chosen, the classification rule C( f,δ) cannot be infinite sample consistent. The remainer of the paper is organize as follows. We first examine in Section the infinite sample consistency for classification with reject option. After establishing a general result, we consier its implication on several commonly use loss functions. In Section 3, we establish bouns on the excess risk in the form of (1), followe by applications to the popular loss functions. We also show that uner an aitional assumption on the behavior of η(x) near an 1 as in Herbei an Wegkamp (006), generalizing the conition in the case of = 1/ of Mammen an Tsybakov (1999) an Tsybakov (004), the boun (1) can be tightene consierably. Section 4 iscusses rates of convergence of the empirical risk minimizer f n that minimizes the empirical risk Q n ( f) over a boune class F. Section 5 consiers extension to asymmetric loss where one type of misclassification may be more costly than the other. All proofs are relegate to Section
4 YUAN AND WEGKAMP. Infinite Sample Consistency We first give a general result on the infinite sample consistency of the classification rule C( f φ,δ). Theorem 1 Assume that φ is convex. Then the classification rule C( fφ,δ) for some δ > 0 is infinite sample consistent, that is, C( fφ,δ) = g if an only if φ (δ) an φ ( δ) both exist, φ (δ) < 0, an φ (δ) φ (δ)+φ =. () ( δ) When there is no reject option, it is known that the necessary an sufficient conition for the infinite sample consistency is that φ is ifferentiable at 0 an φ (0) < 0 (see, e.g., Bartlett, Joran an McAuliffe, 006). As inicate by Theorem 1, the ifferentiability of φ at ±δ plays a more prominent role in the general case when there is a reject option. From Theorem 1 it is also evient that the infinite sample consistency epens on both φ an the choice of thresholing parameter δ. Observe that for any δ 1 < δ, φ ( δ ) φ ( δ 1 ) φ (δ 1 ) φ (δ ), which implies that the left-han sie of () is a ecreasing function of δ. If φ is strictly convex, then it is strictly ecreasing; an therefore there is at most one value of δ that satisfies (). In other wors, for strictly convex φ, there is at most one thresholing parameter δ such that C( f φ,δ) = g. On the other han, if φ is twice ifferentiable such that φ (0) < 0 an φ (z) 0 as z +, then for any < 1/, there always exists a δ > 0 such that () hols. This is because the left-han sie of () is a ecreasing function of δ, which approaches its supremum 1/ when δ 0 an 0 when δ increases. Moreover, the twice ifferentiability of φ ensures that the left-han sie of () is also a continuous function of δ. The following is therefore a irect consequence of Theorem 1: Corollary If φ is strictly convex, then either there is a unique δ > 0 such that C( fφ,δ) is infinite sample consistent; or C( fφ,δ) is not infinite sample consistent for any δ > 0. In aition to convexity, if φ is twice ifferentiable such that φ (0) < 0 an φ (z) 0 as z +, then there always exists a δ > 0 such that C( fφ,δ) is infinite sample consistent. Theorem 1 provies a general guieline on how to choose δ for common choices of convex losses. Below we look at several concrete examples..1 Least Squares Loss We first examine the least squares loss φ(z) = (1 z). Observe that φ (δ) φ ( δ)+φ (δ) = 1 δ. All conitions of Theorem 1 are met if an only if δ = 1. Corollary 3 For the least squares loss, C( f φ,1 ) = g. 114
5 CLASSIFICATION WITH REJECTION OPTION. Exponential Loss Exponential loss, φ(z) = exp( z), is connecte with boosting (Frieman, Hastie an Tibshirani, 000). Because φ (δ) φ ( δ)+φ (δ) = 1 1+exp(δ), Therefore all conitions of Theorem 1 are met if an only if δ = 1 log ( 1 1 ). Corollary 4 For the exponential loss, ( C fφ, 1 ( )) 1 log 1 = g..3 Logistic Loss Logisitic regression employs loss φ(z) = ln(1+exp( z)). Similar to before, φ (δ) φ ( δ)+φ (δ) = 1 1+exp(δ), which suggests that all conitions of Theorem 1 are met if ( ) 1 δ = log 1. Corollary 5 For the logistic loss, ( ( )) 1 C fφ,log 1 = g..4 Square Hinge Loss Square hinge loss, φ(z) = (1 z) +, is another popular choice for which φ (δ) φ ( δ)+φ (δ) = 1 δ. Similar to the least squares loss, we have the following corollary. Corollary 6 For the square hinge loss, C ( f φ,1 ) = g. 115
6 YUAN AND WEGKAMP.5 Distance Weighte Discrimination Marron, To an Ahn (007) recently introuce the so-calle istance weighte iscrimination metho where the following loss function (see, e.g., Bartlett, Joran an McAuliffe, 006) is use { 1 z if z γ φ(z) = ( ) 1 γ z γ if z < γ, (3) where γ > 0 is a constant. It is not har to see that φ is convex. Moreover, { φ 1/z if z γ (z) = 1/γ if z < γ. Thus, { φ (δ) 1/ if δ < γ φ ( δ)+φ (δ) = 1/δ if δ > γ. 1/δ +1/γ In other wors, we have the following result for the istance weighte iscrimination loss. Corollary 7 For the loss (3),.6 Hinge Loss ( ) C fφ,[(1 )/] 1/ γ = g. The popular support vector machine employs the hinge loss, φ(z) = (1 z) +. The hinge loss is ifferentiable everywhere except 1. Therefore φ { (δ) 1 φ ( δ)+φ (δ) = if 0 < δ < 1. 0 if δ > 1 Because 0 < < 1/, there oes not exist a δ such that all conitions of Theorem 1 are met. As a matter of fact, for any δ > 0, C( fφ,δ) g. Motivate by this observation, Bartlett an Wegkamp (008) introuce the following moification to the hinge loss: φ(z) = where a > 1. Note that with this moification, φ (δ) φ ( δ)+φ (δ) = Therefore, we have the following corollary. 1 az if z 0 1 z if 0 < z 1 0 if z > 1 { 1/(a+1) if 0 < δ < 1 0 if δ > 1 Corollary 8 For the moifie hinge loss (4) an any δ < 1, if a = (1 )/, then C ( f φ,δ ) = g., (4) It is interesting to note that for the examples we consiere previously, a specific choice of δ is neee to ensure the infinite sample consistent. Whereas for the moifie hinge loss, a range of choice of δ can serve the same purpose. However, as we shall see in the next section, ifferent choices of δ for the moifie hinge loss may result in slightly ifferent boun on the excess risk with δ = 1/ appearing to be more preferable in that it yiels the smallest upper boun of the excess risk.. 116
7 CLASSIFICATION WITH REJECTION OPTION 3. Excess Risk We now turn to the excess risk R[C( f,δ)] an show how it can be boune through the excess φ risk Q( f) := Q( f) Q( f φ). Recall that the infinite sample consistency establishe in the previous section means that Q( f) = 0 implies throughout this section that R(C( f,δ)) = 0. For brevity, we shall assume implicitly that δ is chosen in accorance with Theorem 1 to ensure infinite sample consistency. Write By efinition, Denote Q η(x) (z) = η(x)φ(z)+(1 η(x))φ( z). Q η(x) ( f φ(x)) = inf z Q η(x)(z). Q η ( f) = Q η ( f) Q η ( f φ) where we suppress the epenence of η, f an fφ on X for brevity. Theorem 9 Assume that φ is convex, φ (δ) an φ ( δ) both exist, φ (δ) < 0, an () hols. In aition, suppose that there exist constants C > 0 an s 1 such that η s C s Q η ( δ); (1 η) s C s Q η (δ). Then R[C( f,δ)] C[ Q( f)] 1/s. (5) It is immeiate from Theorem 9 that Q( ˆf n ) p 0 implies R( ˆf n ) p 0. In other wors, consistency in terms of φ risk implies the consistency in terms of loss l. It is worth noting that the constant in the upper boun can be tightene uner stronger conitions. Theorem 10 In aition to the assumptions of Theorem 9, assume that (η 1) s + C s Q η ( δ); (1 η) s + C s Q η (δ). Then R[C( f,δ)] C[ Q( f)] 1/s. We can improve the bouns even further by the following margin conition. Assume that for some α 0 an A 1 P{ η(x) z t} At α (6) for all 0 t < at z = an z = 1. This assumption was introuce in Herbei an Wegkamp (006) an generalizes the margin conition of Mammen an Tsybakov (1999) an Tsybakov (004). It is always met for α = 0 an A = 1. The other extreme is for α + - the case where η(x) stays away from an 1 with probability one. 117
8 YUAN AND WEGKAMP Theorem 11 In aition to the assumptions of Theorem 9, assume that (6) hols for some α 0 an A 1. Then, for some K epening on A an α, where β = α/(1+α). R[C( f,δ)] K[ Q( f)] 1/(s+β βs), (7) In case α = 0, the exponent 1/(s+β βs) is 1/s on the right han sie in (7) above, an the situation is as in Theorem 9. For α +, the boun (7) improves upon the one in Theorem 9 as the exponent 1/(s + β βs) converges to 1. We now examine the consequences of Theorems 9, 10 an 11 on several common loss functions. 3.1 Least Squares Note that for the least squares loss Simple algebraic manipulations show that Therefore, by Theorems 9 an 11, Corollary 1 For the least squares loss, Q η ( f) = (η 1 f). Q η ( δ) = 4 η ; Q η (δ) = 4 (1 η). R[C( f,1 )] [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then for some constant K > Exponential Loss R[C( f,1 )] K[ Q( f)] 1+α +α An application of Taylor expansion yiels (see, e.g., Zhang, 004) Therefore, Therefore, by Theorems 9 an 11, ( ) 1 Q η ( f) η. 1+exp( f) Q η ( δ) η ; Q η (δ) (1 η). 118
9 CLASSIFICATION WITH REJECTION OPTION Corollary 13 For the exponential loss, [ ( R C f, 1 ( ))] 1 log 1 [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then [ ( R C f, 1 ( ))] 1 log 1 K[ Q( f)] +α 1+α for some constant K > Logistic Loss Similar to exponential loss, an application of Taylor expansion yiels ( ) 1 Q η ( f) η. 1+exp( f) Therefore, Therefore, by Theorems 9 an 11, Corollary 14 For the logistic loss, [ R C Q η ( δ) η ; Q η (δ) (1 η). ( f,log ( ))] 1 1 [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then [ ( ( ))] 1 R C f,log 1 K[ Q( f)] 1+α +α for some constant K > Square Hinge Loss Simple algebraic erivation shows Therefore, By Theorems 9 an 11, Corollary 15 For the square hinge loss, Q η ( f) = (η 1 f) η( f 1) + (1 η)( f + 1). Q η ( δ) = 4 η ; Q η (δ) = 4 (1 η). R[C( f,1 )] [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then for some constant K > 0. R[C( f,1 )] K[ Q( f)] 1+α +α 119
10 YUAN AND WEGKAMP 3.5 Distance Weighte Discrimination Observe that Hence Q η (z) = infq η (z) = γ η z + (1 η)z γ γ + z + (1 η) γ (1 η) γ η γ ηz 1 η γ z if z γ if z < γ if z γ ( η(1 η)+min{η,1 η} ). an (η/(1 η)) 1/ γ if η > 1/ fφ = any value in [ γ,γ] if η = 1/ ((1 η)/η) 1/ γ if η < 1/ Recall that δ = ((1 )/) 1/ γ. Then ( η (1 η)δ Q η (δ) + δ γ ) η(1 η)/γ ( (η ) ( ) ) 1/ (1 η)δ 1/ = δ γ ( ( = ηδ ) 1/ ( ) ) 1 η 1/ γ 1 η. Observe that Thus, Similarly, ( ( = ηδ γ = δ γ (1 ) From Theorems 9 an 11, we conclue that ) 1/ ( ) ) 1 η 1/ ( + 1 η 1 1 η ) η [ ( ) ] 1/ η 1/ +(1 η) 1/ (1 η ). 1 ( ) 1/ η 1/ +(1 η) 1/ (1 ) 1/. 1 (1 η ) γ(1 ) 1/ 1/ Q η (δ). (η ) γ(1 ) 1/ 1/ Q η ( δ). Corollary 16 For the istance weighte iscrimination loss, [ ] R C( f,((1 )/) 1/ γ) γ 1/ (1 ) 1/4 1/4 [ Q( f)] 1/. Furthermore, if the margin conition (6) hols, then [ ] R C( f,((1 )/) 1/ γ) K[ Q( f)] +α 1+α for some constant K > 0. 10
11 CLASSIFICATION WITH REJECTION OPTION 3.6 Hinge Loss with Rejection Option As shown by Bartlett an Wegkamp (008), for the moifie hinge loss (4), 1 if η arg minq η (z) = 0 if < η < 1. z 1 if η > 1 Simple algebraic manipulations lea to (1 δ)( η)/ if η Q η ( δ) = (η )δ/ if < η < 1 1 (1 η)/ +(η )δ/ if η > 1, an Therefore, Furthermore, Q η (δ) = 1 η/ +(1 η )δ/ if η (1 η )δ/ if < η < 1 (δ 1)(1 η )/ if η > 1 min{δ,1 δ} η Q η ( δ); min{δ,1 δ} (1 η) Q η (δ).. From Theorems 10 an 11, we conclue that min{δ,1 δ} (η 1) + Q η ( δ); min{δ,1 δ} (1 η) + Q η (δ). Corollary 17 For the moifie hinge loss an any δ < 1, R[C( f,δ)] Furthermore, if the margin conition (6) hols, then for some constant K > 0. Q( f). (8) min{δ,1 δ} R[C( f,δ)] K Q( f) (9) Notice that the corollary also suggests that δ = 1/ yiels the best constant in the upper boun. A similar result has also been recently establishe by Bartlett an Wegkamp (008). It is also interesting to see that (8) cannot be further improve by the generalize margin conition (6) as the bouns (8) an (9) only iffer by a constant. 11
12 YUAN AND WEGKAMP 4. Rates of Convergence for Empirical Risk Minimizers In this section we briefly review the possible rates of convergence for minimizers of the empirical risk Q n ( f) = (1/n) n i=1 φ(y i f(x i )) over a convex class of iscriminant functions F ; an show the implications of the excess risk bouns obtaine in the previous section. The analysis of the generalize hinge loss is complicate an is treate in etail in Wegkamp (007) an Bartlett an Wegkamp (008). The other loss functions φ consiere in this paper have in common that the moulus of convexity of Q, { ( ) } Q( f)+q(g) f + g δ(ε) = inf Q : E[( f g) (X)] ε satisfies δ(ε) cε for some c > 0 an that, for some L <, φ(x) φ(x ) L x x for all x,x R. We have the following result that imposes a restriction on the 1/n-covering number N n = N(1/n,L,F ), the carinality of the set of close balls with raius 1/n in L neee to coverf. Theorem 18 Assume that f B for all f F an let 0 < γ < 1. With probability at least 1 γ, Q( ˆf n ) inf Q( f)+ 3L ( L f F n + 8 c + B ) log(nn /γ) 6 n Together with the excess risk bouns from Theorems 9 an 11, we have Corollary 19 Uner the assumptions of Theorems 9 an 18, we have, with probability at least 1 γ, { R(C( ˆf n,δ)) C inf Q( f)+ 3L ( L f F n + 8 c + LB ) } 1/s log(nn /γ). 3 n Furthermore, if the generalize margin conition (6) hols, then with probability at least 1 γ, { R(C( ˆf n,δ)) K inf Q( f)+ 3L ( L f F n + 8 c + LB ) } 1/(s+β βs) log(nn /γ) 3 n for some constant K > 0. In the special case wheref consists of linear combinations f λ (x) = M λ j f j (x) j=1 of simple iscriminant functions (ecision stumps) f 1,..., f M with M j=1 λ j B an f j 1, we obtain the rate (M logn/n) 1/(s+β βs). We can view B as a tuning parameter here, an if the functions f j are near orthogonal in the sense that max 1 i j M E[ f i (X) f j (X)] E[ fi (X)]E[ f j (X)] for some small c > 0, a small moification of Theorem 1 in Wegkamp (007) shows that we also aapt to the unknown sparsity of the minimizer λ 0 of Q(f λ ) over λ in that the rate becomes ( λ 0 0 logn/n) 1/(s+β βs) for suitably chosen B = B(n). c λ 0 0 1
13 CLASSIFICATION WITH REJECTION OPTION 5. Asymmetric Loss We have focuse thus far on the case where misclassifying from one class to the other, either g(x) = 1 while Y = 1 or g(x) = 1 while Y = 1, is assigne the same loss. In many applications, however, one type of misclassification may incur a heavier loss than the other. Such situations naturally arise in risk management or meical iagonsis. To this en, the following loss function can be aopte in place of l: 1 if g(x) = 1 an Y = 1 θ l θ [g(x),y] = if g(x) = 1 an Y = 1 if g(x) = 0 0 if g(x) = Y We shall assume that θ < 1 without loss of generality. It can be shown that the rejection option is only available if < θ/(1+θ) (see, e.g., Herbei an Wegkamp, 006), which we shall assume throughout the section. When this hols, the corresponing Bayes rule is given by (see, e.g., Herbei an Wegkamp, 006) 1 if η(x) > 1 /θ g θ(x) = 0 if η(x) 1 /θ. 1 if η(x) < Instea of C( ˆf n,δ), an asymmetrically truncate classification rule, ˆf n, C( ˆf n ;δ 1,δ ), can be use for our purpose here where 1 if f(x) > δ 1 C( f(x);δ 1,δ ) = 0 if δ f(x) δ 1. 1 if f(x) < δ The behavior of the asymmetically truncate classification rule C( ˆf n ;δ 1,δ ) can be stuie in a similar fashion as before. In particular, we have the following results in parallel to Theorems 1 an 9. Theorem 0 Assume that φ is convex. Then C( fφ,δ 1,δ ) for some δ 1,δ > 0 is infinite sample consistent, that is, C( fφ,δ 1,δ ) = g θ if an only if φ (±δ 1 ) an φ (±δ ) exist; φ (δ 1 ),φ (δ ) < 0; an φ (δ 1 ) φ ( δ 1 )+φ = (δ 1 ) θ ; φ (δ ) φ ( δ )+φ =. (δ ) Furthermore, if C( f φ,δ 1,δ ) is infinite sample consistent an then θ(1 η) s C s Q η (δ 1 ); η s C s Q η ( δ ), R θ [C( f,δ 1,δ )] C[ Q( f)] 1/s, where R θ (g) = R θ (g) R θ (g θ ) an R θ(g) = E[l θ (g(x),y)]. Theorem 0 can be prove in the same fashion as Theorems 1 an 9 an is therefore omitte for brevity.. 13
14 YUAN AND WEGKAMP 6. Proofs Proof of Theorem 1. We first show the if part. Recall that Q( f) = E[φ(Y f(x))] With slight abuse of notation, write = E(E[φ(Y f(x)) X]) = E[η(X)φ( f(x))+(1 η(x))φ( f(x))]. Q η(x) ( f(x)) = η(x)φ( f(x))+(1 η(x))φ( f(x)). Then f φ (X) minimizes Q η(x)( ). We now procee by separately consiering three ifferent scenarios: (a) η(x) < ; (b) η(x) > 1 ; an (c) < η(x) < 1. For brevity, we shall abbreviate the epenence of η an f φ on X in the reminer of the proof when no confusion occurs. First consier the case when η <. Recall that φ ( δ) < φ (δ) < 0, an Therefore, By the convexity of φ, for any z > 0, φ (δ) φ ( δ)+φ (δ) =. ηφ ( δ) (1 η)φ (δ) > 0. φ(z δ) φ( δ) φ( z+δ) φ(δ) φ ( δ)z; φ (δ)z. Hence Q η (z δ) Q η ( δ) [ ηφ ( δ) (1 η)φ (δ) ] z > 0, which implies that f φ δ. It now suffices to show that f φ δ. By the efinition of φ ( δ) an φ (δ), for any ε > 0, there exists a ζ > 0 such that for any 0 < z < ζ, Therefore for any 0 < z < ζ, φ( z δ) φ( δ) z φ(z+δ) φ(δ) z φ ( δ) ε; φ (δ)+ε. Q η ( z δ) Q η ( δ) = η[φ( z δ) φ( δ)]+(1 η)[φ(z+δ) φ(δ)] η [ φ ( δ) ε ] z+(1 η) [ φ (δ)+ε ] z = ([ (1 η)φ (δ) ηφ ( δ) ] + ε ) z. Recall that (1 η)φ (δ) ηφ ( δ)<0. By setting ε small enough, we can ensure that (1 η)φ (δ) ηφ ( δ)+ε remains negative. Hence Q η ( z δ) < Q η ( δ), 14
15 CLASSIFICATION WITH REJECTION OPTION which implies that fφ δ. Now consier the case when η > 1. Observe that Q η (z) = Q 1 η ( z). From the previous iscussion, f φ = argmin z Q η (z) = argminq 1 η ( z) > δ. z At last, consier the case when < η < 1. Observe that in this case, Hence for any z > 0, ηφ (δ) (1 η)φ ( δ) > 0; ηφ ( δ) (1 η)φ (δ) < 0. Q η (z+δ) Q η (δ) [ ηφ (δ) (1 η)φ ( δ) ] z > 0, which implies that f φ δ. Similarly, Q η ( z δ) Q η ( δ) [ ηφ ( δ)+(1 η)φ (δ) ] z > 0, which implies that fφ δ. In summary, f φ [ δ,δ]. We now consier the only if part. Let [a,b ] an [a +,b + ] be the subifferential of φ at δ an δ respectively. We nee to show that a = b, a + = b + an a + /(a + + a ) =. We begin by showing that b + 0. Assume the contrary. The infinite sample consistency implies that for any η > 1, fφ > δ. Because b + > 0, we have φ( fφ ) > φ(δ). Together with the fact that Q η ( fφ ) < Q η(δ), this implies that φ( fφ ) < φ( δ). Subsequently, we have a > 0. The convexity of φ also suggests that a a + b b +. Because we have φ( f φ) φ(δ) b + ( f φ δ); φ( δ) φ( f φ) a ( f φ δ), Q η ( f φ) Q η (δ) (ηb + (1 η)a )( f φ δ) > 0. This contraiction suggests that b + 0. Given that a a + b b + 0, we have a a + b b +, which implies that It suffices to show that b + a + b + b + a + b + a + b + a +. an a + b + a +. Assume the contrary. First consier the case when b + /(a + b + ) <. Let η be such that b + /(a + b + ) < η <. By efinition, for any f < δ, φ( f) φ( δ) a ( f + δ); φ( f) φ(δ) b + ( f δ). Hence Q η ( f) Q η ( δ) [ηa (1 η)b + ]( f + δ) > 0, 15
16 YUAN AND WEGKAMP which implies that argminq η (z) δ. This contraicts with the infinite sample consistency. Therefore, b + /(a + b + ). Next we eal with the case of a + /(b + a + ) >. Let η be such that a + /(b + a + ) > η >. Following a similar argument as before, one can show that Q 1 η ( f) Q 1 η (δ) > 0 for any f > δ, which implies that argminq η (z) δ. This again contraicts infinite sample consistency because 1 η < 1. Therefore, a + /(b + a + ). The proof is now conclue. Proof of Theorem 9. Recall that Q η ( f) = ηφ( f)+(1 η)φ( f). Similarly, write R η [C( f,δ)] = ηl(c( f,δ),1)+(1 η)l(c( f,δ), 1). Also write Q η ( f) = Q η ( f) infq η ( f) an R η ( f) = R η ( f) infr η ( f). It suffices to show that The theorem can be euce from (10) by Jensen s inequality: R η [C( f,δ)] C[ Q η ( f)] 1/s. (10) R[C( f,δ)] = E [ R η(x) [C( f(x),δ)] ] CE [ Q η(x) ( f(x)) ] 1/s C ( E [ Q η(x) ( f(x)) ]) 1/s = C[ Q( f)] 1/s. To show (10), we consier separately the ifferent combinations of values of η an f. For brevity, we shall abbreviate their epenence on X in what follows. Case 1. η < an f < δ. As shown before, in this case fφ (X) < δ. Thus, R η [C( f,δ)] = 0 C[ Q η ( f)] 1/s. Case. η < an f < δ. Observe that Q η ( f) Q η ( δ) [ ηφ ( δ) (1 η)φ (δ) ] ( f + δ) = φ (δ) ( η)( f + δ) 0. Together with the fact that C s Q η ( δ) η s, we have Q η ( f) Q η ( δ) C s η s. Note that We have R η [C( f,δ)] = η. R η [C( f,δ)] C[ Q η ( f)] 1/s. 16
17 CLASSIFICATION WITH REJECTION OPTION Case 3. η < an f > δ. Observe that Q η ( f) Q η (δ) [ ηφ (δ) (1 η)φ ( δ) ] ( f δ) = φ (δ) (1 η )( f δ) 0. Together with the facts that < 1/ an C s Q η (δ) 1 η s, we have Q η ( f) Q η (δ) C s 1 η s (C) s 1 η s. Note that Therefore, R η [C( f,δ)] = 1 η. R η [C( f,δ)] C[ Q η ( f)] 1/s. Case 4. < η < 1 an f < δ. Following a similar argument as before, Therefore, Q η ( f) Q η ( δ) φ (δ) ( η)( f + δ) 0. Q η ( f) Q η ( δ) C s η s, which, together with the fact that R η [C( f,δ)] = η, implies that Case 5. < η < 1 an f < δ. In this case, Case 6. < η < 1 an f > δ. Observe that Hence R η [C( f,δ)] C[ Q η ( f)] 1/s. R η [C( f,δ)] = 0 C[ Q η ( f)] 1/s. Q η ( f) Q η (δ) φ (δ) (1 η )( f δ) 0. Q η ( f) Q η (δ) C s 1 η s, which, together with the fact that R η [C( f,δ)] = 1 η, implies that R η [C( f,δ)] C[ Q η ( f)] 1/s. Case 7. η > 1. Observe that R η [C( f,δ)] = R 1 η [C( f,δ)], an Q η ( f) = Q 1 η ( f). Because 1 η <, from Cases 1, an 3, we have The proof is therefore complete. R η [C( f,δ)] = R 1 η [C( f,δ)] C[ Q 1 η ( f)] 1/s = C[ Q η ( f)] 1/s. 17
18 YUAN AND WEGKAMP Proof of Theorem 10. The proof follows from the same argument as that of Theorem 9. The only ifference takes place in Case 3 where uner the current assumptions R η ( f) = 1 η C[ Q η (δ)] 1/s. Proof of Theorem 11. The last part of the proof is base on the proof of Theorem 3 in Bartlett, Joran an McAuliffe (006). Let g = C( f,δ) be the classification rule with reject option base on f :X R an set g = C( fφ,δ). We have shown above that uner the assumptions of Theorem 9, η 1{g g }(1{g = 1}+1{g = 1}) C[ Q η ( f)] 1/s 1 η 1{g g }(1{g = 1}+1{g = 1}) C[ Q η ( f)] 1/s. Moreover, Lemma 1 in Herbei an Wegkamp (006) states that R(g) = E[ η(x) 1{g(X) g (X)}(1{g(X) = 1}+1{g (X) = 1})] (11) Hence, for any ε > 0, R(g) +E[ 1 η(x) 1{g(X) g (X)}(1{g(X) = 1}+1{g (X) = 1})]. = E[ η(x) 1{ η(x) ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] +E[ η(x) 1{ η(x) > ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] +E[ 1 η(x) 1{ 1 η(x) ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] +E[ 1 η(x) 1{ 1 η(x) > ε}1{g(x) g (X)}(1{g(X) = 1}+1{g (X) = 1})] εp{g (X) g(x)}+ε 1 s Q( f) where we use (11) an the inequality x 1{ x ε} x r ε 1 r for r 1. Using the boun [ ] β P{g(X) g (X)} (8A) 1/α R(g) from the proof of Lemma 4 of Herbei an Wegkamp (006), an choosing ε = c[ R(g)] 1 β with c = [(8A) 1/α ] β /4 reaily gives the esire claim with K = 4Cc 1 s. Proof of Theorem 18. φ(y f(x)). Since we have Recall that f F minimizes Q( f) over f F. Let h(y f(x)) = φ(y f(x)) Q( f)+q( f) ( ) f + f Q + ce [ ( f f) (X) ] Q( f)+ce [ ( f f) (X) ], E [ h (Y f(x)) ] L E [ ( f f) (X) ] L c {Q( f) Q( f)} = L E[h(Y f(x))], c 18
19 CLASSIFICATION WITH REJECTION OPTION see, for example, Bartlett, Joran an McAuliffe (006). Since f n minimizes Q n ( f), we have Q( f n ) Q( f) = Ph(y f n (x)) = P n h(y f n (X))+(P P n )h(y f n (X)) P n h(y f(x))+(p P n )h(y f n (X)) sup(p P n )h(y f(x)) f F where Ph(Y f(x)) = E[h(Y f(x))] an P n h(y f(x)) = (1/n) n i=1 h(y i f(x i )) for any f F. Next we observe that sup(p P n )h(y f(x)) 3L f F n + max (P P n )h(y f(x)) f F n wheref n is the minimal 1/n-net off. By Bernstein s inequality, we get { } [ n{t +Ph(Y f(x))} ] /8 P sup(p P n )h(y f(x)) t N n exp f F n Ph (Y f(x))+(lb){t +Ph(Y f(x))}/6 [ N n exp nt ( L 8 c + LB ) 1 ] 3 an the conclusion follows easily. Acknowlegments The authors gratefully acknowlege the fact that part of the research was one while the authors were visiting the Isaac Newton Institute for Mathematical Sciences (Statistical Theory an Methos for Complex, High-Dimensional Data Programme) at Cambrige University uring Spring 008. The research of Ming Yuan was supporte in part by NSF grants DMS-MPSA an DMS The research of Marten Wegkamp was supporte in part by NSF Grant DMS References P.L. Bartlett, M.I. Joran an J. McAuliffe. Convexity, classification, an risk bouns. Journal of the American Statistical Association, 101: , 006. P.L. Bartlett an M.H. Wegkamp. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9: , 008. J. Frieman, T. Hastie an R. Tibshirani. Aitive logistic regression: a statistical view of boosting. Annals of Statistics, 8(): , 000. R. Herbei an M.H. Wegkamp. Classification with reject option. Canaian Journal of Statistics, 34(4):709-71, 006. Y. Lin. Support vector machines an the Bayes rule in classification. Machine Learning an Knowlege Discovery, 6:59-75,
20 YUAN AND WEGKAMP E. Mammen an A.B. Tsybakov. Smooth iscrimination analysis. Annals of Statistics, 7(6): , J.S. Marron, M. To an J. Ahn. Distance weighte iscrimination. Journal of the American Statistical Association, 10: , 007. B.D. Ripley. Pattern Recognition an Neural Networks. Cambrige University Press, Cambrige, A.B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 3: , 004. M.H. Wegkamp. Lasso type classifiers with a reject option. Electronic Journal of Statistics, 1: , 007. T. Zhang. Statistical behavior an consistency of classification methos base on convex risk minimization. Annals of Statistics, 3:56-134,
PETER L. BARTLETT AND MARTEN H. WEGKAMP
CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost,
More informationClassification with Reject Option
Classification with Reject Option Bartlett and Wegkamp (2008) Wegkamp and Yuan (2010) February 17, 2012 Outline. Introduction.. Classification with reject option. Spirit of the papers BW2008.. Infinite
More informationUC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics
UC Berkeley Department of Electrical Engineering an Computer Science Department of Statistics EECS 8B / STAT 4B Avance Topics in Statistical Learning Theory Solutions 3 Spring 9 Solution 3. For parti,
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationThe Exact Form and General Integrating Factors
7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily
More informationREAL ANALYSIS I HOMEWORK 5
REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove
More informationGeneralized Tractability for Multivariate Problems
Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationAcute sets in Euclidean spaces
Acute sets in Eucliean spaces Viktor Harangi April, 011 Abstract A finite set H in R is calle an acute set if any angle etermine by three points of H is acute. We examine the maximal carinality α() of
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationMonotonicity for excited random walk in high dimensions
Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationMath 1271 Solutions for Fall 2005 Final Exam
Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationA Weak First Digit Law for a Class of Sequences
International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationThe Principle of Least Action
Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of
More information'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21
Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationA Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential
Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix
More informationLecture 2 Lagrangian formulation of classical mechanics Mechanics
Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,
More informationIntroduction to variational calculus: Lecture notes 1
October 10, 2006 Introuction to variational calculus: Lecture notes 1 Ewin Langmann Mathematical Physics, KTH Physics, AlbaNova, SE-106 91 Stockholm, Sween Abstract I give an informal summary of variational
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationFlexible High-Dimensional Classification Machines and Their Asymptotic Properties
Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More informationNOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,
NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationThe effect of dissipation on solutions of the complex KdV equation
Mathematics an Computers in Simulation 69 (25) 589 599 The effect of issipation on solutions of the complex KV equation Jiahong Wu a,, Juan-Ming Yuan a,b a Department of Mathematics, Oklahoma State University,
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationImplicit Differentiation
Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,
More informationarxiv:math/ v1 [math.pr] 19 Apr 2001
Conitional Expectation as Quantile Derivative arxiv:math/00490v math.pr 9 Apr 200 Dirk Tasche November 3, 2000 Abstract For a linear combination u j X j of ranom variables, we are intereste in the partial
More informationALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS
ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an
More informationarxiv: v2 [math.pr] 27 Nov 2018
Range an spee of rotor wals on trees arxiv:15.57v [math.pr] 7 Nov 1 Wilfrie Huss an Ecaterina Sava-Huss November, 1 Abstract We prove a law of large numbers for the range of rotor wals with ranom initial
More informationd dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1
Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of
More informationTwo formulas for the Euler ϕ-function
Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationOn the Cauchy Problem for Von Neumann-Landau Wave Equation
Journal of Applie Mathematics an Physics 4 4-3 Publishe Online December 4 in SciRes http://wwwscirporg/journal/jamp http://xoiorg/436/jamp4343 On the Cauchy Problem for Von Neumann-anau Wave Equation Chuangye
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationConstruction of the Electronic Radial Wave Functions and Probability Distributions of Hydrogen-like Systems
Construction of the Electronic Raial Wave Functions an Probability Distributions of Hyrogen-like Systems Thomas S. Kuntzleman, Department of Chemistry Spring Arbor University, Spring Arbor MI 498 tkuntzle@arbor.eu
More informationCalculus of Variations
Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,
More informationExistence of equilibria in articulated bearings in presence of cavity
J. Math. Anal. Appl. 335 2007) 841 859 www.elsevier.com/locate/jmaa Existence of equilibria in articulate bearings in presence of cavity G. Buscaglia a, I. Ciuperca b, I. Hafii c,m.jai c, a Centro Atómico
More informationOn colour-blind distinguishing colour pallets in regular graphs
J Comb Optim (2014 28:348 357 DOI 10.1007/s10878-012-9556-x On colour-blin istinguishing colour pallets in regular graphs Jakub Przybyło Publishe online: 25 October 2012 The Author(s 2012. This article
More informationProof of SPNs as Mixture of Trees
A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a
More informationLie symmetry and Mei conservation law of continuum system
Chin. Phys. B Vol. 20, No. 2 20 020 Lie symmetry an Mei conservation law of continuum system Shi Shen-Yang an Fu Jing-Li Department of Physics, Zhejiang Sci-Tech University, Hangzhou 3008, China Receive
More informationOn combinatorial approaches to compressed sensing
On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationChapter 6: Energy-Momentum Tensors
49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.
More informationClosed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device
Close an Open Loop Optimal Control of Buffer an Energy of a Wireless Device V. S. Borkar School of Technology an Computer Science TIFR, umbai, Inia. borkar@tifr.res.in A. A. Kherani B. J. Prabhu INRIA
More informationarxiv: v1 [math.co] 15 Sep 2015
Circular coloring of signe graphs Yingli Kang, Eckhar Steffen arxiv:1509.04488v1 [math.co] 15 Sep 015 Abstract Let k, ( k) be two positive integers. We generalize the well stuie notions of (k, )-colorings
More informationThe Three-dimensional Schödinger Equation
The Three-imensional Schöinger Equation R. L. Herman November 7, 016 Schröinger Equation in Spherical Coorinates We seek to solve the Schröinger equation with spherical symmetry using the metho of separation
More informationOptimization of Geometries by Energy Minimization
Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.
More informationSeparation of Variables
Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical
More informationCalibrated Surrogate Losses
EECS 598: Statistical Learning Theory, Winter 2014 Topic 14 Calibrated Surrogate Losses Lecturer: Clayton Scott Scribe: Efrén Cruz Cortés Disclaimer: These notes have not been subjected to the usual scrutiny
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationNotes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk
Notes on Lie Groups, Lie algebras, an the Exponentiation Map Mitchell Faulk 1. Preliminaries. In these notes, we concern ourselves with special objects calle matrix Lie groups an their corresponing Lie
More information1 Heisenberg Representation
1 Heisenberg Representation What we have been ealing with so far is calle the Schröinger representation. In this representation, operators are constants an all the time epenence is carrie by the states.
More informationLOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES
LOCAL WELL-POSEDNESS OF NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES ÁRPÁD BÉNYI AND KASSO A. OKOUDJOU Abstract. By using tools of time-frequency analysis, we obtain some improve local well-poseness
More informationUnit #6 - Families of Functions, Taylor Polynomials, l Hopital s Rule
Unit # - Families of Functions, Taylor Polynomials, l Hopital s Rule Some problems an solutions selecte or aapte from Hughes-Hallett Calculus. Critical Points. Consier the function f) = 54 +. b) a) Fin
More informationSINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
Communications on Stochastic Analysis Vol. 2, No. 2 (28) 289-36 Serials Publications www.serialspublications.com SINGULAR PERTURBATION AND STATIONARY SOLUTIONS OF PARABOLIC EQUATIONS IN GAUSS-SOBOLEV SPACES
More informationMath 1B, lecture 8: Integration by parts
Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationIntroduction to the Vlasov-Poisson system
Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its
More informationNumber of wireless sensors needed to detect a wildfire
Number of wireless sensors neee to etect a wilfire Pablo I. Fierens Instituto Tecnológico e Buenos Aires (ITBA) Physics an Mathematics Department Av. Maero 399, Buenos Aires, (C1106ACD) Argentina pfierens@itba.eu.ar
More informationNew bounds on Simonyi s conjecture
New bouns on Simonyi s conjecture Daniel Soltész soltesz@math.bme.hu Department of Computer Science an Information Theory, Buapest University of Technology an Economics arxiv:1510.07597v1 [math.co] 6 Oct
More informationOn the enumeration of partitions with summands in arithmetic progression
AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 8 (003), Pages 149 159 On the enumeration of partitions with summans in arithmetic progression M. A. Nyblom C. Evans Department of Mathematics an Statistics
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationCharacterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations
Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial
More informationIntegration Review. May 11, 2013
Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In
More information4.2 First Differentiation Rules; Leibniz Notation
.. FIRST DIFFERENTIATION RULES; LEIBNIZ NOTATION 307. First Differentiation Rules; Leibniz Notation In this section we erive rules which let us quickly compute the erivative function f (x) for any polynomial
More informationLinear Regression with Limited Observation
Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear
More informationAssignment 1. g i (x 1,..., x n ) dx i = 0. i=1
Assignment 1 Golstein 1.4 The equations of motion for the rolling isk are special cases of general linear ifferential equations of constraint of the form g i (x 1,..., x n x i = 0. i=1 A constraint conition
More informationFall 2016: Calculus I Final
Answer the questions in the spaces provie on the question sheets. If you run out of room for an answer, continue on the back of the page. NO calculators or other electronic evices, books or notes are allowe
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationLecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations
Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:
More informationA. Incorrect! The letter t does not appear in the expression of the given integral
AP Physics C - Problem Drill 1: The Funamental Theorem of Calculus Question No. 1 of 1 Instruction: (1) Rea the problem statement an answer choices carefully () Work the problems on paper as neee (3) Question
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationStudents need encouragement. So if a student gets an answer right, tell them it was a lucky guess. That way, they develop a good, lucky feeling.
Chapter 8 Analytic Functions Stuents nee encouragement. So if a stuent gets an answer right, tell them it was a lucky guess. That way, they evelop a goo, lucky feeling. 1 8.1 Complex Derivatives -Jack
More informationEnergy behaviour of the Boris method for charged-particle dynamics
Version of 25 April 218 Energy behaviour of the Boris metho for charge-particle ynamics Ernst Hairer 1, Christian Lubich 2 Abstract The Boris algorithm is a wiely use numerical integrator for the motion
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More informationA FURTHER REFINEMENT OF MORDELL S BOUND ON EXPONENTIAL SUMS
A FURTHER REFINEMENT OF MORDELL S BOUND ON EXPONENTIAL SUMS TODD COCHRANE, JEREMY COFFELT, AND CHRISTOPHER PINNER 1. Introuction For a prime p, integer Laurent polynomial (1.1) f(x) = a 1 x k 1 + + a r
More informationFinal Exam Study Guide and Practice Problems Solutions
Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making
More informationExam 2 Review Solutions
Exam Review Solutions 1. True or False, an explain: (a) There exists a function f with continuous secon partial erivatives such that f x (x, y) = x + y f y = x y False. If the function has continuous secon
More information1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.
Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency
More information