QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS

Similar documents
Convex Optimization in Classification Problems

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Linear programming: Theory

A Robust Minimax Approach to Classification

A sharp Blaschke Santaló inequality for α-concave functions

Robust Fisher Discriminant Analysis

Robust Nash Equilibria and Second-Order Cone Complementarity Problems

Max-Margin Ratio Machine

A Robust Minimax Approach to Classification

Conic Linear Optimization and its Dual. yyye

Duality, Geometry, and Support Vector Regression

2 Ordinary Differential Equations: Initial Value Problems

Handout 8: Dealing with Data Uncertainty

LOWER BOUNDS FOR THE MAXIMUM NUMBER OF SOLUTIONS GENERATED BY THE SIMPLEX METHOD

Stability Analysis for Linear Systems under State Constraints

MAT389 Fall 2016, Problem Set 2

The Entropy Power Inequality and Mrs. Gerber s Lemma for Groups of order 2 n

Robust linear optimization under general norms

OPTIMAL CONTROL FOR A PARABOLIC SYSTEM MODELLING CHEMOTAXIS

Week 3 September 5-7.

Robust Kernel-Based Regression

An upper bound for the number of different solutions generated by the primal simplex method with any selection rule of entering variables

Mathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes

Robust Novelty Detection with Single Class MPM

Second Proof: Every Positive Integer is a Frobenius Number of Three Generators

Evaluation of Support Vector Machines and Minimax Probability. Machines for Weather Prediction. Stephen Sullivan

Convex Optimization. Convex Analysis - Functions

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1.

A. Derivation of regularized ERM duality

On the Consistency of Multiclass Classification Methods

ABOUT THE WEAK EFFICIENCIES IN VECTOR OPTIMIZATION

Lecture 6: Conic Optimization September 8

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Graph the linear system and estimate the solution. Then check the solution algebraically.

Trade Liberalization and Pollution in Melitz-Ottaviano Model

Summer School: Semidefinite Optimization

8. BOOLEAN ALGEBRAS x x

Orientations of digraphs almost preserving diameter

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION

Blow-up collocation solutions of nonlinear homogeneous Volterra integral equations

Handout 6: Some Applications of Conic Linear Programming

Lecture 20: November 1st

Statistical Pattern Recognition

A New Closed-loop Identification Method of a Hammerstein-type System with a Pure Time Delay

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

INF Introduction to classifiction Anne Solberg

Local Learning vs. Global Learning: An Introduction to Maxi-Min Margin Machine

Invariant Operators, Small Samples, and the Bias Variance Dilemma

Summary of Random Variable Concepts March 17, 2000

An Accurate Incremental Principal Component Analysis method with capacity of update and downdate

The approximation of piecewise linear membership functions and Łukasiewicz operators

A General Projection Property for Distribution Families

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y

Learning the Kernel Matrix with Semidefinite Programming

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

4. Algebra and Duality

On Range and Reflecting Functions About the Line y = mx

Local Maximums and Local Minimums of Functions. f(x, y) has a local minimum at the point (x 0, y 0 ) if f(x 0, y 0 ) f(x, y) for

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

New types of bipolar fuzzy sets in -semihypergroups

Transmit Optimization with Improper Gaussian Signaling for Interference Channels

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

8. Geometric problems

Identifying second degree equations

8.7 Systems of Non-Linear Equations and Inequalities

10 Back to planar nonlinear systems

A Second order Cone Programming Formulation for Classifying Missing Data

Semidefinite Programming Basics and Applications

A new primal-dual path-following method for convex quadratic programming

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

The Minimum Error Minimax Probability Machine

On the relation between the relative earth mover distance and the variation distance (an exposition)

Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Robust Farkas Lemma for Uncertain Linear Systems with Applications

w u P z K Figure 1: Model of the closed loop control sstem Model In this section the model for the control problems will be given. It is static, multi

Technical Note Preservation of Supermodularity in Parametric Optimization Problems with Nonlattice Structures

Second Order Cone Programming Relaxation of Positive Semidefinite Constraint

Nash Equilibrium and the Legendre Transform in Optimal Stopping Games with One Dimensional Diffusions

Lecture 7: Advanced Coupling & Mixing Time via Eigenvalues

Using Schur Complement Theorem to prove convexity of some SOC-functions

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Associativity of triangular norms in light of web geometry

Exam: Continuous Optimisation 2015

LMI Methods in Optimal and Robust Control

Quantitative stability of the Brunn-Minkowski inequality for sets of equal volume

The Steiner Ratio for Obstacle-Avoiding Rectilinear Steiner Trees

PICONE S IDENTITY FOR A SYSTEM OF FIRST-ORDER NONLINEAR PARTIAL DIFFERENTIAL EQUATIONS

A General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones

EE 227A: Convex Optimization and Applications October 14, 2008

15. Conic optimization

Kernels for Multi task Learning

Marcinkiewicz Interpolation Theorem by Daniel Baczkowski

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

MINIMUM PRECISION REQUIREMENTS FOR THE SVM-SGD LEARNING ALGORITHM. Charbel Sakr, Ameya Patil, Sai Zhang, Yongjune Kim, and Naresh Shanbhag

12.1 Systems of Linear equations: Substitution and Elimination

Transcription:

Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog (Received Ma 18, 007; Revised October 16, 007) Abstract When there are two classes whose mean vectors and covariance matrices are known, Lanckriet et al. [7 consider the Linear Minimax Classification (LMC) problem and the propose a method for solving it. In this paper we first discuss the Quadratic Minimax Classification (QMC) problem, which is a generalization of LMC. We show that QMC is transformed to a parametric Semidefinite Programming (SDP) problem. We further define the Convex Minimax Classification (CMC) problem. Though the two problems are generalizations of LMC, we prove that solutions of these problems can be obtained b solving LMC. Kewords: Optimization, minimax classification, semidefinite programming, convexit 1. Introduction Classification is a widel used technique in real life. Its practical importance has attracted a large number of researches. These researches include, for example, classification using the Mahalanobis distance, the Neural Network, and the Support Vector Machine (SVM). Recentl, Lanckriet et al. [7 propose a minimax approach to binar classification. Their approach needs mean vectors and covariance matrices of two classes, and no assumptions are made with respect to the probabilit distributions. Generall, the performance of a classifier is affected b the unknown distributions. To avoid the poor performance of a classifier, Lanckriet et al. [7 propose to use a linear classifier which minimizes the worstcase misclassification probabilit under all the possible choice of probabilit distributions with given mean vectors and covariance matrices. In this paper, we call this problem the Linear Minimax Classification (LMC) problem. Lanckriet et al. [7 show that LMC is formulated as a Second Order Cone Programming (SOCP) problem. The size of this problem depends onl on the dimension and it is independent from the number of data, so the problem is easil solved b an interior point method. Their algorithm is called the Minimax Probabilit Machine (MPM). The demonstrate that a classifier obtained b MPM works well to practical problems. From its theoretical and practical importance, man researches related to MPM have been made. Huang et al. [4 propose the Biased- MPM, which takes into account an importance of each class. In [5, the same authors as [4 further generalize the Biased-MPM and develop the Minimum Error MPM, which brings tighter worst-case accurac. Hoi et al. [3 and Kitahara et al. [6 both extend MPM to multiple classification. Hoi et al. [3 adopt a classical method which uses binar classifiers. In contrast, Kitahara et al. [6 formulate a problem to find linear functions whose worst-case misclassification probabilit is small. In most cases, classification methods are first developed in linear classification. Then the are extended to improve the performances b allowing nonlinear classification functions. For example, SVM is first proposed to give linear classifiers, after that the Kernel method 191

19 T. Kitahara, S. Mizuno & K. Nakata is developed for handling nonlinear classification functions. As for MPM, there are few researches in this direction. Cooke [ analzes quadratic classifiers which minimize the worst-case misclassification probabilit. But his analsis is rather restrictive due to the assumption that the covariance matrices are diagonal. In this paper, we formall address the problem using quadratic classifiers with onl the assumption that the covariance matrices are positive definite. Practicall, this assumption is acceptable. We call the problem the Quadratic Minimax Classification (QMC) problem. We show that QMC is formulated as a parametric Semidefinite Programming (SDP) problem. The parametric SDP problem can be solved b an interior point method. However, in our preliminar computational experiments we observe that solutions of various examples of QMC are alwas linear. Motivated b these facts, we conduct a theoretical analsis of QMC. In our analsis, we define another generalization of LMC. In both linear and quadratic classification, we essentiall classif samples using a set defined b a classifier and its complementar set. Noticing this fact, we consider the problem of finding a convex set which minimizes the worst-case misclassification probabilit. We refer to this problem as the Convex Minimax Classification (CMC) problem. As opposed to LMC or QMC, so far we have not found an computational method to solve CMC directl. In this paper, we prove that an optimal solution of LMC is an optimal solution of CMC. Moreover, from this fact we manage to show that an optimal solution of LMC is an optimal solution of QMC. The rest of this paper is organized as follows. In Section, we review LMC b Lanckriet et al. [7. In Section 3, we define QMC and we give an SDP representation of QMC. In Section 4, we introduce CMC. Relations between the three problems are investigated in Section 5. Finall in Section 6, we conclude the paper. Throughout this paper, we use the following notations. Let R n be the n-dimensional Euclidean space and S n be the set of n-dimensional smmetric matrices. S n + and S n ++ represent the set of n-dimensional smmetric positive semidefinite and positive definite matrices, respectivel. For X S n, X O means that X is positive semidefinite.. The Linear Minimax Classification Problem In this section, we review the Linear Minimax Classification (LMC) problem b Lanckriet et al. [7. Let x R n be a sample which belongs to one of two classes, namel, Class 1 or Class. Suppose that we know the mean vector µ i R n and the covariance matrix Σ i S n ++ of each class i {1, }. In this paper, we assume that the covariance matrices are positive definite. This assumption is valid when a regularization term is added to each of the covariance matrix. No further assumptions are made with respect to the probabilit distributions. We remark that the settings stated in this paragraph are valid throughout this paper. We determine a linear classifier l(z) = a T z + b, where a R n \ {0}, b R, and z R n. For an given sample x, if a T x + b < 0 then it is classified as Class 1, if a T x + b > 0 then it is classified as Class, and if a T x + b = 0 then it can be classified as either Class 1 or Class. For the classifier l(z) = a T z + b, its worst-case misclassification probabilit of Class 1 sample as Class is expressed as sup Pr{a T x + b 0}, where the supremum is taken over all probabilit distributions having the mean vector

Minimax Classification Problems 193 µ 1 and the covariance matrix Σ 1. Considering the other case similarl, the worst-case misclassification probabilit α l of the classifier l(z) = a T z + b is given b α l = max{ sup Pr{a T x + b 0}, sup Pr{a T x + b 0}}. (1) Definition.1 (LMC) We refer to the problem of finding a linear function l(z) = a T z +b, which minimizes the worst-case misclassification probabilit defined b (1), as the Linear Minimax Classification (LMC) problem. We represent the optimal value of LMC as α 1, that is, α 1 = min α l. () l From (1), it is eas to see that LMC can be expressed as min α subject to sup x (µ1,σ 1 ) Pr{a T x + b 0} α, sup x (µ,σ ) Pr{a T x + b 0} α, (3) where α [0, 1, a R n \ {0}, and b R are variables. Lanckriet et al. [7 prove that if µ 1 µ then α 1 < 1 and a 0 at an optimal solution (α, a, b) of (3). In the following, we assume µ 1 µ. Lanckriet et al. [7 show that (3) is reduced to the following problem: min Σ 1/ 1 a + Σ 1/ a subject to (µ µ 1 ) T a = 1, (4) where Σ 1/ i (i = 1, ) is a smmetric positive definite matrix which satisfies (Σ 1/ i ) = Σ i. The problem (4) is a Second Order Cone Programming (SOCP) problem and can be solved efficientl b an interior point method. The algorithm is called the Minimax Probabilit Machine (MPM). In [7, it is demonstrated that the performance of MPM is competitive to the Support Vector Machine (SVM). 3. The Quadratic Minimax Classification Problem In this section, we define the Quadratic Minimax Classification (QMC) problem and present an SDP representation of it. Under the same settings as Section, we consider to classif a sample using a quadratic function of the form q(z) = z T Az + b T z + c ((A, b, c) S n R n R) in analog with linear classification. Then, the worst-case misclassification probabilit α q of q(z) is given b α q = max{ sup Pr{x T Ax + b T x + c 0}, sup Pr{x T Ax + b T x + c 0}}. (5) Definition 3.1 (QMC) We call the problem of finding a quadratic function q(z) = z T Az + b T z + c, which minimizes the worst-case misclassification probabilit defined b (5), the Quadratic Minimax Classification (QMC) problem. The optimal value of QMC is represented as α, that is, α = min α q. (6) q

194 T. Kitahara, S. Mizuno & K. Nakata Since a linear function is a special quadratic function, we have α α 1. (7) QMC can be formulated as min α subject to sup x (µ1,σ 1 ) Pr{x T Ax + b T x + c 0} α, sup x (µ,σ ) Pr{x T Ax + b T x + c 0} α, (8) where α [0, 1, A S n, b R n, and c R are variables. The next lemma gives an SDP representation of QMC. Lemma 3.1 Suppose that µ 1 µ. Then the optimal value of (8) is less than 1. Furthermore, the problem (8) is equivalent to the following parametric semidefinite programming problem: min α subject to [ tr((σ i + µ i µ T i )P i ) + µ T i q i + r i τ i α (i = 1, ), Pi q i qi T O (i = 1, ), [ r i P1 A q 1 b (9) (q 1 b) T O, [ r 1 τ 1 c P + A q + b (q + b) T O, r τ + c τ i > 0 (i = 1, ), where α, (P i, q i, r i, τ i ) S n R n R R (i = 1, ), and (A, b, c) S n R n R are variables. The following lemma is needed in the proof of Lemma 3.1. Lemma 3. The value sup x (µ1,σ 1 ) Pr{x T Ax + b T x + c 0} is the optimal value of the following semidefinite programming problem: min tr((σ [ 1 + µ 1 µ T 1 )P 1) + µ T 1 q 1 + r 1 P subject to 1 q 1 (q 1) T r 1 O, [ P 1 τ 1A q 1 τ 1b (q 1 τ 1b) T r 1 1 τ 1c O, τ 1 0, (10) where (P 1, q 1, r 1, τ 1) S n R n R R are variables. Proof. The proof is given in Vandenberghe et al. [10. Here we give a sketch of the proof. Define the set D = {x R n : x T Ax + b T x + c 0}. From the constraints of (10), for an x R n we have x T P 1x + q 1 T x + r 1 1 + τ 1(x T Ax + b T x + c), x T P 1x + q 1 T x + r 1 0.

Thus it follows that Therefore, for x (µ 1, Σ 1 ), we have Minimax Classification Problems 195 { 1; x D x T P 1x + q 1 T x + r 1 δ D (x) = 0; x D tr((σ 1 + µ 1 µ T 1 )P 1) + µ T 1 q 1 + r 1 = E[x T P 1x + q T 1 x + r 1 E[δ D (x) = Pr{x D}. The above relation implies that the optimal value of (10) is an upper bound for the value sup x (µ1,σ 1 ) Pr{x T Ax + b T x + c 0}. We can also show that the optimal value of the dual problem for (10) gives a lower bound. Notice that (10) is strictl feasible. From these facts and the semidefinite programming dualit, the statement follows. Proof of Lemma 3.1. As we mentioned in Section 1, when µ 1 µ, we have α 1 < 1, which implies α < 1. To show the second statement, we claim that when α < 1, the first constraint of (8) is equivalent to the existence of (P 1, q 1, r 1, τ 1 ) which satisfies the constraints of (9). From Lemma 3., the constraint is equivalent to the existence of (P 1, q 1, r 1, τ 1) S n R n R R satisfing the following sstem: tr((σ 1 + µ 1 µ T 1 )P 1) + µ T 1 q 1 + r 1 α, (11) [ P 1 q 1 (q 1) T r 1 O, [ P 1 τ 1A q 1 τ 1b (q 1 τ 1b) T r 1 1 τ 1c O, τ 1 0. If τ 1 = 0, then we have [ P 1 q 1 (q 1) T r 1 1 which leads to or equivalentl tr ([ P 1 q 1 (q 1) T r 1 1 O, [ Σ1 + µ 1 µ T 1 µ 1 µ T 1 1 tr((σ 1 + µ 1 µ T 1 )P 1) + (q 1) T µ 1 + r 1 1, ) 0 which contradicts to (11). Thus we have τ 1 0. Then (P 1, q 1, r 1, τ 1 ) = 1 (P τ 1, q 1, r 1, 1) 1 satisf the constraints of (9). Similarl, we can show that the second constraint of (8) is equivalent to the existence of (P, q, r, τ ) satisfing the constraints of (9). Thus we have established the fact that when α < 1, the constraints of (8) is equivalent to those of (9), which completes the proof. If we fix α, the constraints of (9) are semidefinite constraints, and the resulting feasibilit problem is an SDP problem. Furthermore, the constraints of (9) are monotonic in α, that is, if the constraints are infeasible for some α, then the are infeasible for α α. Exploiting these facts, we can compute the optimal value of (9) b using a line search on α, for example, a bisection method. In our preliminar computational experiments, we observe that solutions of various examples of QMC are alwas linear. In section 5, we prove that this propert holds in general.

196 T. Kitahara, S. Mizuno & K. Nakata 4. The Convex Minimax Classification Problem In this section, we generalize LMC b noticing the fact that the classification rule is characterized b a set. We define an open half space S = {z R n a T z + b < 0} for a linear classifier l(z) = a T z +b. Let ext(s) and bd(s) denote the exterior and boundar of S, respectivel. Then the worst-case misclassification probabilit (1) is equivalent to α = max{ sup Pr{x ext(s) bd(s)}, sup Pr{x S bd(s)}}. (1) The LMC problem is to find an open half space S, which minimizes α defined b (1). Now we consider a classification rule using a general set which is not restricted to a half space. Suppose we have a set S R n. For an sample x R n, if x S, then it is classified as Class 1, if x ext(s) then it is classified as Class, and if x bd(s), then it can be classified as either Class 1 or Class. As in the former sections, the worst-case misclassification probabilit α S of S is expressed as α S = max{ sup Pr{x ext(s) bd(s)}, sup Pr{x S bd(s)}}. (13) It is possible to consider S to be a general set, but due to its generalit, the problem is intractable. Therefore, as the first step, in this paper we focus on the case where S is an open convex set. Definition 4.1 (CMC) We call the problem of finding an open convex set S, which minimizes the worst-case misclassification probabilit defined b (13), the Convex Minimax Classification (CMC) problem. The optimal value of CMC is denoted b α 3, that is, It is eas to see that α 3 = min α S. (14) S:open convex α 3 α 1. As in the former sections, CMC can be expressed as min α subject to sup x (µ1,σ 1 ) Pr{x ext(s) bd(s)} α, sup x (µ,σ ) Pr{x S bd(s)} α, S R n : open, convex. (15) In contrast to LMC or QMC, we have not found an solvable optimization problem for CMC. 5. Relations between Three Problems In this section, we investigate relations between LMC, QMC, and CMC. We state our main results as the following two theorems. Theorem 5.1 Suppose that µ 1 µ. Then an optimal solution of LMC is an optimal solution of QMC. In short, we have α 1 = α.

Minimax Classification Problems 197 Theorem 5. Suppose that µ 1 µ and let a T z + b be an optimal solution of LMC. Then the half space H = {z R n a T z + b < 0} (16) is an optimal solution of CMC, that is, α 1 = α 3. Remember that both QMC and CMC are generalizations of LMC, so there is a possibilit that we could obtain a better classifier than the optimal one of LMC. It is interesting to see that the above two theorems exclude this possibilit. We also remark that Cooke [ shows the same result as Theorem 5.1 under the assumption that covariance matrices are diagonal. As he admits, this is a restrictive assumption. In this section, we prove the propert with onl the assumption that the covariance matrices are positive definite. First we prove Theorem 5.. Our proof heavil relies on the next classical result. Lemma 5.1 (Marshall and Olkin [8) Let (µ, Σ) R n S n ++ be given. Then for an convex set S R n, it holds that sup Pr{x S} = 1 x (µ,σ) 1 + d, where d = inf S ( µ) T Σ 1 ( µ). The result is rediscovered b Popescu [9 from an optimization perspective. Intuitivel speaking, the lemma states that the supremum of the probabilit is determined b a kind of distance from µ to S. We remark that this tpe of bounds, so to speak Chebshev bounds, have regained interests with the recent developments of interior point methods for semidefinite programming problems [10. It is because in some cases the bounds can efficientl be computed b semidefinite programming formulations, as Lemma 3. shows. Proof of Theorem 5.. We prove this theorem b showing that for an open convex set S, there exists a half space H such that α H α S, where α S and α H are defined b (13). As S is convex, its closure cl(s) = S bd(s) is also convex. Thus, from Lemma 5.1, we have sup Pr{x cl(s)} = 1 1 + d, where or equivalentl d = d = inf ( µ ) T Σ 1 ( µ ) cl(s) inf Σ 1/ Σ 1/ µ. (17) cl(s) Here Σ 1/ is a smmetric positive definite matrix satisfing (Σ 1/ ) = Σ 1. The right-hand minimization problem of (17) can be seen as a projection problem. Let and define the problem T = {z R n z = Σ 1/ w, w cl(s)} inf z Σ 1/ µ. (18) z T Both Figure 1 and Figure show an essence of our proof. In the transformed space shown

198 T. Kitahara, S. Mizuno & K. Nakata µ.0 1.5 1/ Σ µ.0 1.5 = Σ 1/ z 1.0 1.0 z 0.5 cl( S) 0.5 T -.0-1.5-1.0-0.5 O 0.5 1.0 1.5.0 x -.0-1.5-1.0-0.5 O 0.5 1.0 1.5.0 x -0.5-0.5-1.0 1 ( Σ T ( µ )) ( ) < 0-1.5-1.0 1/ ( Σ T µ z ) ( z -1.5 z ) < 0 -.0 -.0 Figure 1: The original space Figure : The transformed space in Figure, there is a unique projection of Σ 1/ µ onto T, which is denoted b z. Let H be a set defined b H = {z R n (Σ 1/ µ z ) T (z z ) < 0}. Then b considering the tangent at z it is eas to see that and T cl(h ) (19) inf z Σ 1/ µ = inf z Σ 1/ z T z cl(h µ = z Σ 1/ µ. (0) ) Back in the original space, for = Σ 1/ z, let H be a set defined b H = { R n (Σ 1 (µ )) T ( ) < 0}. Then (19) implies which is equivalent to Thus we have S H, ext(h) bd(h) ext(s) bd(s). sup Pr{x ext(h) bd(h)} sup Pr{x ext(s) bd(s)}. (1) Note that (0) shows which means inf Σ 1/ Σ 1/ µ = inf Σ 1/ Σ 1/ µ, cl(s) cl(h) sup Pr{x cl(h)} = sup Pr{x cl(s)} () from Lemma 5.1. Combining (1) with (), we conclude that α H α S. Let a T z + b be an optimal solution of LMC and define the half space H according to (16). Obviousl, H is a feasible solution of CMC and now we have shown that its objective

Minimax Classification Problems 199 value is the same as the optimal value of CMC, that is, α 1 = α H = α 3, which shows H is an optimal solution of CMC. Since we do not assume convexit in QMC, Theorem 5.1 is not directl derived from Theorem 5.. But the next lemma essentiall sas that we can focus onl on a convex quadratic classifier in QMC. Lemma 5. For an quadratic classifier q(z) = z T Az + b T z + c, there exists a convex quadratic classifier f(z) = z T A z + b T z + c ((A, b, c ) S n + R n R) such that α f α q. Proof. When α q = 1, where α q is defined b (5), an convex quadratic classifier satisfies the propert of the lemma. Therefore in the following, we assume that α q < 1. From Lemma 3., the worst-case misclassification probabilit of Class 1 can be computed as the optimal value of the following Semidefinite Programming (SDP) problem: min subject to tr((σ [ 1 + µ 1 µ T 1 )P ) + µ T 1 q + r P q q T O, [ r P τa q τb (q τb) T O, r 1 τc τ 0. We denote this problem as P(A, b, c). Notice from the assumption α q < 1, the matrix [ A b  = b T c is neither positive semidefinite nor negative semidefinite. Indeed, if  were positive semidefinite, for an x R n we would have which implies x T Ax + b T x + c 0, sup Pr{x T Ax + b T x + c 0} = 1. Thus we would have α q = 1 from (5), which is a contradiction. Similarl, we can show a contradiction in the case where  were negative semidefinite. Note also that we can add the condition tr((σ 1 + µ 1 µ T 1 )P ) + µ T 1 q + r 1 to the problem. From these facts and the assumption that Σ 1 is positive definite, we can show that feasible solutions of P(A, b, c) are bounded. Therefore, the set of the optimal solutions of P(A, b, c) is nonempt. Let (P, q, r, τ ) be one of them. Now for (A, b, c ) = (P, q, r 1), consider the new classifier z T A z + b T z + c. Define and also p 1 = sup x (µ1,σ 1 ) Pr{x T Ax + b T x + c 0}, p = sup x (µ,σ ) Pr{x T Ax + b T x + c 0}, p 3 = sup x (µ1,σ 1 ) Pr{x T A x + b T x + c 0}, p 4 = sup x (µ,σ ) Pr{x T A x + b T x + c 0}, (3) (4)

00 T. Kitahara, S. Mizuno & K. Nakata We prove the lemma b showing p 3 p 1 and p 4 p. Note that p 3 is the optimal value of P (A, b, c ) and that (P, q, r, 1) is a feasible solution of P(A, b, c ) with objective value p 1. Hence we have p 3 p 1. Next, the second constraint in P(A, b, c) indicates that z T A z + b T z + c τ (z T Az + b T z + c) for an z R n. Remark that from the assumption that α q < 1, we have p 1 < 1, which implies τ > 0. Then we have which is equivalent to {z R n z T Az + b T z + c > 0} {z R n z T A z + b T z + c > 0}. {z R n z T A z + b T z + c 0} {z R n z T Az + b T z + c 0}. This shows p 4 p. Proof of Theorem 5.1. From Theorem 5. and Lemma 5., we have α 1 = α 3 α. Since α α 1 from (7), we have α 1 = α. B the logic similar to the one appearing in the last paragraph of the proof of Theorem 5., we can show that an optimal solution of LMC is an optimal solution of QMC. 6. Conclusion In this paper, we introduce QMC (Quadratic Minimax Classification problem) and CMC (Convex Minimax Classification problem) b generalizing LMC (Linear Minimax Classification problem) proposed b Lanckriet et al. [7. We show that QMC is transformed to a parametric SDP (Semidefinite Programming problem), which is easil solved b an interior point method. In our preliminar computational experiments, we observe that all the solutions of QMC are linear. So we guess that QMC has the same solution as LMC in spite of the generalit. We succeed to prove the result theoreticall. We also show that the solution of CMC is obtained from the solution of LMC, although we do not have an direct method for solving CMC. Acknowledgement The authors are grateful to two anonmous referees. The provided ver useful comments to our paper. This research is supported in part b Grant-in-Aid for Scientific Research (A) 1601033 and Young Scientists (B) 1771016 of Japan Societ for the Promotion of Science. References [1 S. Bod and L. Vandenberghe: Convex Optimization (Cambridge Universit Press, 004). [ T. Cooke: A lower bound on the performance of the quadratic discriminant function. Journal of Multivariate Analsis, 89 (004), 371 383. [3 C.H. Hoi and M.R. Lu: Robust face recognition using minimax probabilit machine. Proceedings of the 004 IEEE International Conference on Multimedia and Expo (IEEE, 004), 1175 1178.

Minimax Classification Problems 01 [4 K. Huang, H. Yang, I. King, and M.R. Lu: Learning classifiers from imbalanced data based on biased minimax probabilit machine. Proceedings of the 004 IEEE Computer Societ Conference on Computer Vision and Pattern Recognition (IEEE, 004), 558 563. [5 K. Huang, H. Yang, I. King, M.R. Lu, and L. Chan: The minimum error minimax probabilit machine. Journal of Machine Learning Research, 5 (004), 153 186. [6 T. Kitahara, S. Mizuno, and K. Nakata: An extension of a minimax approach to multiple classification. Journal of Operations Research Societ of Japan, 50 (007), 13 136. [7 G.R.G. Lanckriet, L.El Ghaoui, C. Bhattachara, and M.I. Jordan: A robust minimax approach to classification. Journal of Machine Learning Research, 3 (00), 555 58. [8 A. Marshall and I. Olkin: Multivariate chebshev inequalities. Annals of Mathematical Statistics, 31 (1960), 1001 1014. [9 I. Popescu: Applications of optimization in probabilit, finance and revenue management (Ph. D. thesis, Applied Mathematics Department and Operations Research Center, Massachusetts Institute of Technolog, 1999). [10 L. Vandenberghe, S. Bod, and K. Comanor: Generalized Chebshev bounds via semidefinite programming. SIAM Review, 49 (007), 5 64. Tomonari Kitahara Department of Industrial Engineering and Management Toko Institute of Technolog -1-1 Ohokaama Meguro Toko 15-855, Japan E-mail: kitahara.t.ab@m.titech.ac.jp