Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability

Size: px

Start display at page:

Download "Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability"

Simon Sherman
6 years ago
Views:

1 American Journal of Theoretical and Applied Statistics 05; 5(-): 7- Published online November 9, 05 ( doi: 0.648/j.ajtas.s ISSN: (Print); ISSN: (Online) Efficient Approach to Pattern Recognition Based on Nicholas A. Nechval, *, Konstantin N. Nechval Department of Mathematics, Baltic International Academy, Riga, Latvia Department of Applied Mathematics, Transport and Telecommunication Institute, Riga, Latvia address: (N. A. Nechval), (K. N. Nechval) To cite this article: Nicholas A. Nechval, Konstantin N. Nechval. Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability. American Journal of Theoretical and Applied Statistics. Special Issue: Novel Ideas for Efficient Optimization of Statistical Decisions and Predictive Inferences under Parametric Uncertainty of Underlying Models with Applications. Vol. 5, No. -, 06, pp. 7-. doi: 0.648/j.ajtas.s Abstract: In this paper, an efficient approach to pattern recognition (classification) is suggested. It is based on minimization of assification probability and uses transition from high dimensional problem (dimension p ) to one dimensional problem (dimension p=) in the case of the two classes as well as in the case of several classes with separation of classes as much as possible. The probability of assification, which is nown as the error rate, is also used to judge the ability of various pattern recognition (classification) procedures to predict group membership. The approach does not require the arbitrary selection of priors as in the Bayesian classifier and represents the novel pattern recognition (classification) procedure that allows one to tae into account the cases, which are not adequate for Fisher s classification rule (i.e., the distributions of the classes are not multivariate normal or covariance matrices of those are different or there are strong multi-nonlinearities). Moreover, it also allows one to classify a set of multivariate observations, where each of the observations belongs to the same unnown class. For the cases, which are adequate for Fisher s classification rule, the proposed approach gives the results similar to that of Fisher s classification rule. For illustration, practical examples are given. Keywords: Pattern, Recognition, lassification, Misclassification, Probability, Minimization. Introduction Pattern recognition aim is to classify data (patterns) based on statistical information extracted from the patterns [, ]. It provides the solution to various problems from speech recognition, face recognition to classification of handwritten characters and medical diagnosis. The various application areas of pattern recognition are lie bioinformatics, document classification, image analysis, data mining, industrial automation, biometric recognition, remote sensing, handwritten text analysis, medical diagnosis, speech recognition, statistics, diagnostics, computer science, biology and many more. Pattern recognition aim is to classify data (patterns) based on either a priori nowledge or on statistical information extracted from the patterns. Fisher s linear discriminant rule (FLDR) is the most widely used classification rule [3 9 and references therein]. Some of the reasons for this are its simplicity and unnecessity of strict assumptions. In its original form, proposed by Fisher, the method assumes equality of population covariance matrices, but does not explicitly require multivariate normality. However, optimal classification performance of Fisher's discriminant function can only be expected when multivariate normality is present as well, since only good discrimination can ensure good allocation. In practice, we often are in need of analyzing input data samples, which are not adequate for Fisher s classification rule, such that the distributions of the groups (classes, populations) are not multivariate normal or covariance matrices of those are different or there are strong multi-nonlinearities. In this paper, an efficient approach to pattern recognition (classification) is proposed. It is based on minimization of assification probability. The approach does not require the arbitrary selection of priors as in the Bayesian classifier and represents the novel procedure that allows one to analyze input data samples, which are not adequate for Fisher s pattern classification rule (i.e., the distributions of the classes are not multivariate normal or covariance matrices of those are different or there are strong multi-nonlinearities). For the

2 8 Nicholas A. Nechval and Konstantin N. Nechval: Efficient Approach to Pattern Recognition Based on cases, which are adequate for Fisher s classification rule, the proposed approach gives the results similar to that of Fisher s rule. Moreover, it also allows one to classify the set of multivariate observations, where each of the observations belongs to the same class. This approach uses transition from high dimensional problem (dimension p ) to one dimensional problem (dimension p=) in the case of the two classes as well as in the case of several classes with separation of classes as much as possible. The probability of assification, which is nown as the error rate, is also used to judge the ability of various pattern recognition (classification) procedures to predict group membership.. Approach to Pattern Recognition.. Approach to Pattern Recognition (lassification) in the ase of Two lasses Let = (,, n ), = (,, n ) () be samples of observed vectors of attributes of objects from two different classes and, respectively. In this case, the proposed approach to pattern recognition (classification) is as follows. is determined (see Fig. ), where represents the probability density function (pdf) of a transformed observation X= X ( ) of from class j, j {, }. Step (Pattern recognition (classification) via separation threshold h). At this step, pattern recognition (classification) of the observation is carried out as follows: if X( ) < h, if X( ) h. > Remar. The recognition (classification) rule (3) can be rewritten as follows: Assign to the class j for which, j {, }, is largest... Approach to Pattern Recognition (lassification) in the ase of Several lasses Let = (,, ), n, = ( m,, ) m n (4) be samples of observed vectors of objects from several different classes,,, m, respectively. In this case, the proposed approach to pattern recognition (classification) is as follows. Step (Transition from high dimensional problem (dimension p ) to one dimensional problem (dimension p=)). At this step, at first, transition from p-dimensional problem to g-dimensional problem is carried out by using a suitable transformation of the multivariate observations = (,..., ) to the multivariate observations Z= p Z,..., Z ), where g must be no bigger [] than g m m (3) g= min( m, p). (5) Figure. Misclassification probability. Step (Transition from high dimensional problem (dimension p ) to one dimensional problem (dimension p=)). At this step, transition from high dimensional problem to one dimensional problem is carried out by using suitable transformations of the multivariate (p ) observations = [,, p ] to univariate observations X with separation of classes as much as possible to obtain the input object allocation, which should be optimal in the sense of minimizing, on average, the number of incorrect assignments. Then the separation threshold h, which minimizes the probability of assification of the input observation, () P = f ( xdx ) + f ( xdx ) ], R R If g, transition from g-dimensional problem to one dimensional problem is carried out by using a suitable transformation of the multivariate observations Z= ( Z,..., Z ) to univariate observations X with separation of g classes as much as possible to obtain the input object allocation, which should be optimal in the sense of minimizing, on average, the number of incorrect assignments. Then the separation threshold h l, which minimizes the probability of assification of the input observation for classes and l, l l l P = f ( xdx ) + f ( xdx ) ], R l R l, l {,, m}, l, (6) is determined (pairwise), where represents the pdf of a transformed observation X( ) from class. Step (Pattern recognition (classification) via separation thresholds h l ;, l {,, m}, l). At this step, pattern recognition (classification) of the observation is

3 American Journal of Theoretical and Applied Statistics 05; 5(-): 7-9 carried out as follows: if X ( ) < h, l. (7) Remar. The recognition (classification) rule (7) can be rewritten as follows: l l if f ( x ) > f ( x ), l. (8) 3. Practical Examples 3.. Example Suppose we wish to classify some product (input vector = [, ] ) to one of two classes of quality ( and ) of this product. The data samples of observed vectors of attributes of product quality from two different classes and, respectively, are given in Table. l Table. Product quality attributes data. lass of product quality attributes Vector (i) of quality attributes i y (i) y (i) i y (i) y (i) lass of product quality attributes Vector (i) of quality attributes i y (i) y (i) i y (i) y (i) l procedure will not successfully separate the two classes. Step. For transition from high dimensional problem (p=) to one dimensional problem (p=), the following transformations are used: =[, ] Z=[Z, Z, Z 3 ] X, where Z = ( a), Z = ( b), Z = ( a)( b), 3 n = 8 i= n = 8 y / n 5.3, (9) i= a= y / n = 6.98, b= = X = U Z= [5.74, 8.99,.089] Z. (0) Using the Anderson-Darling goodness-of-fit test for Normality (significance level α=0.05), it was found that It follows from () that ( x µ f ( x) = exp, πσ σ µ = 4.48, σ = 9.85, ( x µ = f ( x) exp, πσ σ µ = 09.39, σ = () () (3) (4) h= 38.49, P ( h ) = , (5) h = ( µ + µ )/= 6.739, Fisher (6) P ( h ) = (7) Fisher Indexes. Thus, the index of relative efficiency of the Fisher approach as compared with the proposed approach is I ( h, h) = P ( h)/ P ( h ) rel.eff. Fisher Fisher = / = (8) The index of reduction percentage in the probability of assification for the proposed approach as compared with the Fisher approach is given by I ( hh, ) red.per. rel.eff. Fisher Fisher = ( I ( h, h))00% = 6.8%. (9) Figure. Pictorial representation of the data of Table, which are not adequate for Fisher s classification rule. A pictorial representation of the above data, which are not adequate for Fisher s classification rule, is given on Fig.. If the points are projected in any direction onto a straight line, there will be almost total overlap. A linear discriminant 3.. Example This example is adapted from a study [0] concerned with the detection of hemophilia A carriers. To construct a procedure for detecting potential hemophilia A carriers, blood samples were assayed for two groups of women and measurements on the two variables: =log 0 (AHF activity)

4 0 Nicholas A. Nechval and Konstantin N. Nechval: Efficient Approach to Pattern Recognition Based on and =log 0 (AHF antigen) recorded. ( AHF denotes antihemophilic factor.) The first group of n =3 women was selected from nown hemophilia A carriers. This group was called the obligatory carriers. The second group of n =9 women were selected from a population of women who did not carry the hemophilia gene. This group was called the normal group. The pairs of observations (y, y ) for the two groups are given in Table and plotted in Fig. 3. Also shown are estimated contours containing 50% and 95% of the probability for bivariate normal distributions centered at y and, respectively. Table. Hemophilia data. Group of obligatory carriers Vector (i) = [log 0 (AHF activity, log 0 (AHF antigen] i y (i) y (i) i y (i) y (i) Group of noncarriers Vector (i) = [log 0 (AHF activity, log 0 (AHF antigen] i y (i) y (i) i y (i) y (i) Step. For transition from high dimensional problem (p=) to one dimensional problem (p=), the following transformation is used: =[ ] X = U, where S S U= S ( ) = ( ) + n n = [ , 38.53], (0) n j S = ( )( ), () j ji ( ) j ji ( ) j n j i= n j = ( ), j ji j =,. () n j i = Using the Anderson-Darling goodness-of-fit test for Normality (significance level α=0.05), it was found that Theorem. If ( x µ f ( x) = exp, πσ σ µ = 7.5, σ = , ( x µ = f ( x) exp, πσ σ (3) (4) (5) µ = , σ = (6) f ( x ) j, j =,, (7) are the probability density functions of the normal distribution with the parameters ( µ, σ ) and( µ, σ ), respectively, where µ < µ, then the necessary and sufficient conditions for h P ( h ) = f ( xdx ) + f ( xdx ) (8) h to have a unique minimum are: (i) the necessary condition for h to be a minimum point of (5) is given by f ( h) = f ( h), (9) (ii) the sufficient condition for h to be a minimum point of (8) is given by µ < h< µ (30). Figure 3. Scatter plots of [log 0 (AHF activity), log 0(AHF antigen)] for the normal group and obligatory hemophilia A carriers. Proof. The proof being straightforward is omitted here. orollary.. If σ = σ, then the separation threshold h

5 American Journal of Theoretical and Applied Statistics 05; 5(-): 7- is determined as h = ( µ + µ )/, (3) i.e., in this case we deal with Fisher s separation threshold. It follows from (8) that h= , P ( h ) = , (3) h = ( µ + µ )/= 53.60, P ( h ) = Fisher Fisher (33) Indexes. Thus, the index of relative efficiency of the Fisher approach as compared with the proposed approach is I ( h, h) = P ( h)/ P ( h ) rel.eff. Fisher Fisher = / = (34) The index of reduction percentage in the probability of assification for the proposed approach as compared with the Fisher approach is given by I ( hh, ) = ( I ( h, h))00% = 33.5%. (35) red.per. Fisher rel.eff. Fisher Step (Pattern recognition (classification) via separation threshold h). At this step, pattern recognition (classification) of a observation is carried out as follows: if X( ) = U < h, if X( ) = U > h. (36) For instance, measurements of AHF activity and AHF antigen on a woman who may be a hemophilia A carrier give y = 0.0 and y = Should this woman be classified as (obligatory carrier) or (normal)? Using Fisher s classification rule, we obtain U = X = [ ][ ] = 88.69< h = (37) Fisher Using the proposed approach based on minimization of assification probability, we obtain U = X = [ ][ ] = 88.69< h= (38) Applying either (37) or (38), we classify the woman as, an obligatory carrier. Thus, Fisher s approach and the proposed one give the same result in the above case. It will be noted that if = () =[ ], then X = > h Fisher = Thus, in this case, Fisher s classification rule gives incorrect classification. 4. onclusion The approach proposed in this paper represents the innovative pattern recognition (classification) procedure based on minimization of assification probability. It allows one to tae into account the cases, which are not adequate for Fisher s classification rule. Moreover, the procedure also allows one to classify the set of multivariate observations, where each of the observations belongs to the same unnown class. This approach has been motivated by a assification problem that appears in various application areas of pattern recognition. References [] R. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugenics, vol. 7, pp , 936. [] K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis. Academic Press, 979. [3] N. A. Nechval, K. N. Nechval, and M. Purgailis, Statistical pattern recognition principles, in International Encyclopedia of Statistical Science, Part 9, Miodrag Lovric, Ed. Berlin, Heidelberg: Springer-Verlag, 0, pp [4] N. A. Nechval, K. N. Nechval, M. Purgailis, V. F. Strelchono, G. Berzins, and M. Moldovan, New approach to pattern recognition via comparison of maximum separations, omputer Modelling and New Technologies, vol. 5, pp , 0. [5] N. A. Nechval, K. N. Nechval, V. Danovich, G. Berzins, Distance-based approaches to pattern recognition via embedding, in Lecture Notes in Engineering and omputer Science: Proceedings of The World ongress on Engineering 04, 4 July, 04, London, U.K., pp [6] R. O. Duda, P. E. Hart, and D. G. Stor, Pattern classification. New or: Wiley. (Second Edition.), 00. [7] S. T. John and. Nello, Kernel Methods for Pattern Analysis. ambridge: ambridge University Press, 004. [8] A.. Rencher, Methods of Multivariate Analysis. John Wiley & Sons. (Second Edition.), 00. [9] T. Sergios and K. Konstantinos, Pattern Recognition. Singapore: Elsevier Ltd. (Third Edition.), 006. [0] B. N. Bouma, et al., Evaluation of the detection rate of hemophilia carriers. Statistical Methods for linical Decision Maing, vol. 7, pp , 975.

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of