Fuzzy Support Vector Machines for Automatic Infant Cry Recognition

Size: px

Start display at page:

Download "Fuzzy Support Vector Machines for Automatic Infant Cry Recognition"

Angel Farmer
5 years ago
Views:

1 Fuzzy Support Vector Machines for Automatic Infant Cry Recognition Sandra E. Barajas-Montiel and Carlos A. Reyes-García Instituto Nacional de Astrofisica Optica y Electronica, Luis Enrique Erro #1, Tonantzintla, Puebla, Mexico {sandybarajas, kargaxxi}@inaoep.mx Abstract. Crying is the only communication way recently born babies have to express their needs. Several studies have shown that infant cry can be a valuable tool to determine the different infant s emotional, and physiological states. With the aim in usefully applying the crying information, in this paper we present the use of Fuzzy Support Vector Machines (FSVM) for two different infant cry recognition tasks. In the first one to identify pathologies, we classify Normal, Deaf, and Asphyxia infant cries. The second problem is about identifying Pain cries, Hunger cries and No-Pain-No-Hunger cries which are those that do not belong to any of the first two classes. Here we show that FSVM perform better than conventional SVM reaching a correct classification accuracy of up to 90%. 1 Introduction Support Vector Machines (SVM) is a recently becoming popular classification technique developed by Vapnik and his group at AT&T Bell laboratories [1]. Experimental results indicate that SVM can achieve a generalization performance that is greater than or equal to other classifiers, while requiring significantly less training data to achieve good results [2]. An SVM is a binary classifier which makes its decisions by constructing a linear decision boundary or hyperplane that optimally separates the two classes. When there is an n-class problem, it is converted into n two-class problems in conventional classification with SVM. In each binary resulting problem a decision function f(x) is constructed to separate certain class from the others, when f(x)>0 for class i, x is classified into that class. In this kind of n-class pattern recognition problems, some times the decision function results positive for more than one class or some times it is less than zero, in both cases the datum is unclassifiable. To overcome this problem, Fuzzy Support Vector Machines (FSVM) were proposed [3]. Truncated polyhedral pyramidal membership functions are defined [4] in the basis on the functions obtained by training the SVM s, and solve unclassifiable regions. This paper presents two different 3-class infant cry recognition tasks. The first one to identify pathologies, we classify infant cries from Normal, Deaf, and Asphyxiating babies. The second problem is about identifying Pain cries, Hunger cries and cries which do not belong to any of these two classes and which constitute the named No-Pain-No-Hunger class. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNCIS 345, pp , Springer-Verlag Berlin Heidelberg 2006

2 Fuzzy Support Vector Machines for Automatic Infant Cry Recognition The Automatic Infant Cry Recognition Process The Automatic Infant Cry Recognition (AICR) process is very similar to the Speech Recognition Process. Basically the AICR can be divided in two stages. The first stage is for signal processing and the second one is for pattern classification. In the signal processing phase, the cry signal is first normalized and cleaned, and then it is analyzed to extract the most important features in function of time In AICR like in any pattern recognition problem, the goal is that given an input pattern we obtain as an output at the end of the recognition process the class to which this pattern belongs. 2.1 Signal Processing Phase The acoustical analysis of the raw cry wave form provides the information needed for its recognition. At the same time, it discards unwanted information such as background noise and channel distortion [5]. In this phase we make a transformation of measured data into pattern data. There are several techniques for analyzing cry wave signals, for the described experiments we used Mel Frequency Cepstral Coefficients. MFCC are similar to the ear s perceptual characteristics. They can be obtained as filtered signals through different frequency scales. The Mel spectrum operates on the basis of selective weighting of the frequencies in the power spectrum. High order frequencies are weighted on a logarithmic scale whereas lower order frequencies are weighed on a linear scale. In this way, MFCC pretend to emulate the filtering properties of the ear, which is much more sensitive to some frequencies than to others [6], [7]. These coefficients are calculated in small frames of the signal on time. Only the first M cepstral coefficients are taken as features. The spectral form is modelled by the first coefficients and their precision depends on the number of coefficients that are taken. The set of values for n features may be represented by a vector in an n- dimensional space, which further on, each is taken as a pattern. 2.2 Pattern Recognition Phase In this phase we determine the class or category to which each cry pattern belongs to. The set of values for n features represented by a vector is divided in two subsets: The training set and the test set. First the training set is used to teach the classifier how to distinguish the different cry types. Then, the test set is used to determine how well the classifier assigns the corresponding class to a pattern by means of the classification rule generated during training. 3 Binary and N-Class Support Vector Machines The binary support vector classifier uses the discriminant function of the following form: b f : X R n R f = α k s +. (1)

3 878 S.E. Barajas-Montiel and C.A. Reyes-García = is the vector of evaluations of kernel func- The k ( ) [ ( ) ( )] s x k x, s1,..., k x, s d tions, centered at the support vector { } n subset of the training data. The binary classification rule { } S = s,..., s d, s i R l α R is a weight vector and R q : X Y = 1, 2 is defined as 1 = 2 for for f f 1 which usually is a > 0, 0. b is a bias. The q (2) The n-class generalization involves a set of discriminant functions R n R, y Y = { 1, 2,..., c} defined as f y x) = y k s + by, y Y Let the matrix A [ α 1,..., α c ] [ b,...,b ] T c = { 1,2,...,c} is defined as f y : X ( α. (3) = be composed of all weight vectors and b = 1 be a vector of all biases. The multi-class classification rule q : X Y q = max f arg. y Y In this formulation, however unclassifiable regions remain, where some f (x) have the same values. Reference [3] propose Fuzzy Support Vector Machines for conventional one to - (n - 1) formulation to solve unclassifiable regions. y (4) 4 Fuzzy Support Vector Machines In this section we present the Fuzzy Support Vector Machines (FSVM) proposed in [3]. FSVM were introduced in order to decide on unclassifiable regions. In an n-class problem, for class i there are defined one-dimensional membership functions m ij (x) on the directions orthogonal to the optimal separating hyper planes f j (x)=0 as follows: For i=j For i j m m 1 for f i > 1, = fi otherwise. ii (5) 1 = f j for f j < 1, otherwise. ii (6)

4 Fuzzy Support Vector Machines for Automatic Infant Cry Recognition 879 The class i membership function of x is defined using the minimum operator for m ij (x)(j,,n): m i min m The datum x is classified into the class = ij. (7) j= 1,..., n arg max i = 1,...,n m i. (8) In realizing the fuzzy pattern classification, it is not necessary to implement the membership functions m i (x) given by (7). The procedure of classification is as follows. 1. For x, if f i (x) > 0 is satisfied for only one class, the input is classified into the class. Otherwise, go to Step If f i (x) > 0 is satisfied for more than one class i(i = i 2,, i l,l>1), classify the datum into the class with the maximum f i ( i { i1,..., i l }). Otherwise, go to Step If f i (x) 0 is satisfied for all classes, classify the datum into the class with the f i x. minimum absolute value of ( ) 5 Implementation For the present experiments we worked with a corpus of patterns of infant cries labeled with information like infant age and the reason for the cry. The infant cries were collected by recordings done directly by medical doctors. After filtering and normalizing, each signal wave was divided in segments of one second duration. Then, acoustic features were extracted by means of Frequencies in the Mel scale (MFCC), with the freeware program Praat v4.0.8 [8]. Every one second sample is divided in frames of 50-milliseconds and from each frame we extract 16 coefficients. This procedure generates vectors with 304 coefficients by sample. In order to reduce the dimensions of the sample vectors we apply Principal Component Analysis (PCA). Our corpus is composed by 209 samples from pain cries, 759 samples of hunger cry, and 659 samples representing the class no-pain-no-hunger, this last set includes the sleepy and uncomfortable types. For the classification of pathologic cry we had a corpus of 1627 samples of normal babies, 879 samples of deaf babies, and 340 samples of asphyxiating babies. All the parameter values of the classifier were established in a heuristic way after completion of several experiments. During the experiments the 10-fold cross validation technique was used to evaluate the performance and reliability of the classifier. 6 Experiments and Preliminary Results Two different classification tasks were performed. In the first one, to identify pathologies, we worked with cry samples belonging to the Normal Deaf Asphyxia (N-D-A) classes. In the second classification task we had a corpus formed by samples of normal babies to identify the Pain-Hunger-No-Pain-No-Hunger (P-H-NPNH) classes. The

5 880 S.E. Barajas-Montiel and C.A. Reyes-García results of the experiments are shown in Table 1 and Table 2, respectively. Each Table shows the percentage of correct classification using SVM and FSVM. In each classification task different number of principal components (PC) was used, here the number of PC tested in the experiments was 2, 3, 10, 16 and 50 respectively. The experiments show that the best results are achieved when 10 PC and FSVM were used. Table 1. Results of Normal Deaf Asphyxia (N-D-A) infant cry classification with Support Vector Machines and Fuzzy Support Vector Machines Problem (N-D-A) % Classification Accuracy PCA2 PCA3 PCA10 PCA16 PCA50 SVM FSVM Table 2. Results of Pain-Hunger-No-Pain-No-Hunger (P-H-NPNH) infant cry classification with Support Vector Machines and Fuzzy Support Vector Machines Problem (N-D-A) % Classification Accuracy PCA2 PCA3 PCA10 PCA16 PCA50 SVM FSVM When working with pathological and normal cry samples the maximum correct classification obtained was %. The poorest classified class was asphyxia, perhaps because of its lower number of samples abailable. In the second classification task (P-H-Np_Nh) the best classification score was % and the class with more identification problems was the hunger class. One reason might be that this class presents characteristics similar to the uncomfortable cries. 7 Conclusions In this paper we present the automatic classification of infant cry by means of Fuzzy Support Vector Machines. Fuzzy Support Vector Machines were introduced to solve unclassifiable regions that remain with the use of conventional Support Vector Machines. We worked with two different 3-class problems to classify infant cry, in the first one to identify pathologies, the cry samples were divided into the Normal, Deaf, and Asphyxia classes (N-D-A); in the second one we used samples only of normal babies labeled to identify the Pain, Hunger, and No-Pain-No-Hunger classes (P-H-NPNH). Particularly, in the kind of problems we explore in this work, the Fuzzy Support Vector Machines showed improvement in classification performance over conventional SVM. We obtained the best correct classification in both classification tasks when using 10 principal component vectors. In the N-D-A for SVM problem we obtained % of correct classification, and for FSVM %, an average improvement of 0.21% on classification accuracy. In the P-H-NpNh problem the correct classification percentage obtained by SVM was 97.79%; and % by FSVM, which shows a 0.03 % of average improvement between the models. The infant cry

6 Fuzzy Support Vector Machines for Automatic Infant Cry Recognition 881 correct classification results obtained with FSVM until this moment are very encouraging. We think that with a larger number of samples we could be able to generalize better our results in order to be closer to end up with a robust system that can be applicable to real life, and to other pathologies related with the central nervous system. The collection of more samples will also allow us to include a larger number of normal cry classes, and perhaps we could deal also with the identification of deafness levels. Acknowledgments This work is part of a project that is being financed by CONACYT-Mexico (C ). References 1. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning, Vol. 20 (1995) Wan, V., Campbell, W.M.: Support Vector Machines for Speaker Verification and Identification, IEEE International Workshop on Neural Networks for Signal Processing, Sydney, Australia (2000) 3. Inoue, T., Abe, S.; Fuzzy Support Vector Machines for Pattern Classification. Proceedings of International Joint Conference on Neural Networks (IJCNN 01), Vol. 2 (2001) Abe, S.: Pattern Classification; Neuro-Fuzzy Methods and their Comparison, Springer- Verlag, London (2001) 5. Livinson, S.E., Roe, D.B.: A Perspective on Speech Recognition, IEEE Communications Magazine, (1990) Orosco, J., Reyes, CA.: Mel-Frequency Cepstrum Coefficients Extraction from Infant Cry for Classification of Normal and Pathological Cry with Feed-Forward Neural Networks, Proc. International Joint Conference on Neural Networks. Portland, Oregon, USA (2003) Reyes, O., Reyes, CA.: Clasificación de Llanto de Bebés para Identificación de Hipoacusia y Asfixia por medio de Redes Neuronales, Proc. of the II Congreso Internacional de Informática y Computación de la ANIEI, Zacatecas, México (2003) Vojtech, F., Vaclav, H.: Statistical Pattern Recognition Toolbox, Czech Technical University, Prague (1999) 9. Boersma, P., Weenink, D.: Praat v 4.0.8: A System for Doing Phonetics by Computer. Institute of Phonetic Sciences of the University of Amsterdam (2002)

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern