CHAPTER 3 FEATURE EXTRACTION USING GENETIC ALGORITHM BASED PRINCIPAL COMPONENT ANALYSIS

Size: px

Start display at page:

Download "CHAPTER 3 FEATURE EXTRACTION USING GENETIC ALGORITHM BASED PRINCIPAL COMPONENT ANALYSIS"

Hortense Todd
5 years ago
Views:

1 46 CHAPTER 3 FEATURE EXTRACTION USING GENETIC ALGORITHM BASED PRINCIPAL COMPONENT ANALYSIS 3.1 INTRODUCTION Cardiac beat classification is a key process in the detection of myocardial ischemic episodes in the electrocardiographic signal. Myocardial ischemia is caused by insufficient blood flow to the muscle tissue of the heart. This reduced blood supply may be due to narrowing of the coronary arteries, obstruction by a thrombus, or, less commonly, due to diffuse narrowing of arterioles and other small vessels within the heart. Ischemia is one of the leading causes of death in modern societies and, as a consequence, its early diagnosis and treatment is of great importance. In the ECG signal, ischemia is expressed as slow dynamic changes of the ST segment and/or the T wave. Long duration electrocardiography, like Holter recordings or continuous ECG monitoring in the coronary care unit, is a simple and noninvasive method to observe such alterations. The development of suitable automated analysis techniques can make this type of ECG recording very effective in supporting the physician s diagnosis and guide patient management in clinics and clinical applications. The accurate ischemic episode detection in the recorded ECG is based on the correct classification of the ischemic cardiac beats. Several techniques have been proposed for ischemic beat classification, which evaluate the ST segment changes and the T-wave alterations with different methodologies.

2 PREPROCESSING OF ECG The main aim of the ECG signal preprocessing is to prepare a compact description of the ST T complex, composed from the ST Segment and the T wave, for input to the classification methodology with the minimum loss of information. Until now, ECG recordings that are used for the diagnosis of ischemic episodes are affected by noise, which deteriorates significantly the diagnostic accuracy. Better handling of the noisy ECGs can improve the accuracy of the diagnostic methods and increase their applications in every day practice. There are three types of noise in the ECG signal: (a) (b) (c) Power line interference (A/C interference), Electromyographic contamination (EMG noise), and Baseline wandering (BW). A/C interference contaminates the ECG signal with main frequency interference, which sometimes is phase-shifted with respect to the main voltage (50 or 60 Hz). EMG noise is correlated with muscle contraction and overlaps with the frequency spectrum of the ECG signal. It is obvious that the removal of the EMG noise alters also the original ECG signal. Finally, the baseline wandering is caused by respiration and motion artifacts and generally is a low frequency noise. 3.3 FEATURE EXTRACTION USING GENETIC PRINCIPAL COMPONENT ANALYSIS This section describes the feature extraction process from the beat signals extracted from the electrocardiograms. Here we have used two methods for feature extraction, namely PCA and GPCA. The main goal of this

3 48 work is to develop algorithms to automatically detect ischemia episodes. For this purpose, features based on ST segment deviation T wave and QRS complex morphology changes were extracted Feature Extraction Using Principal Component Analysis Principal Components Analysis (PCA) is an exploratory multivariate statistical technique for simplifying complex data sets. The PCA transformation is selected as the tool for reducing the dimensionality of the extracted ST-T samples. The PCA decomposition is in terms of second order statistics optimum, in the sense that it permits an optimal reconstruction of the original data in the mean-square error sense (subject to the dimensionality constraint). The PCA transformation describes the original vectors (ST-T complexes) according to the direction of maximum variance reduction in the training set. The latter information is obtained by analyzing the data covariance matrix. The orthogonal eigenvectors of the covariance matrix are selected as basic functions for the signal projection operation. The corresponding eigenvalues represent the average dispersion of the projection of the input vectors onto the corresponding eigenvectors (basis functions). The numerical value of each eigenvalue quantifies the amount of variance that is accounted for by projecting the signal onto the corresponding eigenvector. Accordingly, it represents the contribution of the eigenvector s analysis direction to the signal reconstruction in the mean squared error sense. For the analysis of ECG signal the eigenvalues after the fifth have very small numerical values. Thus, for the representation of ST-T Complex the first five PCA coefficients were used to characterize about 97.9% of the signal energy. A small performance improvement has been observed by using the first five PCA coefficients instead of four. The five principal components extracted from the corresponding ST-T Complex are assigned to each QRS fiducial point. The first principal component (PC) and the second one (but to a less

4 49 extent) represent the dominant low-frequency component of the ST-T Complex; the third, fourth, and fifth contain more high-frequency energy. In the time series representation of the PCs the ischemic episodes appear as peaks. A straightforward way for the detection of ischemic beats from the PCA representation is to use as the input vector the PCA coefficients of a single beat. This approach clearly accounts only for local information. Therefore, a better approach that can extract also morphological information from the ST-T episodes in such a way to distinguish artifacts and to appreciate even weak ST episodes is necessary. This type of approach should take into account the information from a sequence of beats instead of a single beat. Given n observations on m variables, the goal of PCA is to reduce the dimensionality of the data matrix by finding r new variables, where r is less than m. Principal components project high dimensional data into the subspace spanned by the eigenvectors with the r largest eigenvalues while remaining mutually uncorrelated and orthogonal. Each principal component is a linear combination of the original variables. The algorithm to obtain the Principal Components of a vector set X represented by a X N M matrix, where N represents the number of segments, and M represents the dimension of the vectors that constitute the vector set. The algorithm of PCA is explained as below: N a. Obtain the Mean vector (): 1 1 N i0 x x i b. Obtain the Covariance Matrix: C 1 N N 1 T x (x i i )(x i i ) i0 c. Obtain the eigenvectors and eigenvalues: C x e e where e is eigenvector and is eigenvalue.

5 50 d. After creating the eigenspace we can proceed to recognition. Given a new beat of an individual, the signals are concatenated the same way as the training, the mean vector is subtracted and the result is projected into the face space: k e T k ( ) for k=1,..,m. These calculated values of together form a vector T = [ 1, 2,, M ]. is then used to establish which of the pre-defined classes best describes the new signal. The simplest way to determine class k that minimizes the Euclidian distance: k k 2 where k is a vector describing the k th signal class. A signal is classified as belonging to a certain class when the minimum k (i.e. the maximum matching score) is below some certain threshold. Choosing components and forming a feature vector: From the experiments get 2040 components corresponding to the dimensionality of the input sequence. Components that are significant from the point of view of contribution to the total energy of the signal are selected. The selected components together must constitute about 99% of the total energy of the signal. This procedure decreases the data dimensionality without significant loss of information. There are at least three proposed ways to eliminate eigenvectors. First is the mentioned elimination of eigenvalues with smallest eigenvalues. This can be accomplished by discarding the last 60% of total number of eigenvectors.

6 51 The second way is to use the minimum number of eigenvectors to guarantee that energy E is greater than a threshold. A typical threshold is 0.9 (90% of total energy). If we define E i as the energy of the i th eigenvector, it is the ratio of the sum of all eigenvalues up to and including i over the sum of all the eigenvalues: where k is the total number of eigenvectors. E i i j1 j k j1 j The third variation depends upon the stretching dimension. The stretch for the i th eigenvector is the ratio of that eigenvalue over the largest eigenvalue ( I ): S i = i / I In our proposed method, Genetic Algorithm (GA) is used to select the best eigenvectors Genetic Algorithm Approach Genetic Algorithm is an adaptive heuristic method of globaloptimization searching and it simulates the behaviour of the evolution process in nature. It maps the searching space into a genetic space. That is, every possible key is encoded into a vector called a chromosome. One element of the vector represents a gene. All of the chromosomes make up of a population and are estimated according to the fitness function. A fitness value will be used to measure the fitness of a chromosome. Initial populations in the genetic process are randomly created. GA then uses three operators to produce a next generation from the current generation: reproduction, crossover, and mutation. GA eliminates the chromosomes of low fitness and keeps the ones of high fitness. This whole process is repeated, and more

7 52 chromosomes of high fitness move to the next generation, until a good chromosome (individual) is found. The main objective of genetic feature selection stage is to reduce the dimensionality of the problem before the supervised inductive learning process. Among the many wrapper algorithms used, the Genetic Algorithm (GA), which solves optimization problems using the methods of evolution, specifically survival of the fittest, has proved as a promising one. GA evaluates each individual s fitness as well the quality of the solution. The fitter individuals are more eligible to enter into the next generation as a population. After a required number of generations the final set of optimal population with fittest chromosomes will emerge giving the solution. The process of selection, crossover and mutation continues for a fixed number of generations or till a termination condition is satisfied. Genetic algorithms have been used for selecting the optimal subspace in which the projected data gives higher recognition accuracy Genetic Principal Component Analysis The input data is transformed to higher dimension using a nonlinear transfer function (polynomial function) and GA is used to select the optimal subset of the non-linear principal components with the fitness function taken as the recognition performance. As explained in the previous section there is three possible ways to eliminate eigenvectors. Here, the GA is used to select the best eigenvectors for PCA. In general, M number of eigenvectors having highest eigenvalues will be selected. The main drawback of general PCA is that we can t expect an equal contribution of principal components from each class. And, the principal components are selected based only on highest eigenvalues. In this proposed method, we are going to choose only F number of eigenvectors, for each class, the reduced feature set will contain S=NC F number of features, where NC is the number of classes. In this case, it has two

8 53 classes: ischemic and non-ischemic. The basic idea here is, instead of choosing highest eigenvectors from the entire eigenspace, we are going to choose the best eigenvector for each class based on Euclidian distance. Initially the eigenspace are grouped based on number of classes. For each class the principal components are selected using GA as discussed below: Initially, the eigenspace are grouped based on number of classes (NC). An n number of eigenvector is selected from each class at random. The index of each eigenvector is used to construct one chromosome. Similarly N number of chromosomes is generated. (N=10). For example, consider the chromosome: Each integer represents one eigenvector. The first 980 stands for the 980 th eigenvector from the first class, the 24 in the second row represents the 24 th eigenvector from the second class. The total length of the chromosome is equal to the number of principal components required. Here, we kept the size as 600. For each chromosome, the Euclidian distance within the class (W) and between the classes (B) has been calculated. The fitness value is calculated as: f(x) = B / W The chromosome which has the minimum fitness value (Gmin) is stored as the best eigenvector set. Then the genetic operators are applied to search for the optimum set. Reproduction (selection) The selection process selects chromosomes from the mating pool directed by the survival of the fittest concept of natural genetic systems. In the proportional selection strategy

9 54 adopted in this article, a chromosome is assigned a number of copies, which is proportional to its fitness in the population that goes into the mating pool for further genetic operations. Roulette wheel selection is one common technique that implements the proportional selection strategy. Crossover is a probabilistic process that exchanges information between two parent chromosomes for generating two child chromosomes. In this work, single point crossover with a fixed crossover probability of p c =0.6 is used. For chromosomes of length l, a random integer, called the crossover point, is generated in the range [1, l-1]. The portions of the chromosomes lying to the right of the crossover point are exchanged to produce two offspring. Mutation Each chromosome undergoes mutation with a fixed probability p m = For binary representation of chromosomes, a bit position (or gene) is mutated by simply flipping its value. Since we are considering real numbers, a random position is chosen in the chromosome and replaced by a random number between 0-9.The new populations is generated after the genetic operators are applied. The current best eigenvector set is (Lmin) selected from the new population and compared with the global one. If the global set contains minimum fitness value then the local, the next iteration is continued with the old population. Otherwise, the current population is considered for the next iteration. This process is repeated for k number of iterations. Figure 3.2 shows a flow chart for Genetic PCA based feature Extraction The algorithm is given as: 1. Construct the initial population (p1) with random eigenvectors 2. Calculate the fitness value (x) = B / W 3. Find out the Global minimum (Gmin)

10 55 4. For i = 1 to k do a. Perform reproduction b. Apply the crossover operator between each parent. c. Perform mutation and get the new population. (p2) d. Calculate the local minimum (Lmin) e. If Gmin > Lmin then i. Gmin = Lmin ii. p1 = p2 5. Repeat Figure 3.1 A Flow chart for genetic PCA based feature extraction

11 GENETIC PCA FOR ISCHEMIC BEATS CLASSIFICATION Electrocardiography is a significant tool in analyzing the condition of the heart. The ECG is the record of discrepancy of bioelectric potential with respect to time as the human heart beats. It provides most valuable information about the functional characteristics of the heart and cardiovascular system. Myocardial ischemia is one of the diseases with highest incidence rate in the industrialized countries. Prolonged severe or repeated ischemic episodes can provoke irreversible damage to the cardiac tissue. ECG analysis is not the most accurate method that exists to detect the ischemic events. We proposed an improved version of PCA for feature extraction. Here, the Genetic Algorithm (GA) is combined with PCA to extract more relevant features. A Back propagation Neural Network is used to classify the beats into either ischemic or non-ischemic, with the features from the GPCA. Figure 3.2 Block diagram for GPCA based ischemic beats classification The classifier employed in this work is a three-layer Back Propagation Neural Network. The BPN optimizes the net for correct responses to the training input data set. More than one hidden layer may be beneficial for some applications, but one hidden layer is sufficient if enough hidden neurons are used. Initially the features from the textural analysis method, are normalized between [0,1]. That is, each value in the feature set is divided by the maximum value from the set.

12 57 Input Neurons W ih Hidden Neurons S 1 Output Neuron S 2 W ho Figure 3.3 A Three-layer back propagation network These normalized values are assigned to the input neurons. The number of hidden neurons is equal to the number of input neurons and only one output neuron. Figure 3.3 shows a Three-Layer Back propagation Network for classification. Initial weights are assigned randomly between [- 0.5 to 0.5]. The output from the each hidden neuron is calculated using the sigmoid function, S 1 = 1 / ( 1 + e -x ) where =1, and x = i w ih k i, where w ih is the weight assigned between input and hidden layer, and k is the input value. The output from the output layer is calculated using the sigmoid function, S 2 = 1 / ( 1 + e -x ) where =1, and x = i w ho S i, where w ho is the weight assigned between hidden and output layer, and S i is the output value from hidden neurons. S 2 is subtracted from the desired output. Using this error (d) value, the weight change is calculated as: delta = d * S 2 * ( 1 S 2 )

58 and the weights assigned between input and hidden layer and hidden and output layer are updated as: W ho = W ho + ( n * delta * S 1 ) W ih = W ih + ( n * delta * k) where n is the learning rate, k

13 58 and the weights assigned between input and hidden layer and hidden and output layer are updated as: W ho = W ho + ( n * delta * S 1 ) W ih = W ih + ( n * delta * k) where n is the learning rate, k is the input values. Again calculate the output from hidden and output neurons. Then check the error (d) value, and update the weights. This procedure is repeated till the target output is equal to the desired output. The network is trained to produce a 1.0 output value for ischemic and 0.1 output value for non-ischemic. The classification performance is validated using the ten-fold validation method and the results were analyzed by using ROC analysis. Figure 3.4 A Flow chart for threelayer Back propagation Neural Network Classifier. Figure 3.4 A Flow chart for back propagation neural network classifier

14 RESULTS AND DISCUSSION The European ST-T Database is used for evaluation of our proposed algorithm. This database consists of 90 annotated excerpts of ambulatory ECG recordings from 79 subjects. The subjects were 70 men aged 30 to 84, and 8 women aged 55 to 71. The database includes 367 episodes of ST segment change, and 401 episodes of T-wave change. Each record is two hours in duration and contains two signals, each sampled at 250 samples per second with 12-bit resolution over a nominal 20 millivolt input range. Two cardiologists worked independently to annotate each record beat-by-beat and for changes in ST segment and T-wave morphology, rhythm, and signal quality. ST segment and T-wave changes were identified in both leads (using predefined criteria which were applied uniformly in all cases), and their onsets, extrema and ends were annotated. Annotations made by the two cardiologists were compared, disagreements were resolved by the coordinating group in Pisa, and the reference annotation files were prepared; altogether, these files contain 802,866 annotations. Over half (48 of 90 complete records, and reference annotation files for all records) of this database is freely available from PhysioNet. In this paper, we have taken the full length ECG signals from 17 patients and each signal will be translated into 120 samples and totally 2040 beats for short duration and 20,400 beats extracted for long duration analysis. This dimensionality is reduced by GPCA as discussed in the earlier section. Figure3.5 shows the Comparison of sensitivity at each fold from our proposed and feature extraction existing methods. As shown in the figure, it is noted that GPCA output performance with consistent and improved results.

15 60 Percentage PCA FUZZY GPCA Sensitiv Figure 3.5 Comparison of sensitivity with feature extraction methods Figure 3.6 shows the A z value of existing and the proposed methods for automated ischemic beat classification. The area under the ROC curve is an important criterion for evaluating diagnostic performance. Usually it is referred as the AZ index. The AZ value of ROC curve is just the area under the ROC curve. The value of AZ is 1.0 when the diagnostic detection has perfect performance, which means that TP rate is 100% and FP rate is 0%. Table 3.1 performance analysis of ischemic beat classification with sensitivity and A z Value Az value PCA FUZZY GPCA Figure 3.6 Comparison of A z value with feature extraction methods

16 61 Table 3.1 Performance analysis of ischemic beat classification Methods Sensitivity A z Value Principal Component Analysis 80% 0.78 Fuzzy Approach 81% 0.80 Genetic based PCA 92% 0.90 The Receiver Operating Characteristic (ROC) curve is one of the performance measures for classification. ROC curves measure predictive utility by showing the trade off between the true-positive rate and the false-positive rate inherent in selecting specific thresholds on which predictions might be based. The area under this curve represents the probability that, given a positive case and a negative case, the classifier rule output will be higher for the positive case and it is not dependent on the choice of decision threshold. Figure 3.7 shows the ROC curves for comparison of classification performances for the proposed method. Figure 3.7 ROC curve analysis of ischemic beat classification

17 62 It conveniently displays diagnostic accuracy expressed in terms of sensitivity (or true-positive rate) against (1 - specificity) (or false-positive rate) at all possible threshold values. Performance of each test is characterized in terms of its ability to identify true positives while rejecting false positives, with the following definitions: False Positive Fraction (FPF) True Positive Fraction (TPF) = FP / (TN FP) = TP / (TP FN) True Negative Fraction (TNF) = TN / (TN FP) False Negative Fraction (FNF) = FN / (TP FN) Where TP, TN, FP, and FN are the numbers of true positive, true negative, false positive, and false negative test results, respectively. Note that because every actual positive results in either a true positive or a false negative, while every actual negative results in either a true negative or a false positive, TPF is the ratio of true positives (actually positive and reported positive) to actual positives, and TNF is the ratio of true negatives to actual negatives. Two other quantities of interest for performance characterization are defined in terms of the above quantities, as follows: Sensitivity = TPF Specificity = TNF = 1.0 FPF Choosing a value of threshold c defines an operating point, at which the test has a particular combination of sensitivity and specificity. A plot of TPF versus FPF for all possible operating points is the ROC curve for test X, which makes explicit the trade off between sensitivity and specificity for the test. Both TPF and FPF range from 0 to 1, so the ROC is often plotted within a unit square. The results shown that our proposed GPCA method extracts more relevant features than the linear PCA and other methods.

18 63 Table 3.2 The value of sensitivity at each fold for different extraction methods Fold PCA (%) Fuzzy (%) GPCA (%) Table 3.2 shows the sensitivity at each fold. Here the ten-fold validation method has been applied for analyzing the performance with the linear PCA. The sensitivity is higher than that of the previously described algorithms while the Az value is better than other method. Table 3.3 shows the performance of GPCA for Detection rate. An average sensitivity, specificity and classification accuracy obtained by the evolved BPNNs was approximately 91%, % and 90.24% respectively.

19 64 Table 3.3 Performance of GPCA for Detection rate Record No of No of Detection Fp Se Sp Acc Number normal beats abnormal beats TP FP TN FN Rate % rate % (%) (%) (%) EO EO EO EO EO EO EO EO EO EO EO EO EO EO EO EO EO Total Table 3.4 shows the performance analysis of accuracy at each fold. The average testing and training accuracy was obtained 93.58% and 90.14% respectively. The current approach is able to clarify the type of each detected episode (different types of ST segment vs. T-wave changes) with high rates of sensitivity, specificity and accuracy. Table 3.5. Shows the performance analysis of GPCA for Long Duration ECGs.

20 65 Table 3.4 Performance analysis of accuracy at each fold Fold No of Training Beats No of Testing Beats Normal Abnormal Normal Abnormal Training Accuracy Testing Accuracy F F F F F F F F F F Total Average Value Table 3.5 Performance analysis of GPCA for Long Duration ECGs Total No of Beats No of Normal Beats No of Abnormal Beats TP FP TN FN Detectio n rate % Fp rate % Se % Sp % Accuracy (%)

21 SUMMARY OF CONTRIBUTION In this work, an enhanced version of PCA in ischemia detection has proposed. The Genetic Algorithm (GA) is combined with PCA to select more relevant principal components from the feature set vector of ECG signals. Initially, the features are extracted from the ECG signals as eigenvectors and eigenvalues. As we are having large number of samples, the dimensionality of this vector space is reduced with the proposed Genetic based Principal Component Analysis (GPCA). These extracted features are fed into a three layer BPN to classify the beats into ischemic or non-ischemic. The results showed that the proposed GPCA method extracts more relevant features than linear PCA and long duration ECG analysis.

Lecture 9 Evolutionary Computation: Genetic algorithms

Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic