Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Size: px

Start display at page:

Download "Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms."

Kelley Hart
6 years ago
Views:

1 Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. January 5, 25

2 Outline Methodologies for the development of classification algorithms The binary classification problem The ROC curve theory Principal Component Analysis (PCA) The volcanic ash granulometry classification problem Falls and ADL classification problem

Design of the granulometry classificator Methodology: piezoelectric to convert ash impacts in a voltage signal volcanic ash Modeling 45 piezoelectric transducer F

3 Design of the granulometry classificator Methodology: piezoelectric to convert ash impacts in a voltage signal volcanic ash Modeling 45 piezoelectric transducer F t Hertz law F p A RA va k P k A 7BB-35-3L by Murata F p F V out A e t n 2 sin 2 f t n g 33 = 22e-3 Vm/N d =.23e-3 m S=4.55e-4 m 2 A g 33 F p d S

4 Experimental evidence The peak of the output signal can be related to the ash size

5 Frequency Granulometry classification Looking at the distributions of the peaks of the output voltage we can affirm that there is a good separation between the three classes of granulometry. A partial overlapping is observed. We can face the classiffication problem as two separated binary classification problems Th Th2 medium small big Output Voltage (V) How to fix the optimal threshold values?

6 The ROC curve theory

The binary classification problem Let us consider a two-class prediction problem (binary classification) s A B Example: x (mg/ml) is a substance in the blood higher

7 The binary classification problem Let us consider a two-class prediction problem (binary classification) s A B Example: x (mg/ml) is a substance in the blood higher values of x lower values of x Target: to fix a value s (threshold) Possible pathology B Physiological state A x < s probably belonging to A x > s probably belonging to B

8 The binary classification problem Let us consider a two-class prediction problem (binary classification) s A B Example: x (mg/ml) is a substance in the blood higher values of x lower values of x Target: to fix a value s (threshold) Possible pathology B Physiological state A x < s probably belonging to A x > s probably belonging to B

9 The binary classification problem Let us consider a two-class prediction problem (binary classification) s A B Example: x (mg/ml) is a substance in the blood higher values of x lower values of x Target: to fix a value s (threshold) Possible pathology B Physiological state A x < s probably belonging to A x > s probably belonging to B

10 The binary classification problem Let us consider a two-class prediction problem (binary classification) s A B How to fix the optimal threshold value? ROC curves theory Principal Component Analysis (PCA)

11 frequency frequency frequency ROC curves theory The binary classification problem Let us consider a two-class prediction problem (binary classification) Negative s For every possible cut-off point or criterion (s) value adopted to discriminate between two populations, there will be four possible outcomes: Positive x some instances correctly classified as positive (TP = True Positive) some instances wrongly classified as negative (FN = False Negative) some instances correctly classified as negative (TN = True Negative) some instances wrongly classified as positive (FP = False Positive) x x

Actual frequency ROC curves theory The binary classification problem Let us consider a two-class prediction problem (binary classification) There are four possible outcomes, given a value for the

12 Actual frequency ROC curves theory The binary classification problem Let us consider a two-class prediction problem (binary classification) There are four possible outcomes, given a value for the discrimination threshold : True Negative, True Positive, False Positive, False Negative s Confusion matrix (Error/Contingency table) Negative n Predicted p Positive N TN FP x P FN TP TP sensitivity recall TP FN TN specificity TN FP Inversely proportional (by changing s) TP TN accuracy TP TN FP FN TP precision TP FP...and more others.

13 frequency frequency Sensitivity, Specificity ROC curves theory The binary classification problem 2 Example The ideal binary classification problem.8 Sensitivity Specificty x x There is at least one value for the discrimination threshold that gives: TP TN sensitivity specificity TP FN TN FP

14 frequency Sensitivity, Specificity ROC curves theory The binary classification problem Example 2 The worse binary classification problem Sensitivity Specificty x The two classes are perfectly overlapped s

15 Sensitivity, Specificity frequency ROC curves theory Example 3 The binary classification problem s TN FN TP FP Sensitivity Specificity The real binary classification problem x x By moving the threshold value s from left to right Sensitivity Specificity Trade-off between Sensitivity and Specificity Sometimes we need maximum sensitivity Sometimes we need maximum specificity s

16 Sensitivity, Specificity frequency ROC curves theory The binary classification problem Example 3 s TN FN TP FP Sensitivity Specificity x.8 Sensitivity Specificity By moving the threshold value s from left to right s

True Positive Rate (sensitivity) ROC curves theory ROC Curves: definition True Positive Rate( TPR) TP TP FN Sensitivity False Positive Rate( FPR) FP Specificity TN FP.9.8.7.

17 True Positive Rate (sensitivity) ROC curves theory ROC Curves: definition True Positive Rate( TPR) TP TP FN Sensitivity False Positive Rate( FPR) FP Specificity TN FP In signal detection theory, a Receiver Operating Characteristic (ROC) curve, is a graphical plot of the TPR (Sensitivity) vs. FPR ( - Specificity) for a binary classifier system as its discrimination threshold is varied. An ROC curve is a two-dimensional depiction of classifier performance ROC curve no discrimination line False Positive Rate (-specificity)

18 frequency frequency True Positive Rate (Sensitivity) ROC curves theory 2 ROC Curves: definition x Sensitivity Specificity x.4.2 ROC curve no dicrimination line False Positive Rate (-Specificity) TPR TP Sensitivity TP FN FP FPR TN FP Specificity

19 frequency True Positive Rate (Sensitivity) ROC curves theory ROC Curves: Area Under Curve To compare classifiers we may want to reduce ROC performance to a single scalar value representing expected performance. 2.8 AUC= 8 A B x.2 ROC curve no dicrimination line False Positive Rate (-Specificity) The AUC of a classifier is equivalent to the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance.

20 Sensitivity, Specificity frequency True Positive Rate (Sensitivity) ROC curves theory ROC Curves: Area Under Curve ROC curve no discrimination line x.8.6 Sensitivity Specificity.4.2 AUC= False Positive Rate (-Specificity) s

21 Sensitivity, Specificity frequency True Positive Rate (Sensitivity) ROC curves theory ROC Curves: Area Under Curve AUC= x.8.6 Sensitivity Specificity.2 ROC curve no discrimination line False Positive Rate (-Specificity) s

ROC curves theory ROC Curves: Area Under Curve It is possible for a high-auc classifier to perform worse in a specific region of ROC space than a low-auc classifier.

22 ROC curves theory ROC Curves: Area Under Curve It is possible for a high-auc classifier to perform worse in a specific region of ROC space than a low-auc classifier. Classifier B has greater area and therefore better average performance. Classifier B is generally better than A except at FPrate >.6 where A has a slight advantage. T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27 (26)

23 True Positive Rate (Sensitivity) True Positive Rate (Sensitivity) Frequency ROC curves for the volcanic ash granulometries classification 35 3 Th Th2 medium small big Output Voltage (V) ROC curve for small-medium granulometries ROC curve for medium-large granulometries small-medium ash size worst case False Positive Rate (-Specificity). medium-large ash size worst case False Positive Rate (-Specificity)

24 True Positive Rate (Sensitivity) ROC curves theory ROC Curves: Optimal Threshold How to choose the optimal cutoff? ) Minimum distance d..8 d d 2 2 TPR FPR Sensitivity 2 Specificity 2.6 J Good compromise between Sensitivity and Specificity.4.2 ROC curve no discrimination line False Positive Rate (-Specificity) TPR=FPR 2) Youden index J. Maximum vertical distance between no discrimination line and the generic point (x,y) on the ROC curve max TPR FPR max Sensitivity Specificity

25 Actual Accuracy ROC curves theory How to choose the optimal cutoff? ROC Curves: Optimal Threshold 3) Maximum Accuracy Confusion matrix (Error/Contingency table) n Predicted p N TN FP P FN TP Accuracy TP TN TP TN FP FN Threshold s

26 y Feature-3 The Principal Component Analysis (PCA) What is PCA? It is a way of identifying patterns in data (when graphical representation is not available), and expressing the data in such a way as to highlight their similarities and differences. 4 3 data set data set 2 Feature- VS Feature-2 VS Feature FF FB FL SI SDR SDL SUR SUL LD x Feature Feature-2

27 The Principal Component Analysis (PCA) What is PCA? PCA is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than (or equal to) the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (i.e., uncorrelated with) the preceding components. Principal components are guaranteed to be independent if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables.

28 PCA: Applications Exploratory data analysis PCA is used for making 2,3-dimensional plots of the data for visual examination and interpretation - Detection of outliers - Identification of clusters Data preprocessing, dimensionality reduction Data is often described by more variables then necessary for building the best model. Specific techniques exist for selecting a good subset of variables. PCA is one of them. Reducing the number of variables generally leads to loss of information PCA makes this loss minimal Data compression, data reconstruction (lossy) data compression technique The table describing the data with first k principal components is smaller than original data table discrete Karhunen Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering, singular value decomposition (SVD) of X eigenvalue decomposition (EVD) of XTX in linear algebra, factor analysis, Eckart Young theorem or Schmidt Mirsky theorem in psychometrics, empirical orthogonal functions (EOF) in meteorological science, empirical eigenfunction decomposition, empirical component analysis, quasiharmonic modes, spectral decomposition in noise and vibration, empirical modal analysis in structural dynamics.

29 PCA Step by Step (/6) Step : Get some data Step 2: Subtract the mean Example: Standardization z x x C R x C z x R z C z This produces a data set whose mean is zero.

30 PCA Step by Step (2/6) Step 3: Calculate the covariance matrix Covariance is a measure of how much the dimensions vary from the mean with respect to each other. >, the variable increase together

31 PCA Step by Step (3/6) Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix What do they mean? these eigenvectors are both unit eigenvectors ie. their lengths are both This is very important for PCA The data has a strong pattern. As expected from the covariance matrix, the two variables do increase together. The eigenvectors appear as perpendicular diagonal dotted lines on the plot. More importantly, they provide us with information about the patterns in the data. One of the eigenvectors goes through the middle of the points, like drawing a line of best fit. That eigenvector is showing us how these two data sets are related along that line. The second eigenvector gives us the other, less important, pattern in the data, that all the points follow the main line, but are off to the side of the main line by some amount.

32 Eigenvectors and Eigenvalues a scaled eigenvector is still and eigenvector No Yes y 5 3 eigenvalues x

33 PCA Step by Step (4/6) So, by this process of taking the eigenvectors of the covariance matrix, we have been able to extract lines that characterise the data. The rest of the steps involve transforming the data so that it is expressed in terms of them lines. Step 5: Choosing components and forming a feature vector order eigenvectors by eigenvalue, highest to lowest the eigenvector with the highest eigenvalue is the principle component of the data set. Now, if you like, you can decide to ignore the components of lesser significance and to form a feature vector.

34 PCA Step by Step (5/6) In our example we have two choice: Step 6: Deriving the new data set take the transpose of the vector and multiply it on the left of the original data set, transposed the matrix with the eigenvectors transposed (the eigenvectors are now in the rows), with the most significant eigenvector at the top the mean-adjusted data transposed It will give us the original data solely in terms of the vectors we choose.

35 PCA Step by Step (6/6)

Feature 4 FEATURE 4 (CAD. INCLINATO) Principal component 4 COMPONENTE PCA based classification algorithm Example of classification of events (Activities of Daily Living, ADL).

36 Feature 4 FEATURE 4 (CAD. INCLINATO) Principal component 4 COMPONENTE PCA based classification algorithm Example of classification of events (Activities of Daily Living, ADL). Features = sets of data we want to classify INPUT Features PCA= Principal Component Analysis PCA OUTPUT Incorrelated variables.9.8 Step up event Before PCA.5. After PCA Step up event FEATURE (CAD. IN AVANTI ) Feature cad av cad av2 cad av3 cad incl cad ind Before PCA these events cant be classified because overlapping with an other kind of event (Step down) cad lat salit gr disc gr seduta COMPONENTE 4 Principal component After PCA these events can be correctly classified in the space of principal components

37 Principal Component 2 COMPONENTE 2 Principal COMPONENTE Component 8 PCA based classification algorithm.3.2. Backward fall COMPONENTE.5 Principal Component COMPONENTE Principal Component Step Down Event

38 Principal component 9 COMPONENTE 9 Principal component 3 COMPONENTE 3 PCA based classification algorithm x COMPONENTE 2 Principal component 2 Fall from bowed Fainting fall Lateral fall COMPONENTE Principal component 2

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering