Introduction to Statistical Inference

Size: px

Start display at page:

Download "Introduction to Statistical Inference"

Antonia Harrison
5 years ago
Views:

1 Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural Health Monitoring. Supervised vs unsupervised learning Group classification, regression, outlier (novelty) detection Hypothesis testing Neyman Pearson Theorem Matched Filters Receiver operating characteristic curves Bayes risk formulation #5 stats Inf:

2 Pattern Recognition vs. First Principles Amount of Data Low High Pattern Recognition Prayer? Voodoo? More research? First Principles First Principles Low Strength of Model High Note: First principles models of complex damage mechanisms tend to be weak #5 stats Inf: 3 Statistical Model Building For Feature Discrimination Supervised learning: Data are available from undamaged and damaged system. Unsupervised learning: Data are available only from the undamaged system. Three general types of statistical models for structural health monitoring: Group classification (supervised, discrete) Regression analysis (supervised, continuous) Identification of outliers AKA Novelty Detection (unsupervised) Statistical models are used to answer five questions regarding the damage state of the system. #5 stats Inf: 4

3 Statistical Model Building (cont.) Is the system damaged? Group classification problem for supervised learning Identification of outliers for unsupervised learning Where is the damage located? Group classification or regression analysis problem for supervised learning Identification of outliers for unsupervised learning (in special cases) 3 What type of damage is present? Can only be answered in a supervised learning mode Group classification 4 What is the extent of damage? Can only be answered in a supervised learning mode Group classification or regression analysis 5 What is the remaining useful life of the structure? (Prognosis) Can only be answered in a supervised learning mode Regression analysis #5 stats Inf: 5 Statistical Model Building (cont.) Statistical models are also used to avoid incorrect diagnosis of damage False-positives (Type I, primary motivator for economic SHM concerns) Damage indicated when none is present False-negatives (Type II, primary motivator for life-safety SHM concerns) Damage is not identified when it is present To date, the rotating machinery industry has made the most use of statistical methods. The rotating machinery industry has made use of trend analysis or statistical process control. Statistical modeling can also employ various forms of data cleansing, normalization, compression, and fusion #5 stats Inf: 6

4 Background Early on (98s-99s) statistical modeling was, in general, applied sparingly to structural health monitoring studies. Statistical modeling is necessary to distinguish the changes in features caused by damage from changes caused by non-damaging events. Changes in features can result from non-damage events such as: test-to-test variability unit-to-unit variability environmental and operational variability Data collection & reduction variability Statistical procedures can combine various types of data. ##5 stats Inf: 7 Probabilistic Decision Making Also know as Hypothesis Testing, Statistical Inference, Decision Theory, Detection Theory, Pattern Recognition or Classification Goal is to decide when an event has occurred (Existence of damage) and then more information about the event (Location, Type, and Extent of damage). The rest of this lecture is based on detection theory that has it roots in radar and sonar detection problems From S. M. Kay, Fundamentals of Statistical Signal Processing Vol. II Detection Theory, Prentice Hall, 998. Also cover in most introductory statistics books that discuss hypothesis testing

5 Applicability of Statistical Models Statistical Models Outlier Detection Group Classification Damage Information Existence Location Type Extent X X X (In some cases) X (Discrete) Regression X X (Continuous) X X (Discrete) X (Continuous) Example of Outlier Detection Sees few patients, receives lots of dollars Conclusion: possible fraud Sees lots of patients, receives lots of dollars Conclusion: very industrious doctor Medicare Dollars Received Sees few patients, receives few dollars Conclusion: doctor in semi-retirement Number of Patients Seen Sees lots of patients, receives few dollars Conclusion: doesn t know how to fill out paper work #5 stats Inf:

6 Defining the Detection Problem Strain gage Undamaged Damaged Permanent Deformation Random excitation with shaker Time Histories & Associated PDFs Initial strain = 3 Initial strain = 6 Initial strain =

7 Defining the Detection Problem Consider the beam for two cases: strain gage measures the n-point Gaussian undamaged response, w[n], n=,,,n-, with zero mean (=), variance, strain gage measures the n-point Gaussian damaged response, x[n]=w[n]+a, n=,,,n-, with A corresponding to the initial strain (A=3 in this case), variance is still Hypothesis statement H signal with zero mean, x[n] = w[n], system is undamaged H signal with DC offset caused by damage, x[n] = w[n] + 3 Logical choice is to compute the mean (designated T for this example) of each sample and compare it to a threshold,, maybe =5 in this case. Defining the Detection Problem As n increases the estimates of T becomes better separated. We can quantify a metric called the deflection coefficient, d, that increases with the difference in the means for our undamaged and damaged cases and that also increased as the variances of each pdf decreases: ET;H ET;H d vart;h For this case : 9,4 ET;H, ET;H A 3, vart;h 4.65 n 496 na so d 9,5

8 Defining the Detection Problem Performance of any detector depends on the difference in PDFs associate with features corresponding to the different conditions Difference will depend on means and variances of the two PDFs For this example this difference can be quantified as: (A) This quantity is known as the signal-to-noise ratio (SNR) We can improve detection by increasing the SNR Detection is also improved by increasing the record length, n, which effectively reduces the noise by averaging (i.e. reducing the variance on the estimate of the mean, T, remembering that var(t) = /n) Our goal is to find the optimal detector. -Class (or Binary) Hypothesis Test In the previous example, we were attempting to make assumptions about the damage state of the structure based on observed data. These assumptions are called Hypotheses and are designated: H = undamaged (null hypothesis) H = damaged (alternative hypothesis) If the distributions of features corresponding to the undamaged and damaged cases overlap, then there is the possibility of misclassification Type I Error false-positive, conclude that damage is present when it is not (reject H when H o is true) Minimize this type of error when economic considerations are driving the SHM system deployment Type II Error false-negative, conclude there is no damage when it is present ( accept H when H is true) Minimize this type of error when life-safety considerations are driving the SHM system deployment

9 Decision Errors By shifting the decision boundary, we can influence the probability of Type I vs Type II errors (shaded regions) However, we can not reduce both types of errors simultaneously Instead, we will design an optimal detector by constraining the Type I error (p(h ;H ), or probability of false alarm, PFA) to a value, and then minimize Type II error (p(h ;H )) Equivalent to maximizing -p(h ;H ) = p(h ;H ) = Probability of Detection, PD Neyman-Pearson Theorem P D is maximized for a given P FA = (False positive, Type I error) when H is selected if L x p x : H p x;h L(x) is referred to as the Likelihood Ratio For a given P FA =,the threshold is found from P FA x:l(x) p x;h dx

10 Examples: Likelihood Ratio Apply the NP test to our 3 offset beam data ( = 9,4) for a P FA = -3 We will try to correctly classify just a single data sample x[] Decide H (beam is damaged) if x 3 p x;h p x;h e 9,4 e 9,4 9,4 x 9,4 or e 3 x 45, 9,4 Example: Probability of Detection Now we need to determine based of the selected false alarm constraint. Let = e, then take log of the last equation using this new expressions for 9,4 45, 9,4 ln 45, x 3 3 9,4 ln 45, By setting 3 we will decide that the structure is damaged (H ) if x[]> Solve for ( or ) using the specified false alarm value 3 p px ;H FA e (9,4) t 9,4 dt 3

11 Example: Probability of Detection The probability of detection P px ;H D e (9,4) Performance of a detector is characterized by a Receiver Operating Characteristic (ROC) Curve Plot probability of detection vs probability of false alarm for a given threshold t 3 9,4 dt Receiver Operating Characteristic Curves Probability of Detection Probability of False Alarm Each point on the ROC curve corresponds to a specific threshold (values of thresholds are not evident from the plot). Diagonal line represents a random classifier. The closer the ROC plot is to the upper-left corner, the higher the overall classification accuracy.

5 Undamaged 4 Damaged 3.5 3 Log Score.5.5.5 3 4 5 Instance Receiver Operating Characteristic Curve.

12 ROC Example: Damaged Telescope Drive Mechanism ROC Example: Damaged Telescope Drive Mechanism Mahalanobis Squared Distance 4.5 Undamaged 4 Damaged Log Score Instance Receiver Operating Characteristic Curve.9.8 True Positive Rate False Positive Rate

13 Matched Filter Suppose we want to detect a known deterministic signal (e.g. sine wave) corrupted by Gaussian noise We can make use of the Neyman Pearson criterion to design an optimal detector for that known signal. s[n], n=,,,n- is the known deterministic signal we want to detect w[n], n=,,,n- is the Gaussian noise Matched Filter Then our hypothesis statement is: H : x[n]=w[n], n=,,,n- (signal is not present) H : x[n]=s[n]+w[n], n=,,,n- (signal present) The Neyman-Pearson Theorem tells us to decide H based on the likelihood ratio L x p x : H p x;h

14 Matched Filter After some derivation we find that we should choose H if N N x n n n s n ln s n Because we know s[n] we can define a new threshold ln N n s n Matched Filter Now we decide H if our new text statistic T(x) is greater than T x N n n We will interpret the test statistic by relating the correlation process to the effect of a filter on the data. Let x[n] be the input to a finite impulse response filter h[n], n=,, N- x[ n] s

15 x x Matched Filter The output of the filter is y n n h[ n k] x k for n k Now let h[n] be flipped around version of the signal we want to detect h[ n] s[ N n], n,,..., N y y n n s[ N n k ] xk k N N s[ k] xk k for n With change in summation variables, this gives us the NP detector Matched Filter Example Short Tone Ping with SNR = infinite db l Matched Filter Result lambda(x tau) l Short Tone Ping with SNR = db l Matched Filter Result lambda(x tau) l

16 Conditional Probability and Bayes Theorem P(YIX) is the probability of Y given X Bayes Theorem p Y X p X Y p( X ) p( Y ) Bayes Risk Approach Define C ij as the cost for choosing H i when H j is true C cost for deciding system is damaged when it is not C cost for deciding system is not damaged when it is Often C = C =, no cost for correct decision For many SHM cases, C > C Develop a decision rule based on minimizing Bayes risk, R R E c CijPH i H j PH j i j Assuming C > C and C > C detector that minimizes Bayes risk is to select H if p x : H p x;h C C P H C C P H

17 Classifying the Detection Problem Damaged Response Feature Characteristics Deterministic & Known e.g. DC offset Deterministic & Unknown Random w/known PDF Random w/unknown PDF Undamaged Response Feature Characteristics Deterministic Gaussian w/known PDF Example Typical SHM Problems Typical SHM Problems Non-Gaussian w/known PDF Challenges for Probabilistic Decision Making Analytical approaches to defining threshold levels One approach has been shown, but it requires knowledge of the probability density functions for both undamaged and damaged features Must balance tradeoffs between false-positive and false-negative indications of damage. Obtaining data from the damaged system Data normalization Updating statistical models as new data become available Managing the large volumes of data that will be produced by an on-line monitoring system Learn how others are do it (credit card fraud detection)

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Detection Theory Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Outline Neyman-Pearson Theorem Detector Performance Irrelevant Data Minimum Probability of Error Bayes Risk Multiple