Introduction to Statistical Inference

Similar documents
Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Introduction to Detection Theory

DETECTION theory deals primarily with techniques for

PATTERN RECOGNITION AND MACHINE LEARNING

Detection theory. H 0 : x[n] = w[n]

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

Introduction to Signal Detection and Classification. Phani Chavali

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Multivariate statistical methods and data mining in particle physics

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Bayesian Decision Theory

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Detection and Estimation Theory

Detection and Estimation Theory

Intelligent Systems Statistical Machine Learning

Estimation and Detection

WILEY STRUCTURAL HEALTH MONITORING A MACHINE LEARNING PERSPECTIVE. Charles R. Farrar. University of Sheffield, UK. Keith Worden

Machine Learning Lecture 2

ECE521 Lecture7. Logistic Regression

ECE531 Lecture 2b: Bayesian Hypothesis Testing

Bayesian Decision Theory

outline Nonlinear transformation Error measures Noisy targets Preambles to the theory

Hypothesis testing (cont d)

ECE531 Lecture 4b: Composite Hypothesis Testing

Unsupervised Learning Methods

Advanced statistical methods for data analysis Lecture 1

Lecture 22: Error exponents in hypothesis testing, GLRT

Intelligent Systems Statistical Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning Linear Classification. Prof. Matteo Matteucci

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Performance Evaluation and Comparison

Bayesian Learning (II)

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

Detection and Estimation Theory

Classifier performance evaluation

Fundamentals of Statistical Signal Processing Volume II Detection Theory

Signal Detection Basics - CFAR

Novel spectrum sensing schemes for Cognitive Radio Networks

Bayesian Decision Theory

Machine Learning Lecture 2

Linear & nonlinear classifiers

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

ECE521 week 3: 23/26 January 2017

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Linear & nonlinear classifiers

Lecture 7 Introduction to Statistical Decision Theory

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Machine Learning Lecture 2

10. Composite Hypothesis Testing. ECE 830, Spring 2014

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Machine Learning Lecture 5

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Lecture 8: Information Theory and Statistics

What does Bayes theorem give us? Lets revisit the ball in the box example.

Evaluation. Andrea Passerini Machine Learning. Evaluation

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Notes on Discriminant Functions and Optimal Classification

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes

CS281 Section 4: Factor Analysis and PCA

Evaluation requires to define performance measures to be optimized

Chapter 2 Signal Processing at Receivers: Detection Theory

Algorithmisches Lernen/Machine Learning

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Bayes Rule for Minimizing Risk

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Introduction to Graphical Models

STONY BROOK UNIVERSITY. CEAS Technical Report 829

Does Unlabeled Data Help?

Announcements. Proposals graded

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Bayes Decision Theory

Machine Learning Lecture 1

Classification and Pattern Recognition

Lecture 8: Signal Detection and Noise Assumption

Statistical Methods for Particle Physics (I)

Estimation, Detection, and Identification CMU 18752

Statistical Learning. Philipp Koehn. 10 November 2015

Anomaly Detection. Jing Gao. SUNY Buffalo

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

44 CHAPTER 2. BAYESIAN DECISION THEORY

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Topic 3: Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Mining Classification Knowledge

Transcription:

Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural Health Monitoring. Supervised vs unsupervised learning Group classification, regression, outlier (novelty) detection Hypothesis testing Neyman Pearson Theorem Matched Filters Receiver operating characteristic curves Bayes risk formulation #5 stats Inf:

Pattern Recognition vs. First Principles Amount of Data Low High Pattern Recognition Prayer? Voodoo? More research? First Principles First Principles Low Strength of Model High Note: First principles models of complex damage mechanisms tend to be weak #5 stats Inf: 3 Statistical Model Building For Feature Discrimination Supervised learning: Data are available from undamaged and damaged system. Unsupervised learning: Data are available only from the undamaged system. Three general types of statistical models for structural health monitoring: Group classification (supervised, discrete) Regression analysis (supervised, continuous) Identification of outliers AKA Novelty Detection (unsupervised) Statistical models are used to answer five questions regarding the damage state of the system. #5 stats Inf: 4

Statistical Model Building (cont.) Is the system damaged? Group classification problem for supervised learning Identification of outliers for unsupervised learning Where is the damage located? Group classification or regression analysis problem for supervised learning Identification of outliers for unsupervised learning (in special cases) 3 What type of damage is present? Can only be answered in a supervised learning mode Group classification 4 What is the extent of damage? Can only be answered in a supervised learning mode Group classification or regression analysis 5 What is the remaining useful life of the structure? (Prognosis) Can only be answered in a supervised learning mode Regression analysis #5 stats Inf: 5 Statistical Model Building (cont.) Statistical models are also used to avoid incorrect diagnosis of damage False-positives (Type I, primary motivator for economic SHM concerns) Damage indicated when none is present False-negatives (Type II, primary motivator for life-safety SHM concerns) Damage is not identified when it is present To date, the rotating machinery industry has made the most use of statistical methods. The rotating machinery industry has made use of trend analysis or statistical process control. Statistical modeling can also employ various forms of data cleansing, normalization, compression, and fusion #5 stats Inf: 6

Background Early on (98s-99s) statistical modeling was, in general, applied sparingly to structural health monitoring studies. Statistical modeling is necessary to distinguish the changes in features caused by damage from changes caused by non-damaging events. Changes in features can result from non-damage events such as: test-to-test variability unit-to-unit variability environmental and operational variability Data collection & reduction variability Statistical procedures can combine various types of data. ##5 stats Inf: 7 Probabilistic Decision Making Also know as Hypothesis Testing, Statistical Inference, Decision Theory, Detection Theory, Pattern Recognition or Classification Goal is to decide when an event has occurred (Existence of damage) and then more information about the event (Location, Type, and Extent of damage). The rest of this lecture is based on detection theory that has it roots in radar and sonar detection problems From S. M. Kay, Fundamentals of Statistical Signal Processing Vol. II Detection Theory, Prentice Hall, 998. Also cover in most introductory statistics books that discuss hypothesis testing

Applicability of Statistical Models Statistical Models Outlier Detection Group Classification Damage Information Existence Location Type Extent X X X (In some cases) X (Discrete) Regression X X (Continuous) X X (Discrete) X (Continuous) Example of Outlier Detection Sees few patients, receives lots of dollars Conclusion: possible fraud Sees lots of patients, receives lots of dollars Conclusion: very industrious doctor Medicare Dollars Received Sees few patients, receives few dollars Conclusion: doctor in semi-retirement Number of Patients Seen Sees lots of patients, receives few dollars Conclusion: doesn t know how to fill out paper work #5 stats Inf:

Defining the Detection Problem Strain gage Undamaged Damaged Permanent Deformation Random excitation with shaker Time Histories & Associated PDFs Initial strain = 3 Initial strain = 6 Initial strain =

Defining the Detection Problem Consider the beam for two cases: strain gage measures the n-point Gaussian undamaged response, w[n], n=,,,n-, with zero mean (=), variance, strain gage measures the n-point Gaussian damaged response, x[n]=w[n]+a, n=,,,n-, with A corresponding to the initial strain (A=3 in this case), variance is still Hypothesis statement H signal with zero mean, x[n] = w[n], system is undamaged H signal with DC offset caused by damage, x[n] = w[n] + 3 Logical choice is to compute the mean (designated T for this example) of each sample and compare it to a threshold,, maybe =5 in this case. Defining the Detection Problem As n increases the estimates of T becomes better separated. We can quantify a metric called the deflection coefficient, d, that increases with the difference in the means for our undamaged and damaged cases and that also increased as the variances of each pdf decreases: ET;H ET;H d vart;h For this case : 9,4 ET;H, ET;H A 3, vart;h 4.65 n 496 na so d 9,5

Defining the Detection Problem Performance of any detector depends on the difference in PDFs associate with features corresponding to the different conditions Difference will depend on means and variances of the two PDFs For this example this difference can be quantified as: (A) This quantity is known as the signal-to-noise ratio (SNR) We can improve detection by increasing the SNR Detection is also improved by increasing the record length, n, which effectively reduces the noise by averaging (i.e. reducing the variance on the estimate of the mean, T, remembering that var(t) = /n) Our goal is to find the optimal detector. -Class (or Binary) Hypothesis Test In the previous example, we were attempting to make assumptions about the damage state of the structure based on observed data. These assumptions are called Hypotheses and are designated: H = undamaged (null hypothesis) H = damaged (alternative hypothesis) If the distributions of features corresponding to the undamaged and damaged cases overlap, then there is the possibility of misclassification Type I Error false-positive, conclude that damage is present when it is not (reject H when H o is true) Minimize this type of error when economic considerations are driving the SHM system deployment Type II Error false-negative, conclude there is no damage when it is present ( accept H when H is true) Minimize this type of error when life-safety considerations are driving the SHM system deployment

Decision Errors By shifting the decision boundary, we can influence the probability of Type I vs Type II errors (shaded regions) However, we can not reduce both types of errors simultaneously Instead, we will design an optimal detector by constraining the Type I error (p(h ;H ), or probability of false alarm, PFA) to a value, and then minimize Type II error (p(h ;H )) Equivalent to maximizing -p(h ;H ) = p(h ;H ) = Probability of Detection, PD Neyman-Pearson Theorem P D is maximized for a given P FA = (False positive, Type I error) when H is selected if L x p x : H p x;h L(x) is referred to as the Likelihood Ratio For a given P FA =,the threshold is found from P FA x:l(x) p x;h dx

Examples: Likelihood Ratio Apply the NP test to our 3 offset beam data ( = 9,4) for a P FA = -3 We will try to correctly classify just a single data sample x[] Decide H (beam is damaged) if x 3 p x;h p x;h e 9,4 e 9,4 9,4 x 9,4 or e 3 x 45, 9,4 Example: Probability of Detection Now we need to determine based of the selected false alarm constraint. Let = e, then take log of the last equation using this new expressions for 9,4 45, 9,4 ln 45, x 3 3 9,4 ln 45, By setting 3 we will decide that the structure is damaged (H ) if x[]> Solve for ( or ) using the specified false alarm value 3 p px ;H FA e (9,4) t 9,4 dt 3

Example: Probability of Detection The probability of detection P px ;H D e (9,4) Performance of a detector is characterized by a Receiver Operating Characteristic (ROC) Curve Plot probability of detection vs probability of false alarm for a given threshold t 3 9,4 dt Receiver Operating Characteristic Curves Probability of Detection Probability of False Alarm Each point on the ROC curve corresponds to a specific threshold (values of thresholds are not evident from the plot). Diagonal line represents a random classifier. The closer the ROC plot is to the upper-left corner, the higher the overall classification accuracy.

ROC Example: Damaged Telescope Drive Mechanism ROC Example: Damaged Telescope Drive Mechanism Mahalanobis Squared Distance 4.5 Undamaged 4 Damaged 3.5 3 Log Score.5.5.5 3 4 5 Instance Receiver Operating Characteristic Curve.9.8 True Positive Rate.7.6.5.4.3...5..5..5.3.35 False Positive Rate

Matched Filter Suppose we want to detect a known deterministic signal (e.g. sine wave) corrupted by Gaussian noise We can make use of the Neyman Pearson criterion to design an optimal detector for that known signal. s[n], n=,,,n- is the known deterministic signal we want to detect w[n], n=,,,n- is the Gaussian noise Matched Filter Then our hypothesis statement is: H : x[n]=w[n], n=,,,n- (signal is not present) H : x[n]=s[n]+w[n], n=,,,n- (signal present) The Neyman-Pearson Theorem tells us to decide H based on the likelihood ratio L x p x : H p x;h

Matched Filter After some derivation we find that we should choose H if N N x n n n s n ln s n Because we know s[n] we can define a new threshold ln N n s n Matched Filter Now we decide H if our new text statistic T(x) is greater than T x N n n We will interpret the test statistic by relating the correlation process to the effect of a filter on the data. Let x[n] be the input to a finite impulse response filter h[n], n=,, N- x[ n] s

x x Matched Filter The output of the filter is y n n h[ n k] x k for n k Now let h[n] be flipped around version of the signal we want to detect h[ n] s[ N n], n,,..., N y y n n s[ N n k ] xk k N N s[ k] xk k for n With change in summation variables, this gives us the NP detector Matched Filter Example Short Tone Ping with SNR = infinite db.5 -.5-5 5 5 3 35 l Matched Filter Result lambda(x tau) 5-5 5 5 5 3 35 l Short Tone Ping with SNR = db - - 5 5 5 3 35 l Matched Filter Result lambda(x tau) 5-5 5 5 5 3 35 l

Conditional Probability and Bayes Theorem P(YIX) is the probability of Y given X Bayes Theorem p Y X p X Y p( X ) p( Y ) Bayes Risk Approach Define C ij as the cost for choosing H i when H j is true C cost for deciding system is damaged when it is not C cost for deciding system is not damaged when it is Often C = C =, no cost for correct decision For many SHM cases, C > C Develop a decision rule based on minimizing Bayes risk, R R E c CijPH i H j PH j i j Assuming C > C and C > C detector that minimizes Bayes risk is to select H if p x : H p x;h C C P H C C P H

Classifying the Detection Problem Damaged Response Feature Characteristics Deterministic & Known e.g. DC offset Deterministic & Unknown Random w/known PDF Random w/unknown PDF Undamaged Response Feature Characteristics Deterministic Gaussian w/known PDF Example Typical SHM Problems Typical SHM Problems Non-Gaussian w/known PDF Challenges for Probabilistic Decision Making Analytical approaches to defining threshold levels One approach has been shown, but it requires knowledge of the probability density functions for both undamaged and damaged features Must balance tradeoffs between false-positive and false-negative indications of damage. Obtaining data from the damaged system Data normalization Updating statistical models as new data become available Managing the large volumes of data that will be produced by an on-line monitoring system Learn how others are do it (credit card fraud detection)