Basic Statistical Tools

Similar documents
MACHINE LEARNING ADVANCED MACHINE LEARNING

A Program for Data Transformations and Kernel Density Estimation

Application of Extreme Value Statistics for Structural Health Monitoring. Hoon Sohn

Review of Statistics

MACHINE LEARNING ADVANCED MACHINE LEARNING

Multivariate Distributions

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

01 Probability Theory and Statistics Review

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Review (Probability & Linear Algebra)

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Lecture Note 1: Probability Theory and Statistics

Financial Econometrics and Quantitative Risk Managenent Return Properties

Introduction to Statistics and Error Analysis

HANDBOOK OF APPLICABLE MATHEMATICS

2 Functions of random variables

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Statistics Toolbox 6. Apply statistical algorithms and probability models

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples

Ch. 5 Joint Probability Distributions and Random Samples

ECON 5350 Class Notes Review of Probability and Distribution Theory

Lecture 2: Repetition of probability theory and statistics

Probability. Table of contents

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Robert Collins CSE586, PSU Intro to Sampling Methods

HANDBOOK OF APPLICABLE MATHEMATICS

Algorithms for Uncertainty Quantification

Applied Probability and Stochastic Processes

Modèles stochastiques II

component risk analysis

Latin Hypercube Sampling with Multidimensional Uniformity

Introduction to Probability Theory for Graduate Economics Fall 2008

Test Problems for Probability Theory ,

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Transition Passage to Descriptive Statistics 28

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Statistics and Data Analysis

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Practical Statistics

Class 11 Maths Chapter 15. Statistics

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Principal Component Analysis

Multiple Random Variables

Robert Collins CSE586, PSU Intro to Sampling Methods

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Statistics, Data Analysis, and Simulation SS 2015

Chapter 4 Multiple Random Variables

Unsupervised Learning Methods

Lecture 2: CDF and EDF

Statistical Data Analysis

Lecture 1: Bayesian Framework Basics

CS 147: Computer Systems Performance Analysis

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

Table of Contents. Multivariate methods. Introduction II. Introduction I

Multivariate Random Variable

3. Probability and Statistics

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Nonparametric Density Estimation

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Maximum Likelihood Estimation. only training data is available to design a classifier

p(z)

Ph.D student in Structural Engineering, Department of Civil Engineering, Ferdowsi University of Mashhad, Azadi Square, , Mashhad, Iran

Introduction to Statistical Methods for High Energy Physics

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Statistical Methods in Particle Physics

Gaussian random variables inr n

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Probability theory. References:

2.7 The Gaussian Probability Density Function Forms of the Gaussian pdf for Real Variates

STA 2201/442 Assignment 2

Noise & Data Reduction

APPENDIX 1 BASIC STATISTICS. Summarizing Data

AMCS243/CS243/EE243 Probability and Statistics. Fall Final Exam: Sunday Dec. 8, 3:00pm- 5:50pm VERSION A

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

STATISTICS SYLLABUS UNIT I

Lecture 2: Review of Probability

conditional cdf, conditional pdf, total probability theorem?

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Random Variables and Their Distributions

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y).

Multivariate Distribution Models

Robustness of Principal Components

Week 1 Quantitative Analysis of Financial Markets Distributions A

Chapter 5 continued. Chapter 5 sections

Review of the role of uncertainties in room acoustics

Chapter 9. Non-Parametric Density Function Estimation

Estimation, Detection, and Identification CMU 18752

Distribution Fitting (Censored Data)

Probability and Stochastic Processes

Review: mostly probability and some statistics

Transcription:

Structural Health Monitoring Using Statistical Pattern Recognition Basic Statistical Tools Presented by Charles R. Farrar, Ph.D., P.E. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants Overview Provide a brief statistics background to help with further discussions of statistical pattern recognition applied to Structural Health Monitoring. Probability Density Function Cumulative Distribution Function Statistical Moments Density Estimation Confidence Limits Central Limit Theorem Multivariate statistics Multivariate Analysis Curse of Dimensionality Assessment of normality Data Reduction/Compression Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants

First, we need to define the Probability Density Function, f (), which is used to quantify the Probability Distribution. A Probability Density Function describes the probability density over the sample space of a continuous random variable, X. The probability that X lies between a and b is given f(x) Probability by: Density P(a X b) f Some properties: Probability Density Functions b a () d f (), Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 3 f a b Function ()d X Probability Density Functions Some common probability density functions: Gaussian or Normal Distribution f() e Standard normal distribution: =, = Rayleigh Distribution.4.3.. -4-4.5 ResizeLegend b =.5 f() e b b,.5.5.5 Describes the distance a particle travels per unit time when subjected to velocity components described by normal distributions Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 4

Probability Density Functions Log-Normal Distribution f () e ln,.5.5 =. =.3 =.5 where E(ln ), Used when values of random variable are know to be positive, e.g. fatigue life. Weibull Distribution (ln ).5 4 6 8 Median =. ResizeLegend f () e Models failure of materials,.5 3 = =5. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 5 Cumulative Distribution Function The Cumulative Distribution Function, F(), defines the probability that a random variable X is less than or equal to some value. F () P(X ) If F () has a first derivative, the following relationship between the cumulative distribution function and the probability density function holds: df () f() d f ( ) d Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 6

Cumulative Distribution Function Some properties of F (): F ( ), F ( ) F ().5 Bimodal PDF Bimodal CDF.5.5 -.5.5 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 7 Statistical Moments We will be interested in calculating the average behavior of a random variable (in our case some damage sensitive feature) and estimating how frequently significant deviations from the average occur. The mean value,, (a.k.a. epected value, E()) provides a measure of the this average behavior. The mean value is defined by the following formula for a continuous random variable: n E() f () d E ˆ i n i is a parameter in a model is estimated from discrete data Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 8

Statistical Moments The mean value can be thought of in terms of a mechanics analogy as the first moment of the density function (for such calculations one typically divides by total area, but for PDF s that area is one) f (X) Probability Density Function f () d f () d d X Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 9 Statistical Moments The mean value provide a measure of the random variable s central tendency. There are other measures of a random variable s central tendency: Mode (or modal value) is the most probable value of a random variable (corresponds to the highest point in the probability density function). Median is the value of a random variable where values above and below this one are equally probable. For a uni-modal, symmetric probability density function (e.g. Guassian or normal density function), the mean, the mode and the median are identical. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants

Statistical Moments Variance,, provides a first order measure of the dispersion of the random variables from the mean. The Variance is defined as: n i i f () d n is a parameter in a model is estimated from discrete data The Standard Deviation,, and Coefficient of Variation, C, also provide a measure of the dispersion from the mean., C Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants Statistical Moments Variance can be thought of as the second moment of the probability density function about the mean. f (X) Probability Density Function - f () d f () d d X Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants

Statistical Moments Need to consider the difference between the variance of the sample and the variance of the population. When estimating the variance of the population from a sample, divide previous equation for discrete data by (N-) instead of N (for large samples this is not an issue). The mean and standard deviation are used for standard data normalization: X Z This normalized random variable Z will have zero mean and a variance of Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 3 Biased vs Unbiased Estimators An estimate of a statistic is said to be unbiased if the mean of the statistics obtained from individual samples is equal to the statistic for the entire population. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 4

Biased vs Unbiased Estimates Biased estimate of Variance N i N Unbiased estimate of variance N X i Xi i N Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 5 Biased vs Unbiased Estimates Biased vs Unbaised Estimators % An estimate of a statistic is said to be unbiased if the mean of the % statistics obtained from individual samples is equal to the statistic for % the entire population. % % Lets begin by looking at a pt uniformly distributed random signal =rand(,); plot () % Now calculate the variance of the entire population noting that by default Matlab % calculates an unbiased estimate of the variance (the arguement in the % var command provides a biased variance estimate where the variance is normalized by the number of % samples, n) vpop=var(,) %Net, break the signal into intervals and calculate the variance for each %of these samples using biased variance estimate (arguement specified, i.e. normalize by n). for i=: j=i-; vint(,i)=var((,((j*)+):(*i)),); end %Calculate the mean of the sample variances mvsample=mean(vint) %Because the mean of the sample variances is not equal to the variance of the entire %population, the variance estimatenormalized by n is said to be a baised estimate of the population variance. % %Now calculate the unbiased estimate of the sample variance (normalize by n-, %where n is sample size, this is Matlab default)and show that %the mean of these variance estimates equals the variance of the %population. for i=: j=i-; vbar(i)=var((,((j*)+):(*(i)))); end mvbar=mean(vbar) % %Finally, note that the difference between the biased estimate and %the unbiased estimate becomes small a Variance of pt population of uniformly distributed random variable between and pop=.835 Mean of biased Variances from -pt samples of same random variable Mean sample=.75 Mean of unbiased estimates of Variances from - pt samples of same random variable Mean sample=.834 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 6

Statistical Moments The third moment about the mean provides a measure of the density function s skewness. E(X ) 3 (X N Dimensionless form: E(X 3 N i 3 ) i ) 3, E(X f (X) ) 3 Positive Skewness 3 f () d Negative Skewness Skewness = for symmetric distributions X Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 7 Statistical Moments The Fourth Statistical Moment about the mean provides a measure of the relative distribution of area under the density function between the central portion of the distribution and the tails of the distribution. N 4 (Xi ) 4 i 4 4 E(X ), E(X ) f () N Kurtosis,is a dimensionless form of the fourth moment, for normal distribution it is equal to 3. 4 E(X ) 4 d Note that two random variables can have the same variance, but different kurtosis. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 8

Statistical Moments f (X) Two normal distributions with the same mean - Fourth moments about the mean are different - Non-dimensional fourth moment, or kurtosis, will be the same for each distribution and equal to 3 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 9 Density Estimation Density estimation is the process of estimating an unknown probability density function based on data sampled from the corresponding random process. There are two general approaches to density estimation: Parametric density estimation where we assume the form of the density function (e.g. normal distribution) a priori. Non-parametric density estimation where we let the data define the density function Origins in 95s A method to free discriminant analysis from rigid distribution assumptions Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants

Parametric Density Estimation Person doing the data interrogation chooses a density form a priori. Probability density function will depend on unknown parameters. For the case where f () is assumed to be a normal distribution, parametric density estimation reduces to finding the mean,, and standard deviation,. Estimate and from the observed data. This density estimation procedure is powerful if the assumed form of the density is correct. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants Nonparametric Density Estimation Allows the observed data to choose the form of the density. Histogram density estimator. Naive density estimator. Kernel density estimator. All these methods have fit parameters that must be specified and these parameters influence the shape and smoothness of the estimated density function. No unique solution. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants

Histogram Estimator Oldest and most widely used form of density estimate. fˆ () (# of (i) in same bin as ) nh Choose origin,, and bin width, h Bins are defined as [ +mh, +(m+)h) for positive and negative m values. Can use variable bin widths. Choice of bin width controls the amount of smoothing in the estimate. For given bin width, choice of the origin can have a significant effect on appearance of the density estimate. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 3 Histogram Estimator 6 5 4 ResizeLegend 5 ResizeLegend 3 5-5 5 5-5 5 5 bins 3 bins Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 4

Naive Estimator Now construct a histogram where each point is the center of a sampling interval by placing a bo of width h and height (nh) - on each observation and sum the results. n Xi f (X) w, n ih h w() /,, otherwise Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 5 Naive Estimator f (X) X X X X X X X Place bo shown at right on each observation and sum boes to obtain density estimate (nh) - h Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 6

Kernel Density Estimator Generalize Naive estimator by replacing weight function,w, with kernel function f () nh n i X K h K( ) is a kernel function. Integrates to. Symmetric and nonnegative. A Gaussian pdf kernel is often used Density estimate is sum of hills over. Most common estimator after histogram Has trouble with data from long-tailed distributions i Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 7 Kernal Density Estimator 4 8 6 4 -.5.5.5 h = the window width, and shape of the estimated density function will be a function of this width Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 8

3-Story Test Structure Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 9 Kernel Density Estimator Applied to 4 DOF Structure Linear System, Amplitude.5 v RMS Skewness Impact Nonlinearity, Amplitude.5 v RMS Skewnes Linear System, Amplitude. v rms Skewness Impact Nonlinearity, Amplitude. v rms Skewness = Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 3

Central Limit Theorem The Central Limit Theorem states that the distribution of a sum of random variables tends to the normal distribution as the number of random variables increases, regardless of these random variables individual distributions. Eample: realizations of sum of n random variables drawn from a binomial distribution. 6 5 9 8 5 7 4 3 5 6 5 4 3 5 -.5 - -.5.5.5-3 - - 3 N= N= N= -4-3 - - 3 4 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 3 Confidence Intervals For a random variable, X, whose probability distribution is defined by some probability density function f (X), the confidence intervals define a range of values that there is high probability (not certainty) will contain a realization of X. Notation: CONFX X X is the confidence level (e.g. 95%, 99%) For normal distribution some confidence levels can be related to the mean,, and standard deviation,. 68.7% X, 95.45% 99.73% 3 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 3 X X 3,and

Confidence Intervals Confidence intervals are related to portion of area under the probability density function. One-sided confidence interval There are cases where we are interested in identifying outliers on only one side of a distribution.4.3.. -4.4-4.3. 95% of area under pdf 95% of. area under pdf -4-4 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 33 Multivariate Statistics For many cases the damage sensitive feature will be a function of several random variables. We will have to employ the theory of jointly distributed random variables to analyze these cases. We will eamine the bivariate case (damage feature is a function of two random variables: X, Y) Bivariate joint probability density function, f y (,y) quantifies the probability that (,y) occurs jointly. Similar to univariate pdf: f y,y, fy,y ddy Bivariate Normal density function Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 34

Multivariate Statistics Bivariate Cumulative Density Function: F f X,Y X,Y (,y) y y,y F (,y) f X,Y y u,v dudv P(X Y y) Y Marginal Density Functions f f X Y () () XY f f XY (,y)dy (,y)d y fx() fx()d X X Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 35 Multivariate Statistics Moments of jointly distributed random variables: E[X n Y m ] is the n+m order moment E n m n m X Y X Y f (,y) dyd XY Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 36

Covariance of Jointly Distributed Random Variables Covariance of jointly distributed random variables, XY Second moment of X and Y about centroid Measures deviation of X and Y together from the centroid XY E X Y y E[XY] y y f y XY (,y)ddy Y y f y(,y)ddy y y X Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 37 Covariance for N-Dimensional Distributions For n-dimensional distributions we define the covariance matri, [], as: E, n n T where n n n Note: is symmetric and positive-definite Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 38

Parametric Density Estimation in High Dimensions n-dimensional Multi-Variate Normal Distribution ({}, covariance matri []). fx XX (,, n n ) e n / / is the determinant of the covariance matri n parameters in {} that must be estimated. (/)n (n+) parameters needed to estimate in []. T [ ] {} Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 39 Mahalanobis Distance The Mahalanobis Distance,, is a normalized measure of the distance between a multivariate random variable and the mean of the distribution. For bivariate case: y For general n-dimensional case y T y It is analogous to the Z statistic for univariate distributions XX YX XY YY T y Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 4

Curse of Dimensionality Univariate normal Multivariate normal -dimension n-dimension Mean : ( parameter) Mean Vector : μ (n parameters ) Varaince : ( parameter) Covariance matri : Σ n(n ) parameters Parameters to be estimated increase eponentially!!! Curse of Dimensionality Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 4 The Curse of Dimensionality Place realizations of a normally distributed random variable in,, and 3 dimensions into bins -D: bins 3-D: bins Note that as the dimension increases, data falls into proportionally fewer bins 35 -D: bins 3 3 5 8 Bin number 4 5 6 5 Bin number 6 4 7 8 8 3 4 5 6 7 8 9 Bin number 9 3 4 5 6 7 8 9 5 6 4 Bin number 4 6 Bin number 8 Bin number Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 4

Assessment of Normality For SHM problems, the distributions of data, and etracted features are often assumed to be Gaussian. Sometime the effectiveness of this assumption needs to be evaluated and validated. Normality tests include Normal probability plot Skewness & kurtosis test Chi-square goodness-of-fit test Kolmogorov-Smirnov goodness-of-fit test Bera-Jarque hypothesis test Lilliefors hypothesis test Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 43 8 DOF Mass-Spring System m8 m List of time series employed in this study Case Description Input level Data # per input Total data # No bumper 3, 4, 5, 6, 7 Volts 5 sets 75 sets Bumper between m-m 3, 4, 5, 6, 7 Volts 5 sets 5 sets Bumper between m5-m6 3, 4, 5, 6, 7 Volts 5 sets 5 sets 3 Bumper between m7-m8 4, 5, 6, 7 Volts 5 sets sets Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 44

Normal Probability Plot The purpose of a normal probability plot is to graphically assess whether the data could come from a normal distribution. If the data are normal, the plot will be a straight line. Other distribution types will introduce curvature in the plot. Normal Probability Plot Normal Probability Plot Probability.999.997.99.98.95.9.75.5.5..5...3. Probability.999.997.99.98.95.9.75.5.5..5...3. -5 5 Time series Data data (a) without a bumper - -5 5 Time series Data data (b) with a bumper Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 45 Skewness & Kurtosis All normal distributions have a kurtosis value of 3., and a skewness value of.. Therefore, the calculation of the skewness and kurtosis values can reveal if the data are coming from a normal distribution. (a) Normal without Probability a bumper Plot Normal (b) with Probability a bumper Plot Probability.999.997.99.98.95.9.75.5.5..5...3. -5 5 Time series Data data Skewness=-.439, Kurtosis=3.736 Probability.999.997.99.98.95.9.75.5.5..5...3. - -5 5 Time series Data data Skewness=.365, Kurtosis=4.7 Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 46

Testing Validity of Assumed Distribution The previous analyses of the normal probability plot and the estimation of skewness & kurtosis are very easy and convenient, but do not provide principled procedures. There are more statistically rigorous tests for verifying the validity of the assumed distribution. These tests are called goodness-of-fit tests. The chi-square and Kolmogorov-Smirnov methods are the two most common ones. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 47 Number of samples 8 6 4 Chi-square Test for Distribution -4-4 The relative errors approach the chi-square distribution with f (m--k) degrees-of-freedom as the sample size increases infinitely. Where k is the number of distribution parameters estimated from the data ( n nˆ) n The assumed distribution is substantiated by the data with (- confidence if the inequality holds. Histogram from n data points (with m number of bins, based on judgment) Assumed theoretical distribution n ˆ Compute the errors: ( n) Chi-square distribution ( )% confidence Interval m f If usually taken to be between % and i Note that more than one distribution can satisfy this test Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 48

Kolmogorov-Smirnov Test for Distribution K-S Test compares the empirical and theoretical cumulative density functions (CDF). Cumulative Probability.8.6.4. D n ma F ( ) S n ( ) -4-4 Data D n D cr Compare with the critical value defined at (-) confident interval. (The critical value is obtained from a K-S test table.) IF Dn is less than the critical value D cr the assumed distribution is accepted. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 49 Data Reduction/Compression Find a projection of the feature space X Y f ( X) T T X such that the dimension of the projected features is less than that of the original features. Principal Component Analysis (PCA) Projection Pursuit Analysis (PPA) Informative Component Analysis (ICA) Factor Analysis Multidimensional Scaling Clustering Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 5

Principal Component Analysis Principal component analysis finds an orthogonal projection of original data (red) onto a lower dimensional space (blue line) such that variance of the projected points (green) is maimized Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 5 y Principal Component Analysis (PCA) PCA projects the original variables into uncorrelated orthogonal variables. y Y T T ( X μ) Transformed space Original feature space PCA transformation matri T is obtained by solving the eigenvalue problem of the data s covariance matri. ΣT TΛ T : Eigenvector (principal component) matri Λ : Eigenvalue matri containing variance info. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 5

Principal Component Process PCA provides the optimal linear projection of D dimensional data into an M-dimension subspace where M<D such that variance of the projected data is maimized Find mean vector and covariance matri for the multidimensional data Find M eigenvectors of the covariance matri corresponding to the M largest eigenvalues Use those eigenvector to define the linear combinations of the original data Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 53 PCA Eample: A Bridge Column Test 36, 37 9 7 6 cm 35 34 33 4 45.7 cm 6 8 4 3 3 39 3 5 8 3 9 8 7 38 345 cm 7 9 5 6 5 4 6 4 3 3 6 cm 3.8 cm 63.5 cm A bridge column 4 cm Sensor placement Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 54

PCA Eample: A Bridge Column Test (cont) The first principal component contains 3% of total info. (a) Damage Level UCL Variance (%) 3 5 5 5 ResizeLegend 3 4 Eigenvalues of the covariance matri X bar LCL 4 6 8 4 6 Samples X bar 4 (c) Damage Level CL UCL CL LCL 6 4 6 8 4 6 Samples Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 55 References Probability Density Function, Cumulative Distribution Function, Statistical Moments Any introductory probability and statistics book H. Wirsching, T. L. Paez, K. Ortiz, Random Vibrations Theory and Practice, John Wiley, 995. (our notation follows this reference) M. R. Spiegel, L. J. Stephens, Schuam s Outlines Statistics, 3 rd Edt., McGraw Hill, 998. Basic probability reference Density Estimation, Curse of Dimensionality B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman&Hall, 99. D. W. Scott, Multivariate Density Estimation Theory, Practice and Visualization, John Wiley, 99. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 56

References Confidence Limits E. Kreyszig, Advanced Engineering Mathematics, 8 th Edt., John Wiley, 999. Central Limit Theorem, Assessment of Normality H. Wirsching, T. L. Paez, K. Ortiz, Random Vibrations Theory and Practice, John Wiley, 995.) A. H-S. Ang and W. H. Tang, Probability Concepts in Engineering Planning and Design: Vol. Basic Principles, 975. Multivariate Analysis, Data Reduction & Compression W. R. Dillon and M. Goldstein, Multivariate Analysis: Methods and Applications, 984. D. W. Scott, Multivariate Density Estimation Theory, Practice and Visualization, John Wiley, 99. C. M. Bishop, Neural Networks for Pattern Recognition, Oford University Press, 995. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants 57