Autoregressive Gaussian processes for structural damage detection

Size: px

Start display at page:

Download "Autoregressive Gaussian processes for structural damage detection"

Ruby Warner
5 years ago
Views:

1 Autoregressive Gaussian processes for structural damage detection R. Fuentes 1,2, E. J. Cross 1, A. Halfpenny 2, R. J. Barthorpe 1, K. Worden 1 1 University of Sheffield, Dynamics Research Group, Department Mechanical Engineering Mappin Street, Sheffield, S1 3JD, UK ramon.fuentes@sheffield.ac.uk 2 HBM-nCode United Kingdom, Advanced Manufacturing Park, Catcliffe, Rotherham, South Yorkshire, S60 5WG Abstract This paper provides an application of damage detection from nonlinear time series through the use of a Gaussian Process Autoregressive model. Gaussian Processes are powerful nonparametric models that can be used for advanced nonlinear regression. On the other hand, autoregressive models have been examined extensively for fault detection on mechanical systems. However, when a mechanical system behaves nonlinearly in its baseline condition, a linear autoregressive model might not be able to distinguish between a baseline system response and a response from the system with damage. This paper is intended as a brief introduction to both data-driven damage detection and Gaussian Processes, and concludes with an example damage detection application to a simulated signal as well as a short review of Gaussian Processes for time series models. 1 Introduction This paper explores the use of Gaussian Process Regression within a nonlinear autoregressive framework for structural damage detection in mechanical systems. Gaussian Process regression is a powerful machine learning tool that can be used for both regression and classification tasks. Structural damage detection for mechanical systems, on the other hand, has been the topic of a growing number of academic publications in the last few years. Gaussian Processes have been studied recently for SHM applications in [1] for monitoring of landing gear loads, and in [2] for monitoring of aircraft loads. The next section provides a summarised literature review on the subject and an overview of the general concept and challenges involved. The key idea in performing structural damage detection on a mechanical system is to use available data from a structure in operation in order to diagnose the presence of damage. It has been identified [3] that a statistical pattern recognition approach is often desirable in order to identify the presence of damage. When using a machine learning approach to the problem, the general damage identification procedure is divided in two phases. The first is the training phase, which involves using a data set of the healthy structure in order to tune the model. If example data is available from damaged cases this can also be used in the training phase. The second phase is the prediction phase, and involves using the model that has been trained in order to infer the health state of the structure. Gaussian Process Regression (GPR) is a relatively new tool for performing nonlinear regression in a Bayesian way. What this means exactly will be explained in more detail in a later section, but what it 469

2 470 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 means in practice is that given a set of inputs, the model predicts a probability density for the outputs. Hence, all predictions come with a measure of uncertainty, and it has been demonstrated that this measure of uncertainty in other Bayesian models can be exploited for structural damage identification [4]. The Gaussian process is a also a non-parametric model, which in practice means that it does not rely on parameters in order to make predictions, but instead uses conditional probability relationships to estimate the density of the outputs given the training inputs and outputs. The details of this will be covered in Section 3. The layout of this paper is as follows; Section 2 provides a brief background on Structural Health Monitoring (SHM) and discusses some of the challenges involved as well as some common and simple models. The reader familiar with SHM may safely skip this. Section 3 provides a conceptually-oriented introduction to Gaussian Process Regression, while Section 4 shows the application of a GPR autoregressive model to damage detection from a nonlinear signal. 2 Structural damage detection background Data-driven techniques for SHM involve inferring the presence of damage from data collected via transducers on the structure. These transduces could measure strain, displacements or accelerations as well as any other environmental variables that may affect the characteristics of structures. There is an increasing trend for using acceleration measurements in vibration-based SHM, due to the practicality of using an accelerometer as a transducer. The recent developments in MEMS (microelectronic mechanical sensors) technology not only allow acceleration measurements to be taken in a more cost-effective manner, but they also allow the development of embedded systems which can run algorithms online so that SHM can be performed in real time for a structure in operation. The basic premise for damage detection is that damage will change the dynamic characteristics of a structure, and detecting those changes will lead to detecting damage. In the SHM literature, Condition Monitoring (CM) refers to damage diagnosis applications to rotating machinery. This is now a mature subject and it would be safe to say that it has made the transition out of university research into industrial applications [5 7] One of the defining aspects of CM is that rotating machinery operates mostly in a stationary environment, where the loading amplitudes and frequencies remain constant. The class of problems that CM deals with includes the diagnosis of faulty bearings from shifts in the harmonic frequencies of the response spectrum, or diagnosis of worn or missing tooth in gears from changes in noise signature. The more recent developments in SHM in various industries including civil, aerospace, automotive, power generating and offshore are concerned with damage detection, diagnosis and prognosis in structures where the operating conditions are not necessarily stationary, the input excitations are not necessarily known, the structures might not behave in a linear fashion and their dynamics might also change according to the environment they operate in. These conditions call for algorithms that are robust to these changes. This introduction will provide a brief review of these methods. Some great accounts of and literature reviews can be found in [3,8,9]. A well accepted hierarchical structure for damage identification is the Rytter scale [10], which splits the problem into four levels: 1) Detection Is damage present? 2) Localisation What is the physical location of the damage? 3) Severity What is the extent of the damage? 4) Prognosis What is the remaining useful life in the structure?

3 DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING 471 It will typically be possible to use machine learning and statistical tools to solve Levels 1 to 3 from analysis of data collected from the structure. Prognosis will require some physics-based model of the system and a model for the evolution of damage after it has been diagnosed which will typically require data from previous similar cases. The damage detection problem (Level 1 diagnosis) can be solved by statistical and machine learning tools, using data from the undamaged structure alone. This is an unsupervised learning problem, as the mathematical model is built using training data only from the undamaged structure. Supervised learning algorithms, on the other hand, use samples from both the damaged and undamaged cases in order to fit a model to the data, and this will typically require labels for each class of damage. If this kind of information is available, either from data from previous cases of damage, Finite Element modelling of the various damage scenarios or experimental testing then, in principle, supervised learning algorithms can be used to solve the localisation and severity problems. The issue is that it is not always possible, cost effective or straightforward to measure all the possible damage scenarios. This would mean damaging the structure which, for most applications, will become obviously prohibitive. In some cases, data collected from actual failures could be available, but this is rare and might not necessarily capture the required parameters. The approach of using physics-based models to generate the damaged cases is also restrictive. Their accuracy can be affected by the lack of known input excitations which, for a lot of real-world structures, are hard to measure or predict. Structural nonlinearities and the general difficulties in creating physics-based models to characterise the structural dynamics of complex structural assemblies are also a limiting factor. A thorough and recent account that compares data-driven against physics-based models can be found in [11]. These limitations make it more practical, in the majority of cases, to use unsupervised learning algorithms and thus solve only a Level 1 problem. Data pre-processing and feature extraction steps are required before using any statistical or pattern recognition algorithms. Significant pre-processing is required to make sure the data is clean of anomalies inherent in any data acquisition system. Feature extraction refers to transforming the raw data collected by the acquisition system which will normally be discrete time domain signals, and transform them into a more useful format or features. These features are mathematical transformations of the raw data, typically into a lower dimensional space. It is obviously desirable to construct features that are sensitive to damage. However, sensitivity to damage in a feature will also mean that the feature will be sensitive to changes in the environment. There is an inherent trade-off between sensitivity to damage and sensitivity to environmental changes in any particular feature [12], where these changes refer to varying input loads, changes in temperature, humidity, or mass in the structure. When dealing with variable loading conditions, it is possible to select features that are normalised against these loads. If loads can be measured, a frequency response function, for example, could be a good feature vector. However, the input loads will not be always available (in particular, for structures in operation). This can sometimes be solved by assuming a statistical distribution for the input loading. If this can be approximated as Gaussian, it is possible to use the standard result from linear systems theory that the output of a linear system that is excited by a Gaussian excitation will also be Gaussian distributed. This has allowed for various methods that can deal with this uncertainty in the input excitation. The effects of environmental variability also have a significant impact on the generalisation capabilities of any novelty detection algorithm. There are numerous techniques available for normalising damage sensitive features against environmental variabilities. These include using Singular Value Decomposition (SVD) [13], factor analysis [14] and Auto-associative neural networks [15]. Principal Component Analysis (PCA) has also been suggested as a means of selecting damage sensitive features that are insensitive to environmental variations [16]. More recently, cointegration has been suggested [17,18] for SHM applications and successfully implemented in the removal of environmental trends of a bridge [19]. It is a method of projecting out components that correspond to long term trends in the data, which is a characteristic of environmental variables such as temperature, humidity and ice build-up.

4 472 PROCEEDINGS OF ISMA2014 INCLUDING USD Time series models: autoregressive processes There are numerous methods for modelling a time series, but arguably the simplest such model is an Auto Regressive (AR) process. This defines any point in the signal as a weighted sum of the previous measured points. (1) where is the point of, the signal being modelled and is AR coefficient. If the system has measured inputs, one can add a moving average term to model the signal as a weighted sum of the previous terms plus the influence of the previous inputs. (2) where is the measured input into the system and the term is the Moving Average (MA) coefficient. An AR model of order with an MA model of order is usually denoted as ARMA( ). It is interesting to note that the discrete time representation of the SDOF linear oscillator is in fact an ARMA model. (3) It is possible to show that the second order differential equation for the harmonic oscillator above can be described in discrete time form as (4) This is an ARMA(2,1) model and there is a direct correspondence between the coefficients and the mass and stiffness of the system. The purpose of showing this here is to highlight that a change in the physical parameters of the system would imply a change in the AR coefficients. There are two damage sensitive features that can be extracted using such a model; the coefficients and the signal reconstruction error. The AR coefficients are damage sensitive features, and can be used in further pattern recognition algorithms to detect and classify damage. Although the AR parameters are useful damage sensitive features, there still remains the question of determining the order of such model. Fitting a model with too high an order will result in a poor ability for the model to generalise; the model might start fitting the measurement noise. If the model order is too low, it will simply not completely capture the dynamics contained in the signal. The right choice of model order and its influence on damage detection has been studied in [20]. The study used three dominant techniques; the Akaike information criterion [21], the partial autocorrelation function and root-mean square error. It concluded that they do not all converge to the same solution for model order, but using the three of them together generally provides a robust guideline that can be used to establish model orders in SHM applications. The AR parameters are useful damage sensitive features on their own, but it is possible, for example, to use a Mahalanobis squared distance-based outlier analysis to determine when a combination of AR

5 DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING 473 parameters deviate significantly from the normal condition. A different approach can also be taken using AR parameters. This would involve reconstructing every point in the signal using the AR parameters, together with the previous signal values and then taking the residual in this prediction. This error will increase if there are any changes such as damage (or operational and environmental conditions), thus also being a useful damage sensitive feature. This method will be compared against the autoregressive Gaussian Process model in the last section. 3 Gaussian process regression background The aim of this section is to introduce the concepts behind GP regression at a conceptual level. It is aimed at readers not familiar with concepts such as Bayesian and nonparametric models. It is outside the scope of this paper to fully discuss the details behind GP regression, for a full discussion of the subject the reader is referred to [22]. Readers familiar with Bayesian methods, nonparametric models and Gaussian Processes may skip to the next section. 3.1 Parametric methods The main idea behind any regression model is to try to explain a set of output values, given a set of input values. In a parametric model a parametrised function is used to map inputs to outputs. In linear regression, for example, the function is, where there are two parameters, the slope and the intercept. There needs to be a systematic way at arriving at the parameters that best fit the data. This task is called parameter estimation. The next issue is whether the model is even the right choice for the data, which means that a model selection stage is also required. It is outside the scope of this paper to discuss parameter estimation and model selection in detail, but the reader is referred to [23] for further reading. It is necessary to have a means of quantifying the uncertainty of predictions made by a model. The reason for this is that there will always be errors in the modelling process. To begin with, the training data may be corrupted by noise, which could be noise from the transducers or there could be bias due to the model not capturing the physical process completely. Even if there is good confidence that the chosen model can approximate the physical process, there will be uncertainty about the parameters for that model. If the parameters for a model were chosen using training data within a certain region, there will be high confidence that the parameters fit the data within that region. Any predictions made outside that region will have a higher uncertainty and there needs to be a systematic way to address this. Bayesian methods are particularly suited to these types of problems. 3.2 Bayesian linear regression The term Bayesian comes from the use of Bayes rule of probability (5) where denotes the conditional probability of given. What Bayes rule is doing is essentially reducing the uncertainty of given information about. In order to do so it uses the original uncertainty about, which is called a prior since it encodes prior information about it. is called the

6 474 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 likelihood, and is the inverse conditional probability. terms equation (5) can be rewritten: is called the marginal likelihood. Using these (6) where the posterior is our updated uncertainty about. This is a powerful idea that has led to the development of a host of models that take into account prior information or degrees of belief about something, and then update it when new evidence becomes available. To see how this is useful for regression, consider the simple linear regression model: (7) where is the function of interest, are the inputs to the model, and are the output observations. The observations differ from the actual function values through noise, which could be considered as white Gaussian with zero mean and a variance of. Lastly, are the weights of the model, which are the parameters that need to be estimated. This model could be solved using a least squares method, which would solve for the weights, and the noise. One could now use Bayes theorem to derive a posterior distribution over the weights: (8) which readily gives a distribution over the parameters of the model. The prior in this case is, which is any distribution that reflects one s initial knowledge about the model parameters before one attempts to identify them. The likelihood term is, and is a probability density that reflects the probability of a given measured data point given a set of parameter values. This term is important as it if often chosen as an objective function in optimisation routines [23]. Lastly, the marginal likelihood term also called the model evidence provides a normalising constant and represents the probability of data given a specific model, and so it is often useful for comparing and selecting different competing models. The discussion on Bayesian linear regression can only go to a limited depth here, but the reader is encouraged to consult [23] for a more thorough explanation of the subject. 3.3 Nonparametric methods - Gaussian processes There is another class of machine learning algorithms called nonparametric models. The name comes from the fact that no parameters are used to make predictions. Instead, the training data itself is used to define the relationships between inputs and outputs. One major advantage of this setup is that the type of model does not need to be specified exactly, but is determined from the data. This makes this class of models naturally more flexible and able to model more complex data. It also, (partially) removes the model selection step, while providing an uncertainty estimate when making predictions. Gaussian Process Regression is a type of nonparametric model, which in fact has been shown to be equivalent to other models such as Bayesian linear regression, Artificial Neural Networks (ANNs), Support Vector Machines (SVMs) and spline models. This section will cover the basics of what a GP regression model consists of and how one can perform training and predictions using this model as well as some of the computational issues associated with them as well as some recent solutions to these issues.

7 DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING 475 A Gaussian Process is a generalisation of the Gaussian distribution. Whereas the Gaussian distribution defines a distribution over discrete variables, a Gaussian Process defines a distribution over functions. A key property of this definition, which makes Gaussian Processes usable in practice, is that any finite subset of points from a GP will also be Gaussian distributed. This means that it is possible to use the properties and identities for Gaussian distributions to make predictions for a GP. It is worth starting the discussion with the kernel, which is at the heart of how many nonparametric models retain the relationship between inputs and outputs. A kernel (also called the covariance function) encodes the relationship between the output values as a function of the inputs. It also encodes prior information about the process that is being modelled since the choice of covariance function will determine essentially how smooth this process is. A popular choice for covariance function [22] is the squared exponential: ( ( )) ( ) ( ) (9) which is a function of any two points of the input space (although note how it defines a covariance between points in the outputs). The squared-exponential kernel is shown in Figure 1. There are two hyperparameters for this covariance function. The length scale controls roughly how far in the output space does one have to travel before there is a significant change in direction. The term could be thought of as controlling the overall magnitudes of the outputs. Figure 1 Squared-exponential kernel A Gaussian Process is defined by its covariance function as well as its mean function. Formally, this is written as (10)

8 476 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 which states that comes from a Gaussian Process with a mean of and a covariance function (or kernel), where and are any two points in the input space. For most applications, the mean is typically set as zero, which means that the GP is fully specified by the covariance function [22]. The evaluation of the underlying function for a finite sample of inputs is therefore done by evaluating the covariance function for these inputs, which will result in a covariance matrix. It is possible to sample from a GP, for example, by creating a random vector from a Gaussian distribution, and evaluating its covariance matrix: (11) Figure 2 a) shows some samples from a GP with a squared-exponential prior (using a lengthscale of 2 and a variance of 1). One could generate many functions, but the usefulness of GPs for regression comes from the fact that one could generate samples from functions that pass close to the observed training data points. This is done in GPs using a conditional probability framework. If one uses the standard conditional probability relationships for Gaussians, it is possible to arrive at the predictive distribution, where the GP is conditioned on the training points so that the mean and the variance of the distribution are: [ ] [ ] (12) (13) Note the notation, which is taken to mean a test input point, not a training input point. Equation (12) defines the mean of the predictions, while equation (13) defines their variance. Figure 2 b) shows the mean and the variance of a GP predictive distribution, with the observations used for the conditioning. Note that this predictive distribution assumes noisy observations through the addition of the term, which is the noise variance multiplied by the identity matrix. On a basic level the GP regression model is trained by simply including the training points in the covariance matrix. However, the covariance function still makes use of hyperparameters, so typically an optimisation routine is required to tune these hyperparameters so that the GP fits the data correctly. A gradient-based optimiser is normally used with the model likelihood as an objective function [22]. 3.4 Practicalities of Gaussian process regression It is worth having a brief discussion about some of the practicalities of implementing a GPR model. The first one is the data size problem, in particular from an engineering perspective and even more in a time series analysis context where thousands or millions of data points may need to be analysed. The problem is the matrix inversion required to evaluate the mean and variance of the predictive distribution. The covariance matrix contains an evaluation of the covariance function for all possible pairs of training inputs, and therefore this matrix can be potentially very big. Various methods have been suggested for this based on approaches which mostly try to retain informative training points [22]. A good review of recent methods for Gaussian processes on large data sets is [24]. One method that stands out is the Fully Independent Training Conditional (FITC) method [25] which picks a number of inducing points where the covariance matrix is computed for the inducing points only. This provides a low rank plus diagonal approximation to the covariance matrix which is then easier to invert. The issue with this approach is that a bad selection of inducing points will result in poor predictions, and the effect can be severe if one is not careful. One approach that deals with this problem is presented in [26] where a variational approach is used to learn the appropriate inducing points from the data. This method was used in the examples presented in the next section.

9 f X f X DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING 477 a) GP priors b) GP Posterior Figure 2 a) Samples from a Gaussian Process prior, b) Mean function for a Gaussian Process conditioned on some observations. The mean of the predictive distribution is shown as the continuous blue line, while the two standard deviations region is shown by the shaded region 4 Gaussian process nonlinear autoregressive model for fault detection This section discusses the use of a Gaussian Process nonlinear autoregressive model for structural damage detection. This is demonstrated using an artificial example of a nonlinear time series with simulated damage. The idea of an autoregressive model is to predict a signal value at a discrete time point using the previous signal values. These are lagged versions of the signal. If the function used to predict a signal based on its lagged values is linear, then it is a standard linear Auto Regressive (AR) model as described in equation (1). The idea behind using an AR model for SHM is to use it to characterise the baseline condition of the mechanical system by means of the residual errors between the model predictions and the actual observed values. Any fault in the system will be evidenced as a change in the response, and therefore an increase in the residuals. For systems that exhibit a linear response in the baseline condition, a linear AR model might be sufficient to characterise this. However, if the baseline response is inherently nonlinear, the residuals from the linear model will fail to distinguish between the baseline and the damaged condition. For this reason a nonlinear function is necessary that can predict points in a time series based on some lagged values. In this paper, the use of Gaussian Processes is explored as the nonlinear autoregressive function. There are in fact several ways this could be done using a Gaussian Process, so a brief review will be presented of some of the developments on modelling time series and dynamical systems using GPs. The general autoregressive model is formally written as: (14)

10 478 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 where is the signal measured at a discrete time index, is the number of lags being considered and is the process noise, which could come from sensor noise or other uncertainties, and is assumed to be modelled as white Gaussian. In this case, the function in the demonstration will be a GP, where the lagged signal values are used as inputs and the outputs are all the points in the time series. 4.1 Review of time series and dynamical models using Gaussian process regression There are two general strategies for modelling the dynamics contained in a measured time series. In the case of the autoregressive framework, the model is implicitly time dependent, since a point in time depends on previous points in time, so the inputs to the model are the lagged signals and the output is the full signal. The alternative is to explicitly establish the dependency between time and the signal, so that the input to the model is time, and the output is the signal. There are advantages and disadvantages to both approaches. In the autoregressive formulation, the performance of the model will be strongly dependent on the sample rate chosen to gather data. In the case of measuring a mechanical system, the engineer needs to have enough knowledge about the dynamics of the system to be able to choose a sampling rate that captures the relevant dynamics of the SHM problem. Also, for the case of the autoregressive formulation it is hard to make contact with the physics of the problem, since it is not a parametric model. In the case of the continuous time-dependent formulation, it is possible to interpolate and extrapolate to areas of the time series where data is missing. This may be useful for modelling problems such as financial forecasting, but not necessarily to SHM. The continuous time formulation will require a periodic covariance function, and this has a penalty on the level of complexity of the dynamics that can be modelled. In general, the autoregressive formulation will be able to capture more complex dynamics. However, this comes at a computational penalty since the number of input points will grow with the number of lags used. A very recent detailed review of these issues, as well as a very thorough comparison of different methods for modelling dynamics in time series using Gaussian Processes is presented in [27]. A good review of continuous time Gaussian Processes can be found in [28]. Gaussian Processes are intimately linked to Support Vector Machines (SVMs) [23] which are another class of nonparametric models. A study has been presented in [29] which makes use of SVMs in an autoregressive framework to show its usefulness on a damage detection application. The example shown later in this paper will be based on that used in [29]. Possibly the biggest issue with implementing GPs within an autoregressive framework is the fact that the predictive distribution from equations (12) and (13) takes into account noisy observations, but the inputs are assumed to be noise free. This is clearly a problem for an autoregressive formulation since a point that is an output at time will eventually become an input to the model at time. So, all of the observations will also be inputs. This issue is uncertainty propagation from one time point to another, since in the ideal case one would propagate not only a point forward in time, but the point as well as its uncertainty; the whole probability distribution. This is non-trivial for linear dynamical systems, but there are well established and successful algorithms that do this efficiently such as the Kalman filter [30]. In fact the Kalman filter belongs to a class of estimation algorithms called Bayesian Filters [31], which address the general issue of estimating a time series from noisy observations using a model of the process, where this model can be parametric such as a physical model or neural network or nonparametric model such as a Gaussian Process. Unlike the linear case where the uncertainty can be propagated in closed form when it can be approximated as Gaussian distributed, Bayesian filters that use nonlinear models rely on approximations to the uncertainty propagation problem. Good examples of applications of these using Gaussian Processes can be found in [32] where it is applied to tracking a blimp which has complex flight dynamics. Other interesting formulations using Gaussian Processes for dynamical systems are [27,33,34]. An approximation to the uncertainty propagation for an autoregressive Gaussian Process model is presented in [35,36], with applications to multiple-step-ahead forecasting of a nonlinear time series.

11 DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING Damage detection example An example of a damage detection application using a simple autoregressive Gaussian Process is used here. The problem is the same as in [29], which consists of a nonlinear time series: ( ) ( ) ( ) ( ) (15) In contrast with [29], no noise is added to the process, and only one damage case is considered here. The powers on the first two sinusoidal terms make the time series nonlinear, and thus difficult for a linear model to characterise. It is useful here because it highlights the difference in predictions between the linear and the GP autoregressive models. The simulated damage is the addition of a signal: (16) This is to demonstrate the ability of the autoregressive GP model to characterise the baseline nonlinear signal and detect a change on it, in contrast with the linear AR model which cannot differentiate between the two. The baseline and the damage signals are both generated for. A Gaussian Process model with a squared-exponential covariance function was used as a prior, and trained using the first 600 points. This is in order to test two aspects: the ability of the model to generalise (to correctly make predictions for points not present on the training set) as well as the ability to detect a change in the signal. Only five lags were used as the model input, which is the same number of lags in [29]. Also, the approximation described in [26] for large datasets was used which learns the optimum inducing points. The number of inducing points used was for this example was 50. The autoregressive GP model provides a much better fit to the data than the linear AR counterpart does, which is very clear from Figure 3. As a measure of fitness, the Normalised Mean Squared Error (NMSE) can be used, defined as, where is a measured data point and is a predicted data point. An NMSE of less than one is typically considered as a very good fit. The autoregressive GP model achieved a NMSE of 0.93 for the testing set which included the training set plus 600 more points of previously unseen data (from the baseline condition). The linear AR model achieved a NMSE of for the testing set, which indicates a terrible fit. The fits for both models is presented in Figure 3, where it is visually evident that the autoregressive GP has a superior fit, while the linear AR model has a rather poor fit. The residuals are a good indicator of the models performance, and are shown in Figure 4 for both models. The residuals of the GP model also indicate a good ability make predictions on unseen data, since the model was only trained using data from zero to six hundred seconds. It is clear from the residuals of the linear autoregressive model that it cannot distinguish between the baseline signal and the damaged one, while the residuals from the autoregressive GP can clearly distinguish between the two cases.

(Residual) (Residual) (Amplitude) (Amplitude) 480 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 a) (Seconds) b) (Seconds) Figure 3 Comparison between predictions of; a) GP autoregressive model and b)

12 (Residual) (Residual) (Amplitude) (Amplitude) 480 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 a) (Seconds) b) (Seconds) Figure 3 Comparison between predictions of; a) GP autoregressive model and b) linear autoregressive model. The predictions are shown for a subset of data within the training set. The continuous line shows the actual signal while the dashed lines show the model predictions for one step ahead. Note that no confidence intervals are plotted for the GP predictions Undamaged Damaged a) (Seconds) b) (Seconds) Figure 4 Comparison of residuals between a) the autoregressive GP model and b) the linear AR model. 4.3 Conclusions and further work From the results of the example presented above it can be concluded that Gaussian process regression is a viable method to use within an autoregressive framework in the context of damage detection. It can characterise a signal from a dynamical system when the baseline condition is not linear, and the residuals produced by the model can clearly be used as damage sensitive features provided an appropriate threshold is set [37]. It also demonstrated that recently developed methods for GP regression on large datasets are

13 DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING 481 accurate in their approximations enough for this type of problem. There are several aspects that need to be examined further. Although the autoregressive GP can characterise a nonlinear baseline condition and a change to this can be detected through the residuals, the question still remains as to how to select the correct number of lags for the autoregression. This is largely due to the specific application, but a systematic way of selecting the number of lags, and the sensitivity to this choice for SHM applications needs to be addressed. This class of model should also be robust to changing environmental and operational conditions, provided sufficient example data is present in the training set, therefore further investigation is needed with real-world datasets to validate these conjectures. Gaussian Processes have only just started to appear in the SHM literature, and there are many application areas. However, in order for them to be a successful methodology in engineering, there needs to be a better understanding of their physical interpretation. Acknowledgements The authors would like to acknowledge Dr James Hensman for some interesting discussions as well as guidance with respect to Gaussian Processes and the Python GPy package. Acknowledgements also go to the UK Technology Strategy Board for funding part of this project. References [1] E. J. Cross, P. Sartor, and P. Southern, Prediction of Landing Gear Loads from Flight Test Data using Gaussian Process Regression, in International Workshop in Structural Health Monitoring, (2013). [2] R. Fuentes, E. J. Cross, A. Halfpenny, K. Worden, and R. J. Barthorpe, Aircraft Parametric Structural Load Monitoring Using Gaussian Process Regression, in 7th European Workshop on Structural Health Monitoring, (2014). [3] K. Worden and C. R. Farrar, Structural Health Monitoring: A Machine Learning Perspective. John Wiley & Sons, (2013). [4] R. Fuentes, A. Halfpenny, E. J. Cross, K. Worden, and R. J. Barthorpe, An Approach to Fault Detection Using a Unified Linear Gaussian Framework, in International Workshop in Structural Health Monitoring, (2013). [5] R. B. Randall, State of the Art in Monitoring Rotating Machinery Part 1, Journal of Sound and Vibration, vol. 38, no. 3, pp , (2004). [6] R. B. Randall, State of the Art in Monitoring Rotating Machinery Part 2, Journal of Sound and Vibration, vol. 38, no. 5, pp , (2004). [7] R. B. Randall, Vibration Based Condition Monitoring - Industrial, Aerospace and Automotive. John Wiley & Sons Ltd, (2011). [8] S. Doebling, C. R. Farrar, B. Prime, M, and D. Shevitz, Damage Identification and Health Monitoring of Structural and Mechanical Systems from Changes in their Vibration Characteristics: A Literature Review. Los Alamos National Laboratory Report LA MS, 1996.

14 482 PROCEEDINGS OF ISMA2014 INCLUDING USD2014 [9] H. Sohn, C. R. Farrar, and M. Hemez, F, A Review of Structural Health Monitoring Literature: Los Alamos National Laboratory Report LA MS, [10] A. Rytter, Vibration Based Inspection of Civil Engineering Structures, Ph. D. Dissertation, Aalborg University, (1993). [11] R. J. Barthorpe, On Model and Data-based Approaches to Structural Health Monitoring, Ph. D. Dissertation, The University of Sheffield, (2010). [12] K. Worden, C. R. Farrar, G. Manson, and G. Park, The fundamental axioms of structural health monitoring, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 463, pp , (2007). [13] C. Ruotolo and C. Surace, Damage Detection Using Singular Value Decomposition, in DAMAS 97: Structural Damage Assesment using Advanced Signal Processing Procedures, (1997). [14] J. Kullaa, Is Temperature measurement essential in SHM?, in International Workshop in Structural Health Monitoring, (2003). [15] H. Sohn, K. Worden, and C. R. Farrar, Statistical Damage Classification Under Changing Environmental and Operational Conditions, Journal of Intelligent Material Systems and Structures, vol. 13. pp , [16] G. Manson, Identifying damage sensitive, environment insensitive features for damage detection, in 3rd International Conference on Identification in Engineering Systems, (2002). [17] E. J. Cross, K. Worden, and Q. Chen, Cointegration: a novel approach for the removal of environmental trends in structural health monitoring data, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol pp , [18] E. J. Cross, On Structural Health Monitoring in Changing Environmental and Operational Conditions, Ph. D. Thesis, The University of Sheffield, (2012). [19] E. J. Cross, K. Y. Koo, J. M. W. Brownjohn, and K. Worden, Long-term monitoring and data analysis of the Tamar Bridge, Mechanical Systems and Signal Processing, vol. 35, no. 1 2, pp , (Feb. 2013). [20] E. Figueiredo, J. Figueiras, G. Park, C. R. Farrar, and K. Worden, Influence of the autoregressive model order on damage detection, Computer-Aided Civil and Infrastructure Engineering, vol. 26, pp , (2011). [21] H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, vol. 19, no. 6, pp , (Dec. 1974). [22] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. Cambridge, Massachusetts: The MIT Press, (2006). [23] C. M. Bishop, Pattern Recognition and Machine Learning, vol. 4. (2006), p [24] J. Hensman, U. Sheffield, N. Fusi, and N. Lawrence, Gaussian Processes for Big Data, Proceedings of UAI 29, pp , (2013).

15 DAMAGE DETECTION AND STRUCTURAL HEALTH MONITORING 483 [25] A. Naish-Guzman and S. Holden, The Generalized FITC Approximation, in Advances in Neural Information Processing Systems 20 (NIPS), (2007). [26] M. K. Titsias, Variational Learning of Inducing Variables in Sparse Gaussian Processes, in Twelfth International Conference on Artificial Intelligence and Statistics, (2010). [27] R. D. Turner, Gaussian Processes for State Space Models and Change Point Detection, Ph. D. Thesis, University of Cambridge, (2011). [28] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain, Gaussian processes for time-series modelling., Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, vol. 371, no. 1984, p , (2013). [29] L. Bornn, C. R. Farrar, G. Park, and K. Farinholt, Structural Health Monitoring With Autoregressive Support Vector Machines, Journal of Vibration and Acoustics, vol p , [30] R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Transactions of the ASME-Journal of Basic Engineering, vol. 82, no. Series D, pp , (1960). [31] S. Särkkä, Bayesian filtering and smoothing. Cambridge University Press, (2013). [32] J. Ko, D. J. Klein, D. Fox, and D. Haehnel, GP-UKF: Unscented Kalman Filters with Gaussian Process prediction and observation models, in IEEE International Conference on Intelligent Robots and Systems, (2007), pp [33] R. Turner, M. P. Deisenroth, and C. E. Rasmussen, State-Space Inference and Learning with Gaussian Processes, in International Conference on Artificial Intelligence and Statistics, (2010). [34] A. C. Damianou, M. K. Titsias, and N. D. Lawrence, Variational Gaussian Process Dynamical Systems, in Advances in Neural Information Processing Systems, (2011). [35] A. Girard, C. E. Rasmussen, J. Quinonero-Candela, and R. Murray-smith, Gaussian Process Priors With Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting, in Advances in Neural INformation Processing Systems 15 (NIPS), (2002). [36] A. Girard, Approximate Methods for Propagation of Uncertainty with Gaussian Process Models, Ph. D. Thesis, University of Glasgow, (2004). [37] K. Worden, G. Manson, and N. R. J. Fieller, Damage Detection Using Outlier Analysis, Journal of Sound and Vibration, vol. 229, no. 3, pp , (Jan. 2000).

16 484 PROCEEDINGS OF ISMA2014 INCLUDING USD2014

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,