Gear Health Monitoring and Prognosis Matej Gas perin, Pavle Bos koski, -Dani Juiric ic Department of Systems and Control Joz ef Stefan Institute Ljubljana, Slovenia matej.gasperin@ijs.si Abstract Many industrial set-ups nowadays rely on different types of gears to transmit rotational torque. A gear usually operates in a closed environment, immersed into lubricant (oil). Therefore, direct monitoring of its condition is not possible. We have constructed a pilot test bed in order to develop and test different methods for gear health monitoring and predicting the time of failure. In this paper we present a novel approach to the estimation of gear failure, based on measured vibration signals. From these signals features are extracted first and then the evolution over time is monitored. Expectation-Maximization algorithm is used to estimate the parameters of a discrete linear state-space model that is used to model the features time series. Time to reach safety alarm threshold is determined by making the prediction using the estimated linear model. Keywords - Condition monitoring; Gear systems; Signal processing; Expectation-Maximization I. I NTRODUCTION An open problem in industry is that - according to the estimates - over 95% of installed drives belong to the older generation with no embedded diagnostic functionality. This means that most of them are poorly monitored or even not monitored at all. As many of such machines will still be operating for some time, an upgrade with low-cost intelligent condition monitoring module would significantly improve surveillance, reduce maintenance costs, decrease failure rates and increase equipment availability. Indeed, timely localization of the root cause would allow for more efficient proactive maintenance as well as less costly and better planed fault accommodation. The goal of this work is to construct a reliable machine fault prognostic system to forecast damage propagation in rotary machinery. For this purpose, an experimental test bed has been constructed, which consists of an motor-generator pair, connected with single stage gearbox (c.f. Figure 1). Gear health can be assessed by processing vibration signals by means of Hilbert transform (HT) [1], [2]. Components from Hilbert spectrum can be used as fault indicators (or features). Feature us a real number calculated every ten minutes from five seconds long signal acquisition session. The sequence of features can be viewed as a time series i.e. a realization os the stochastic process. The objective of this work is to extend feature extraction with the prediction of the feature time series in order to estimate the time of occurrence of the safety alarm. There are several different techniques available for timeseries prediction [3], among them neural networks and neurofuzzy systems. In this paper we assume that the time series can be viewed as an output of a second order linear, discrete time, stochastic dynamic system. Because of the progress of faults in gears, the systems parameters are expected to change in time. For online estimation of model parameters an Expectation-Maximization (EM) algorithm is used. The EM algorithm is a well established method, which arose in the mathematical statistics community [4], but has found wide engineering applications in different areas. One of the applications is the use of the algorithm for estimating statespace model parameters in the presence of hidden variables (unmeasured states) [5]. The outline of the paper is as follows. Chapter 2 will present the experimental setup used for this study, experimental protocol and feature extraction. Chapter 3 will describe the stochastic time series properties and prediction along with the EM algorithm for model parameter estimation. Results of prediction using the data from the test bed are presented in Chapter 4. II. T HE EXPERIMENTAL S ET- UP The experimental test bed consists of a motor-generator pair with a single stage gearbox (Fig. 1). The motor is a standard DC motor powered through a Simoreg DC drive. A generator is being used as a break. The generated power is being fed back in the system, thus achieving the breaking force. Fig. 1. The test bed The test bed is equipped with 8 accelerometers. The mounting position and sensitivity axis of each accelerometer are shown in Figure 2.
Fig. 2. Vibration sensors placement scheme A. The Experimental protocol The test run was done with a constant torque of 82.5 Nm and constant speed of 99rpm. This speed of 99rpm generates GMF f gm = 396Hz, rotational speed of input shaft f i = 16.5Hz, and rotational speed of output shaft f o = 24.75Hz. The signals were sampled with sampling frequency f s = 8kHz. Each acquisition session lasted for 5 seconds. The acquisition was repeated every 1 minutes as illustrated in Figure 3. (a) Output gear (b) Input gear Fig. 4. Output and input gear at the end of the experiment (notice heavily pitted teeth) FE 1 y 1 VIBRATION SENSORS FE 2 y 2 TEST RIG FE N y N Fig. 3. The concept of signal acquisition Fig. 5. Feature extraction procedure In order to speed up the experiment, the contact surface between the gears was decreased to 1/3 of the original surface. In this manner the fault evolution horizon is made shorter. The displacement is shown in Fig. 4. The overall experiment run took 65 hours. At the end both gears were heavily damaged i.e. on both gears spalling can be seen on all teeth, which progressed even into a plastic deformation of some of them,as shown in Fig. 4. B. Feature extraction Signals from all 8 sensors were acquired simultaneously. These signals acquired at each acquisition session, were analyzed using envelope analysis [2]. From each sensor s i {s 1,, s 8 }, at each acquisition session k, a feature vector [y si,1(k), y si,2(k),, y si,m(k)] was derived, where m is the total number of extracted features. Each element of the feature set represents the value of the amplitude of specific spectral component from the envelope spectrum for the particular sensor s i. III. STOCHASTIC TIME SERIES PREDICTION A. State space model of time series State-space representation is a very general model, that can describe a whole set of different models. In our case we assume that condition of the machine is a dynamic process influenced by random tribological inputs which occur due to impact between moving surfaces. Condition can be viewed as a random process, which can be described by a state space model: x t+1 = f(x t,u,w t,θ), y t = g(x t,u,e t,θ) (1) where y t is a measured data, u measured system input x t an unmeasured system state, w t and e t denote process and measurement noise, respectively. Θ denotes a vector of static model parameters. For practical use, the expression (1) can be in many cases simplified and transformed into linear form. w t and e t are
assumed to be mutually independent white noise processes with variance Q and R respectively. Additionally, where no distinct measurable inputs are present, u t is considered to be. The resulting model is described as x t+1 = Ax t + w t, y t = Cx t + e t, (2) We assume that the system starts with initial vector x with mean µ and covariance matrix Σ. The observed system output data, indexed by time (y t ) represent the time series we wish to analyze or predict. B. EM algorithm for dynamic state-space system estimation Expectation-Maximization is applied as an iterative method to estimate a vector of unknown parameters (Θ), given measurement data (Y = [y 1, y 2,..., y n ]). In other words, we wish to find the set of parameters Θ, such that the data likelihood function (p(y Θ)) is a maximum. This estimate of Θ is known as maximum likelihood (ML) estimate. Usually, the log-likelihood function of the parameters is defined as, L(Θ) = lnp(y X,Θ), (3) where X = [x 1,x 2,...,x n ]. Since ln(x) is a strictly increasing function, the value of Θ that maximizes p(y X, Θ) also maximizes L(Θ). { Θ k+1 = arg max EX Y,Θk {lnp(y,x Θ)} } (4) Θ EM algorithm is an iterative procedure for maximizing L(Θ), meaning that after k th iteration, we obtain the estimate for Θ, denoted by Θ k. The advantage of the algorithm is, that it can also operate when system states (x) are not known. Because the output is dependant on the unobserved system states, direct maximization is not possible. The EM algorithm alternates between two steps, first maximizing the likelihood function with respect to the system states (E-step) and than with respect to the parameters (M-step). 1) E-step: Given an estimete of the parameter values (Θ k = {A, Q, C, R, x, Q }), Rauch-Tung-Striebel (RTS) smoother provides an optimum estimate of the unobserved state sequence (x t ) of a state space model (2). In dynamical systems with hidden states the E-step corresponds directly to solving the smoothing problem. In other words, for any time t we would like to compute p(x t y 1:T ) (5) where y 1:T = [y 1, y 2,..., y T ], T > t. The RTS smoother procedure is as follows: First, Kalman filter is applied to the observable data in a forward manner (t = 1, 2,..., n), starting with initial estimate of the state mean and variance (x,σ ). TABLE I RTS SMOOTHER ALGORITHM Forward filter Initialization x = x P = P Computation: For t = 1, 2,..., n x t+1 t = Ax t t P t+1 t = AP t t A T + Q K t = P t+1 t C T (CP t+1 t C T + R) 1 x t+1 t+1 = x t+1 t + K t(y t+1 Cx t+1 t ) P t+1 t+1 = P t+1 t K tcp t+1 t Backward filter Computation: For T = n, n 1,...,1 J t = P t t A T P 1 t+1 t x t T = x t t + J t(x t+1 T x t+1 t ) P t T = P t t J t(p t+1 T P t+1 t )J T t Recursive smoother is applied in a backward manner (t = n, n 1,...,1), taking the filtered state estimate at time n as initial condition. Summary of the algorithm is given in Table I. 2) M-step: As stated above, the vector of unknown system parameters in the case of state space model is given as Θ = {A, Q, C, R, x, Σ } (6) If the system states can be observed in addition to system outputs, the joint pdf can be written as p(y Θ) = p(y X, Θ)p(X Θ) Assuming Gaussian distribution of initial state and ignoring the constants, the complete data log-likelihood can be written as 2 lnl(θ) = ln Σ + (x µ ) Σ 1 (x µ ) + n ln Q + (x t Ax t 1 ) Q 1 (x t Ax t 1 ) + n ln R + (y t Cx t ) R 1 (y t Cx t ) (7) where x N (µ,σ ). Taking the expected value of the expression with respect to the current parameter estimate (Θ k ) and complete observed data Y n l(θ Θ k ) = E { 2 lnl(θ) Y n,θ k } (8) Using the results from RTS smoother equations: (x t x t ) = xn t xn t + P n t (9) (x t x t 1 ) = xn t xn t 1 + P n t,t 1 (1) (x t 1 x t 1) = x n t 1x n t 1 + P n t 1 (11) (x t ) = x n t (12)
and taking expectation over the expression (7) yields where l(θ Θ k ) = = ln Σ + tr [ Σ 1 (P n + (x n µ )(x n µ ) ) ] + n ln Q + tr [ Q 1 {S 11 S 1 A AS 1 + AS A } ] + n ln R + tr [ R 1 {S 22 S 2 C CS 2 + CS 11 C } ] (13) S 11 = S 1 = S = S 22 = S 2 = x n t x n t + P n t (14) x n t x n t 1 + P n t,t 1 (15) x n t 1 xnt 1 + P n t 1 (16) n E(y t y t ) = (y t y t ) (17) y t x n t (18) Or goal is to find the maximum of the function l(θ Θ n ) with respect to parameters A,C,Q,R,x,Σ. Calculating derivatives of the Eq. (13) with respect to all parameters we obtain the following result A = S 1 S 1 (19) Q = n 1 ( S 11 S 1 S 1 ) S 1 (2) C = S 2 S 1 11 (21) R = n 1 ( S 22 S 2 S 1 ) 11 S 2 (22) x = x n (23) Q = P n (24) Both E and M steps are iterated until the increase in the likelihood at the current time step compared to the previous one is greater than the selected threshold. 3) Time series prediction: With known model parameters, predicting the future values of the time series is straightforward. We start at the time of the last measure, with the filter estimate of the state vector distribution (x n n, P n n ), where n is the time index of the last measured sample. At every future time step t > n, the mean and variance of the output vector y t can be calculated analytically. IV. EXPERIMENTAL RESULTS The algorithm has been used to predict first passage time in the setup as described in section 2. Each feature is represented by a time series of 39 samples. The goal is, to use the algorithm for online prognosis, which is done in the following way. At time t, we estimate the parameters of the underlying stochastic model (Eq. 2) based on data window y (t Nw+1):t, where Nw is window length. Then we use the model to predict future behavior, that is time T when y(t) for the first time crosses the alarm value y i.e. y(t) y. The idea behind the approach is that as the condition of the machine will change over time, so will the values of the model parameters. Model parameters will determine the trend in the feature value, while noise covariance parameters stand for the varying noise component (which increases in the times close to failure). A. Model structure The underlying model is assumed to be of the following form: x 1 (t + 1) = x 2 (t + 1) = a 11 x 1 (t) + a 21 x 2 (t) + w 1 (t) a 22 x 2 (t) + w 2 (t) y(t) = c 1 x 1 (t) + e(t) (25) or in matrix state-space form x t+1 = Ax t + w t, B. Online tracking of model parameters y t = Cx t + e t, (26) Whenever a new measurement is obtained, the algorithm estimates model parameters using the last 1 samples. Initial parameters of the model are set to the following values: [ ] 1 1 A =, 1 [ ] 1 Q =, 1 C = [ 1 ] R = [ 1 ] [ ] x = [ ] 1 Σ =, 1 (27) Convergence of the EM procedure is achieved when the relative increase of the log likelihood function is less than 1 4. Figure 6 shows the measured time series, first eigenvalue of the estimated matrix A and diagonal elements of the estimated noise covariance matrix Q. Up to 4 hours of operation, it can be seen that the feature values are relatively small and there is no trend present. Therefore the estimated system is stable (eigenvalue is less than 1). However, when the feature value starts to increase, the system eigenvalue becomes greater than one.
Feature values time series 15 1 5 2 25 3 35 4 45 5 55 6 65 Feature value 12 1 8 6 4 First eigenvalue of matrix A 1.5 1.95 2 25 3 35 4 45 5 55 6 65 Diagonal elements of Q 1 x 14 5 2 25 3 35 4 45 5 55 6 65 Predicted time [h] 2 12 11 1 1 2 3 4 5 6 9 8 7 6 (a) Feature measured time 5 4 Fig. 6. Values of the estimated parameters 3 45 46 47 48 49 5 51 52 53 54 55 56 57 58 59 The increase in system noise parameters corresponds to stronger noise component present in the signal, which can be clearly seen from time series (between 4 and 65 hours). Fig. 7. (b) Prediction Results of MC analysis using eight sensor data feature C. Prediction of the first passage time The goal of our prediction is to determine the time, at which the feature will exceed a certain value, we approach in the following way. At a selected time (t), model parameters are estimated using windowed data (y (t Nw+1):t ). Based on the estimated model, a Monte-Carlo simulation is preformed. For every run of MC simulation a time when the predicted feature value becomes greater than a certain threshold is determined. The procedure is repeated at different times and each for each of them the simulation is repeated 1 times. The large number of iteration is necessary due to relatively large value of noise covariances which cause a high dispersion of the results. The final result of the Monte-Carlo simulation analysis is a distribution of the times, at which feature value became greater than the selected threshold for an array of different times. From the full time series, an actual time at which this occurred at the actual machine has been determined. The described results for a feature representing 8 th sensor data (c.f. 2) of the vibrations are shown in Figure 7. It can be seen, that the first estimates of the approximate time are made around 2 hours before the feature actually achieves that value. The estimation variance gradually decreases and, in this case, 15 hours before the variance settles inside a ±5 hours of the true value. To evaluate the performance of our algorithm, we have compared its results to the ones obtained using a simple linear regression. for this, we used a time series consisting of third sensor feature values. The results are shown in Figure 8, where black boxplots represent the results from EM based Feature value 15 1 Predicted time [h] 5 Feature value Safety alarm value 1 2 3 4 5 6 12 1 8 6 4 (a) Feature 45 46 47 48 49 5 51 52 53 54 55 56 57 58 59 (b) Prediction Fig. 8. Comparison of the MC simulation results from third sensor data for EM based alrogithm (black) and linear regression (grey) algorithm and grey boxplots from linear regression. Both times, the simulation has been repeated 1 times. It can be seen, that EM based algorithm provides a more accurate prediction of the first passage time. This results confirm our hypothesis, that the complex dynamics of the feature value, coupled with strong noise component require a more advanced models in order to predict the future time
series evolution. V. CONCLUSION It has been shown that second order dynamic linear stochastic state space model is sufficient to model the dynamic behavior of the vibration feature value in the presented setup. Model parameters have been estimated using Expectation-Maximization algorithm, which iteratively maximizes the model likelihood function with respect to hidden states and to unknown parameters. The results show, that the model obtained in this way can effectively predict the future behavior of the feature and can therefore be used to predict the time of safety alarm. From current experiments, we conclude that using this method, the accurate prediction can be made 15 to 2 hours in advance. This offers the machine operators or maintenance a reasonable amount of time to replace the gear system without causing unnecessary production downtime. In this stage, our algorithm has only been tested in the case of constant load. In real application, it is common that the load as well as rounds per minute are changing in time. Next step in the development of the procedure is to modify the algorithm so that it will include these as a measured system input. REFERENCES [1] R. Rubini and U. Meneghetti, Application of the envelope and wavelet transform and analyses for the diagnosis of incipient faults in ball bearings, Mechanical Systems and Signal Processing, vol. 15, pp. 287 32, 21. [2] D. Ho and R. B. Randall, Optimisation of bearing diagnostic techniques using simulated and actual bearing fault signals, Mechanical Systems and Signal Processing, vol. 14, pp. 763 788, 2. [3] W. Q. Wang, M. F. Golnaragh, and F. Ismail, Prognosis of machine health condition using neuro-fuzzy systems, Mechanical Systems and Signal Processing, vol. 18, pp. 813 831, 23. [4] A. Dempster, N. Lard, and D. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society, Series B, vol. 39, pp. 1 38, 1977. [5] S. Gibson and B. Ninness, Robust maximum-likelihood estimation of multivariable dynamic systems, Automatica, vol. 41, pp. 1667 1682, 25.