Bayesian and Monte Carlo change-point detection ROMN CMEJL, PVEL SOVK, MIROSLV SRUPL, JN UHLIR Department Circuit heory Czech echnical University in Prague echnica, 66 7 Prague 6 CZECH REPUBLIC bstract: - he contribution presents to analyses and comparison of the recursive (sliding window) Bayesian autoregressive normalized change-point detector (RBCDN) and the reversible ump Marov chain Monte Carlo method (RJMCMC) when they are used for the localization of signal changes (change-point detection). he choice of priors and parameter setting for the RJMCMC and the RBCDN are discussed. he evaluation of both the algorithms performance is shown, and their accuracy and some illustrative examples with synthetic and real signals are presented. Key-Words: - Monte Carlo method, Bayesian change-point detector, multiple changes, signal segmentation Introduction he change-point detection and its application to signal segmentation has been intensively studied for many years. Various methods have been developed. For example, robust and reliable algorithms using recursive identification [], [], lielihood and Bayesian approaches have been designed for the batch or sequential detection [], [3], [5] of multiple changepoints. he most reliable and noise resistant methods are based on the Monte Carlo method [], [4], [6]. his contribution is focused on the analysis and comparison of the modified Bayesian autoregressive change-point detector (BCD) [] with RJMCMC method [6]. he classical Bayesian autoregressive change-point detector [] assumes piecewise constant parameters of R signal and one change-point in one signal segment. he condition requiring only one change in one signal segment is too limiting for the analysis of real signals as speech, music or biological signals. he segmentation of these signals requires the detection of multiple changepoints. herefore the recursive BCD algorithm based on the sequential Bayesian approach has been suggested [8]. Effective way of the Bayesian detector implementation is based on growing window recursions for a change-point position and a new data. Robust implementation of a recursive growing window algorithm for one change-point detection is suggested in [], [7]. he modification of this algorithm suggested in [] based on sliding window algorithm using the normalization of the BCD by the data dependent Bayesian evidence is compared to slightly modified RJMCMC described in [6]. he Bayesian detector was chosen for its effective implementation caused by removing nuisance parameters by the marginalization process and no need of the final signal length. Methods used Firstly, the brief definition of autoregressive signal model with one change will be given. hen the RBCDN algorithm description including the parameter setting will be shortly given. Finally, the RJMCMC for multiple change-point detection will be discussed including choice of priors.. Signal model and BCD definition he signal model for one change-point used throughout this text consists of two parts: left generated by R model with M parameters b and right generated by another R model with M parameters b [] d[ n] = M b d[ n ] + e[ n], n m = n =, L, N. () b d[ n ] + e[ n], n > m M = In matrix form: d = G b + e. he matrix G has the Jordan form and depends on the unnown changepoint index m =,..., N. Let us summarize priors for the BCD as follows. Excitation process e [n] of R model is assumed to be stationary white Gaussian process with zero mean and varianceσ. Noninformative priors were chosen for R parameters b, change-point position m and standard deviation σ. More specifically, the uniform prior was assigned to m and b, and Jeffreys prior for σ. nother assumption is the constant value of evidence p ( d used in the denominator of Bayesian rule
d Θ, Θ, Θ d, = () d where M stands for the signal model, and Θ for model parameters ( b, σ, and b in this case). Under the above mentioned assumptions the marginalization process applied to Bayesian rule leads to posterior probability (classical BCD formula) [] [( D g Φ g )] N M m d, M ) =, M = M + M (3) containing only one desired parameter m. Other nuisance parameters ( b, σ ) of the model M are eliminated by the marginalization process and they need not to be estimated. But the assumption about the constant evidence p ( d is why it is not used in (3). Matrix Φ = ( G G ) is the inverse correlation matrix, = d D d is the signal energy, g = d G is the crosscorrelation vector, and = det( G G ) stands for determinant of correlation matrix. Matrix Φ = ( G G ) is the inverse correlation matrix, = d D d is the signal energy, g = d G is the cross-correlation vector, and = det( G G ) stands for determinant of correlation matrix. he change-point position m is then uniquely determined by the maximum of the posterior p ( m d, (MP). When signal contains more changes than one then model (3) is not valid. he use of posterior (3) leads to one change-point localization in this case. hus the assumption of only one change in the given signal is very limited in practice. Real signals contain more than one change even for short lengths. nother disadvantage of posterior (3) reclines in the great computational costs and numerical instabilities when signal length is not short (several hundreds ). he latter disadvantage can be overcome by using recursive evaluation of posterior (3) as suggested in [], [7], and using the logarithm of (3). he former problem is more difficult to solve. One simple solution consists in repeatedly using of posterior (3) for segmented signal and omitting the MP step. he problem arising if this approach is applied reclines in the fact the data (and possibly model M ) are not constant further. Next paragraph describes the solution of this problem in more details.. RBCDN definition When the BCD should be used repeatedly for multiple change-point detection the normalization is required to ensure the comparison between various signal segments. he impossibility of the comparison posteriors (3) for two different signal segments follows from the differences between signal models M corresponding these two segments as mentioned in the preceding paragraph. hus the normalization by the evidence corresponding the given signal segment is required [( D g Φ g )] N M ~ m d, = Δ N [( D gφg )] he second term represents data dependent Bayesian evidence (slightly modified from []) where Φ, D, g, and Δ are defined similarly as in (3) but for the whole signal segment without any division into left and right parts. his evidence is evaluated for any new model M given by a new data segment. he sequential evaluation of formula (4) leads to the two-steps RBCDN algorithm. First step is the initialization of d d, d G, G G, det( G G), and ( G G) giving ~ p ( m + d,, which represents the central value of posterior (4) computed for a given signal segment. Second step is sliding window updates of all above given functions for a new sample followed by removing old sample, and the position update giving ~ pl ( m + d new,, l =,3,... Details of the RBCDN algorithm and notes on its implementation are given in []. he described approach enables to overcome problems arising from the need of the final signal segment and only one change-point in a given signal..3 RJMCMC description he RJMCMC (Reversible Jump Marov Chain Monte Carlo) method [4] serves here as a reference for comparisons with the proposed RBCDN method. he RJMCMC enables to detect multiple changes in an analyzed signal. hus no segmentation is needed but the signal length must be restricted to the final length with L (similarly as for the BCDN). he RJMCMC concept followed here is slightly modified from [6]. Modifications were done with the aim to match priors used for the RBCDN with priors used for the RJMCMC. Let the vector of change-positions be s = [ s, s,... s ], where the parameter is the number of changes in data with truncated Poison distribution. he hyper-parameters of this distribution are intensityλ and the maximum number of changes max. In order to simplify lielihood evaluation (gamma functions) the possible change-points positions are restricted to even only. o overcome problems with short segments detection the prior s ) is chosen to be zero for segment lengths shorter or equal to δ. Prior M (4)
s ) is supposed to be uniform or nonzero region. R parameters b and the variance σ of the excitation process have the flat normal and the flat scalar inverted Wishart distribution, respectively. he hyper-parameters of normal and inverted Wishart distribution are chosen in order to match the RBCDN priors and the RJMCMC priors as much as possible. he lielihood for the given data and desired parameters can be written easily. Multiplying lielihood by parameters priors and marginalizing analyticallyσ, b, the unnormalized posterior for number of changes and their positions s, which is here of primary interest, can be obtained. Using RJMCMC, the ergodic Marov chain is constructed with equilibrium distribution equal to the posterior. Detected change-points are determined by the MP estimation constructed from obtained by repeatedly running throughout the chain (iterations). ransition ernel of the chain is equipped with "update of changepoint positions", "birth/death of change-point" move types. ssuming that the previous state of the Marov chain is (, s ) the proposed state is constructed as follows. he update move: one change-point s from s is chosen and its new position is proposed with the uniform distribution from even between s +δ and s + δ. he birth move: new changepoint s is proposed from uniform distribution over even * numbers from the set { i + i+ i= U s δ +,..., s δ } where s and s = L = +. he death move: one of the change-points s from s is chosen with uniform distribution and deleted from s..4 Notes on RBCDN and RJMCMC Prior information used for the RBCDN derivation, especially one change in one segment implies very simple algorithm. Unfortunately, this algorithm is not as robust and noise resistant as the RJMCMC method where priors and used idea are more sophisticated (see preceding paragraph). he length of window used for RBCDN is very limiting feature of this algorithm as will be shown later. On the other hand the length of analyzed signal for RBCDN is not limited as for the RJMCMC method. he RBCDN also offers on-line analysis with small delay determined by the window length used. 3 Experiments and results he behaviour of above described algorithms was tested by experiments with synthetic and real signals using Monte Carlo simulations. R signals with various parameters (model orders, coefficients, number of change-points and their positions) were generated using white noise passed through all-pole filters. Results were evaluated using histogram and accumulated histograms (each created from or realizations of R signals). 3. Cepstral distances he degree of changes are given by cepstral distance (in [db]) defined by d M =.34 ( c c ) + ( c m c m ) m= 4 [db], (5) where coefficients c m describes the left signal part and c m the right signal part. he distance d includes changes in spectrum shape given by coefficients c m, and c m, m =,,..., and also changes in signal energy given by coefficients c and c. When c and c are omitted then the distance d reflecting only spectral changes can be obtained. 3. RJMCMC parameter setting Setup for the RJMCMC method in performed experiments is as follows. he number of iterations for Marov chain is with the burn-in period in all experiments. he probability of choosing respective move types is.5 for the update move,.5 for the birth move, and.5 for the death move (for =,..., max ) in all experiments. For = and = max the probability of the choice of all available move types is uniform. Parameter δ is chosen for all experiments. Other parameters, the R model order M and intensity of changes λ are chosen as follows. he parameters are M =, λ = for experiments with synthetic signals and M =, λ = 4 for experiments concerning the segmentation of a violin signal. he R model order for the violin signal is adopted from the RBCDN method as this method was able to detect all changes with this order. R model order for synthetic signal is its actual order. he choice of hyperparameter λ insures the density of changes remain approximately the same. Notes on RJMCMC outputs used in figures. he histogram of posterior obtained from the RJMCMC cannot be used for plots because of its great dimensionality. hus some simplifications must be made to obtain readable figures. Procedure used for plots (Fig. at the bottom and Fig. 5 in the middle) is as follows. Firstly, the histogram of marginal distribution for is computed. he maximum of this histogram has the index. Secondly, the histogram for s is constructed. Finally, as the histogram of s is -dimensional, only the sum across marginal histograms of components of s is shown.
3.3 Results and illustrative examples Some illustrative examples of analyses are given below. Fig. illustrates the synthetic signal and its spectrogram together with filter characteristics. Signal is composed of four power normalized parts generated using four different coefficients sets. Corresponding decreasing cepstral distances d determining degrees of changes are given below the waveform. he inspection of the spectrogram shows very small last change ( d = db). his change is almost inaudible which corresponds with the fact that cepstral distances below db are inaudible for acoustical signals. Fig. shows typical shapes of RBCDN output ~ p (, M l m + dnew ) and the RJMCMC output (for definition see preceding paragraph). H R signal Frequency -.. 3.4. -4-4. 3...5 Frequency - Re.. 3. 4. Im 4. 3.. 5dB 3.5dB db - 3 4 5 6.4. 3 4 5 Fig. Synthetic R signal with 3 changes. From top to bottom: Normalized frequency responses of all-pole filters and pole diagram, waveform with borders given by change-points (including cepstral distances in db), and spectrogram. RBCDN RBCDN RJMCMC 5 window 3 4 5 6 5 window 5 3 4 5 6 iterations.5 3 4 5 6 Fig. Results of change-point detection. From top to bottom: posterior ~ pl ( m + dnew, of RBCDN for two window lengths, and RJMCMC output... he inspection of Fig. shows two basic features of used algorithms. First, the noisy character of the RBCDN histogram for short window ( ) can be seen. Second the RJMCMC output is more focused than the RBCDN output enabling more precise change-point localization. he latter conclusion is verified by histograms in Fig. 3. It can be seen that the variance of the RBCDN histogram ( st and nd histogram) is greater than the variance of the RJMCMC histogram (3 rd histogram). Occurence Occurence Occurence 3 RBCDN window 3 4 5 6 3 RBCDN window 5 3 4 5 6 3 RJMCMC iterations 3 4 5 6 Position Fig.3 Histograms of change-point detection of RBCDN and RJMCMC evaluated using R process realizations. When the distance between changes is shorter than the window length then the RBCDN detects only one change. Similarly, very close-distant changes are not separated and localized by the RJMCMC. he preceding conclusions about the precision of change-point localization can be also seen from ab. and ab., where mean deviations standard deviations for the RBCDN and the RJMCMC are given. ab. summarizes the RBCDN mean deviation and standard deviation (in ) of change-point localization for three changes differing in their levels ( st column), and for various window lengths ( nd column). While abrupt changes can be localized relatively precisely for all window lengths (bottom of 3 rd and 4 th columns), the localization of wee changes requires longer window length more than 4 (see 3 rd and 4 th columns for and 3.5 db). ab. shows the RJMCMC mean deviation and standard deviation for various number of iterations. It can be concluded the errors of RJMCMC are lower than errors of RBCDN, especially for wee changes. he choice of iterations given in paragraph 3. is verified by the fact that the differences between MP
estimates based on and 5 iterations were negligible (less than 5 for violin, and less than 6 for synthetic signals). hus iterations seem to be sufficient for the described experiments. lso it was found the setting of lambda as little impact on final result. he differences of RJMCMC results gained on violin signal for λ = and λ = 4 were negligible (less than 5 ). d cep Length of window Mean deviation SD [db] 9 478 3 4 55 5-3 3-5 3 3.5-6 7 3.5 3-7 3.5 5-3 6 3.5-4 7 5-4 9 5 3-6 3 5 5-7 9 5-4 ab. Mean deviation and standard deviation (SD) for RBCDN with various window lengths. d cep No. of iterations Mean deviation SD [db] 4 3 3.5 5 3.5 3 5-4 9 5-4 ab. Mean deviation and standard deviation for RJMCMC. Examples of the segmentation of real violin signal are given in Figs. 4 and 5. While the RBCDN detects all changes in tones including wee change (see 3 rd change representing halftone) the RJMCMC omits this change. hus the RBCDN with window length seems to be suitable for the separation of tones while the RJMCMC detects all changes (see for 3 rd and 8 th changes in Fig. 5 bordering short transient regions). violin RBCDN Frequency. -. 6 55 5.4..5.5.5 x 4.5.5.5 x 4.5.5.5 x 4 Fig.4 Change-point localization of violin signal using RBCDN with the window length (waveform, RBCDN output, spectrogram with detected change-points) violin RJMCMC Frequency. -..5.4..5.5.5 x 4.5.5.5 x 4.5.5.5 x 4 Fig.5 Change-point localization of violin signal using RJMCMC (waveform, RJMCMC output, spectrogram with detected change-points) 4 Conclusion he performance of the sliding window change-point detection algorithm based on the normalization of the probability density function by the Bayesian evidence (RBCDN) was compared with the RJMCMC method. he RBCDN and RJMCMC behaviour were illustrated by experiments with synthetic and real signals. Further research will be focused on an automatic model order selection using the Bayesian evidence, and the optimization of the method for automatic signal segmentation.
cnowledgement: heoretical part of this wor has been supported by the research program Research in the rea of Information echnologies and Communications MSM 34 of the Czech University in Prague while the experimental part including evaluation results by the grant G //4 Voice echnologies for Support of Information Society. References: [] F. Gustafsson, daptive filtering and change detection. J. Wiley New Yor,. [] J. J. K. Ó Ruanaidh and W. J. Fitzgerald, Numerical Bayesian methods applied to signal processing. Springer-Verlag New Yor, 996 [3]. Procháza, J. Uhlíř, J., P.J.W. Rayner, N.G. Kingsbury (eds.), Signal nalysis and Prediction. Birhauser, Boston, 998. [4] P. J. Green, "Reversible ump MCMC computation and Bayesian model determination", Biometria, vol. 8, pp. 7-73, 995 [5] J-Y. ourneret, M Doisy, and M. Lavielle, Bayesian off-line detection of multiple change-points corrupted by multiplicative noise; application to SR image edge detection, Signal Processing, vol. 83, pp. 87 887, 3. [6] E.Punsaya, C. ndrieu,. Doucet, and W. J. Fitzgerald, Bayesian curve fitting using MCMC with applications to signal segmentation, IEEE rans. on Signal Processing, vol. 5, pp. 747 758, Mar.. [7] J.J.K.O'Ruanaidh, W.J.Fitzgerald and K.J.Pope, Recursive Bayesian location of a discontinuity in time series, in Proc. International Conference on coustics, Speech and Signal Processing, delaide, ustralia, 994. [8] S. J. Godsill and J. W. Rayner, Digital audio restoration. Springer-Verlag New Yor, 998 [9] S. M. Kay, ands. L. Marple, Spectrum nalysis Modern Perspective, Proceedings of the IEEE, vol. 69, pp. 38 49, Nov. 98 [] R. Cmela, and P. Sova: WSES udio Signal Segmentation Using Recursive Bayesian Changepoint Detectors. WSES ransactions on Computers. 4, vol. 3, no. 4, pp. 87-9. [] P. nderson, daptive forgetting in recursive identification through multiple models, International Journal of Control, vol. 4, pp. 75 93, 985. [] W. K. Gils, SW. Richardson, D.J.Spiegelhalter, Marov chain Monte Carlo in practice, Chapman and Hall, 996