KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030 chang@gmu.eu Abstract - In state estimation of ynamic systems, Sequential Monte Carlo methos, also nown as particle filters, have been introuce to eal with practical problems of nonlinear, non-gaussian situations. They allow us to treat any type of probability istribution, nonlinearity an non-stationarity although they usually suffer major rawbacs of sample egeneracy an inefficiency in high-imensional cases. In this paper, we show how we can exploit the structure of partially ynamic hybri Bayesian networs (PD-HBN) to reuce sample epletion an increase the efficiency of particle filtering, by combining the well-nown KNN majority voting strategy an the concept of evolution algorithm. Essentially, the novel metho re-samples part of the variables an ranomly combines them with the existing samples of other variables to prouce new particles. As new observations become available, the algorithm allows the particles to incorporate the latest information so that the top K fittest particles associate with a propose objective rule will be ept for re-sampling. With simulations, we show that this new approach has a superior estimation/classification performance compare to other relate algorithms. Keywors: Dynamic Bayesian Networs, Hybri Bayesian Networs, an Particle Filters. 1 Introuction Bayesian networs represent a probability istribution using a graphical moel of a irecte acyclic graph (DAG). Every noe in the graph correspons to a ranom variable in the omain an is annotate by a conitional probability istribution (CPD), efining the conitional istribution of the variable given its parents. Pure iscrete networs are the most popular ones use in practice. However, iscrete networs are inaequate since many practical problems involve continuous attributes as well as iscrete ones. Hybri Bayesian networs (HBN) contains both iscrete an continuous noes [1] an is general in the sense that it allows arbitrary relationships between any ranom variables in the networ. The ynamic systems in practical applications usually involve HBN moels. Examples of such inclue vehicle navigations, target tracing, parameter estimations an system ientifications an/or classifications. This paper focuses specifically on HBN ynamic systems that can be escribe by a partially ynamic HBN moel (PD-HBN) where not all noes in the networ are ynamic noes (see Figure 1). Figure 1. A Systematic Moel of PD-HBN. The hien system state S (incluing ynamic an static, evolves over time noes), with initial istribution p( S 0 ) as an inirect or partially observe first orer Marov process accoring to the conitional probability ensity p ( S S 1 ). The observations { e } are conitionally inepenent given the state an behave accoring to the probability ensity p( e S ). In general, the PD-HBN can be escribe by the following ynamic equations S = ψ ( S 1, U 1) (1) e = h ( S, W ) (2) Where U enotes the process vector that rives the ψ, ynamic system through the state transition function ( ) an W is the measurement noise vector corrupting the observations of the state through the function h( ). The state transition ensity p ( S S ) is fully specifie by 1 ψ ( ) an the process istribution p( U ), whereas h( ) an the observation noise istribution p( W ) fully specify the observation lielihoo p( e S ). 0-7803-9286-8/05/$20.00 2005 IEEE

One of the most funamental issues for Bayesian networs is the problem of probabilistic inference. Given a Bayesian networ that represents a joint probability istribution of the variables involve, the inference is to compute the probability istribution over a set of ranom variables given a secon set of evience. For example, in Figure 1, the posterior ensity p ( S E ) of the state given all the observations E { e1, e2,..., e} constitutes the complete solution to a sequential probabilistic inference problem. The optimal metho to recursively upate the a posterior probability ensity as new observations arrive is given by the recursive Bayesian estimation algorithm [2]. First, one can compute the prior p ( S E ) base on the 1 ynamic moel such that p( S E 1) = p( S 1 E 1) p( S S 1) S (3) 1 Then incorporates the latest noisy measurement using the observation lielihoo to compute the upate posterior p( S E 1 ) p( e S) p ( S E) = (4) p( S E 1 ) p( e S) S Although this presents the optimal recursive solution, it is usually only tractable for linear an Gaussian systems, in which case the close-form recursive solution is the wellnown Kalman filter. For most general real-worl (nonlinear, non-gaussian) systems, however, the multiimensional integrals are intractable an approximate solutions must be use [1]. One approach is to iscretize the continuous variables by partitioning their omain into a finite number of subsets [4]. However, this simple approach is often very problematic an might lea to an unacceptable performance both in computation an accuracy [5]. Recently, a popular solution strategy for the general filtering problem is to use sequential Monte Carlo methos, also nown as particle filters (PF) [4], which allow for a complete representation of the posterior istribution of the states, so that any statistical estimates, such as mean an variance, can be easily compute. They can eal with any nonlinearity an/or istribution. However, particle filters rely on importance sampling an, as a result, require the esign of appropriate istributions that can approximate the posterior istribution reasonably well. In general, it is har to esign such an importance istribution. Without special correction strategies, the particle epletion is unavoiable in many situations. This is particularly the case for PD-HBN ue its hybri nature. To overcome this problem, several techniques have been propose in the literature, such as pruning an enrichment to throw out ba an boost goo particles [9], irecte enrichment [4] an mutation (ernel smoothing) [7]. In this paper, we focus our effort on the inference problem for PD-HBN using a novel re-sampling approach base on a K-Nearest-Neighbors (KNN) ecision strategy. The KNN is a non-parametric classification metho, which is simple but effective in many cases. KNN is also a memory-base moel efine by a set of examples for which the outcomes are nown (i.e., the examples are labele). Each example consists of a ata case having a set of inepenent values labele by a set of epenent outcomes. The inepenent an epenent variables can be either continuous or categorical. A majority voting among the ata recors in the neighborhoo is then use to ecie the classification for the test ata with or without consieration of istance-base weighting. 2 K-Nearest-Neighbors Decision The KNN is a case-base learning metho, which eeps all the training ata for ecision applications. To employ the KNN ecision strategy, one nees to efine a metric for measuring the istance between the test ata an training samples. One of the most popular choices to measure this istance is nown as Eucliean. For instance, let s consier the tas of classifying a new test object (2- Dimensional ata) among m classes, an each one has n training samples. Then, there are total mn corresponing Eucliean istances. To apply KNN one also nees to choose an appropriate value for K an the success of classification is very much epenent on this value. In fact, K can be regare as one of the most important factors of the moel that can strongly influence the quality of preictions. In general, K shoul be set to a value large enough to minimize the probability of misclassification an small enough so that the K nearest points are close to the test ata. There are many ways of choosing the K value, one simple iea is to run the algorithm many times with ifferent K values an choose the one with the best performance. r Y O 10-nearest neighbor outcome is a circle 20-nearest neighbor outcome is a square Figure 2. Illustrative iagram for KNN algorithm. Figure 2 shows a possible istribution of 2-D Eucliean istances. Our tas is to estimate the outcome of the test ata base on a selecte number of its nearest neighbors. In other wors, we want to now whether the test ata can be classifie as one of the epicte mars. To procee, consiering the outcome of KNN base on 10-nearest X

neighbor. It is clear that in this example KNN will classify the test ata to the group of circle since the circle is the majority in the 10 selecte neighbors. If increasing the number of nearest neighbors to 20, then KNN will report a square, which is the majority of the 20 neighbors shown in the figure. Since KNN preictions are base on the intuitive assumption that objects closer in istance are potentially similar, it maes goo sense to iscriminate between the K nearest neighbors when maing preictions. 3 KNN-PF Re-sampling Algorithm In this paper, we focus our effort on the inference problem for PD-HBN [7], which contain both iscrete an continuous variables. The consiere hybri moel is general in the sense that it allows arbitrary relationships between any ranom variables as long as they can be expresse in a certain form. In equation (4), the a posterior probability istribution constitutes the complete solution to the sequential estimation problem. However, in many applications, such as target tracing an ientifications, it is ifficult to have a close form solution. In a PD-HBN, one may express the unobserve state as a set, i.e., S { F, D,T}, where F an D present static an ynamic features in the PD-HBN, an T enotes the target noe of interest. For simplicity, we efine the ynamic moel (1) an (2) as a combine function such that PDBN[ F 1, D 1,T ] h{ ψ ( F-1, D 1,T, U 1), W } (5) Assume that there are n iscrete state values in the target noe T, an the evience space is with imensions, i.e., 1 2 e = e e e. In traitional Monte Carlo simulations an particle filters, a set of weighte particles (importance samples), rawn from a propose istribution, is use for representing the state istribution an propagating over ynamic steps. Typically, after a few iterations, one or the sum of a few of the normalize importance weights approaches to 1, while the remaining weights approach to zero. A large number of samples are thus effectively remove from the sample set because their weights become numerically insignificant. This is particularly the case when static iscrete variables are present in the moel. Many mitigation strategies have been propose to avoi this egeneration or epletion of samples. Besies the pruning an enrichment techniques, most of methos are base on estimating the a posterior istributions, such as minimum variance sampling [12], sampling-importance re-sampling [13], an resiual resampling [14]. In this paper, we will propose a novel metho, terme as K-nearest-neighbor particle filters (KNN-PF), to mitigate this egeneration problem. Instea of consiering posterior istributions, we will select particles base on KNN ecision results. At the each iteration, the new particles are constructe by ranomly combining the re-sample particles of the ynamic noe D an the existing samples of the static noes F an T. The iea is to ranomly permute an cross over between ynamic an static particles in such as way that the particles correspon to the static variables remain relatively stable so that the problem of particle epletion can be minimize. The etail algorithm is escribe as follows. Step 1. Set = 1, an generate N particles by sampling base on the prior of Bayesian Networs. Assuming uniform istribution for the static iscrete target noe, there will be Ν n particles for each target class. Step 2. Mae the KNN ecision an select K best particles base on the square Eucliean istances to the current observe evience. Step 3. If is the last level of the PD-HBN, stop; else, = + 1 an go to Step 4. Step 4. Extrapolate the ynamic features of each of the K surviving particles to the next time slice. Step 5. Generate a new set of N particles, where the newly generate ynamic features in Step 4 will be reprouce an ranomly combine with the static features of the previous time slice. Step 6. Of the newly generate N particles in Step 5, replace the ynamic features of the K particles corresponing to the ones chosen in Step 2 with their extrapolate ones obtaine in Step 4. Go to Step 2. In the above algorithm escriptions, the step 5 introuces a concept similar to the evolution algorithm, an the step 6 guarantees the previously selecte goo particles will be extene to the next ynamic step. This novel metho resamples the ynamic variables an eeps the samples of the static variables unchange. As a new set of noisy or incomplete observations becomes available, the algorithm allows the particle filter to incorporate the latest observations so that ynamic variables are upate accoringly. Aitionally, in the step 5, the static variables may be partially moifie in orer to achieve a better iversity especially for large imensional state space. The strategy to re-sample the static noes in the PD-HBN will be iscusse later. Next, we will present the square Eucliean measurement an the KNN strategy in the KNN-PF re-sampling algorithm. Consiering that two sets of ranom variables e, y are rawn from the same istribution ( )

( ) p e S, it is obvious that E y e is zero. Accoring to the law of large number, the statistical mean of the ifference is almost surely convergence to the expectation. Moreover, if the variance of p ( e S ) is boune, then the ifference ( y e ) is multivariate zero-mean Gaussian following the central limit theorem. The probability istribution may therefore be expresse as 1 1 1 (6) p( y e S ) = exp ( ) ( ) T y e Σ y e ( 2π ) 2 4 Σ The new algorithm is to use KNN strategy to select the particles with the state S as close as possible to the true ' S, an then mae the classification ecision upon a majority voting. The majority group is selecte by those particles that have higher lielihoo functions (6), or equivalently, they have smaller istance, 1 ( y ) ( ) T e Σ y e, which is the objective function of our KNN ecision strategy. With multiple inepenent samples over time, the joint ensity function may be illustrate by 1 ( ) exp ( ) ( ) ' p y e S = C y e Σ y e (7) The constant factor C can be irectly erive from (6). More specifically, suppose that there are a total of N particles each contains a sample in the observation space at the -th ynamic step such that 1 2 x [1] x1( ) x1 ( ) x1 ( ) 1 2 [2] x x2( ) x2 ( ) x2 ( ) (8) 1 2 x [N] xn( ) xn( ) xn( ) N ' x i h S, W, where the observations are base on = ( ) an y { x 1 x 2 x N }. From equation (7), we may recursively calculate the square Eucliean istances for each particle, 2 2 1 R i = R i + x i e Σ x i e (9) ( ) ( ) T 1 where Σ enotes the covariance of the evience space. Now, we sort the istances (9) in the ascenant orer. Suppose that { λ 1, λ 2,..., λ N } presents the inex set corresponing to R λ R λ R λ 2 2 2 1 2 N, respectively. We then pic up the top K minimal istances an locate the ynamic features. Namely, q( 1: Κ, ) 2 Z = arg min sort R i (10) i D : Κ, i where ( 1:, ) {,,..., } q λ1 λ2 λ Κ Κ represents the inex set of the K nearest neighbors at the ynamic step. In the KNN-PF, we start with a fixe number of particles obtaine base on the prior istribution. We then choose the top K (say 20) particles for ecision-maing (voting) q( 1: Κ, ) an also use them Z for ynamic propagation. The iea is that assuming enough samples for static variables to start with, we only nee to moify the samples for ynamic noes. For example, with 10 4 particles, the ynamic part of the top 20 particles will be uplicate an merge with the static particles ranomly (1 for 500, an therefore 50 for each target class). This is essentially the re-sampling without the static noes. It has the flavor of the ranom mutation an cross over as in the evolution algorithm. Thus, we have o ql (, ) D i = Z ; l = mo ( i, Κ) (11) On the other han, to ensure iversity, we may choose to re-sample the static noes partially or wholly as well. Suppose that we upate the corresponing static features F every c perioic interval. Dynamically, the static noes can be expresse as F i, mo ( ic, ) 0 o (12) F i = i, F mo ( ic, ) = 0 * Where F enotes the re-sampling static noes. In the meantime, in orer to guarantee the top K particles to survive onto the next time step, we eep an propagate those K particles forwar, namely o PDBN i, i,, F D T i q( 1: Κ, ) (13) x + 1 i = i i,, T, PDBN F Z i q( 1: Κ, ) This novel metho re-samples the ynamic variables an mixes them with the existing samples of the static variables. This has proven to be an effective way of ealing with sample epletion for static variables. Note that the static variables can also be re-sample at a ifferent rate to improve performance. 4 Numerical Results In this section we present results from a set of experiments that test the efficacy an traeoffs of our moel an the classification algorithms. We esign a hybri ynamic networ as shown in Figure 3 for testing. In this example, there are two iscrete variables (SDA, SDB) each has eight an ten state values respectively, an the remaining variables are continuous. Assuming with conitional linear Gaussian (CLG) istributions, we have conitional Gaussian between continuous an iscrete noes, an linear Gaussian among continuous noes. The target noe of interest is SDB, an the observable eviences are COA, COB, COC, COD, COE an COF. Figure 3 shows the corresponing PD-HBN of two time slices where DA an DB are the ynamic noes an CFA is a ynamic feature that change over time. However, the static noes SDA an SDB as well as the static features CFB, CFC, CFD an CFE will remain unchange over time. This Bayes net can

be extene to a multi-slice PD-HBN easily by introucing aitional copies of the ynamic noes an the proper transitional relationship between them [7]. Figure 4. The pf Curves of the Observation Noes. Figure 3. A Two-step PD-HBN Example. To evaluate the propose KNN particle filter base inference algorithms, we conuct extensive simulations to examine the computation an performance traeoffs. The two ynamic noes, DA an DB, are moele as two sinusoi waveforms x = Asin ( 2πf ) at + φa + n an ( ) a y = Bsin 2πf bt + φb + nb, where the aitive noises are Gaussian with na Ν ( 100, 0.2) an nb Ν ( 30, 0.1), respectively. The parameters are given as A = 20, B = 10, f a = 0.6, an f b = 1. The sampling interval is assume to be 0.5. Assuming that the initial phases are uniformly istribute, i.e., φa U ( 0, 2π ) an φb U ( 0, 2π ), the probability ensity functions (pf) of the 6-D evience given 10 ifferent target types are shown in Figure 4. In all the subplots, each curve is a pf correspons to a particular target type. It is necessary to mention that these pf curves are corresponing to the Bayesian networ in the initial state, i.e., time slice t = 1. As can be seen from the figures, some of them can be approximate by a single Gaussian, but most of them require at least a multiple terms Gaussian mixture. Since the KNN-PF eeps the survive particles as the ynamic slices move forwar, it is not critical to have a precise covariance of the evience variables in equation (9). We may only nee to estimate the iagonal covariance to construct the matrix such that 2 σ 1 0 0 2 0 σ 2 0 Σ = (14) 2 0 0 σ Figure 5 shows the average probabilities of correct etection (Pc) base on 1,000 ranomly generate evience sets using the KNN-PF algorithm uner three ifferent scenarios. K is selecte as 20, an the number of particles is either 10 4 or 10 5. Aitionally, the static noes are partially re-sample an the parameter c=100, means 1% (of 10 4 ) of the static particles are re-sample. The figure shows that the two cases (with or without resampling static noes) are very close to each other after the fifth ynamic time step. As a by-prouct of the KNN-PF inference, the ynamic noes DA an DB may be estimate. The estimates are mae from the DA/DB values corresponing to the best particle at each ynamic step. With respective to the etection performance in Figure 5, we have plotte the mean square error (MSE) of the DA/DB estimations base on the best particle in the KNN-PF algorithm. As seen in Figure 6, clearly with more particles, the estimation performance is better. Figure 5. Probabilities of Correct Detection

Figure 7 shows the etection performance comparisons for traitional PF, LW an KNN-PF. In this experiment, the initial DA/DB values are assume to be nown. The K value is set to be 20 in the KNN-PF. Note that the traitional PF reaches its pea performance at aroun 87% ue to the epletion of samples. LW reaches about 89% using 10 5 particles. We observe that with 10 times more samples, LW is able to increase its performance up to 97%. Notice that with 10 5 particles, KNN-PF was able to outperform both PF an LW significantly. After the 15 th time slice, it even outperforms LW with 10 6 particles, an converges to almost 100% performance at the en of the simulation. Figure 6. DA/DB MSE via the KNN-PF Now, we compare the KNN-PF with other relate algorithms, such as the traitional PF an the lielihooweighting (LW) [15] algorithm. The traitional PF relies on importance sampling an, as a result, requires the esign of appropriate istributions that can approximate the a posterior istribution reasonably well. However, without special treatments, the epletion of samples is unavoiable particularly for the static variables. One of the main ifferences between KNN-PF an PF/LW is that KNN-PF simulates the "particles" all the way to the leaf/evience noes but PF/LW only simulates own to the parent noes of the observe evience. On the other han, between PF an LW, instea of re-sampling at every time slice base on the importance function learne so far as in PF, LW generate the samples globally base on the prior istribution of the entire N-step Bayes net an use accumulate evience lielihoos to weight each sample. Note that we use LW here merely for a bench mar purpose. It is not realistic to use this type of batch processing in practice 5 Conclusions We presente the KNN-PF algorithm, an efficient an effective new solution to the sequential inference problems for PD-HBN as well as for other non-linear, non-gaussian ynamic estimation problems. This algorithm partitions the particles into ynamic an static variables an ranomly combines the re-sample particles together. It allows the particles to incorporate the latest observations so that target variables are upate by the surviving top K fittest particles. It was shown that this algorithm outperforms stanar particle filters an global lielihoo weighting techniques. Furthermore, the KNN-PF mitigates the effects of sample epletion by eeping the particles of the static noe unchange. In summary, this novel metho allows us to exploit the architecture of any ynamic system, an has a potential to perform efficient an accurate inference on large real worl complex systems. References [1] K. G. Olesen, Causal Probabilistic Networs with both Discrete an Continuous Variables, IEEE Transactions on Pattern Analysis an Machine Intelligence, vol.15, pp.275-279, 1993. [2] B. D. Anerson an J. B. Moore, Optimal Filtering, Prentice-Hall, 1979. [3] D. Hecerman, A. Mamani, M. P. Wellman, Realworl applications of Bayesian networs, Comm. of the ACM, vol. 38, no. 3, pp. 24-68, March 1995. Figure 7. Performance Comparison of Different Algorithms. [4] A. V. Kozlov an D. Koller, Nonuniform Dynamic Discretization in Hybri Networs, Proceeings of the 13th Uncertainty in AI Conference, p.314-325, 1997.

[5] K. C. Chang an W. Sun, Comparing probabilistic inference for mixe Bayesian networs, Proc. SPIE conference, 2003. [17] M. Taiawa, B. D Ambrosio, an E. Wright, Real-Time Inference with Large-Scale Temporal Bayes Nets, Proceeings of the 18th UAI Conference, 2002. [6] M. K. Pitt, an N. Shephar, "Filtering via simulation: Auxiliary particle filters", Journal of the Americian Statistical Association, 94(446):590-599, 1999. [7] M. Arulampalam, S. Masell, N. Goron an T. Clapp, A Tutorial on Particle Filters for On-line Nonlinear/Non-gaussian Bayesian Tracing, IEEE Transactions on Signal Processing, No.50, pp.174-188 2002. [8] Arnau Doucet, Nano e Freitas, an Neil Goron, Sequential Monte Carlo Methos in Practice, Springer, 2001. [9] J. L. Zhang an J. S. Liua, A new sequential importance sampling metho an its application to the two-imensional hyrophobic hyrophilic moel, Journal of Chemical Physics, vol. 117, no. 7, pp.3492-3498, August 2002. [10] U. Kjaerulff, A Computational Scheme for Reasoning in Dynamic Probabilistic Networs, Proceeings of the 8th UAI Conference, 1992. [11] X. Boyen an D. Koller, Tractable Inference for Complex Stochastic Processes, Proceeings of the 14th UAI Conference, Seattle, 1998. [12] A. J. Bayes, A Minimum Variance Sampling Technique for Simulation Moels, Journal of the ACM archive, vol. 19, Issue 4, pp.734-741, October 1972. [13] N. Goron, D. Salmon, an A. F. Smith, Novel approach to nonlinear an non-gaussian Bayesian state estimation, IEE Proceeings-F, v.140, pp.107-113, 1993. [14] S. Hong, M. B. Miorag an P. M. Djuric, An Efficient Fixe-Point Implementation of Resiual Resampling Scheme for High-Spee Particle Filter, IEEE Signal Processing Letters, vol. 11, no. 5, pp.482-485 May 2004. [15] R. Fung an K. C. Chang, Weighing an integrating evience for stochastic simulation in Bayesian networs. In Uncertainty in Artificial Intelligence 5, pages 209-219, New Yor, N. Y., 1989. Elsevier Science Publishing Company, Inc. [16] U. N. Lerner an R. Parr, Inference in Hybri Networs: Theoretical Limits an Practical Algorithms, Proceeings of the 17th UAI Conference, Seattle, 2001.