BAYESIAN NETWORK REASONING WITH UNCERTAIN EVIDENCES

Similar documents
1 Definition of Rademacher Complexity

Excess Error, Approximation Error, and Estimation Error

XII.3 The EM (Expectation-Maximization) Algorithm

Modify Bayesian Network Structure with Inconsistent Constraints

System in Weibull Distribution

COS 511: Theoretical Machine Learning

Applied Mathematics Letters

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Three Algorithms for Flexible Flow-shop Scheduling

Computational and Statistical Learning theory Assignment 4

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

Least Squares Fitting of Data

Module 9. Lecture 6. Duality in Assignment Problems

1 Review From Last Time

On the number of regions in an m-dimensional space cut by n hyperplanes

Least Squares Fitting of Data

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

Structure and Drive Paul A. Jensen Copyright July 20, 2003

On Pfaff s solution of the Pfaff problem

On the Construction of Polar Codes

Collaborative Filtering Recommendation Algorithm

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

The Parity of the Number of Irreducible Factors for Some Pentanomials

An Optimal Bound for Sum of Square Roots of Special Type of Integers

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy.

NP-Completeness : Proofs

On the Construction of Polar Codes

ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING

Xiangwen Li. March 8th and March 13th, 2001

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Revision: December 13, E Main Suite D Pullman, WA (509) Voice and Fax

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

VQ widely used in coding speech, image, and video

Lecture Notes on Linear Regression

The Impact of the Earth s Movement through the Space on Measuring the Velocity of Light

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

On the Revision of Probabilistic Beliefs using Uncertain Evidence

Difference Equations

Week 5: Neural Networks

Quantum Particle Motion in Physical Space

arxiv: v2 [math.co] 3 Sep 2017

Scattering by a perfectly conducting infinite cylinder

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

Temperature. Chapter Heat Engine

Section 8.3 Polar Form of Complex Numbers

Fermi-Dirac statistics

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Finite Fields and Their Applications

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. ) with a symmetric Pcovariance matrix of the y( x ) measurements V

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Kernel Methods and SVMs Extension

PGM Learning Tasks and Metrics

Introducing Entropy Distributions

Lecture Space-Bounded Derandomization

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

Chapter 8 Indicator Variables

The Order Relation and Trace Inequalities for. Hermitian Operators

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Determination of the Confidence Level of PSD Estimation with Given D.O.F. Based on WELCH Algorithm

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Chapter 12 Lyes KADEM [Thermodynamics II] 2007

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

,..., k N. , k 2. ,..., k i. The derivative with respect to temperature T is calculated by using the chain rule: & ( (5) dj j dt = "J j. k i.

Small-Sample Equating With Prior Information

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Notes on Frequency Estimation in Data Streams

ASYMMETRIC TRAFFIC ASSIGNMENT WITH FLOW RESPONSIVE SIGNAL CONTROL IN AN URBAN NETWORK

Markov Chain Monte Carlo Lecture 6

Handling Overload (G. Buttazzo, Hard Real-Time Systems, Ch. 9) Causes for Overload

3.1 ML and Empirical Distribution

MMA and GCMMA two methods for nonlinear optimization

Maximizing the number of nonnegative subsets

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Chapter Newton s Method

Lecture 12: Discrete Laplacian

On Syndrome Decoding of Punctured Reed-Solomon and Gabidulin Codes 1

Lecture 19 of 42. MAP and MLE continued, Minimum Description Length (MDL)

LECTURE :FACTOR ANALYSIS

Robust Algorithms for Preemptive Scheduling

CHAPTER 10 ROTATIONAL MOTION

Uncertainty and auto-correlation in. Measurement

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

International Journal of Mathematical Archive-9(3), 2018, Available online through ISSN

Centroid Uncertainty Bounds for Interval Type-2 Fuzzy Sets: Forward and Inverse Problems

Slobodan Lakić. Communicated by R. Van Keer

Reliability estimation in Pareto-I distribution based on progressively type II censored sample with binomial removals

AGC Introduction

Generalized Linear Methods

COMP th April, 2007 Clement Pang

Canonical transformations

Chapter - 2. Distribution System Power Flow Analysis

The Geometry of Logit and Probit

A Robust Method for Calculating the Correlation Coefficient

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Assortment Optimization under MNL

Transcription:

Internatonal Journal of Uncertanty, Fuzzness and Knowledge-Based Systes Vol. 8, No. 5 (200) 539 564 World Scentfc Publshng Copany DOI: 0.42/S02848850006696 BAYESIAN NETWORK REASONING WITH UNCERTAIN EVIDENCES YUN PENG Unversty of Maryland Baltore County, Coputer Scence and Electrcal Engneerng, 000 Hlltop Crcle, Baltore, MD 2250, USA ypeng@ubc.edu SHENYONG ZHANG Unversty of Scence and Technology of Chna, Hefe, Anhu 230026, Chna Unversty of Maryland Baltore County, Coputer Scence and Electrcal Engneerng, 000 Hlltop Crcle, Baltore, MD 2250, USA RONG PAN NexTag Inc., San Mateo, CA 94402, USA Receved 3 October 2009 Revsed 2 June 200 Ths paper nvestgates the proble of belef update n Bayesan networks (BN) wth uncertan evdence. Two types of uncertan evdences are dentfed: vrtual evdence (reflectng the uncertanty one has about a reported observaton) and soft evdence (reflectng the uncertanty of an event one observes). Each of the two types of evdence has ts own characterstcs and obeys a belef update rule that s dfferent fro hard evdence, and dfferent fro each other. The partcular ephass s on belef update wth ultple uncertan evdences. Effcent algorths for BN reasonng wth consstent and nconsstent uncertan evdences are developed, and ther convergences analyzed. These algorths can be seen as cobnng the technques of tradtonal BN reasonng, Pearl s vrtual evdence ethod, Jeffrey s rule, and the teratve proportonal fttng procedure. Keywords: Bayesan networks; belef update; probablstc reasonng; uncertan evdence.. Introducton Ths paper consders the proble of probablstc reasonng wth uncertan evdences. A regular evdence, called hard evdence n the lterature, s an observaton of a rando varable, say X, havng a partcular value (or n a partcular state), say a, represented as an nstantaton X = a. However, t s not always possble to observe the value a varable s havng n a partcular case, or to have a coplete trust on a claed observaton, thus brngng uncertanty to the evdences. Ths paper focuses on two types of uncertan evdences. The frst type, called soft evdence as suggested by others, 9 can be nterpreted as evdence of uncertanty, and s represented as a probablty dstrbuton of one or ore varables. The second type, called vrtual evdence, can be nterpreted as evdence wth uncertanty, and s represented as a lkelhood rato. 6 These two types of evdences reflect dfferent knds of uncertanty and each obeys a belef update rule that s dfferent fro hard evdence, and dfferent fro each other. 539

540 Y. Peng, S.-Y. Zhang & R. Pan Based on an n-depth exanaton of these two types of uncertan evdences, we have developed effcent algorths for belef update n Bayesan networks (BN) wth such evdences. We focus on BN because of ts popularty n ntellgent systes and ts te and space effcency n representng and reasonng wth probablstc nforaton. 5 However, any theoretcal results we obtaned hold for belef update of ont dstrbutons that are not represented by BNs. Related exstng work can be found n Refs. 6, 9, 3, 20 and 2. Pearl was aong the frst to rase the ssue of uncertan evdence and proposed the vrtual evdence ethod. 6 However, as can be seen n Sec. 3, ths ethod s not drectly applcable to the stuaton n whch ultple soft evdences are presented. Chan and Darwche provded a thorough analyss that connects Pearl s vrtual evdence ethod and Jeffrey s rule for both general ont dstrbutons as well as BNs. 3 They also showed that a soft evdence can be converted nto a vrtual evdence, and as the result, belef update wth a sngle soft evdence can be carred out by Pearl s vrtual evdence ethod for both BN and ont dstrbutons. They argued that ultple uncertan evdences should not be allowed for belef update at the sae te. Volel, on the other hand, argued that ultple uncertan evdences, even f they are nconsstent wth each other, should be allowed, and developed an algorth, naed GEMA, for such purpose. 2 However, GEMA was devsed for general ont dstrbutons, not for BNs. Valtorta et al. proposed to extend the teratve proportonal fttng procedure (IPFP) for BN belef update wth ultple consstent soft evdences. 9 Our research extends these works n a nuber of sgnfcant ways. The results presented n ths paper can be suarzed as follows. () We forally establshed the equvalence of Jeffrey s rule, I-proecton (a central operaton of IPFP), and vrtual evdence ethod, when dealng wth a sngle uncertan evdence. We also establshed that Pearl s vrtual evdence ethod works for ultple vrtual evdences but not for ultple soft evdences. (2) We, for the frst te, proved that I-proecton and IPFP, whch s known to nze the I-dvergence (or Kullback-Lebler dstance), also nzes the total varaton between the source and the proected dstrbutons. (3) We developed BN- IPFP, an effcent algorth that cobnes Pearl s vrtual evdence ethod and IPFP for BN belef update wth ultple consstent soft evdences, and proved ts convergence. (4) We developed SMOOTH, an algorth for belef update wth nconsstent soft evdences and proved ts convergence for the case of two evdences. SMOOTH can be easly ncorporated nto BN-IPFP for BN update wth nconsstent evdences. The rest of the paper s organzed as follows. Secton 2 provdes techncal prelnares wth bref ntroductons to Jeffrey s rule, I-proecton, and IPFP. Secton 3 analyzes the two types of uncertan evdences. Secton 4 develops two versons of algorth BN-IPFP. Secton 5 dscusses ssues related to nconsstent evdences and develops algorth SMOOTH. Secton 6 concludes wth a dscusson on evdental reasonng n whch dfferent types (hard, vrtual, and soft) evdences are gven ether sequentally or at the sae te, followed by the drectons of future research.

BN Reasonng wth Uncertan Evdences 54 For presentatonal clarty, proofs of theores of our own (Theores 2, 4, 6, 7) are gven n the Appendx. We re-stated soe theores of others that are of edate relevancy to ths work, ther proofs are referred to ther orgnal publcatons. A nuber of coputer experents wth artfcal data were conducted to valdate our results and to copare the perforances wth dfferent ethods. All experents were run on an Intel Core 2 CPU of 2.40G Hz and 2.0G axu eory for the JVM (Java Vrtual Machne). Netca a Java API and ts uncton tree based nference engne were used for standard BN nference. 2. Prelnares Throughout ths paper, we use upper-case X = (X, X 2,, X n ) for the set of all rando varables of nterest and X for ndvdual rando varables; lower-case x and x denote partcular and arbtrary nstantaton(s) of the respectve varable(s); and bold upper-case 2 X, X denote the set of all possble nstantatons. Y, Y, Y, X are for subsets of X, and y and Y for ther nstantatons slarly. Upper-case P, Q, R, S, T are reserved for probablty dstrbuton; P(X) ndcates a ont dstrbuton; and Q( Y ) denotes the argnal dstrbuton of Q( X ) over a subset of varables Y. Bold upper case P, Q, R, S, T are reserved for sets of dstrbutons. In partcular, PR ( Y ) = { P( X ) P( Y ) = R( Y )} denotes the set of all dstrbutons over X whose argnals over Y X equal R(Y). 2.. Jeffrey s rule and I-proecton How to update a dstrbuton P(X) by another lower densonal dstrbuton R(Y), Y X, has been debated for a long te n the atheatcs and phlosophy countes. 2,6,3 One of the dffcultes stes fro the fact that the Bayes rule cannot drectly apply here because R(y), although actng as a condton for the update, tself s not an event. One approach proposed by R. Jeffrey 2 s based on two prncples: the new, posteror dstrbuton Q(X) should ) satsfy R(Y) (.e., Q(Y) = R(Y)) and 2) keep the condtonal dstrbuton of X, gveny X, unchanged (e.g., Q(X\Y Y) = P(X\Y Y)). The second prncple, known as probablty kneatcs, has the effect of keepng the change n the update nu. Then for a gven R(Y) and Z X \ Y, we can copute the probabltes P( z, y) Q( z) = Σ P( z y) R( y) = Σ R( y) y Y y Y P( y) where y Y ndcates the suaton s over all nstantatons of Y. Equaton () s known as Jeffrey's rule 2 or J-condtonng. Fro (), let Z X \ Y, then, for any y we have the updated dstrbuton () a Netca: Bayesan network tool fro Norsys Software Corp. http://www.norsys.co/

542 Y. Peng, S.-Y. Zhang & R. Pan R( y) P( x, y) P( x) f P( y) 0 Q( x) = R( y) = P( y) (2) P( y) 0 otherwse Two functons have been used wdely to easure the dstance or dfference between two dstrbutons over X. Ther defntons are gven below. Defnton. 20 The I-dvergence (also known as Kullback-Lebler dstance and relatve entropy) between P(X) and Q(X) s gven by P( x) P( x)log f P << Q I( P Q) = P( x) > 0 Q( x) + otherwse where P << Q, denotng P s donated by Q, f { x P( x) > 0} { x Q( x) > 0}. Note that I ( P Q) 0 for all P and Q, the equalty holds only f P = Q. Also note that n general I ( P Q) I ( Q P), so I-dvergence s not a true dstance etrc. (3) Defnton 2. The total varaton between P(X) and Q(X) s defned as δ ( P, Q) = P( x) Q( x) x X Now we defne I-proecton, one of the central concepts for our work. (4) Defnton 3. Q(x) s sad to be an I-proecton b of P(x) on a convex set of dstrbutons S f I( Q P) n I( Qɶ P) (5) = Qɶ S It has been shown that because of the convexty of Q I-proecton s unque. 6 We are partcularly nterested n I-proectons on P R( Y ), the set of dstrbutons whose argnals over Y equal R(Y). PR ( Y ) s known to be convex and the I-proecton of P(x) on PR ( Y ) can be calculated by 20 R( y) P( x) f P( y) 0 Q( x) = P( y) (6) 0 otherwse Note that (6) s exactly the sae as (2). Ths proves the followng theore. Theore. Let Q(X) be the dstrbuton resulted fro updatng P(X) by R( Y ), Y X usng Jeffrey s rule of (2). Then Q(X) s the I-proecton of P(X) on P R( Y ). b I-proecton defned here s also called I-proecton n the lterature. Snce I-dvergence s not syetrc, another proecton, naely, I 2-proecton Q on Q s defned that nzes the I-dvergence I ( P Q '). Unlke I -proecton, I 2-proecton n general s not unque. In ths paper, all I-proectons refer to I -proectons.

BN Reasonng wth Uncertan Evdences 543 Next we show that I-proecton by (6) not only nzes the I-dvergence, but also the total varaton. Theore 2. Let Q(X) be the I-proecton of P(x) on P R( Y ). Then δ ( P, Q) = n δ ( P, Qɶ ). Qɶ P R ( y) 2.2. IPFP For a sngle constrant R(Y), the I-proecton of P(X) on PR ( Y ) fnds a dstrbuton that satsfes ths constrant and s closest to P(X) (n ters of I-dvergence), provded R( Y ) << P( Y ). Iteratve proportonal fttng procedure (IPFP) extends ths dea to odfy P(X) wth ultple constrants by contnuously proectng the dstrbuton resulted fro the prevous teraton to P ( of the next constrant R( Y ). Ths procedure s forally R Y ) defned as follows. Defnton 4. 20 Let R = ( R( Y ), R( Y )) be a set of constrants and Q ( ) 0 X the ntal dstrbuton. Then for k =, 2,, = + ( k ) od, and R( Y ) << Qk ( Y ) for all k,, IPFP s defned by ( R y ) ( ) ( Qk x f Qk y ) 0 Q ( ) ( > k x = Qk y ) (7) 0 otherwse In (7), s the nuber of constrants, k s the teraton ndex, and deternes the constrant used at step k. For clarty, n the rest of ths paper, we wrte (7) as R( y ) Qk ( x) = Qk ( x) (7-) Q ( y ) wth the understandng that Qk ( x ) = 0 when Q ( ) 0 k y =. IPFP frst appeared n the lterature n Ref. 3, and shortly after was used as a procedure to estate cell frequences n contngency tables under soe argnal constrants. 8 IPFP was extended n Refs. and 5 to also allow condtonal dstrbutons as constrants (condtonal or C-IPFP). The convergence of IPFP was studed n Refs. 7, 0, and 7 wth proofs under dfferent condtons, the convergence of C-IPFP can be found n Ref. 5. For our purpose, we cte a result fro Ref. 20 n the theore below, whch s based on the I-dvergence geoetry developed n Ref. 7. Theore 3. Let R = ( R( Y ), R( Y )) be a set of constrants. If S = = P, then R( Y ) IPFP of (7) converges and the convergng dstrbuton Q * ( X ) s the I-proecton of Q ( X ) 0 on S. If S, these constrants are sad to be consstent wth each other, and each dstrbuton n S satsfes all constrants n R. Therefore, at convergence, Q * ( X ), as the k

544 Y. Peng, S.-Y. Zhang & R. Pan I-proecton on S, has the nu I-dvergence aong those that satsfy all constrants n R. Next we show that IPFP also nzes the total varaton n the next two theores. Theore 4. Consder an ntal dstrbuton Q ( X ) 0 and a set of consstent constrants R = ( R( Y ), R( Y )). Let Q * ( X ) be the convergng dstrbuton when applyng IPFP on Q ( X ) 2 0 usng constrants n R, let Y = Y Y Y and Q * ( Y ) be the convergng dstrbuton when applyng IPFP on Q ( ) 0 Y usng constrants n R. Then Q ( y) ( ) = ( ). (8) ( ) * Q* x Q0 x Q 0 y Coparng Theore 4 and (7-), IPFP on Q ( ) 0 X wth constrants s equvalent to odfyng Q ( X ) by a sngle constrant * 0 Q ( Y ). That s, Q * ( X ) s the I-proecton of Q ( X ) on 0 P. Ths, together wth Theore 2, leads to the followng theore. Q * ( Y ) Theore 5. Let Q * ( X ) be the convergng dstrbuton usng IPFP wth an ntal dstrbuton Q ( X ) and a set of constrants 0 R = ( R( Y ), R( Y )) wth * S = P. Then δ ( P, Q ) n δ ( P, Qɶ ). = R( Y ) = Qɶ S To the best of our knowledge, Theores 2, 4, and 5 are orgnal results whch have not been reported n the lterature before. IPFP bears a great reseblance wth another faly of procedures known as alternatng proecton, whch fnds a pont n the ntersecton of several convex sets by a sequence of proectons onto these sets. Alternatng proecton has been wdely used as an optzaton ethod n areas of saplng theory, sgnal processng, and neural networks. A coprehensve revew of ths ethod can be found n Ref. 4. The dfference fro IPFP s that alternatng proecton s prarly for Eucldean spaces and t tends to nze the square dstances whle IPFP s for probablty spaces and t nzes I-dvergence (and the total varaton by our result n Theore 5) but not the square dstances. 0 Several IPFP-based algorths we wll dscuss, especally those for nconsstent evdences, can fnd ther counterparts n alternatng proecton procedures. 3. Uncertan Evdences Evdences presented for belef update ay be uncertan for varous reasons. A reported observaton ay not be totally trusted due to errors or nose n the observaton or reportng process; t ay be based due to the observer s preference; t ay not hold when the te or locaton s dfferent. Aong all types of uncertan evdences, ths paper concentrates on two of the, naed vrtual evdence and soft evdence. 3.. Vrtual evdences Pearl 6 proposed the vrtual evdence ethod to deal wth BN belef update when one s uncertan about a cla of a hard evdence (.e., an event), say, X = a. Suppose we beleve wth probablty p that ths cla s actually due to the occurrence of X = a, then the probablty t s not occurrng s p. The vrtual evdence ethod requres ths uncertanty nforaton be gven as a lkelhood rato L( X ) = p : ( p), not necessarly

BN Reasonng wth Uncertan Evdences 545 the specfc probabltes. To reason wth vrtual evdence n a BN, Pearl s ethod extends the gven BN by creatng a bnary vrtual node, U wth state u standng for the event that X = a s claed to have occurred. The vrtual node U has X as ts only parent and ts condtonal probablty table (CPT) satsfes P( u X = a) : P( u X a) = L( X ). Then the belef update (wth the claed observaton and the uncertanty about ths cla n the for of the lkelhood rato L) can be done by nstantatng U to u (.e., treatng u as a hard evdence). Many BN engnes accept a lkelhood rato as nput for the update wthout explctly ntroducng the vrtual node. Ths ethod s generalzed n Ref. 3 to any arbtrary set of utually exclusve and exhaustve events and the assocated lkelhood rato, and fro BN to any ont dstrbutons. Under ths generalzaton, vrtual evdence on Y X s represented as a lkelhood rato L( Y ) = P( ob( y ) y ) : P( ob( y ) y ) : : P( ob( y ) y ), () () (2) (2) ( s) ( s) where y(), y(2),..., y( s) Y are all nstantatons of Y, ob( y ) ( ) denotes the event that we observed Y = y s True, and ( ) P( ob( y ) y ) s nterpreted as the probablty we ( ) ( ) observe Y = y( ) f Y s ndeed n state y ( ). 3.2. Soft evdences Soft evdence, naed by Valtorta, 9 s gven as a dstrbuton R( Y ), Y X. Ths knd of evdence can be seen n any places. For exaple, one ay not be able to observe the precse state of a varable for a gven case but ay know ts dstrbuton. Also soetes t s ore portant to know the dstrbuton of a varable than ts precse state at a gven oent. When two BNs (or soe other data and knowledge sources of probablstc or statstcal nature) nteract wth each other, the nforaton exchanged between the s often n the for of probablty dstrbutons of shared varables. For a gven soft evdence, say R( X ), even though we are uncertan about the specfc state X s n, we are certan about ts dstrbuton. In other words, R( X ) s a true (and certan) observaton, and ths dstrbuton should be preserved n the updated ont dstrbuton Q* (.e., Q*( X ) = R( X ) ). In ths sense, soft evdences should be treated the sae as hard evdence. In fact, a hard evdence, say X = a, s a specal case of soft evdence ( R( X = a) =, R( X = b) = 0 for all states b a ). As suggested n Ref. 3, Jeffrey s rule of (2) s a natural choce for updatng a ont dstrbuton P(X) by a soft evdence R(Y X) because the updated dstrbuton preserves R(Y) whle akng nu changes to the orgnal dstrbuton. However, Jeffrey s rule cannot drectly apply when the ont dstrbuton s represented as a BN. Ths can be overcoe by convertng a soft evdence to a vrtual evdence, as suggested by Ref. 3. Consder a dstrbuton P(X) and a soft evdence R( Y ), Y X. All possble nstantatons of Y, y, () y(2),, y( l ) Y, for a utually exclusve and exhaustve set of events. R(Y) then can be converted to a vrtual evdence wth the lkelhood rato R( y ) R( y ) R( y ) L( y) : : : P( y ) P( y ) P( y ) () (2) ( l ) = (9) () (2) ( l )

546 Y. Peng, S.-Y. Zhang & R. Pan As shown n Theore 5 of Ref. 3, when ths vrtual evdence s appled to P(X), the new dstrbuton s exactly the sae as the one obtaned by applyng R(Y) usng the Jeffrey s rule of (2). 3.3. Multple uncertan evdences Lke hard evdences, ultple uncertan evdences can arrve at the sae te or n a sequence. There s no proble for belef update by ultple vrtual evdences, because what s requred s that the updated dstrbuton preserves the gven lkelhoods. Update can be done by sply treatng each vrtual evdence as a hard evdence on the vrtual node and nstantatng that node to true. Note that, snce a vrtual node U s ndependent of all other vrtual nodes, gven the parent of U (.e., they are d-separate), the lkelhood rato reflected on U wll not be affected by the belef update operatons wth other vrtual (and hard) evdences. However, ths s not the case when updatng by two soft evdences se = R( Y ) and 2 se2 = R( Y ). To satsfy both se and se2, the updated dstrbuton Q s requred to have 2 2 ts argnals Q( Y ) = R( Y ) and Q( Y ) = R( Y ). Update cannot be done by frst convertng se and se2 to two vrtual evdences and then applyng the vrtual evdence ethod wth these two vrtual evdences. Ths s because, after applyng the frst evdence, there s no way to hold Q( Y ) = R( Y ) when the second evdence s appled. Furtherore, as can be seen n the exaple below, when the soft evdences are presented n dfferent orders or altogether, dfferent update results wll be generated. Ths proble, known as the coutatvty of terated revsons, has been vewed as a proble for Jeffrey s rule. 3,22 Exaple. As depcted n Fg., we are gven a BN of four bnary varables A, B, C, and D and two soft evdences se: R(B) = (0.7, 0.3) and se2: R(C) = (0.3, 0.7). To convert the to vrtual evdences, we frst copute fro the BN the argnals P(B) = (0.44, 0.56) and P(C) = (0.45, 0.55), then copute the lkelhood ratos by (9) as L(B) = 0.7/0.44:0.3/0.56 =.5909:0.5357 and slarly L(C) = 0.6667:.2727. A 0 0.4 0.6 A B 0 0.20 0.80 0 0.60 0.40 A C 0 0.60 0.40 0 0.35 0.65 B C D 0 0.0 0.90 0 0.85 0.5 0 0.45 0.55 0 0 0.70 0.30 Fg.. An exaple BN of 4 varables.

BN Reasonng wth Uncertan Evdences 547 As can be seen n rows 2 and 3 of Table below, when the two vrtual evdences are appled separately, the updated belefs satsfy the correspondng se and se2 (belef on B = and C = are updated to 0.7 and 0.3, respectvely). Rows 4 and 5 show the update results when these two vrtual evdences appled together and n a sequence, respectvely. It s not surprse that the results are the sae, snce, as entoned earler, belef update wth ultple vrtual evdences are equvalent to belef update wth ultple hard evdences of the vrtual evdence nodes. Let U and U2 be the two vrtual evdence nodes. It can be verfed that P(u B =, u2):p(u B = 0, u2) = L(B) and P(u2 C =, u): P(u2 C=0, u) = L(C),.e., the lkelhood ratos are preserved when the other evdence s presented. However, as can be seen n Rows 4 and 5, none of these two soft evdences s satsfed by the resultng dstrbutons. To deal wth ths proble, one ay suggest that, before applyng se2, we frst recalculate a new lkelhood rato L (C) for se2 usng the dstrbuton updated by se (Row 2). By (9), we have L (C) = 0.3/0.425:0.7/0.575 = 0.7.59:.274. Row 6 shows the update result where se2 s satsfed but belef on B = s oved away fro what s requred by se (fro 0.700 to 0.70). Table. Belef update on BN of Exaple. Evdences Belef on B = Belef on C =. orgnal 0.440 0.450 2. usng L(B) 0.700 0.425 3. usng L(C) 0.455 0.300 4. L(B) and L(C) 0.72 0.279 5. L(B) then L(C) 0.72 0.279 6. L(B) then L (C) 0.70 0.300 Soe argued based on the All thngs consdered nterpretaton of soft evdence, that belef update wth such evdences should not be coutatve. 3 In contrast, we argue that soft evdences are true observatons of dstrbutons of soe events, and as such, they all should be preserved n the updated posteror dstrbuton; also that, f one or ore such dstrbutons exst, the one wth the nu I-dvergence to the orgnal dstrbuton can be found by IPFP, usng these evdences as constrants. However, IPFP works on full ont dstrbutons, and thus s not drectly applcable to belef update n BNs. In the next secton, we develop algorth BN-IPFP for BN belef update wth ultple soft evdences. Ths algorth frst converts all soft evdences to vrtual evdence for and then terates n IPFP style to update the BN untl t settles down to a dstrbuton that satsfes all gven soft evdences. Another ssue that arses wth ultple soft evdences s that these evdences ay not be consstent wth each other,.e., there s no dstrbuton that satsfes all gven evdences. Ths proble s dealt wth n Sec. 5. 4. BN-IPFP The proble s stated as follows. We are gven a BN on varables X = (X, X 2,,, X n ) wth the ont probablty P( X ) = Π P( X π ), where P( X π ) s the CPT for X X

548 Y. Peng, S.-Y. Zhang & R. Pan varable X, and a set of soft evdences R = ( R( Y ), R( Y )) where Y, Y2,, Y X. Suppose the constrants n R ) are consstent, and 2) satsfy the donance condton: for all ( R Y ) R, P( Y ) << R( Y ). Then the belef update of the gven BN by R s to fnd Q*(X) whch ) satsfes all evdence n R; and 2) has nu I-dvergence to P(X). For sall BNs, one can explctly generate the full ont dstrbuton P(X) fro the gven BN and then apply IPFP usng the soft evdences n R as constrants to update the dstrbuton. Ths, however, s nfeasble for large BN, because the dstrbuton would be prohbtvely large and IPFP would be coputatonally extreely expenses as t needs to lterally odfy each entry of the ont dstrbuton table n each teraton. To address ths proble, Valtorta, K and Volel have devsed a varaton of Juncton-Tree (JT) algorth based on IPFP 9 that utlzes the nterdependences captured n the BN structure. One verson of ths algorth works n stuaton where all varables n each Y are contaned n one clque C n the JT. Then the belef update goes teratvely over the evdences n cycle: n each teraton, Q( C ) s updated by the correspondng R( Y ) and then the change of Q( C ) s propagated to the rest of the JT by the regular JT ethod. The general stuaton where a soft evdence ay nvolve varables n ore than one clques s dealt wth by another verson called bg clque algorth. In ths algorth, when constructng the JT, all soft evdence nodes (.e., those varables that are nvolved n any of the soft evdences) are fully connected wth each other by addtonal undrected edges. After trangulaton, all soft evdence nodes appear n a sngle clque (the Bg Clque). The belef update s done by frst updatng the bg clque usng all evdences n R by runnng IPFP to convergence and then propagatng the resultng dstrbuton of ths clque to the rest of the JT. The Bg Clque algorth becoes te and space neffcent when the sze of the bg clque tself becoes large. Both versons are shown to converge and the convergng ont dstrbuton satsfes all evdences n R, provded these constrants are consstent to each other. One ltaton wth these JT based belef update algorths s that they cannot be easly adopted by those usng nference echanss other than JT. Also, they requre ncorporatng IPFP operatons nto the JT procedure, causng re-codng of the exstng JT nference engne. The authors of Ref. 9 entoned the possblty of pleentng the frst verson of ther algorth as a wrapper around Hugn shell or other JT engnes, but no suggeston of how ths can be done was gven. To address these ssues, we propose two new algorths for BN nference wth ultple soft evdences. Both algorths utlze IPFP, although n qute dfferent ways. The frst algorth cobnes the dea of IPFP and the encodng of soft evdence by vrtual evdence of (9). The second algorth s based on Theore 4, t s slar to the Bg Clque algorth but t decouples the IPFP fro JT (or any specfc BN nference engne). These two algorths are presented n the next two subsectons. 4.. BN-IPFP- As shown earler, although a sngle soft evdence can be appled to BN belef update by frst convertng t to a vrtual evdence, ths approach does not work wth ultple

BN Reasonng wth Uncertan Evdences 549 evdences. As can be seen n Exaple at the end of last secton, after updatng by se 2, the dstrbuton no long satsfes se. What s needed s a ethod that can convert soft evdences n R to one or ore lkelhood ratos whch, when appled as vrtual evdences to the BN, preserve argnal dstrbutons specfed n every soft evdence. Algorth BN-IPFP- presented below accoplshes ths by cobnng the dea of IPFP and the vrtual evdence ethod. Roughly speakng, ths algorth goes as follows. Lke the IPFP, t s an teratve process, startng wth Q 0 (X) = P(X), and one soft evdence R( Y ) s consdered at each teraton. If the argnal Q ( ) k Y of the current dstrbuton equals R( Y ), then t does nothng; otherwse, a new vrtual evdence (n the for of a lkelhood rato) s created based on the current Q ( ) k Y and R( Y ) accordng to (9) and appled to odfy Q ( ) Y. The algorth s gven below. k Algorth BN-IPFP-. Consder a BN wth pror dstrbuton P(x), and a set of consstent soft evdences R = ( R( Y ), R( Y )). We use the followng teratve procedure for belef update:. Q 0 (X) = P(X); k = ; 2. Repeat the followng untl convergence; 2. = + ( k ) od ; l = + ( k ) / ; 2.2 construct vrtual evdence wth lkelhood rato ( ( ()) ( (2)) ( )) R y R y R y s L, l ( Y ) = : :... : Q ( y ) Q ( y ) Q ( y ) where () (2) ( s ) k () k (2) k ( s ) y, y,..., y Y are state confguratons of Y ; 2.3 Obtan Q k (X) by updatng Q k- (X) wth L, ( Y ) usng Pearl s vrtual evdence ethod; 2.4 k = k + ; l The core of ths algorth s Step 2.2, whch adds a new vrtual evdence wth lkelhood rato L ( ), y where the second subscrpt, l, s the nuber of vrtual evdences l created for R( y ), ncreented for every teratons. Note that the sequence of lkelhood ratos for each ( R Y ) can be cuulated as a sngle one L( Y ) = Π ll, l( Y ). 4.2. BN-IPFP-2 BN-IPFP- ay becoe expensve when the gven BN s large because t coputes the argnal Q ( ) k Y (Step 2.2) and updates the belefs of the entre BN (Step 2.3) n each teraton. Algorth BN-IPFP-2 avods repeated BN coputaton by frst constructng a sngle vrtual evdence node fro the argnal of P(Y), where Y contans all varables n all of the gven soft evdences, and then updatng the BN by ths vrtual evdence.

550 Y. Peng, S.-Y. Zhang & R. Pan Algorth BN-IPFP-2. Consder a BN wth pror dstrbuton P(X), and a set of 2 consstent soft evdences R = ( R( Y ), R( Y )). Let Y = Y Y Y. We use the followng procedure for belef update:. Use any BN nference ethod to obtan P(Y) fro P(X). 2. Update P(Y) by IPFP usng R = ( R( Y ), R( Y )) as constrants untl convergng to Q*(Y). 3. Construct a vrtual evdence wth lkelhood rato L(Y) coputed fro Q*(Y) and P(y) by (9). 4. Apply L(Y) as a sngle vrtual evdence to update P(X). The convergence and correctness of both BN-IPFP algorths are establshed n Theore 6 below. Theore 6. If soft evdences n R = ( R( Y ), R( Y )) are consstent wth each other and P( Y ) << R( Y ) for all R( Y ) R, then both algorths BN-IPFP- and BN-IPFP-2 converge to the sae dstrbuton, whch s the I-proecton of P(X) on S = = P. R( Y ) (a) BN-IPFP- (b) BN-IPFP-2 Fg. 2. Runnng results of Exaple wth BN-IPFP- and 2. Fgure 2 shows the runnng results of BN-IPFP- and 2 for the exaple BN gven n Fg.. The two vrtual evdence nodes VE0 and VE n Fg. 2 (a) are generated by BN- IPFP- for the two soft evdences R(B) and R(C); the vrtual evdence VE In Fg. 2 (b) s created fro R(B) and R(C) accordng to BN-IPFP-2. Both algorths converge n 4 teratons to the sae dstrbuton that satsfes both constrants R(B) and R(C).The fnal cobned lkelhood ratos at convergence are L*(B) = (.0:0.354) and L*(C) = (0.578:.0) for BN-IPFP- and L*(B, C) = (0.578:.0: 0.205:0.354) for BN-IPFP-2. 4.3. Te and space perforance The teratons of BN-IPFP-, BN-IPFP-2 and Bg Clque algorth all converge to the sae dstrbuton. At each teraton, Bg Clque algorth updates belefs of the ont

BN Reasonng wth Uncertan Evdences 55 probabltes of the bg clque C, BN-IPFP-2 updates the ont dstrbuton of Y, and BN- IPFP- updates the belef of the whole BN,.e., all varables n X. Clearly, Y C X. However, the te coplexty for one teraton of Bg Clque s C O (2 ), and Y O (2 ) for IPFP because both requre odfyng a ont dstrbuton table. On the other hand, the te coplexty of BN-IPFP- s equal to the coplexty of the BN nference algorth t uses for belef update, whch s often ore effcent than odfyng the ont dstrbuton. For exaple, f we use JT, the te coplexty for one teraton of BN- IPFP- s exponental to the sze of the largest clque n JT of the orgnal BN, whch ay be saller than C and Y, especally for sparse BNs. Both Bg Clque and BN-IPFP-2 are space neffcent, they need exponental space for the ont potental of C, and the ont dstrbuton of Y, respectvely. In contrast, BN- IPFP- only needs addtonal space for vrtual evdence, whch s Y O ( Σ = 2 ). BN-IPFP- 2 s thus ore sutable for probles wth a large BN but a sall nuber of soft evdence varables and BN-IPFP- s ore effcent when the nuber of soft evdence varables s large. Also, both BN-IPFP- and 2 have the advantage that users do not have to stck to uncton tree and odfy the JT related procedures n the nference engne. They can be easly pleented as wrappers on any BN nference engnes. To eprcally evaluate our algorths and to get a sense of how expensve these two algorths ay be, we have conducted soe experents wth artfcally constructed BNs of dfferent szes and wth dfferent constrant sets. The reported eory consupton does not nclude those that was used by the JT-based nference engne of Netca, but the reported runnng te s the total runnng te. Experent 4- copares the algorths perforance wth varyng nuber of soft evdences. It used a BN of 5 varables and three sets of 2, 4, 8 soft evdences each. One half of these evdences nvolved 2 varables, and the other half nvolved varable. The experent results are gven n Table 2. It can be seen that, when the nuber of evdences ncreases, both the te and eory consuptons for BN-IPFP- ncrease at uch slower rates than BN-IPFP-2. Table 2. Experent 4-: perforance wth dfferent nubers of soft evdences. # of # Iteratons Exec. Te Meory evdences BN-IPFP- BN-IPFP-2 BN-IPFP- BN-IPFP-2 BN-IPFP- BN-IPFP-2 2 24 4 0.57s 0.62s 590,736 468,532 4 79 23 0.63s 0.83s 726,896 696,960 8 95 7 0.7s 5.34s 926,896 2,544,536 Experent 4-2 copares the algorths perforances wth dfferent sze of BN. Four BNs of 30, 60, 20, and 240 bnary varables were used, each of whch was updated by the sae set of 4 soft evdences nvolvng a total of 6 varables. For each algorth, experental runs for the four BNs were all converged after the sae nuber of teratons (43 for BN-IPFP- and 4 for BN-IPFP-2).

552 Y. Peng, S.-Y. Zhang & R. Pan Table 3. Experent 4-2: perforance of BN wth dfferent sze. Sze of # of Iteratons Exec. Te Meory BN BN-IPFP- BN-IPFP-2 BN-IPFP- BN-IPFP-2 BN-IPFP- BN-IPFP-2 30 0.58s 0.67s (0.64s) 72,848 69,042 60 0.7s 0.69s (0.66s) 723,944 69,424 43 4 20.7s 0.72s (0.66s) 726,904 69,46 240 03.s 3.3s (0.72s) 726,800 696,842 Fro Table 3 we can see that when the nuber of soft evdences s fxed, the runnng te of BN-IPFP-2 ncreases slowly wth the ncrease of the network sze. Especally, the te for IPFP on P(Y) (the te n parentheses) ncreases only slghtly. Ths s because coputng the sngle constrant Q*(Y) (Step 2) s the ost expensve step n BN-IPFP-2 and Y s fxed. On the other hand, the executon te for BN-IPFP- ncreases at a uch faster pace (roughly exponentally). Ths s because each teraton requres updatng the entre BN. These experents results confr our theoretcal analyss for the proposed algorths. 5. Inconsstent Soft Evdences A set of soft evdences or constrants R = ( R( Y ), R( Y )) s sad to be nconsstent f S = P. Snce there does not exst a dstrbuton that satsfes all constrants n = R( Y ) R, IPFP or ethods based on IPFP such as those we developed n the prevous secton wll not converge. Instead, the update wll go nto cycles around several dstrbutons, 2 and the specfc dstrbutons t cycles around ay be dfferent, dependng on the order the constrants are presented. 4 Several approaches to ths proble based on IPFP have been suggested n the lterature. A sple approach s to frst run IPFP untl t goes nto a * * * cycle of Q ( X ), Q ( X ),, Q ( X ), each of whch satsfes one of the gven constrants, 2 * * and then take the average of these dstrbutons Q ( X ) = Σ = Q ( X ) / as the soluton. Several dsadvantages can be seen for ths sple approach. The result ay be dfferent when these constrants are presented n dfferent orders; there s not uch we can say about * * * * Q ( x) except that t s soewhere n the ddle of Q ( X ), Q2 ( X ),, Q( X ). Moreover, ths approach s hard to apply to BN because t operates on full ont dstrbutons. Another approach odfes the IPFP of (7-) as follows 20 : R( y ) Qk ( x) = ( αk ) Qk ( x) + αkqk ( x) (0) Q ( y ) where 0 < α k <. Ths approach wll be referred to as SR-IPFP, as t can be seen to be analogous to the seral relaxaton ethod of alternatng proecton that can be used to fnd an approxate soluton when the soluton set S s epty (see Eq. (38) of Ref. 4). Ths ethod converges wth constant αk = α when R s consstent; t converges when R s nconsstent f α k gradually decreases toward 0. To allow each constrant to take ts effect, α needs to start wth a value very close to and to decrease very slowly. k k

BN Reasonng wth Uncertan Evdences 553 However, f the decreasng rate s too sall, the convergence wll take too any teratons; on the other hand, f the rate s too bg, the process wll be based n favor of earler constrants. A ore prncpled ethod was proposed n Ref. 2, naed GEMA (Generalzed EM Algorth). GEMA assgns a weght w to each constrant ( R Y ) R, Σ = w =, whch can be understood as the credblty one has for the evdence. The update s agan an teratve process, and t takes two steps n each teraton. Take as an exaple, consder the k th teraton that starts wth Q ( ) k X. In Step, t frst uses (7-) to copute I- proectons of Q ( ) k X to P ( for each R Y R( Y ), denoted Qɶ ) k, ( X ), and then takes a weghted su of these k I-proectons to obtan a dstrbuton Qɶ k ( X ) = Σ w Q ɶ = k, ( X ). In Step 2, GEMA frst coputes argnals Rɶ ( Y ) = Qɶ ( ) k Y, then perfors steps of the standard IPFP on Q ( ) k X usng these Rɶ ( Y ) as constrants to obtan Qk ( X ). Note that these new constrants are consstent wth each other snce they are argnals fro the sae dstrbuton Q ɶ ( ) k X. It has been shown that GEMA converges to a dstrbuton whch has a nu I-aggregate Ψ, the weghted su of I-dvergences to all of the orgnal constrants n R: Ψ ( ( ) ( ),, ( ) K Q X R Y R Y = w ( ( ) ( )) I R Y Q y. () = GEMA can be seen as analogous to a parallel ethod of alternatng proecton that can be used to fnd an approxate soluton when the soluton set s epty (see Eq. (35) of Ref. 4). Our experents (see Subsecton 5.3) show that the te perforance of GEMA s very senstve to the data. For soe cobnatons of Q ( X ) = P( X ) 0 and R, t converges wthn a few hundreds of teratons, but for other cobnatons of slar sze, llons of teratons are needed. 5.. Algorth SMOOTH One thng n coon for both GEMA and SR-IPFP of (0) s that both of the only odfy the ont dstrbuton Q ( ) k X whle keepng the constrants unchanged through the teratons. Alternatvely, one can ake the odfcaton b-drectonal: at each teraton, not only the ont dstrbutons are pulled closer to the constrants but also the constrants are pulled towards the ont dstrbutons. By dong so, the nconsstency aong the constrants s gradually reduced or soothened, whch ay lead to a faster convergence. Based on ths dea we developed our new ethod SMOOTH. The procedure of SMOOTH conssts of two phases. Phase perfors the standard IPFP usng all of the orgnal constrants n R. It stops when the process converges (for consstent constrants) or starts to cycle (for nconsstent constrants). Phase 2, executed only when cycle s detected at the end of Phase, dffers fro Phase n that at each teraton, not only the current dstrbuton Q ( ) k X s odfed by the chosen constrant R( Y ), R( Y ) tself s also odfed by Q ( ) k X. Specfcally, we denote the odfed constrants as R ( l Y ), wth ( ) ( R ) 0 Y = R Y and l = + ( k ) /. At teraton k, frst the constrant s odfed by R ( Y ) = αr ( Y ) + ( α) Q ( Y ) (2) l l k

554 Y. Peng, S.-Y. Zhang & R. Pan where α (0,) s the sooth factor and t controls the speed of soothng. Fro (2) we can see that the odfed constrant R ( Y ) s a xture of the prevous constrant l R ( ) l Y and the argnal of the current dstrbuton Q ( ) k X. Snce Q ( ) k X has been odfed by all other constrants, (2) has the effect of pullng P R ( l Y ) closer to P, Rl ( Y ), thus reducng or soothng the nconsstency aong the constrants. To ensure that the soothng s unbased α should be chosen as very close to. Then Qk s odfed by the new constrant by By (2), (3) can be rewrtten as R ( y ) Q x = Q x (3) Q ( y ) l k ( ) k ( ) k R ( y ) Q x Q x Q ( y ) l k ( ) = k ( ) k = Q k α R ( y ) + ( α ) Q ( y ) ( x) l k Qk ( y ) R ( y ) = αq ( x) + ( ) Q ( x) (3-) k l α Qk ( y ) k Equaton (3-) s very slar to (0) for SR-IPFP. The dfferent s that (0) always uses the orgnal constrants whle n (3-) a changed constrant s used at each teraton. It s ths dfference that akes SMOOTH converges wth constantα when the constrants are nconsstent. The algorth SMOOTH s gven below. Algorth SMOOTH. Consder an ntal dstrbuton P(x) and a set of soft evdences R = ( R( Y ), R( Y )). SMOOTH conssts of the followng two phases: Phase : do the standard IPFP usng all constrants n R untl t converges or goes nto cycles; f convergence s reached then ext; Phase 2:. for = to, ( ) ( R ) 0 y = R y ; 2. k = ; 3. repeat the followng untl convergng 3. = + ( k ) od ; l = + ( k ) / ; 3.2 Rl ( y ) = αrl ( y ) + ( α) Qk ( y ) ; Rl ( y ) 3.3 Qk ( x) = Qk ( x) ; Q k ( y ) 3.4 k = k + ;

BN Reasonng wth Uncertan Evdences 555 Note that SMOOTH s exactly the sae as the standard IPFP except that n Phase 2 t uses odfed constrants, not the orgnal one to update the current Q k. Ths akes SMOOTH drectly applcable to BN belef update n BN-IPFP style. The only thng that needs to be changed when applyng SMOOTH to BN s to replace the operaton of I- proecton (Step 3.3 n Phase 2) by vrtual evdence ethod of BN-IPFP- (Steps 2.2 and 2.3) of Sec. 4. Next we nvestgate the convergence of SMOOTH. 5.2. Convergence and perforance of SMOOTH Accordng to the algorth, when the set of constrants s consstent, SMOOTH s reduced to the standard IPFP, and t converges n Phase. Next we dscuss what happens when constrants are not consstent. Fgure 3 shows an exaple nvolvng four constrants ( = 4) where S ( X ) = P ( X ) s the set of all dstrbutons whose argnal on Y equals R( Y ). At R( Y ) the end of Phase, a cycle (sold lnes) s fored through Q0,, Q0,2, Q0,3, Q 0,4. In the frst teraton of Phase 2, constrant R ( Y ) = R( Y ) s odfed to 0 R ( ) Y by (2). Ths changes S ( X ) to 0, S,( X ), whch s closer to Q 0,4 than S ( X ) 0,. As the process contnues, S, l ( X ) are ovng closer to each other, and the cycles (dotted lnes) fored by the resultng dstrbutons becoe saller and saller untl they erge nto a sngle dstrbuton. Q 0,3 S 0,3 S,2 S 0,2 Q,2 Q 0,2 Q, Q 0,4 Q 0, S 0,4 S, S 0, Fg. 3. Exaple showng the convergence of SMOOTH. We forally establsh the convergence of SMOOTH for = 2 n the next theore. Theore 7. For an ntal dstrbuton P(X), two nconsstent soft evdences 2 R( Y ), R( Y ), and α (0,), Phase 2 of SMOOTH converges. Experents show that Phase 2 of SMOOTH converges for > 2, and when * α the convergng dstrbuton Q nzes the su of dstances, n both I- dvergence and total varaton, to all constrants n R. We leave ths general cla as a conecture.

556 Y. Peng, S.-Y. Zhang & R. Pan The te perforance of SMOOTH, lke all IPFP based ethods, depends on the nuber of teratons t takes to reach convergence. Experents show that SMOOTH oves towards the convergence pont farly fast at the begnnng, even wth α very close to. However, t slows down drastcally at the end, forng a long and flat tal (see Fg. 4 where 90% of the te s spent to brng the flat tal to the convergence pont). As dscussed before, keepng α large at the begnnng ensures nforaton n the orgnal constrants s not lost too soon by soothng before t gets a chance to be absorbed. When the process gets closer to the convergence pont, we can afford to use saller α snce ost nforaton of the orgnal constrants that can be absorbed has largely been absorbed. By (2), a saller α pulls the constrants toward the current Q k faster, leadng to a faster convergence at the end. We have experented wth a nuber of schedules for reducng α. The one perfored best s the sgod functon: α k = exp( A k / B) /( + exp( A k / B)) (4) where k s the teraton steps of Phase 2. It can be seen by (4) that wth a large postve A, α s close to at the begnnng (k s sall), and close to 0 when k becoes very large, and that α decreases very slowly at the two ends, but fast n the ddle. Paraeter A controls how long α s to rean large (longer for larger A) and B controls how fast k α ncreases n the ddle (faster for saller B). If the desred ntal value α0 s gven, then A can be deterned by α 0 = /( + exp( A)). For exaple, to have α0 0.99, we set A = 4.595. We call SMOOTH usng (4) to reduce α k Accelerated SMOOTH (A-SMOOTH for short). Replacng α by αk n (3-), when k, snce αk 0, so Qk ( x) Qk ( x), therefore, a-smooth converges. 5.3. Experents To eprcally valdate algorth SMOOTH and to get a sense of how well t perfors n coparson to the exstng ethods, we have conducted coputer experents wth dfferent ntal dstrbutons and dfferent constrants. The algorths copared n the experents nclude: () GEMA, (2) SR-IPFP, (3) SMOOTH, (4) A-SMOOTH. For SR-IPFP, we use α k = /( + k) n (0), whch s the fastest schedule for reducng α k suggested by the authors. 20 For SMOOTH we set α 0.99 n Phase 2, and for A-SMOOTH, we set A = 4.595 and B = 50. We use the nuber of I-proectons nstead of the nuber of teratons to easure the te perforance of an algorth because an teraton ay nvolve dfferent nuber of I- proectons for dfferent algorths. For exaple, nuber of I-proectons n one teraton s for our SMOOTH and 2 for GEMA ( for each of the two steps). In all our experents, convergence s reached f at teraton k = l the su of total varatons ( ) ( 2 Σ = Qk + y Qk + y ) s wthn the gven error bound of 0.,, Experent 5- uses the data taken fro Ref. 2 nvolvng three varables X X 2 X 3. The ntal ont dstrbuton JPD s a unfor dstrbuton of the three varables. Three constrants, each a dstrbuton of two varables, are generated accordng to the schee n Table 4. These constrants are consstent wth each other when ε = 4 / 20 (called CONS0), nconsstent when ε = 3 / 20 (called CONS).

BN Reasonng wth Uncertan Evdences 557 Table 4. Constrant generator. P, =, 2 X X = 0 = X = 0 X = + + / 2 ε ε ε / 2 ε = 0 X = P 3 3 3 X X = 0 = X ε / 2 ε / 2 ε ε The experent results for consstent constrants CONS0 are gven n Table 5. All three algorths converged to the sae the I-proecton on S = P ( x) P ( x) P ( x). SMOOTH s sgnfcantly faster than the other two. R( y) R( y2 ) R( y3) Ths s because for the consstent constrants SMOOTH s reduced to the standard IPFP (only Phase s executed). Table 5. Experent 5- results for CONS0 ( ε = 4 / 20 ). Algorth GEMA SR-IPFP SMOOTH # proectons 64 3507 84 I-dvergence 0.045386 0. 045386 0. 045386 Experent 5-2 copares perforance wth nconsstent constrants CONS n whch every two constrants are consstent wth each other, but they together are nconsstent wth the thrd one. Besdes JPD, another ntal ont dstrbutons JPD2 s also used. The experent results are gven n Table 6 where for the two versons of SMOOTH nubers of I-proectons for both phases are gven. It can be seen fro the I-dvergences of the convergng dstrbutons to the ntal dstrbutons and the I-aggregates that GEMA, SMOOTH, and A-SMOOTH converge to dstrbutons that are very close to each other, wth A-SMOOTH sgnfcantly faster than the others (SR-IPFP was stopped when the te lt of 0 llon I-proectons s reached before the convergence). Table 6. Experent 5-2 results for Inconsstent CONS ( ε = 3 / 20 ). # proectons I-dvergence I-aggregate GEMA JPD- 7,744,446 0.450243 0.0036769 JPD-2 9,064,080 0.7979040 0.0572799 SR- IPFP JPD- >0,000,000 0.37048603 0.0046839 JPD-2 >0,000,000 0.7027029 0.05742945 SMOOTH JPD- 77+3825 0.4503774 0.0036772 JPD-2 29+4899 0.7306584 0.0572920 A-SMOOTH JPD- 77+375 0.450389 0.00367227 JPD-2 29+402 0.7439294 0.05729532

558 Y. Peng, S.-Y. Zhang & R. Pan We plot I-aggregates of all the four algorths for JPD n Fg. 4. The plot starts at the 78 th I-proecton, whch s the begnnng of Phase 2 of SMOOTH and A-SMOOTH, and ends at the 4200 th I-proecton. It s clear that I-aggregate decreases fastest for A-SMOOTH, followed by SMOOTH, wth CC-IPFP the slowest. Fg. 4. Plot of I-aggregates of the four algorths. Experent 5-3. To see that GEMA s data senstve, we generated another set of 3 constrants (CONS2), each of whch also nvolves two of the three varables X, X 2, X 3. Unlke CONS shown n Table 4, CONS2 s par-wse nconsstent. The results usng CONS2 aganst JPD and JPD2 are gven n Table 7. It can be seen fro Tables 6 and 7 that GEMA s very slow for three of the four cobnatons of JPDs and constrants but very fast (780 I-proectons) for one cobnaton (JPD+CONS2). Slar phenoena have also been observed n soe of our other experents. On the other hand, both versons of SMOOTH have unfor perforance for all cobnatons. Table 7. Experent 5-3 result for CONS2. Algorth GEMA SR-IPFP SMOOTH JPD 780 >0,000,000 54+3405 JPD2 2,400,542 >0,000,000 26+3933 Experent 5-4 tests the scalablty of these algorths wth larger JPDs of 8 and 5 varables. The results shown n Tables 8 and 9 are consstent wth those reported earler for saller JPD. For these experents we dd not run SR-IPFP because t took too uch te to reach a pont that was close to a convergence.

BN Reasonng wth Uncertan Evdences 559 Table 8. Result for JPD of 8 varables and 4 nconsstent constrants. # proectons I-dvergence I-aggregate GEMA 92 0. 93720845 0.02345286 SMOOTH 48+5000 0.94365473 0.0234722 A-MOOTH 48+568 0.94366720 0.0234724 Table 9. Result for JPD of 5 varables and 4 nconsstent constrants. # proectons I-dvergence I-aggregate GEMA 67 0.4597234 0.0340749 SMOOTH 736+5460 0.4597849 0.03408528 A-MOOTH 736+584 0.45989650 0.0340846 Fnally, we conducted an experent to copare the perforance of belef updates on full ont dstrbutons and on BNs. The experent reported n Table 0 used a BN of 4 bnary varables and 4 nconsstent constrants nvolvng a total of 7 dstrct varables. Both GEMA and SMOOTH were run on the full ont dstrbuton (of 0 4 entres) generated fro ths BN. The SMOOTH verson of BN-IPFP- was run drectly on the BN. As can be seen n Table 0, belef updates on the full JPD are several orders of agntudes slower than that on the BN. When these constrants were odfed to be consstent, the convergence te for the standard IPFP on the full JPD was 27 second whle the te for BN-IIPFP- was only 0.323 second. Table 0. Result for nconsstent constrants: Full JPD vs BN. Algorths # proectons Te Full JPD usng GEMA 784 459s Full JPD usng SMOOTH 2769 887s SMOOTH on BN-IPFP- 380 0.656s Recall (Table 6) that GEMA took ore than 7 llon I-proectons to converge n Experent 5-2 to odfy the belef n a tny JPD of only three varables. We appled the SMOOTH verson of BN-IPFP- to the sae task after frst convertng the orgnal JPD to a BN of three nodes; and, uch to our surprse, t took only 02 I-proectons to converge! Although anecdotal, these results clearly deonstrated sgnfcant coputatonal advantages of usng BN to represent ont dstrbutons and the practcal value of belef update ethods based on BN such as the algorths we developed n ths work. 6. Conclusons In ths paper we presented our results on Bayesan network belef update wth uncertan evdences. We defned two types of uncertan evdences. The vrtual evdence, gven as a lkelhood rato, represents uncertanty one has for an observaton and t requres the

560 Y. Peng, S.-Y. Zhang & R. Pan lkelhood rato be preserved n updated BN. The soft evdence, gven as a dstrbuton over one or ore varables, represents the uncertanty of an event one s observng, and t requres ths dstrbuton be preserved n the updated BN. After establshng the close relatons between the Pearl s vrtual evdence ethod, the Jeffrey s rule, and the I- proecton, we developed the effcent algorths for BN belef updates wth ultple soft evdences. One advantage of BN-IPFP-, n contrast to soe exstng ethods, s that t can easly work wth any BN nference engnes. BN-IPFP-2 can provde effcent coputaton when the nuber of varables nvolved n the soft evdences s sall. Algorth SMOOTH was developed by odfyng the standard IPFP to support belef update wth nconsstent evdences. The convergence of these algorths was analyzed and experents of lted scales were conducted to valdate these algorths and to deonstrate ther effectveness. In addton, we for the frst te forally establshed that Equaton (6), whch s used to copute I-proecton n IPFP, not only nzes the I- dvergence but also the total varaton between the source and the proected dstrbutons. BN belef update ay be subect to ultple evdences of dfferent types (hard, vrtual, and soft), and these evdences ay arrve at the sae te or at dfferent te. Our BN-IPFP- s flexble to support such nference. When all evdences arrve at the sae te or hard and vrtual evdences arrve before soft evdences, one can frst update the belefs wth the gven hard and vrtual evdences usng the conventonal BN nference ethods and then apply BN-IPFP- on the updated BN. A hard or vrtual evdence arrvng after soft evdences havng been absorbed wll change the belefs n the BN, f ths change causes Q( Y ) R( Y ) for any soft evdence R( Y ) (.e., L, l ( Y ) :: n Step 2.2 of BN-IPFP-), then BN-IPFP- s actvated and the teratons renewed untl convergence. As entoned earler, one can use vrtual evdence to represent the doubt he has on a hard evdence, ths can also be appled when one s n doubt of a soft evdence. Recall that n our approach, a soft evdence R( Y ) s frst converted nto a vrtual evdence wth a vrtual node U. If our doubt of R( Y ) can be represented as a lkelhood L(U), then we can create another vrtual node V wth U as ts only parent and ts CPT deterned by L(U). Then nstantaton of V to true wll apply R( Y ) wth uncertanty of L(U) to the BN. We are contnung our research effort n ths frutful area along several drectons. Our proof of convergence of SMOOTH s only done for the case of two nconsstent constrants, we are actvely workng on generalzng t to any arbtrary nuber of constrants. Our experents show that SMOOTH has a unfor te perforance whle GEMA s data senstve and t soete converges uch faster than SMOOTH. We are exanng the factors that ay be the causes for the perforance dfferences and hopng to fnd a way to utlze soe of the fndngs to prove the effcency. We realzed that GEMA, although orgnally devsed for general ont dstrbutons, ay be adapted to BNs. We are workng on developng a BN verson of GEMA algorth. In ths work, we consdered constrants R( y ) as soft evdences to odfy the current belefs. These low densonal dstrbutons can also be peces of new knowledge whch are ore up-to-date, ore accurate, or ore locaton specfc, and absorbng these nto a larger dstrbuton s a process of knowledge ntegraton or knowledge-base update. In the