Accepted for the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) August 1-3, 1996, Portland, Oregon, USA

Size: px

Start display at page:

Download "Accepted for the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) August 1-3, 1996, Portland, Oregon, USA"

Candace Watkins
5 years ago
Views:

1 Computatonal complexty reducton for BN2O networks usng smlarty of states Alexander V. Kozlov Department of Appled Physcs Stanford Unversty Stanford, CA phone: (415) e-mal: Abstract Although probablstc nference n a general Bayesan belef network s an NP-hard problem, computaton tme for nference can be reduced n most practcal cases by explotng doman knowledge and by makng approxmatons n the knowledge representaton. In ths paper we ntroduce the property of smlarty of states and a new method for approxmate knowledge representaton and nference whch s based on ths property. We dene two or more states of a node to be smlar when the rato of ther probabltes, the lkelhood rato, does not depend on the nstantatons of the other nodes n the network. We show that the smlarty of states exposes redundances n the jont probablty dstrbuton whch can be exploted to reduce the computaton tme of probablstc nference n networks wth multple smlar states, and that the computatonal complexty n the networks wth exponentally many smlar states mght be polynomal. We demonstrate our deas on the example of a BN2O network a two layer network often used n dagnostc problems by reducng t to a very close network wth multple smlar states. We show that the answers to practcal queres converge very fast to the answers obtaned wth the orgnal network. The maxmum error s as low as 5% for models that requre only 10% of the computaton tme needed by the orgnal BN2O model. 1 INTRODUCTION A Bayesan belef network s a drected acyclc graph (DAG) whose nodes represent random varables and whose edges represent dependences between the random varables. Belef networks are used for knowledge representaton n dagnostc and forecastng software systems. Belef networks allow the user to answer queres about the probabltes of the states of one or Jaswnder Pal Sngh Department of Computer Scence Prnceton Unversty Prnceton, NJ phone: (609) e-mal: jps@cs.prnceton.edu several nodes, called query nodes, condtoned on other nodes, called evdence nodes. The process of ndng these condtoned probabltes s called probablstc nference. Probablstc nference s NP-hard for a general network wth an arbtrary structure [Cooper, 1990]. Furthermore, even approxmatng nference n a general belef network s NP-hard [Dagum and Luby, 1993]. However, knowledge of the problem doman can help to reduce computaton tme of probablstc nference. For example, Heckerman showed that probablstc nference s lnear n two-layer networks wth nosy-or nteracton between nodes (BN2O networks) for negatve evdence about nodes [Heckerman, 1989], and Heckerman and Breese showed that the nosy-or nteracton between nodes can be further smpled to reduce the number of parents per node whch reduces computatonal complexty for networks wth specal structure [Heckerman and Breese, 1994]. Thus, the computatonal complexty of probablstc nference can be managed n specal cases. In ths paper we propose a new way of smplfyng probablstc nference n belef networks based on the property of smlarty of states. Two or more states of a node are smlar when the lkelhood rato does not depend on the nstantatons of the other nodes n the network. The probablty of one these states determnes the probabltes of all states smlar to t. If a model contans states that are almost smlar, we can force the states to be smlar and make probablstc nference less computatonally expensve. If we can make exponentally many states smlar, the resultng computatonal complexty of probablstc nference n the new model s polynomal n the sze of the network. We call the new method state space aggregaton snce we explctly aggregate states nto groups to make probablstc nference wth them as wth a group. We demonstrate the new method on two examples of BN2O networks: One s a randomly generated BN2O network and the other s a BN2O network wth nosy-or coecents randomly chosen from the practcally mportant CPCS medcal dagnostc network (the orgnal CPCS network was constructed out of a

2 Computer-based Patent Case Smulaton database by R. Parker and R. Mller [Parker and Mller, 1987]). 1 We rst convert a BN2O network to a cluster tree. A general form of the cluster tree for the BN2O network s a \Nave" Bayesan classer wth one large node representng all nodes n the rst layer and many chldren representng nodes n the second layer. Although the resultng cluster node has exponentally many states, we can aggregate most of these states nto a group of smlar states. We then make probablstc nference wth these states as a group. We show that the resultng model provdes a good estmate of the probablstc nference results as compared to the orgnal BN2O model and s better than the state abstracton method [Wellman and Lu, 1994] used for approxmate nference. The paper s organzed as follows. In Secton 2 we clarfy the notatons and conventons we use throughout the paper. In Secton 3 we ntroduce BN2O networks and revew the approaches to make probablstc nference n them tractable. In Secton 4 we dene the property of smlarty and develop our approach based on the modcaton of the orgnal network to make a large subset of states smlar. In Secton 5 we demonstrate our deas on the example of our randomly generated BN2O networks. In Secton 6 we dscuss the relaton of the current technque to prevous work. Fnally, we conclude n Secton 7. 2 NOTATIONS In ths paper, we use a specal notaton for a set of smple nodes n the orgnal network. Thus, whle we denote an ndvdual smple node by a small letter x, wth a possble subscrpt when further dstncton s requred, we denote a set of nodes by a captal letter X, agan wth a possble subscrpt. We call a set of smple nodes a cluster node snce we can always represent a set of nodes as a sngle node whch takes values from an expanded set of values, a set obtaned by takng a drect product of the sets of values for the orgnal nodes. A superscrpt of X, f t appears, denotes a partcular state of the cluster node (.e. a partcular combnaton of states of the smple nodes n the set). We denote the number of smple nodes n the cluster node as N (X) and the sze of the state space as jxj. In the networks consdered n ths paper (BN2O networks) all ndvdual nodes are bnary,.e. can take only two values false and true. For the bnary nodes, we assume that f x s n the state false, then the varable x s zero, and f x s n the state true, then the varable x s one. We assume a short form p() to denote p(x = false) and p() to denote p(x = true). For the cluster node, we assume a short form p(x s) to 1 The CPCS network s actually not qute a two-layer network. Whle several BN2O networks are smlar to CPCS and are used n practce, they are propretary and were not accessble for ths paper. denote p(x = X s ). Before we ntroduce the smlarty of states property, let us consder BN2O networks further. 3 BN2O NETWORKS A BN2O network s a two-layer network consstng of bnary nodes wth a nosy-or dependence between the nodes n the rst and the second layers. Let us take a medcal dagnostc network as an example. The nodes n the rst layer are dseases and the nodes n the second layer are symptoms (called ndngs n the medcal lterature) n ths network. The nodes n the rst layer (dseases) have one or several chldren n the second layer (ndngs). The nosy-or nteracton between the dseases and ndngs descrbes a causal ndependence assumpton,.e. that the ablty of any sngle dsease to cause a gven symptom does not depend on the presence of the other dseases [Pearl, 1988]. A nosy-or dependence between a ndng node x f and ts n parent dsease nodes x dj can be characterzed by (n + 1) real numbers from the nterval [0; 1]: a leak and n coecents. The leak, whch we denote Leak(f ), s the probablty of the ndng node n the absence of any of the dseases descrbed by the network. A coecent, whch we denote c j, descrbes the ablty of a dsease d j to cause an ncrease n the probablty p(f ) of the ndng f. More precsely, the probablty of the ndng beng absent p(f ) s multpled by (1? c j ) each tme a parent x dj of the node x f changes ts state from false to true. We can wrte the nosy-or condtonal probablty n a closed form: p(f jx d1 ; x d2 ; : : : ; x dn ) = (1) [1? Leak(f )] Y j [1? c j x dj ]; p(f jx d1 ; x d2 ; : : : ; x dn ) = (2) 1? [1? Leak(f )] Y j [1? c j x dj ]; where we used our conventon that x dj s zero f the node x dj s n the state false and one f t s n the state true. Trvally, f c j s zero, the state of the parent does not aect the probablty of the chld, and f c j s one, the true state of the parent forces the chld to be true wth probablty one. An extenson of the nosy-or nteracton to multply-valued nodes s possble and s called nosy-max [Pradhan et al., 1994]. In the nosy- MAX nteracton relatons dentcal to (1) and (2) hold for the combned probablty of the rst k states of the chld node. BN2O networks have been an object of study for a long tme due to ther potental applcablty and due to the exstence of a compact form for the nosy-or condtonal probabltes. Knowledge acquston for BN2O network s also smpled snce an expert has to assess only a small number of parameters lnear n the number of parents to completely characterze the

3 dependence. Also, probablstc nference wth BN2O networks has polynomal complexty n some specal cases. For example, for negatve ndngs the nodes n the rst layer reman condtonally ndependent and the dsease probabltes can be obtaned by the summaton of the probablty dstrbuton for the dsease nodes: p(x d1 ; x d2 ; : : : ; x dn ) = Q [1? Leak(f )] Q j [1? c jx dj ]p(x dj ); (3) where the rst product s over the negatvely nstantated ndngs. The computatonal complexty of the probablstc nference wth negatve evdence s lnear n the sze of the network snce the above probablty dstrbuton (3) s easly decomposable nto factors of the form [1? c k x dk ]p(x dk ). The new probabltes of the dsease d k after nstantaton are: p (1) (d k ) = p (1) (d k ) = p(d k ) Q [1? c k]p(d k ) p(d k ) + Q [1? c k]p(d k ) : p(d k ) + Q [1? c k]p(d k ) ; Inference for postve evdence about nodes s more nvolved. We have to evaluate the sums over: p(x d1 ; x d2 ; : : : ; x dn ) = Q 1? [1? Leak(f )] Q j [1? c jx dj ]p(x dj ) where the rst product s over the postvely nstantated ndngs. The probablty of each of the dseases cannot be taken out of the summaton easly for ths sum and we have to evaluate t by expandng the outermost product n the above expresson nto a sum of products: p(x d1 ; x d2 ; : : : ; x dn ) = P P h x X E (?1) f Q [1 :x f =1? Leak(f )] Q [1 j? c jx dj ]p(x dj ) ; (4) where we sum over all possble nstantaton of the cluster node X E consstng of the evdence nodes. Each of the terms n the sum (4) has structure dentcal to (3) and ts contrbuton to the dsease probabltes can be computed n lnear tme. However, the total number of terms n the sum s exponental n the number of evdence nodes, and the computatonal complexty of probablstc nference wth postve evdence s exponental n the number of evdence nodes [Heckerman, 1989]. The same complexty results can be obtaned by consderng topologcal transformatons of networks wth nosy-or nteractons [Heckerman and Breese, 1994]. The transformaton can reduce the number of parents for a node n a network n specal cases and s computatonally equvalent to the above decomposton of the jont probablty sums (4). ; Besdes these smplcatons, probablstc nference n a general BN2O network (as well as n the practcal QMR-DT network [D'Ambroso, 1994]) s computatonally ntractable for general postve evdence. In the practcal QMR-DT network, users have to apply heurstc search or stochastc smulaton methods to obtan approxmate results. The above methods are unpredctable and sometmes fal to produce a satsfactory error bound on the result n tme crtcal stuatons. In ths case, we need to smplfy the orgnal model so that t determnstcally produces a satsfactory answer n a known xed amount of tme. Let us now ntroduce smlarty of states whch we wll use later to construct a model wth polynomal complexty that s very close to the BN2O model. 4 SIMILARITY OF STATES In ths secton, we ntroduce the smlarty of states property, whch we use later for our model reducton. The smlarty of states s a form of ndependence n the network. It exposes a redundancy n the jont probablty dstrbuton and therefore can be used to make probablstc nference faster. Denton 4.1: The states X s and Xs0 of a node X are smlar wth respect to a node X j f the rato of the probabltes p(x s )=p(x s0 ) s nvarant wth respect to any nstantaton of the node X j : p (k) (X s ) p (k) (X s0 ) = p(x s ) p(x s0 ) = const (5) for all nstantatons of Xj k of the node X j. We call the two states of a node smply smlar f the two states are smlar wth respect to all the other nodes. The probablty of one of the smlar states determnes the probabltes of the others. Let us take an example of car dagnoss. Gven that a car doesn't start, the fact the the fuel tank s full ncreases the probablty that the one of the spark plugs doesn't work. However, we can treat the probabltes that each one of the spark plugs faled as smlar. Unless we look under the hood, the probablty that one of the spark plugs has faled determnes the probabltes of the falure of any other spark plug. The lkelhood rato of the spark falure probabltes stays the same. Ths ndependence nformaton was brllantly used for the constructon of smlarty networks [Heckerman, 1990]. A smlarty network s a constructon consstng of a smlarty graph and a collecton of local knowledge maps correspondng to each edge n the smlarty graph. Smlarty networks were developed to smplfy the constructon of large and complex belef networks. They are a result of recognzng \specc forms of condtonal ndependence" and developng specal representatons for them that smplfy the knowledge acquston. To buld a smlarty network, we rst pck a dstngushed node representng

4 the hypotheses to be chosen from (for example, n a medcal dagnoss problem, the hypotheses are the dseases). A node n a smlarty graph s a hypothess, and an edge ndcates two hypotheses that are lkely to be confused by an expert. For each such par of hypotheses, we buld a local knowledge map. A local knowledge map s a belef network for dstngushng between these two hypotheses. By focusng on constructng local knowledge maps, a person can concentrate on one manageable porton of the modelng task at a tme. Our goal, on the other hand, s to smplfy probablstc nference n complex belef networks. We do ths by dentfyng redundances n the jont probablty dstrbuton. The redundancy consdered n ths paper s the smlarty of states and s related to the same \specc forms of condtonal ndependence" as the smlarty networks developed earler. The local knowledge maps constructed by a knowledge engneer mght, n fact, be used for dentfyng smlar states. A local knowledge map would contan all nodes wth respect to whch the par of hypotheses s not smlar. In our denton, the concept of smlarty of states s more general and can be appled to any node n the network, not just the dstngushed node representng the hypotheses to be chosen from. We demonstrate how ths concept can be appled for model reducton n the case of BN2O networks. The followng theorem shows the redundancy n the jont probablty dstrbuton ntroduced by the smlarty of states: Theorem 4.2: The states X s and X s0 of a node X are smlar wth respect to a node X j the columns of the condtonal probablty matrx (P j ) qp = p(x q j jxp) correspondng to these two states s and s 0 are dentcal the columns of the probablty dstrbuton matrx (D j ) qp = p(x q j ; Xp ) correspondng to the states s and s 0 are lnearly dependent. The proof of the theorem s easy, and follows from the decomposton of the probablty dstrbuton for the two nodes nto a product (D j ) qp = p(x q j jxp )p(xp). In ths form, any postve nstantaton of the node X j n the state Xj k can be represented as removng all rows except k from the matrx D j. After the remanng row s normalzed, the probablty p(x p ) can be read o from ths row. Any negatve nstantaton of the node X j n the state Xj k can be represented as removng the row k from the matrx D j. After the remanng probablty dstrbuton s normalzed, the probablty p(x p ) can be obtaned by summng all numbers n p- th column. The proof s obtaned by consderng all possble nstantatons of X j. Thus, the smlarty of states uncovers a redundancy n the jont probablty dstrbuton. In lnear algebra terms, two or several columns of the matrx representng the jont probablty dstrbuton are lnearly dependent f the correspondng states are smlar. If the columns are close to lnearly dependent, we can approxmate the jont probablty dstrbuton to make the states smlar n order to smplfy probablstc nference. The theorem shows how we can ntroduce the smlarty of states va condtonal probabltes. We aggregate some of the states wth almost dentcal condtonal probablty matrx columns and force them to be smlar by assgnng the same column to every one of these states. Although for a general jont probablty dstrbuton the computatonal complexty of probablstc nference s lnear n the total number of states per any gven node, the computatonal complexty of probablstc nference wth smlar states s lnear only n the total number of states that are not smlar. By constructng models wth exponentally many smlar states we can reduce computatonal complexty from exponental to polynomal n some networks. In the next secton we show that the precson of the reduced model as compared to the orgnal model can be qute satsfactory n these cases. 5 EXAMPLE OF STATE SPACE AGGREGATION We demonstrate the applcaton of our state space aggregaton method on the example of BN2O networks. We assume that a BN2O network has n 1 bnary nodes n the rst layer and n 2 bnary nodes n the second layer, and that every node x d n the rst layer s connected to every node x fj n the second layer. 2 Frst, we descrbe our procedure and then compare the results for our reduced model to the results of a full BN2O network. 5.1 FORMALISM We proceed by combnng all nodes n the rst layer nto one large cluster node X D representng all possble dseases and ther combnatons. Node X D has an exponental number of states 2 n1, and we hope that some of these states can be made smlar. We therefore partton the 2 n1 states nto two subsets: One s the subset of N b base states, whch we denote as S b, and the other s the subset of N = jx D j? N b smlar states, whch we denote as S. We wll explot derent strateges for choosng the subset of states whch we force to be smlar (see subsectons 5.2 and 5.3). Accordng to the denton of smlar states (5), the contrbuton to the dsease probabltes from the set 2 Although sparse nterconnecton reduces the applcablty of ths method compared to the methods based on topologcal decomposton [D'Ambroso, 1994, Heckerman and Breese, 1994], we wll show that state space aggregaton produces satsfactory results even for sparse BN2O networks. A combnaton of the topologcal method and methods based on state space aggregaton s dentely possble but not consdered n ths paper.

5 of smlar states S s a constant factor tmes the combned probablty of the smlar states p(x ). The D posteror probablty of a dsease s then computed as: p (1) (d ) = X X s D 2Sb p (1) (X s D)x d + (d )p (1) (X D); where the rst term s the contrbuton to the dsease probablty from the base states and the second s the contrbuton to the dsease probablty form the smlar states. The coecents (d ) can be obtaned from the pror probabltes: (d ) = p(d )? P s2s b p(x s D )x d 1? P s2s b p(x s D ) ; and are computed n lnear tme gven the condtonally ndependent rst layer nodes n the orgnal network. The condtonal probabltes for the base states match the condtonal probabltes of the correspondng states n the orgnal BN2O model. The condtonal probablty for the smlar states whch has to be dentcal for every smlar state s chosen to preserve the pror probabltes of the ndngs: p(f j jx D) = p(f j)? P s2s b p(f j jx s D )p(xs D ) 1? P s2s b p(x s D ) : (6) The last equaton s an applcaton of the Bayes' rule for a ndng node x fj and the aggregated smlar state. As we can see, the computaton to transform the model to a reduced model nvolves smple summatons over the base states. The computatonal requrements n ths state aggregaton model are thus the same as n the state space abstracton model [Wellman and Lu, 1994] n whch the answer to a query s nferred by summng over the base states only and completely gnorng the rest of the states. Our model accounts for some of the gnored probablty mass va the coecents (d ). Let us now see how the state space aggregaton model helps to ncrease the precson of probablstc nference n BN2O networks. 5.2 RANDOMLY GENERATED BN2O NETWORKS To demonstrate how the state aggregaton model can help mprove the precson of the model and reduce the computaton tme of probablstc nference, we rst generated a BN2O network wth randomly chosen nosy-or coecents drawn from a Beta(2; 4) dstrbuton (wth the expected value hc j = 1=3). If a large number of nodes n the rst layer are n the state true, we expect that the probablty of any ndng s close to one. Thus, we make smlar all states of the cluster node X D n whch the number of dseases present s larger than some xed number d max : X s D 2 S f X N (X D ) =0 x d d max : The number of base states n ths case s polynomal n the number of the rst layer nodes: N b = dx max =0 n 1 1 A n d max 1 ; for d max > 1, and the reducton of the orgnal BN2O model to the model wth the aggregated smlar states has polynomal complexty. Frst, we analyzed the maxmum absolute error for the queres about the probablty of each of the dseases (nodes n the rst layer) gven derent nstantatons wth derent number of postve ndngs (nodes n the second layer). The results of the smulatons for the BN2O network consstng of 18 nodes n the rst layer and 18 nodes n the second layer are presented n Fgures 1 and 2 (for the state space abstracton and the state space aggregaton models respectvely). The error n the state space aggregaton model s much smaller (about an order of magntude for hgh d max ) than the error n the state space abstracton model, where the set S s completely dsregarded. Also, the error for the nstantatons wth the large number of ndngs present the regon where the probablstc nference s computatonally very expensve s almost ndependent of the number of dseases present. Snce the errors ntroduced by the nstantatons of derent ndngs can be consdered ndependent, we expect the maxmum error to be O( p N (X E )),.e. to grow as a square root of the number of nstantated nodes. For our network, the maxmum absolute error s less than 0:005 for d max > 5 over all possble nstantatons of the nodes n the second layer. Second, we analyzed the behavor of the maxmum relatve error n the above network. The relatve error as opposed to the absolute errors mght be more mportant for some problems. For example, the probablty of a lfe-threatenng dsease beng 10?3 s substantally better than the probablty of t beng 10?2, and the relatve error of 10 shows ths more clearly than the absolute error of Fgure 3 shows the maxmum relatve errors for the above models over all possble nstantatons of the second layer nodes. The error n the state space aggregaton method s about an order of magntude lower than n the state space abstracton method for hgh d max. Our method gves superor precson as t partally accounts for the states completely gnored n the state space abstracton method. The error decreases as the combned pror probablty of the smlar states (.e. before any nstantaton), whch s shown on the same plot. The maxmum relatve error s less than 0:01 for d max > 6 over all possble nstantatons of the nodes n the second layer. 5.3 CPCS-LIKE NETWORKS Although the above generated networks do not have the structure that real practcal networks have, the state space aggregaton method can be extended to

6 max absolute error max absolute error max dseases num nstantatons max dseases num nstantatons Fgure 1: The maxmum absolute error of the answer to a query about a dsease probablty for the state space abstracton method adopted from [Wellman and Lu, 1994]. The maxmum error was found by an exhaustve search over all possble postve ndng nstantatons. The computatonal complexty of probablstc nference s O(n dmax 1 ). Fgure 2: The maxmum absolute error of the answer to a query about a dsease probablty for the reducton based on state space aggregaton. The maxmum error was found by an exhaustve search over all possble postve ndng nstantatons. The computatonal complexty of probablstc nference s stll O(n dmax 1 ) as n the state space abstracton method. practcal problems gven some nsght nto the problem doman. For example, the rule for selectng the base states above can be formulated n the doman language: If the number of dseases present s large, then wth a hgh probablty a patent has any magnable ndng present. Thus, the states are almost smlar already and we do not change the condtonal probabltes much by forcng them to be smlar. We can also argue that the cases wth more than a certan number of dseases present occur rarely n practce and are not mportant for dagnoss (have a low combned probablty mass). Although the valdty of these specc rules mght be arguable n the medcal doman, rules of these type can dentely lead to state space aggregaton and smplcaton of probablstc nference. The smlarty of states s present already n many practcal networks. For nstance, Fgure 4 shows that the majorty of the nosy-or coecents n the CPCS network are concentrated close to round numbers lke 0, 0.2, 0.5, 0.8, or 1 snce further precson s not necessary for the dagnostc problem at hand. Besdes, our study of the nosy-or dstrbutons n ths network show that n many cases the coecents are equal exactly, mplyng the approprate redundancy n the jont probablty dstrbuton. Identcaton of these smlar states, however, s best done by a doman expert. To study the eect of structure on the precson of the state space aggregaton and to demonstrate another rule for choosng the set of base states, we bult a BN2O network wth coecents drawn randomly from the set of real nosy-or coecents n the CPCS network. The presence of the nosy-or coecents that are close to zero or one whch consttute about 50% of the total number of nosy-or coecents n the CPCS network makes the state aggregaton more complex and requres a better algorthm for the selecton of smlar states. Consder the states wth the number of dseases present equal to d max as n the prevous subsecton. A subset of d max dseases mght no longer cause a ndng f the coecents for ths subset are close to zero. The condtonal probablty of the ndng s no longer close to one, and ncludng ths state n the set of smlar states and alterng the correspondng condtonal probablty of the ndng can aect the accuracy of the network. Table 1: Errors n the CPCS-lke BN2O network N b =jx D j max abs error max rel error % 4:4 10?2 1: % 6:2 10?3 5:4 10? % 5:8 10?4 1:0 10? % 1:3 10?4 1:7 10?3 To cope wth ths stuaton, we had to modfy the base state selecton algorthm for the CPCS-lke BN2O network. We consder a state of the cluster node X D to be a base f the condtonal probablty of any ndng gven ths state s bgger than a xed parameter. The results for ths base state selecton polcy are gven n Table 1. For a relatve error of 5% we need to account exactly for only 10% of the total number of states, thus reducng the computaton tme of the dagnoss ten tmes. These smple examples show that a large state space of a node can be managed by havng many smlar states n practcal problems, and thus the large szes of the clques n the jon tree can be managed by ntroducng smlarty between states. Gven that the state spaces of the jon tree nodes can be very large, we are lkely to nd exponentally many states that can be aggregated

7 count State Space Aggregaton Error Pror Probablty of Smlar States State Space Abstracton Error postve fndngs value of the coeffcent Fgure 3: The combned pror probablty of the smlar states p(xd) and the maxmum relatve errors jp (1) (d )=p (1) (d )j of the posteror dsease probabltes over all possble queres as a functon of d max. All three curves have the same asymptotc behavor. The error n the state space aggregaton method s smaller snce t partally accounts for the probablty mass that s completely gnored n the state space abstracton method. Fgure 4: The dstrbuton of the nosy-or coecents n the practcal CPCS network. Most coecents are close to round numbers (0, 0.2, 0.5, 0.8, or 1). If a coecent s close to zero, the state of the parent node only slghtly aects the probablty of the chld. If a coecent s close to one, the true state of the parent causes the chld to be n the state true also. nto groups, especally f we have some nsghts nto the underlyng problem. 6 RELATION TO PREVIOUS WORK Snce the BN2O networks are practcally mportant, a few approxmate algorthms has been developed dstnctvely for ths type of networks. The Quckscore uses the nosy-or propertes descrbed n Secton 3 to rearrange the summaton of the jont probablty dstrbuton [Heckerman, 1989], makng the probablstc nference exponental n the number of postvely nstantated nodes, not the number of nodes n the rst layer as gven by the drect trangulaton. The TopN algorthm [Henron, 1991] tres to bound the (ratos of) posteror probabltes for the most lkely N dseases by searchng n a subspace of the full probablty dstrbuton for the rst layer nodes. Stochastc smulaton methods [Henron, 1988] have been speccally extended to sample the jont probablty dstrbuton of BN2O networks. The approach taken n ths paper ders from the prevous ones n that we reduce the complexty of probablstc nference by makng approxmatons n the knowledge representaton, not by makng approxmatons n the nference procedure. The reduced and full models take the same amount of space for ther representaton (the number of coecents to completely specfy the dependence s exactly the same), but the reduced model produces results of almost the same qualty n polynomal amount of tme. On the other hand, our approach s close n sprt to the prevously developed TopN and state space abstracton algorthms n that t tres to account for the major probablty mass of the jont probablty dstrbuton exactly, whle makng approxmatons about the rest of the probablty mass. Our method s drectly related to the proposed earler general approach to complexty reducton usng senstvtes nstead of condtonal probabltes. [Kozlov and Sngh, 1995], and n fact was rst derved n terms of senstvtes. In the prevous work we suggested reducng the computatonal complexty of probablstc nference for general networks by reducng the rank of the senstvty matrces by averagng out the columns of the senstvty matrx. It can be shown that assgnng the same value to condtonal probabltes wthout changng the pror probabltes of nodes s equvalent to averagng out senstvty matrx elements over a subset of states. In the case of BN2O networks ths averagng s reduced to dentfyng the smlar subset of the cluster node X D and assgnng the same condtonal probablty to all these states. However, the methods based on senstvtes are lkely to result n a larger class of complexty reducton methods, partcularly for multply-valued nodes where the analyss n terms of tradtonal condtonal probabltes s complcated. 7 SUMMARY AND FUTURE WORK We dene the property of smlarty of states and use t for model reducton. Two states of a node are smlar

8 f the rato between the probabltes of the two states remans constant after any nstantaton of other nodes n the network. We show that the smlarty of states property can be exploted to perform probablstc nference more ecently. The computatonal complexty of probablstc nference n networks wth smlar states s determned by the total number of non-smlar states nstead of the total number of states, and mght be polynomal n the sze of the network f exponentally many states are smlar. We show a relaton between the smlarty of states property and the redundances n the jont probablty dstrbuton. The states are smlar f and only f the correspondng columns n the jont probablty dstrbuton are lnearly dependent. We nd a generc way of dentfyng smlarty of states and enforcng the smlarty property on states that we want to make smlar through condtonal probabltes. Thus, we can reduce computaton tme of probablstc nference by enforcng the smlarty of states n a model. The accuracy of the reduced model s determned by how smlar the states are n the orgnal problem already. We show that the BN2O models can be readly reduced to a model wth exponentally many smlar states, and that the reduced model produces results very close to the orgnal model for all queres of practcal mportance. The proposed method of complexty reducton s related to the developed earler TopN [Henron, 1991] and the state space abstracton [Wellman and Lu, 1994] methods. As n the above methods, we also try to account for the major probablty mass n the jont probablty dstrbuton exactly, but make some approxmatons about the unaccounted-for probablty mass. When the accounted-for probablty mass s substantal, all methods produce almost exact results. However, our method produces superor accuracy as t estmates the contrbuton from the rest of the probablty mass and performs better on real networks. The model reducton descrbed n ths paper can be readly expanded to any other network represented as a cluster tree (a sngly-connected Markov network of cluster nodes). The cluster nodes wll have exponentally many states and many of these states are lkely to be almost smlar. The method can readly be extended by buldng several groups of smlar states per cluster node, thus mprovng the accuracy wthout much computaton overhead. In ths paper we have shown a successful applcaton on two BN2O networks: One randomly generated and the other buld based on the CPCS medcal dagnostc network. For the network we studed, the error can be as lttle as 5% for the reduced problem whle requrng only 10% of the computaton tme needed by the orgnal problem. Further applcatons of the new approach are of course necessary, and we are actvely pursung the applcaton to practcal belef networks and expert systems. Acknowledgments We thank Randy Mller and Unversty of Pttsburgh for supplyng the CPCS network. We also thank Daphne Koller, Malcolm Pradhan, and Lse Getoor for readng the manuscrpt and valuable comments, John Hennessy for hs support and gudance, and ARPA for nancal support under contract no. N C References Cooper, G. (1990). The computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence, 42:393 { 405. Dagum, P. and Luby, M. (1993). Approxmatng probablstc nference n Bayesan belef networks s NP-hard. Artcal Intellgence, 60:141 { 153. D'Ambroso, B. (1994). Symbolc probablstc nference n large BN2O networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 128 { 135. Heckerman, D. and Breese, J. S. (1994). A new look at causal ndependence. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 286 { 292. Heckerman, D. E. (1989). A tractable nference algorthm for dagnosng multple dseases. In Proceedngs of the Ffth Conference on Uncertanty n Artcal Intellgence, pages 174 { 181. Heckerman, D. E. (1990). Probablstc smlarty networks. Networks, 20:607 { 636. Henron, M. (1988). Propagatng uncertanty n bayesan networks by probablstc logc samplng. In Proceedngs of the Second Conference on Uncertanty n Artcal Intellgence, pages 149 { 163. Henron, M. (1991). Search-based methods to bound dagnostc probabltes n very large belef nets. In Proceedngs of the Seventh Conference on Uncertanty n Artcal Intellgence, pages 142 { 150. Kozlov, A. V. and Sngh, J. P. (1995). Senstvtes: an alternatve to condtonal probabltes for Bayesan belef networks. In Proceedngs of the Eleventh Conference on Uncertanty n Artcal Intellgence, pages 376 { 385. Parker, R. C. and Mller, R. A. (1987). Usng causal knowledge to create smulated patent cases: the CPCS project as an extenson of Internst-1. In Proceedngs of the 11th Annual Symposum on Computer Applcatons n Medcal Care, pages 473 { 480. IEEE Comp Soc Press. Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann. Pradhan, M., Provan, G., Mddleton, B., and Henron, M. (1994). Knowledge engneerng for large belef networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 484 { 490. Wellman, M. P. and Lu, C.-L. (1994). State-space abstracton for anytme evaluaton of probablstc networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 567 { 574.

9 References [Cooper, 1990] Cooper, G. (1990). The computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence, 42:393 { 405. [Dagum and Luby, 1993] Dagum, P. and Luby, M. (1993). Approxmatng probablstc nference n Bayesan belef networks s NP-hard. Artcal Intellgence, 60:141 { 153. [D'Ambroso, 1994] D'Ambroso, B. (1994). Symbolc probablstc nference n large BN2O networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 128 { 135. Morgan Kaufmann. [Heckerman and Breese, 1994] Heckerman, D. and Breese, J. S. (1994). A new look at causal ndependence. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 286 { 292. Morgan Kaufmann. [Heckerman, 1989] Heckerman, D. E. (1989). A tractable nference algorthm for dagnosng multple dseases. In Uncertanty n Artcal Intellgence: Proceedngs of the Ffth Conference, pages 174 { 181. [Heckerman, 1990] Heckerman, D. E. (1990). Probablstc smlarty networks. Networks, 20:607 { 636. [Henron, 1988] Henron, M. (1988). Propagatng uncertanty n bayesan networks by probablstc logc samplng. In Proceedngs of the Second Conference on Uncertanty n Artcal Intellgence, pages 149 { 163. [Henron, 1991] Henron, M. (1991). Search-based methods to bound dagnostc probabltes n very large belef nets. In Uncertanty n Artcal Intellgence: Proceedngs of the Seventh Conference, pages 142 { 150. [Kozlov and Sngh, 1995] Kozlov, A. V. and Sngh, J. P. (1995). Senstvtes: an alternatve to condtonal probabltes for Bayesan belef networks. In Besnard, P. and Hanks, S., edtors, Proceedngs of the Eleventh Conference on Uncertanty n Artcal Intellgence, pages 376 { 385. Morgan Kaufmann. [Parker and Mller, 1987] Parker, R. C. and Mller, R. A. (1987). Usng causal knowledge to create smulated patent cases: the CPCS project as an extenson of Internst-1. In Proceedngs of the 11th Annual Symposum on Computer Applcatons n Medcal Care, pages 473 { 480. IEEE Comp Soc Press. [Pearl, 1988] Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann. [Pradhan et al., 1994] Pradhan, M., Provan, G., Mddleton, B., and Henron, M. (1994). Knowledge engneerng for large belef networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 484 { 490. Morgan Kaufmann. [Wellman and Lu, 1994] Wellman, M. P. and Lu, C.- L. (1994). State-space abstracton for anytme evaluaton of probablstc networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 567 { 574. Morgan Kaufmann.

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example. CIS587 - Artfcal Intellgence Bayesan Networks KB for medcal dagnoss. Example. We want to buld a KB system for the dagnoss of pneumona. Problem descrpton: Dsease: pneumona Patent symptoms (fndngs, lab tests):