Accepted for the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) August 1-3, 1996, Portland, Oregon, USA
|
|
- Candace Watkins
- 5 years ago
- Views:
Transcription
1 Computatonal complexty reducton for BN2O networks usng smlarty of states Alexander V. Kozlov Department of Appled Physcs Stanford Unversty Stanford, CA phone: (415) e-mal: Abstract Although probablstc nference n a general Bayesan belef network s an NP-hard problem, computaton tme for nference can be reduced n most practcal cases by explotng doman knowledge and by makng approxmatons n the knowledge representaton. In ths paper we ntroduce the property of smlarty of states and a new method for approxmate knowledge representaton and nference whch s based on ths property. We dene two or more states of a node to be smlar when the rato of ther probabltes, the lkelhood rato, does not depend on the nstantatons of the other nodes n the network. We show that the smlarty of states exposes redundances n the jont probablty dstrbuton whch can be exploted to reduce the computaton tme of probablstc nference n networks wth multple smlar states, and that the computatonal complexty n the networks wth exponentally many smlar states mght be polynomal. We demonstrate our deas on the example of a BN2O network a two layer network often used n dagnostc problems by reducng t to a very close network wth multple smlar states. We show that the answers to practcal queres converge very fast to the answers obtaned wth the orgnal network. The maxmum error s as low as 5% for models that requre only 10% of the computaton tme needed by the orgnal BN2O model. 1 INTRODUCTION A Bayesan belef network s a drected acyclc graph (DAG) whose nodes represent random varables and whose edges represent dependences between the random varables. Belef networks are used for knowledge representaton n dagnostc and forecastng software systems. Belef networks allow the user to answer queres about the probabltes of the states of one or Jaswnder Pal Sngh Department of Computer Scence Prnceton Unversty Prnceton, NJ phone: (609) e-mal: jps@cs.prnceton.edu several nodes, called query nodes, condtoned on other nodes, called evdence nodes. The process of ndng these condtoned probabltes s called probablstc nference. Probablstc nference s NP-hard for a general network wth an arbtrary structure [Cooper, 1990]. Furthermore, even approxmatng nference n a general belef network s NP-hard [Dagum and Luby, 1993]. However, knowledge of the problem doman can help to reduce computaton tme of probablstc nference. For example, Heckerman showed that probablstc nference s lnear n two-layer networks wth nosy-or nteracton between nodes (BN2O networks) for negatve evdence about nodes [Heckerman, 1989], and Heckerman and Breese showed that the nosy-or nteracton between nodes can be further smpled to reduce the number of parents per node whch reduces computatonal complexty for networks wth specal structure [Heckerman and Breese, 1994]. Thus, the computatonal complexty of probablstc nference can be managed n specal cases. In ths paper we propose a new way of smplfyng probablstc nference n belef networks based on the property of smlarty of states. Two or more states of a node are smlar when the lkelhood rato does not depend on the nstantatons of the other nodes n the network. The probablty of one these states determnes the probabltes of all states smlar to t. If a model contans states that are almost smlar, we can force the states to be smlar and make probablstc nference less computatonally expensve. If we can make exponentally many states smlar, the resultng computatonal complexty of probablstc nference n the new model s polynomal n the sze of the network. We call the new method state space aggregaton snce we explctly aggregate states nto groups to make probablstc nference wth them as wth a group. We demonstrate the new method on two examples of BN2O networks: One s a randomly generated BN2O network and the other s a BN2O network wth nosy-or coecents randomly chosen from the practcally mportant CPCS medcal dagnostc network (the orgnal CPCS network was constructed out of a
2 Computer-based Patent Case Smulaton database by R. Parker and R. Mller [Parker and Mller, 1987]). 1 We rst convert a BN2O network to a cluster tree. A general form of the cluster tree for the BN2O network s a \Nave" Bayesan classer wth one large node representng all nodes n the rst layer and many chldren representng nodes n the second layer. Although the resultng cluster node has exponentally many states, we can aggregate most of these states nto a group of smlar states. We then make probablstc nference wth these states as a group. We show that the resultng model provdes a good estmate of the probablstc nference results as compared to the orgnal BN2O model and s better than the state abstracton method [Wellman and Lu, 1994] used for approxmate nference. The paper s organzed as follows. In Secton 2 we clarfy the notatons and conventons we use throughout the paper. In Secton 3 we ntroduce BN2O networks and revew the approaches to make probablstc nference n them tractable. In Secton 4 we dene the property of smlarty and develop our approach based on the modcaton of the orgnal network to make a large subset of states smlar. In Secton 5 we demonstrate our deas on the example of our randomly generated BN2O networks. In Secton 6 we dscuss the relaton of the current technque to prevous work. Fnally, we conclude n Secton 7. 2 NOTATIONS In ths paper, we use a specal notaton for a set of smple nodes n the orgnal network. Thus, whle we denote an ndvdual smple node by a small letter x, wth a possble subscrpt when further dstncton s requred, we denote a set of nodes by a captal letter X, agan wth a possble subscrpt. We call a set of smple nodes a cluster node snce we can always represent a set of nodes as a sngle node whch takes values from an expanded set of values, a set obtaned by takng a drect product of the sets of values for the orgnal nodes. A superscrpt of X, f t appears, denotes a partcular state of the cluster node (.e. a partcular combnaton of states of the smple nodes n the set). We denote the number of smple nodes n the cluster node as N (X) and the sze of the state space as jxj. In the networks consdered n ths paper (BN2O networks) all ndvdual nodes are bnary,.e. can take only two values false and true. For the bnary nodes, we assume that f x s n the state false, then the varable x s zero, and f x s n the state true, then the varable x s one. We assume a short form p() to denote p(x = false) and p() to denote p(x = true). For the cluster node, we assume a short form p(x s) to 1 The CPCS network s actually not qute a two-layer network. Whle several BN2O networks are smlar to CPCS and are used n practce, they are propretary and were not accessble for ths paper. denote p(x = X s ). Before we ntroduce the smlarty of states property, let us consder BN2O networks further. 3 BN2O NETWORKS A BN2O network s a two-layer network consstng of bnary nodes wth a nosy-or dependence between the nodes n the rst and the second layers. Let us take a medcal dagnostc network as an example. The nodes n the rst layer are dseases and the nodes n the second layer are symptoms (called ndngs n the medcal lterature) n ths network. The nodes n the rst layer (dseases) have one or several chldren n the second layer (ndngs). The nosy-or nteracton between the dseases and ndngs descrbes a causal ndependence assumpton,.e. that the ablty of any sngle dsease to cause a gven symptom does not depend on the presence of the other dseases [Pearl, 1988]. A nosy-or dependence between a ndng node x f and ts n parent dsease nodes x dj can be characterzed by (n + 1) real numbers from the nterval [0; 1]: a leak and n coecents. The leak, whch we denote Leak(f ), s the probablty of the ndng node n the absence of any of the dseases descrbed by the network. A coecent, whch we denote c j, descrbes the ablty of a dsease d j to cause an ncrease n the probablty p(f ) of the ndng f. More precsely, the probablty of the ndng beng absent p(f ) s multpled by (1? c j ) each tme a parent x dj of the node x f changes ts state from false to true. We can wrte the nosy-or condtonal probablty n a closed form: p(f jx d1 ; x d2 ; : : : ; x dn ) = (1) [1? Leak(f )] Y j [1? c j x dj ]; p(f jx d1 ; x d2 ; : : : ; x dn ) = (2) 1? [1? Leak(f )] Y j [1? c j x dj ]; where we used our conventon that x dj s zero f the node x dj s n the state false and one f t s n the state true. Trvally, f c j s zero, the state of the parent does not aect the probablty of the chld, and f c j s one, the true state of the parent forces the chld to be true wth probablty one. An extenson of the nosy-or nteracton to multply-valued nodes s possble and s called nosy-max [Pradhan et al., 1994]. In the nosy- MAX nteracton relatons dentcal to (1) and (2) hold for the combned probablty of the rst k states of the chld node. BN2O networks have been an object of study for a long tme due to ther potental applcablty and due to the exstence of a compact form for the nosy-or condtonal probabltes. Knowledge acquston for BN2O network s also smpled snce an expert has to assess only a small number of parameters lnear n the number of parents to completely characterze the
3 dependence. Also, probablstc nference wth BN2O networks has polynomal complexty n some specal cases. For example, for negatve ndngs the nodes n the rst layer reman condtonally ndependent and the dsease probabltes can be obtaned by the summaton of the probablty dstrbuton for the dsease nodes: p(x d1 ; x d2 ; : : : ; x dn ) = Q [1? Leak(f )] Q j [1? c jx dj ]p(x dj ); (3) where the rst product s over the negatvely nstantated ndngs. The computatonal complexty of the probablstc nference wth negatve evdence s lnear n the sze of the network snce the above probablty dstrbuton (3) s easly decomposable nto factors of the form [1? c k x dk ]p(x dk ). The new probabltes of the dsease d k after nstantaton are: p (1) (d k ) = p (1) (d k ) = p(d k ) Q [1? c k]p(d k ) p(d k ) + Q [1? c k]p(d k ) : p(d k ) + Q [1? c k]p(d k ) ; Inference for postve evdence about nodes s more nvolved. We have to evaluate the sums over: p(x d1 ; x d2 ; : : : ; x dn ) = Q 1? [1? Leak(f )] Q j [1? c jx dj ]p(x dj ) where the rst product s over the postvely nstantated ndngs. The probablty of each of the dseases cannot be taken out of the summaton easly for ths sum and we have to evaluate t by expandng the outermost product n the above expresson nto a sum of products: p(x d1 ; x d2 ; : : : ; x dn ) = P P h x X E (?1) f Q [1 :x f =1? Leak(f )] Q [1 j? c jx dj ]p(x dj ) ; (4) where we sum over all possble nstantaton of the cluster node X E consstng of the evdence nodes. Each of the terms n the sum (4) has structure dentcal to (3) and ts contrbuton to the dsease probabltes can be computed n lnear tme. However, the total number of terms n the sum s exponental n the number of evdence nodes, and the computatonal complexty of probablstc nference wth postve evdence s exponental n the number of evdence nodes [Heckerman, 1989]. The same complexty results can be obtaned by consderng topologcal transformatons of networks wth nosy-or nteractons [Heckerman and Breese, 1994]. The transformaton can reduce the number of parents for a node n a network n specal cases and s computatonally equvalent to the above decomposton of the jont probablty sums (4). ; Besdes these smplcatons, probablstc nference n a general BN2O network (as well as n the practcal QMR-DT network [D'Ambroso, 1994]) s computatonally ntractable for general postve evdence. In the practcal QMR-DT network, users have to apply heurstc search or stochastc smulaton methods to obtan approxmate results. The above methods are unpredctable and sometmes fal to produce a satsfactory error bound on the result n tme crtcal stuatons. In ths case, we need to smplfy the orgnal model so that t determnstcally produces a satsfactory answer n a known xed amount of tme. Let us now ntroduce smlarty of states whch we wll use later to construct a model wth polynomal complexty that s very close to the BN2O model. 4 SIMILARITY OF STATES In ths secton, we ntroduce the smlarty of states property, whch we use later for our model reducton. The smlarty of states s a form of ndependence n the network. It exposes a redundancy n the jont probablty dstrbuton and therefore can be used to make probablstc nference faster. Denton 4.1: The states X s and Xs0 of a node X are smlar wth respect to a node X j f the rato of the probabltes p(x s )=p(x s0 ) s nvarant wth respect to any nstantaton of the node X j : p (k) (X s ) p (k) (X s0 ) = p(x s ) p(x s0 ) = const (5) for all nstantatons of Xj k of the node X j. We call the two states of a node smply smlar f the two states are smlar wth respect to all the other nodes. The probablty of one of the smlar states determnes the probabltes of the others. Let us take an example of car dagnoss. Gven that a car doesn't start, the fact the the fuel tank s full ncreases the probablty that the one of the spark plugs doesn't work. However, we can treat the probabltes that each one of the spark plugs faled as smlar. Unless we look under the hood, the probablty that one of the spark plugs has faled determnes the probabltes of the falure of any other spark plug. The lkelhood rato of the spark falure probabltes stays the same. Ths ndependence nformaton was brllantly used for the constructon of smlarty networks [Heckerman, 1990]. A smlarty network s a constructon consstng of a smlarty graph and a collecton of local knowledge maps correspondng to each edge n the smlarty graph. Smlarty networks were developed to smplfy the constructon of large and complex belef networks. They are a result of recognzng \specc forms of condtonal ndependence" and developng specal representatons for them that smplfy the knowledge acquston. To buld a smlarty network, we rst pck a dstngushed node representng
4 the hypotheses to be chosen from (for example, n a medcal dagnoss problem, the hypotheses are the dseases). A node n a smlarty graph s a hypothess, and an edge ndcates two hypotheses that are lkely to be confused by an expert. For each such par of hypotheses, we buld a local knowledge map. A local knowledge map s a belef network for dstngushng between these two hypotheses. By focusng on constructng local knowledge maps, a person can concentrate on one manageable porton of the modelng task at a tme. Our goal, on the other hand, s to smplfy probablstc nference n complex belef networks. We do ths by dentfyng redundances n the jont probablty dstrbuton. The redundancy consdered n ths paper s the smlarty of states and s related to the same \specc forms of condtonal ndependence" as the smlarty networks developed earler. The local knowledge maps constructed by a knowledge engneer mght, n fact, be used for dentfyng smlar states. A local knowledge map would contan all nodes wth respect to whch the par of hypotheses s not smlar. In our denton, the concept of smlarty of states s more general and can be appled to any node n the network, not just the dstngushed node representng the hypotheses to be chosen from. We demonstrate how ths concept can be appled for model reducton n the case of BN2O networks. The followng theorem shows the redundancy n the jont probablty dstrbuton ntroduced by the smlarty of states: Theorem 4.2: The states X s and X s0 of a node X are smlar wth respect to a node X j the columns of the condtonal probablty matrx (P j ) qp = p(x q j jxp) correspondng to these two states s and s 0 are dentcal the columns of the probablty dstrbuton matrx (D j ) qp = p(x q j ; Xp ) correspondng to the states s and s 0 are lnearly dependent. The proof of the theorem s easy, and follows from the decomposton of the probablty dstrbuton for the two nodes nto a product (D j ) qp = p(x q j jxp )p(xp). In ths form, any postve nstantaton of the node X j n the state Xj k can be represented as removng all rows except k from the matrx D j. After the remanng row s normalzed, the probablty p(x p ) can be read o from ths row. Any negatve nstantaton of the node X j n the state Xj k can be represented as removng the row k from the matrx D j. After the remanng probablty dstrbuton s normalzed, the probablty p(x p ) can be obtaned by summng all numbers n p- th column. The proof s obtaned by consderng all possble nstantatons of X j. Thus, the smlarty of states uncovers a redundancy n the jont probablty dstrbuton. In lnear algebra terms, two or several columns of the matrx representng the jont probablty dstrbuton are lnearly dependent f the correspondng states are smlar. If the columns are close to lnearly dependent, we can approxmate the jont probablty dstrbuton to make the states smlar n order to smplfy probablstc nference. The theorem shows how we can ntroduce the smlarty of states va condtonal probabltes. We aggregate some of the states wth almost dentcal condtonal probablty matrx columns and force them to be smlar by assgnng the same column to every one of these states. Although for a general jont probablty dstrbuton the computatonal complexty of probablstc nference s lnear n the total number of states per any gven node, the computatonal complexty of probablstc nference wth smlar states s lnear only n the total number of states that are not smlar. By constructng models wth exponentally many smlar states we can reduce computatonal complexty from exponental to polynomal n some networks. In the next secton we show that the precson of the reduced model as compared to the orgnal model can be qute satsfactory n these cases. 5 EXAMPLE OF STATE SPACE AGGREGATION We demonstrate the applcaton of our state space aggregaton method on the example of BN2O networks. We assume that a BN2O network has n 1 bnary nodes n the rst layer and n 2 bnary nodes n the second layer, and that every node x d n the rst layer s connected to every node x fj n the second layer. 2 Frst, we descrbe our procedure and then compare the results for our reduced model to the results of a full BN2O network. 5.1 FORMALISM We proceed by combnng all nodes n the rst layer nto one large cluster node X D representng all possble dseases and ther combnatons. Node X D has an exponental number of states 2 n1, and we hope that some of these states can be made smlar. We therefore partton the 2 n1 states nto two subsets: One s the subset of N b base states, whch we denote as S b, and the other s the subset of N = jx D j? N b smlar states, whch we denote as S. We wll explot derent strateges for choosng the subset of states whch we force to be smlar (see subsectons 5.2 and 5.3). Accordng to the denton of smlar states (5), the contrbuton to the dsease probabltes from the set 2 Although sparse nterconnecton reduces the applcablty of ths method compared to the methods based on topologcal decomposton [D'Ambroso, 1994, Heckerman and Breese, 1994], we wll show that state space aggregaton produces satsfactory results even for sparse BN2O networks. A combnaton of the topologcal method and methods based on state space aggregaton s dentely possble but not consdered n ths paper.
5 of smlar states S s a constant factor tmes the combned probablty of the smlar states p(x ). The D posteror probablty of a dsease s then computed as: p (1) (d ) = X X s D 2Sb p (1) (X s D)x d + (d )p (1) (X D); where the rst term s the contrbuton to the dsease probablty from the base states and the second s the contrbuton to the dsease probablty form the smlar states. The coecents (d ) can be obtaned from the pror probabltes: (d ) = p(d )? P s2s b p(x s D )x d 1? P s2s b p(x s D ) ; and are computed n lnear tme gven the condtonally ndependent rst layer nodes n the orgnal network. The condtonal probabltes for the base states match the condtonal probabltes of the correspondng states n the orgnal BN2O model. The condtonal probablty for the smlar states whch has to be dentcal for every smlar state s chosen to preserve the pror probabltes of the ndngs: p(f j jx D) = p(f j)? P s2s b p(f j jx s D )p(xs D ) 1? P s2s b p(x s D ) : (6) The last equaton s an applcaton of the Bayes' rule for a ndng node x fj and the aggregated smlar state. As we can see, the computaton to transform the model to a reduced model nvolves smple summatons over the base states. The computatonal requrements n ths state aggregaton model are thus the same as n the state space abstracton model [Wellman and Lu, 1994] n whch the answer to a query s nferred by summng over the base states only and completely gnorng the rest of the states. Our model accounts for some of the gnored probablty mass va the coecents (d ). Let us now see how the state space aggregaton model helps to ncrease the precson of probablstc nference n BN2O networks. 5.2 RANDOMLY GENERATED BN2O NETWORKS To demonstrate how the state aggregaton model can help mprove the precson of the model and reduce the computaton tme of probablstc nference, we rst generated a BN2O network wth randomly chosen nosy-or coecents drawn from a Beta(2; 4) dstrbuton (wth the expected value hc j = 1=3). If a large number of nodes n the rst layer are n the state true, we expect that the probablty of any ndng s close to one. Thus, we make smlar all states of the cluster node X D n whch the number of dseases present s larger than some xed number d max : X s D 2 S f X N (X D ) =0 x d d max : The number of base states n ths case s polynomal n the number of the rst layer nodes: N b = dx max =0 n 1 1 A n d max 1 ; for d max > 1, and the reducton of the orgnal BN2O model to the model wth the aggregated smlar states has polynomal complexty. Frst, we analyzed the maxmum absolute error for the queres about the probablty of each of the dseases (nodes n the rst layer) gven derent nstantatons wth derent number of postve ndngs (nodes n the second layer). The results of the smulatons for the BN2O network consstng of 18 nodes n the rst layer and 18 nodes n the second layer are presented n Fgures 1 and 2 (for the state space abstracton and the state space aggregaton models respectvely). The error n the state space aggregaton model s much smaller (about an order of magntude for hgh d max ) than the error n the state space abstracton model, where the set S s completely dsregarded. Also, the error for the nstantatons wth the large number of ndngs present the regon where the probablstc nference s computatonally very expensve s almost ndependent of the number of dseases present. Snce the errors ntroduced by the nstantatons of derent ndngs can be consdered ndependent, we expect the maxmum error to be O( p N (X E )),.e. to grow as a square root of the number of nstantated nodes. For our network, the maxmum absolute error s less than 0:005 for d max > 5 over all possble nstantatons of the nodes n the second layer. Second, we analyzed the behavor of the maxmum relatve error n the above network. The relatve error as opposed to the absolute errors mght be more mportant for some problems. For example, the probablty of a lfe-threatenng dsease beng 10?3 s substantally better than the probablty of t beng 10?2, and the relatve error of 10 shows ths more clearly than the absolute error of Fgure 3 shows the maxmum relatve errors for the above models over all possble nstantatons of the second layer nodes. The error n the state space aggregaton method s about an order of magntude lower than n the state space abstracton method for hgh d max. Our method gves superor precson as t partally accounts for the states completely gnored n the state space abstracton method. The error decreases as the combned pror probablty of the smlar states (.e. before any nstantaton), whch s shown on the same plot. The maxmum relatve error s less than 0:01 for d max > 6 over all possble nstantatons of the nodes n the second layer. 5.3 CPCS-LIKE NETWORKS Although the above generated networks do not have the structure that real practcal networks have, the state space aggregaton method can be extended to
6 max absolute error max absolute error max dseases num nstantatons max dseases num nstantatons Fgure 1: The maxmum absolute error of the answer to a query about a dsease probablty for the state space abstracton method adopted from [Wellman and Lu, 1994]. The maxmum error was found by an exhaustve search over all possble postve ndng nstantatons. The computatonal complexty of probablstc nference s O(n dmax 1 ). Fgure 2: The maxmum absolute error of the answer to a query about a dsease probablty for the reducton based on state space aggregaton. The maxmum error was found by an exhaustve search over all possble postve ndng nstantatons. The computatonal complexty of probablstc nference s stll O(n dmax 1 ) as n the state space abstracton method. practcal problems gven some nsght nto the problem doman. For example, the rule for selectng the base states above can be formulated n the doman language: If the number of dseases present s large, then wth a hgh probablty a patent has any magnable ndng present. Thus, the states are almost smlar already and we do not change the condtonal probabltes much by forcng them to be smlar. We can also argue that the cases wth more than a certan number of dseases present occur rarely n practce and are not mportant for dagnoss (have a low combned probablty mass). Although the valdty of these specc rules mght be arguable n the medcal doman, rules of these type can dentely lead to state space aggregaton and smplcaton of probablstc nference. The smlarty of states s present already n many practcal networks. For nstance, Fgure 4 shows that the majorty of the nosy-or coecents n the CPCS network are concentrated close to round numbers lke 0, 0.2, 0.5, 0.8, or 1 snce further precson s not necessary for the dagnostc problem at hand. Besdes, our study of the nosy-or dstrbutons n ths network show that n many cases the coecents are equal exactly, mplyng the approprate redundancy n the jont probablty dstrbuton. Identcaton of these smlar states, however, s best done by a doman expert. To study the eect of structure on the precson of the state space aggregaton and to demonstrate another rule for choosng the set of base states, we bult a BN2O network wth coecents drawn randomly from the set of real nosy-or coecents n the CPCS network. The presence of the nosy-or coecents that are close to zero or one whch consttute about 50% of the total number of nosy-or coecents n the CPCS network makes the state aggregaton more complex and requres a better algorthm for the selecton of smlar states. Consder the states wth the number of dseases present equal to d max as n the prevous subsecton. A subset of d max dseases mght no longer cause a ndng f the coecents for ths subset are close to zero. The condtonal probablty of the ndng s no longer close to one, and ncludng ths state n the set of smlar states and alterng the correspondng condtonal probablty of the ndng can aect the accuracy of the network. Table 1: Errors n the CPCS-lke BN2O network N b =jx D j max abs error max rel error % 4:4 10?2 1: % 6:2 10?3 5:4 10? % 5:8 10?4 1:0 10? % 1:3 10?4 1:7 10?3 To cope wth ths stuaton, we had to modfy the base state selecton algorthm for the CPCS-lke BN2O network. We consder a state of the cluster node X D to be a base f the condtonal probablty of any ndng gven ths state s bgger than a xed parameter. The results for ths base state selecton polcy are gven n Table 1. For a relatve error of 5% we need to account exactly for only 10% of the total number of states, thus reducng the computaton tme of the dagnoss ten tmes. These smple examples show that a large state space of a node can be managed by havng many smlar states n practcal problems, and thus the large szes of the clques n the jon tree can be managed by ntroducng smlarty between states. Gven that the state spaces of the jon tree nodes can be very large, we are lkely to nd exponentally many states that can be aggregated
7 count State Space Aggregaton Error Pror Probablty of Smlar States State Space Abstracton Error postve fndngs value of the coeffcent Fgure 3: The combned pror probablty of the smlar states p(xd) and the maxmum relatve errors jp (1) (d )=p (1) (d )j of the posteror dsease probabltes over all possble queres as a functon of d max. All three curves have the same asymptotc behavor. The error n the state space aggregaton method s smaller snce t partally accounts for the probablty mass that s completely gnored n the state space abstracton method. Fgure 4: The dstrbuton of the nosy-or coecents n the practcal CPCS network. Most coecents are close to round numbers (0, 0.2, 0.5, 0.8, or 1). If a coecent s close to zero, the state of the parent node only slghtly aects the probablty of the chld. If a coecent s close to one, the true state of the parent causes the chld to be n the state true also. nto groups, especally f we have some nsghts nto the underlyng problem. 6 RELATION TO PREVIOUS WORK Snce the BN2O networks are practcally mportant, a few approxmate algorthms has been developed dstnctvely for ths type of networks. The Quckscore uses the nosy-or propertes descrbed n Secton 3 to rearrange the summaton of the jont probablty dstrbuton [Heckerman, 1989], makng the probablstc nference exponental n the number of postvely nstantated nodes, not the number of nodes n the rst layer as gven by the drect trangulaton. The TopN algorthm [Henron, 1991] tres to bound the (ratos of) posteror probabltes for the most lkely N dseases by searchng n a subspace of the full probablty dstrbuton for the rst layer nodes. Stochastc smulaton methods [Henron, 1988] have been speccally extended to sample the jont probablty dstrbuton of BN2O networks. The approach taken n ths paper ders from the prevous ones n that we reduce the complexty of probablstc nference by makng approxmatons n the knowledge representaton, not by makng approxmatons n the nference procedure. The reduced and full models take the same amount of space for ther representaton (the number of coecents to completely specfy the dependence s exactly the same), but the reduced model produces results of almost the same qualty n polynomal amount of tme. On the other hand, our approach s close n sprt to the prevously developed TopN and state space abstracton algorthms n that t tres to account for the major probablty mass of the jont probablty dstrbuton exactly, whle makng approxmatons about the rest of the probablty mass. Our method s drectly related to the proposed earler general approach to complexty reducton usng senstvtes nstead of condtonal probabltes. [Kozlov and Sngh, 1995], and n fact was rst derved n terms of senstvtes. In the prevous work we suggested reducng the computatonal complexty of probablstc nference for general networks by reducng the rank of the senstvty matrces by averagng out the columns of the senstvty matrx. It can be shown that assgnng the same value to condtonal probabltes wthout changng the pror probabltes of nodes s equvalent to averagng out senstvty matrx elements over a subset of states. In the case of BN2O networks ths averagng s reduced to dentfyng the smlar subset of the cluster node X D and assgnng the same condtonal probablty to all these states. However, the methods based on senstvtes are lkely to result n a larger class of complexty reducton methods, partcularly for multply-valued nodes where the analyss n terms of tradtonal condtonal probabltes s complcated. 7 SUMMARY AND FUTURE WORK We dene the property of smlarty of states and use t for model reducton. Two states of a node are smlar
8 f the rato between the probabltes of the two states remans constant after any nstantaton of other nodes n the network. We show that the smlarty of states property can be exploted to perform probablstc nference more ecently. The computatonal complexty of probablstc nference n networks wth smlar states s determned by the total number of non-smlar states nstead of the total number of states, and mght be polynomal n the sze of the network f exponentally many states are smlar. We show a relaton between the smlarty of states property and the redundances n the jont probablty dstrbuton. The states are smlar f and only f the correspondng columns n the jont probablty dstrbuton are lnearly dependent. We nd a generc way of dentfyng smlarty of states and enforcng the smlarty property on states that we want to make smlar through condtonal probabltes. Thus, we can reduce computaton tme of probablstc nference by enforcng the smlarty of states n a model. The accuracy of the reduced model s determned by how smlar the states are n the orgnal problem already. We show that the BN2O models can be readly reduced to a model wth exponentally many smlar states, and that the reduced model produces results very close to the orgnal model for all queres of practcal mportance. The proposed method of complexty reducton s related to the developed earler TopN [Henron, 1991] and the state space abstracton [Wellman and Lu, 1994] methods. As n the above methods, we also try to account for the major probablty mass n the jont probablty dstrbuton exactly, but make some approxmatons about the unaccounted-for probablty mass. When the accounted-for probablty mass s substantal, all methods produce almost exact results. However, our method produces superor accuracy as t estmates the contrbuton from the rest of the probablty mass and performs better on real networks. The model reducton descrbed n ths paper can be readly expanded to any other network represented as a cluster tree (a sngly-connected Markov network of cluster nodes). The cluster nodes wll have exponentally many states and many of these states are lkely to be almost smlar. The method can readly be extended by buldng several groups of smlar states per cluster node, thus mprovng the accuracy wthout much computaton overhead. In ths paper we have shown a successful applcaton on two BN2O networks: One randomly generated and the other buld based on the CPCS medcal dagnostc network. For the network we studed, the error can be as lttle as 5% for the reduced problem whle requrng only 10% of the computaton tme needed by the orgnal problem. Further applcatons of the new approach are of course necessary, and we are actvely pursung the applcaton to practcal belef networks and expert systems. Acknowledgments We thank Randy Mller and Unversty of Pttsburgh for supplyng the CPCS network. We also thank Daphne Koller, Malcolm Pradhan, and Lse Getoor for readng the manuscrpt and valuable comments, John Hennessy for hs support and gudance, and ARPA for nancal support under contract no. N C References Cooper, G. (1990). The computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence, 42:393 { 405. Dagum, P. and Luby, M. (1993). Approxmatng probablstc nference n Bayesan belef networks s NP-hard. Artcal Intellgence, 60:141 { 153. D'Ambroso, B. (1994). Symbolc probablstc nference n large BN2O networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 128 { 135. Heckerman, D. and Breese, J. S. (1994). A new look at causal ndependence. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 286 { 292. Heckerman, D. E. (1989). A tractable nference algorthm for dagnosng multple dseases. In Proceedngs of the Ffth Conference on Uncertanty n Artcal Intellgence, pages 174 { 181. Heckerman, D. E. (1990). Probablstc smlarty networks. Networks, 20:607 { 636. Henron, M. (1988). Propagatng uncertanty n bayesan networks by probablstc logc samplng. In Proceedngs of the Second Conference on Uncertanty n Artcal Intellgence, pages 149 { 163. Henron, M. (1991). Search-based methods to bound dagnostc probabltes n very large belef nets. In Proceedngs of the Seventh Conference on Uncertanty n Artcal Intellgence, pages 142 { 150. Kozlov, A. V. and Sngh, J. P. (1995). Senstvtes: an alternatve to condtonal probabltes for Bayesan belef networks. In Proceedngs of the Eleventh Conference on Uncertanty n Artcal Intellgence, pages 376 { 385. Parker, R. C. and Mller, R. A. (1987). Usng causal knowledge to create smulated patent cases: the CPCS project as an extenson of Internst-1. In Proceedngs of the 11th Annual Symposum on Computer Applcatons n Medcal Care, pages 473 { 480. IEEE Comp Soc Press. Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann. Pradhan, M., Provan, G., Mddleton, B., and Henron, M. (1994). Knowledge engneerng for large belef networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 484 { 490. Wellman, M. P. and Lu, C.-L. (1994). State-space abstracton for anytme evaluaton of probablstc networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 567 { 574.
9 References [Cooper, 1990] Cooper, G. (1990). The computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence, 42:393 { 405. [Dagum and Luby, 1993] Dagum, P. and Luby, M. (1993). Approxmatng probablstc nference n Bayesan belef networks s NP-hard. Artcal Intellgence, 60:141 { 153. [D'Ambroso, 1994] D'Ambroso, B. (1994). Symbolc probablstc nference n large BN2O networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 128 { 135. Morgan Kaufmann. [Heckerman and Breese, 1994] Heckerman, D. and Breese, J. S. (1994). A new look at causal ndependence. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 286 { 292. Morgan Kaufmann. [Heckerman, 1989] Heckerman, D. E. (1989). A tractable nference algorthm for dagnosng multple dseases. In Uncertanty n Artcal Intellgence: Proceedngs of the Ffth Conference, pages 174 { 181. [Heckerman, 1990] Heckerman, D. E. (1990). Probablstc smlarty networks. Networks, 20:607 { 636. [Henron, 1988] Henron, M. (1988). Propagatng uncertanty n bayesan networks by probablstc logc samplng. In Proceedngs of the Second Conference on Uncertanty n Artcal Intellgence, pages 149 { 163. [Henron, 1991] Henron, M. (1991). Search-based methods to bound dagnostc probabltes n very large belef nets. In Uncertanty n Artcal Intellgence: Proceedngs of the Seventh Conference, pages 142 { 150. [Kozlov and Sngh, 1995] Kozlov, A. V. and Sngh, J. P. (1995). Senstvtes: an alternatve to condtonal probabltes for Bayesan belef networks. In Besnard, P. and Hanks, S., edtors, Proceedngs of the Eleventh Conference on Uncertanty n Artcal Intellgence, pages 376 { 385. Morgan Kaufmann. [Parker and Mller, 1987] Parker, R. C. and Mller, R. A. (1987). Usng causal knowledge to create smulated patent cases: the CPCS project as an extenson of Internst-1. In Proceedngs of the 11th Annual Symposum on Computer Applcatons n Medcal Care, pages 473 { 480. IEEE Comp Soc Press. [Pearl, 1988] Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann. [Pradhan et al., 1994] Pradhan, M., Provan, G., Mddleton, B., and Henron, M. (1994). Knowledge engneerng for large belef networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 484 { 490. Morgan Kaufmann. [Wellman and Lu, 1994] Wellman, M. P. and Lu, C.- L. (1994). State-space abstracton for anytme evaluaton of probablstc networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 567 { 574. Morgan Kaufmann.
CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.
CIS587 - Artfcal Intellgence Bayesan Networks KB for medcal dagnoss. Example. We want to buld a KB system for the dagnoss of pneumona. Problem descrpton: Dsease: pneumona Patent symptoms (fndngs, lab tests):
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More informationBayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta
Bayesan Networks Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Example Burglar alarm at home Farly relable at detectng
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationThe optimal delay of the second test is therefore approximately 210 hours earlier than =2.
THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationComputational Biology Lecture 8: Substitution matrices Saad Mneimneh
Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationNUMERICAL DIFFERENTIATION
NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the
More informationReasoning under Uncertainty
Reasonng under Uncertanty Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Handlng uncertan knowledge p p Symptom(p, Toothache
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More information2.3 Nilpotent endomorphisms
s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationChapter 5 Multilevel Models
Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationFREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,
FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationO-line Temporary Tasks Assignment. Abstract. In this paper we consider the temporary tasks assignment
O-lne Temporary Tasks Assgnment Yoss Azar and Oded Regev Dept. of Computer Scence, Tel-Avv Unversty, Tel-Avv, 69978, Israel. azar@math.tau.ac.l??? Dept. of Computer Scence, Tel-Avv Unversty, Tel-Avv, 69978,
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationECEN 5005 Crystals, Nanocrystals and Device Applications Class 19 Group Theory For Crystals
ECEN 5005 Crystals, Nanocrystals and Devce Applcatons Class 9 Group Theory For Crystals Dee Dagram Radatve Transton Probablty Wgner-Ecart Theorem Selecton Rule Dee Dagram Expermentally determned energy
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationLecture 2: Gram-Schmidt Vectors and the LLL Algorithm
NYU, Fall 2016 Lattces Mn Course Lecture 2: Gram-Schmdt Vectors and the LLL Algorthm Lecturer: Noah Stephens-Davdowtz 2.1 The Shortest Vector Problem In our last lecture, we consdered short solutons to
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationPhysics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1
P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the
More information5 The Rational Canonical Form
5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationA New Evolutionary Computation Based Approach for Learning Bayesian Network
Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang
More information1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations
Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More informationTHE SUMMATION NOTATION Ʃ
Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationCS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016
CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng
More informationPsychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationCase A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.
THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty
More informationA new construction of 3-separable matrices via an improved decoding of Macula s construction
Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationWorkshop: Approximating energies and wave functions Quantum aspects of physical chemistry
Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department
More informationDiscussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek
Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson
More informationBayesian belief networks
CS 1571 Introducton to I Lecture 24 ayesan belef networks los Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square CS 1571 Intro to I dmnstraton Homework assgnment 10 s out and due next week Fnal exam: December
More informationDensity matrix. c α (t)φ α (q)
Densty matrx Note: ths s supplementary materal. I strongly recommend that you read t for your own nterest. I beleve t wll help wth understandng the quantum ensembles, but t s not necessary to know t n
More informationUncertainty and auto-correlation in. Measurement
Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at
More informationStanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7
Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every
More informationThis column is a continuation of our previous column
Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard
More informationPower law and dimension of the maximum value for belief distribution with the max Deng entropy
Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng
More informationSnce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t
8.5: Many-body phenomena n condensed matter and atomc physcs Last moded: September, 003 Lecture. Squeezed States In ths lecture we shall contnue the dscusson of coherent states, focusng on ther propertes
More informationNP-Completeness : Proofs
NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More informationJournal of Universal Computer Science, vol. 1, no. 7 (1995), submitted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Springer Pub. Co.
Journal of Unversal Computer Scence, vol. 1, no. 7 (1995), 469-483 submtted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Sprnger Pub. Co. Round-o error propagaton n the soluton of the heat equaton by
More informationDynamical Systems and Information Theory
Dynamcal Systems and Informaton Theory Informaton Theory Lecture 4 Let s consder systems that evolve wth tme x F ( x, x, x,... That s, systems that can be descrbed as the evoluton of a set of state varables
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationSystem in Weibull Distribution
Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co
More informationChapter 6. Supplemental Text Material
Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More information11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]
Algorthms Lecture 11: Tal Inequaltes [Fa 13] If you hold a cat by the tal you learn thngs you cannot learn any other way. Mark Twan 11 Tal Inequaltes The smple recursve structure of skp lsts made t relatvely
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationLow Complexity Soft-Input Soft-Output Hamming Decoder
Low Complexty Soft-Input Soft-Output Hammng Der Benjamn Müller, Martn Holters, Udo Zölzer Helmut Schmdt Unversty Unversty of the Federal Armed Forces Department of Sgnal Processng and Communcatons Holstenhofweg
More informationPopulation element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)
Chapter 1 Samplng wth Unequal Probabltes Notaton: Populaton element: 1 2 N varable of nterest Y : y1 y2 y N Let s be a sample of elements drawn by a gven samplng method. In other words, s s a subset of
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationCanonical transformations
Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,
More informationChapter 8 Indicator Variables
Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationFormulas for the Determinant
page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use
More informationStatistics II Final Exam 26/6/18
Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the
More information