Accepted for the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) August 1-3, 1996, Portland, Oregon, USA

Size: px
Start display at page:

Download "Accepted for the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96) August 1-3, 1996, Portland, Oregon, USA"

Transcription

1 Computatonal complexty reducton for BN2O networks usng smlarty of states Alexander V. Kozlov Department of Appled Physcs Stanford Unversty Stanford, CA phone: (415) e-mal: Abstract Although probablstc nference n a general Bayesan belef network s an NP-hard problem, computaton tme for nference can be reduced n most practcal cases by explotng doman knowledge and by makng approxmatons n the knowledge representaton. In ths paper we ntroduce the property of smlarty of states and a new method for approxmate knowledge representaton and nference whch s based on ths property. We dene two or more states of a node to be smlar when the rato of ther probabltes, the lkelhood rato, does not depend on the nstantatons of the other nodes n the network. We show that the smlarty of states exposes redundances n the jont probablty dstrbuton whch can be exploted to reduce the computaton tme of probablstc nference n networks wth multple smlar states, and that the computatonal complexty n the networks wth exponentally many smlar states mght be polynomal. We demonstrate our deas on the example of a BN2O network a two layer network often used n dagnostc problems by reducng t to a very close network wth multple smlar states. We show that the answers to practcal queres converge very fast to the answers obtaned wth the orgnal network. The maxmum error s as low as 5% for models that requre only 10% of the computaton tme needed by the orgnal BN2O model. 1 INTRODUCTION A Bayesan belef network s a drected acyclc graph (DAG) whose nodes represent random varables and whose edges represent dependences between the random varables. Belef networks are used for knowledge representaton n dagnostc and forecastng software systems. Belef networks allow the user to answer queres about the probabltes of the states of one or Jaswnder Pal Sngh Department of Computer Scence Prnceton Unversty Prnceton, NJ phone: (609) e-mal: jps@cs.prnceton.edu several nodes, called query nodes, condtoned on other nodes, called evdence nodes. The process of ndng these condtoned probabltes s called probablstc nference. Probablstc nference s NP-hard for a general network wth an arbtrary structure [Cooper, 1990]. Furthermore, even approxmatng nference n a general belef network s NP-hard [Dagum and Luby, 1993]. However, knowledge of the problem doman can help to reduce computaton tme of probablstc nference. For example, Heckerman showed that probablstc nference s lnear n two-layer networks wth nosy-or nteracton between nodes (BN2O networks) for negatve evdence about nodes [Heckerman, 1989], and Heckerman and Breese showed that the nosy-or nteracton between nodes can be further smpled to reduce the number of parents per node whch reduces computatonal complexty for networks wth specal structure [Heckerman and Breese, 1994]. Thus, the computatonal complexty of probablstc nference can be managed n specal cases. In ths paper we propose a new way of smplfyng probablstc nference n belef networks based on the property of smlarty of states. Two or more states of a node are smlar when the lkelhood rato does not depend on the nstantatons of the other nodes n the network. The probablty of one these states determnes the probabltes of all states smlar to t. If a model contans states that are almost smlar, we can force the states to be smlar and make probablstc nference less computatonally expensve. If we can make exponentally many states smlar, the resultng computatonal complexty of probablstc nference n the new model s polynomal n the sze of the network. We call the new method state space aggregaton snce we explctly aggregate states nto groups to make probablstc nference wth them as wth a group. We demonstrate the new method on two examples of BN2O networks: One s a randomly generated BN2O network and the other s a BN2O network wth nosy-or coecents randomly chosen from the practcally mportant CPCS medcal dagnostc network (the orgnal CPCS network was constructed out of a

2 Computer-based Patent Case Smulaton database by R. Parker and R. Mller [Parker and Mller, 1987]). 1 We rst convert a BN2O network to a cluster tree. A general form of the cluster tree for the BN2O network s a \Nave" Bayesan classer wth one large node representng all nodes n the rst layer and many chldren representng nodes n the second layer. Although the resultng cluster node has exponentally many states, we can aggregate most of these states nto a group of smlar states. We then make probablstc nference wth these states as a group. We show that the resultng model provdes a good estmate of the probablstc nference results as compared to the orgnal BN2O model and s better than the state abstracton method [Wellman and Lu, 1994] used for approxmate nference. The paper s organzed as follows. In Secton 2 we clarfy the notatons and conventons we use throughout the paper. In Secton 3 we ntroduce BN2O networks and revew the approaches to make probablstc nference n them tractable. In Secton 4 we dene the property of smlarty and develop our approach based on the modcaton of the orgnal network to make a large subset of states smlar. In Secton 5 we demonstrate our deas on the example of our randomly generated BN2O networks. In Secton 6 we dscuss the relaton of the current technque to prevous work. Fnally, we conclude n Secton 7. 2 NOTATIONS In ths paper, we use a specal notaton for a set of smple nodes n the orgnal network. Thus, whle we denote an ndvdual smple node by a small letter x, wth a possble subscrpt when further dstncton s requred, we denote a set of nodes by a captal letter X, agan wth a possble subscrpt. We call a set of smple nodes a cluster node snce we can always represent a set of nodes as a sngle node whch takes values from an expanded set of values, a set obtaned by takng a drect product of the sets of values for the orgnal nodes. A superscrpt of X, f t appears, denotes a partcular state of the cluster node (.e. a partcular combnaton of states of the smple nodes n the set). We denote the number of smple nodes n the cluster node as N (X) and the sze of the state space as jxj. In the networks consdered n ths paper (BN2O networks) all ndvdual nodes are bnary,.e. can take only two values false and true. For the bnary nodes, we assume that f x s n the state false, then the varable x s zero, and f x s n the state true, then the varable x s one. We assume a short form p() to denote p(x = false) and p() to denote p(x = true). For the cluster node, we assume a short form p(x s) to 1 The CPCS network s actually not qute a two-layer network. Whle several BN2O networks are smlar to CPCS and are used n practce, they are propretary and were not accessble for ths paper. denote p(x = X s ). Before we ntroduce the smlarty of states property, let us consder BN2O networks further. 3 BN2O NETWORKS A BN2O network s a two-layer network consstng of bnary nodes wth a nosy-or dependence between the nodes n the rst and the second layers. Let us take a medcal dagnostc network as an example. The nodes n the rst layer are dseases and the nodes n the second layer are symptoms (called ndngs n the medcal lterature) n ths network. The nodes n the rst layer (dseases) have one or several chldren n the second layer (ndngs). The nosy-or nteracton between the dseases and ndngs descrbes a causal ndependence assumpton,.e. that the ablty of any sngle dsease to cause a gven symptom does not depend on the presence of the other dseases [Pearl, 1988]. A nosy-or dependence between a ndng node x f and ts n parent dsease nodes x dj can be characterzed by (n + 1) real numbers from the nterval [0; 1]: a leak and n coecents. The leak, whch we denote Leak(f ), s the probablty of the ndng node n the absence of any of the dseases descrbed by the network. A coecent, whch we denote c j, descrbes the ablty of a dsease d j to cause an ncrease n the probablty p(f ) of the ndng f. More precsely, the probablty of the ndng beng absent p(f ) s multpled by (1? c j ) each tme a parent x dj of the node x f changes ts state from false to true. We can wrte the nosy-or condtonal probablty n a closed form: p(f jx d1 ; x d2 ; : : : ; x dn ) = (1) [1? Leak(f )] Y j [1? c j x dj ]; p(f jx d1 ; x d2 ; : : : ; x dn ) = (2) 1? [1? Leak(f )] Y j [1? c j x dj ]; where we used our conventon that x dj s zero f the node x dj s n the state false and one f t s n the state true. Trvally, f c j s zero, the state of the parent does not aect the probablty of the chld, and f c j s one, the true state of the parent forces the chld to be true wth probablty one. An extenson of the nosy-or nteracton to multply-valued nodes s possble and s called nosy-max [Pradhan et al., 1994]. In the nosy- MAX nteracton relatons dentcal to (1) and (2) hold for the combned probablty of the rst k states of the chld node. BN2O networks have been an object of study for a long tme due to ther potental applcablty and due to the exstence of a compact form for the nosy-or condtonal probabltes. Knowledge acquston for BN2O network s also smpled snce an expert has to assess only a small number of parameters lnear n the number of parents to completely characterze the

3 dependence. Also, probablstc nference wth BN2O networks has polynomal complexty n some specal cases. For example, for negatve ndngs the nodes n the rst layer reman condtonally ndependent and the dsease probabltes can be obtaned by the summaton of the probablty dstrbuton for the dsease nodes: p(x d1 ; x d2 ; : : : ; x dn ) = Q [1? Leak(f )] Q j [1? c jx dj ]p(x dj ); (3) where the rst product s over the negatvely nstantated ndngs. The computatonal complexty of the probablstc nference wth negatve evdence s lnear n the sze of the network snce the above probablty dstrbuton (3) s easly decomposable nto factors of the form [1? c k x dk ]p(x dk ). The new probabltes of the dsease d k after nstantaton are: p (1) (d k ) = p (1) (d k ) = p(d k ) Q [1? c k]p(d k ) p(d k ) + Q [1? c k]p(d k ) : p(d k ) + Q [1? c k]p(d k ) ; Inference for postve evdence about nodes s more nvolved. We have to evaluate the sums over: p(x d1 ; x d2 ; : : : ; x dn ) = Q 1? [1? Leak(f )] Q j [1? c jx dj ]p(x dj ) where the rst product s over the postvely nstantated ndngs. The probablty of each of the dseases cannot be taken out of the summaton easly for ths sum and we have to evaluate t by expandng the outermost product n the above expresson nto a sum of products: p(x d1 ; x d2 ; : : : ; x dn ) = P P h x X E (?1) f Q [1 :x f =1? Leak(f )] Q [1 j? c jx dj ]p(x dj ) ; (4) where we sum over all possble nstantaton of the cluster node X E consstng of the evdence nodes. Each of the terms n the sum (4) has structure dentcal to (3) and ts contrbuton to the dsease probabltes can be computed n lnear tme. However, the total number of terms n the sum s exponental n the number of evdence nodes, and the computatonal complexty of probablstc nference wth postve evdence s exponental n the number of evdence nodes [Heckerman, 1989]. The same complexty results can be obtaned by consderng topologcal transformatons of networks wth nosy-or nteractons [Heckerman and Breese, 1994]. The transformaton can reduce the number of parents for a node n a network n specal cases and s computatonally equvalent to the above decomposton of the jont probablty sums (4). ; Besdes these smplcatons, probablstc nference n a general BN2O network (as well as n the practcal QMR-DT network [D'Ambroso, 1994]) s computatonally ntractable for general postve evdence. In the practcal QMR-DT network, users have to apply heurstc search or stochastc smulaton methods to obtan approxmate results. The above methods are unpredctable and sometmes fal to produce a satsfactory error bound on the result n tme crtcal stuatons. In ths case, we need to smplfy the orgnal model so that t determnstcally produces a satsfactory answer n a known xed amount of tme. Let us now ntroduce smlarty of states whch we wll use later to construct a model wth polynomal complexty that s very close to the BN2O model. 4 SIMILARITY OF STATES In ths secton, we ntroduce the smlarty of states property, whch we use later for our model reducton. The smlarty of states s a form of ndependence n the network. It exposes a redundancy n the jont probablty dstrbuton and therefore can be used to make probablstc nference faster. Denton 4.1: The states X s and Xs0 of a node X are smlar wth respect to a node X j f the rato of the probabltes p(x s )=p(x s0 ) s nvarant wth respect to any nstantaton of the node X j : p (k) (X s ) p (k) (X s0 ) = p(x s ) p(x s0 ) = const (5) for all nstantatons of Xj k of the node X j. We call the two states of a node smply smlar f the two states are smlar wth respect to all the other nodes. The probablty of one of the smlar states determnes the probabltes of the others. Let us take an example of car dagnoss. Gven that a car doesn't start, the fact the the fuel tank s full ncreases the probablty that the one of the spark plugs doesn't work. However, we can treat the probabltes that each one of the spark plugs faled as smlar. Unless we look under the hood, the probablty that one of the spark plugs has faled determnes the probabltes of the falure of any other spark plug. The lkelhood rato of the spark falure probabltes stays the same. Ths ndependence nformaton was brllantly used for the constructon of smlarty networks [Heckerman, 1990]. A smlarty network s a constructon consstng of a smlarty graph and a collecton of local knowledge maps correspondng to each edge n the smlarty graph. Smlarty networks were developed to smplfy the constructon of large and complex belef networks. They are a result of recognzng \specc forms of condtonal ndependence" and developng specal representatons for them that smplfy the knowledge acquston. To buld a smlarty network, we rst pck a dstngushed node representng

4 the hypotheses to be chosen from (for example, n a medcal dagnoss problem, the hypotheses are the dseases). A node n a smlarty graph s a hypothess, and an edge ndcates two hypotheses that are lkely to be confused by an expert. For each such par of hypotheses, we buld a local knowledge map. A local knowledge map s a belef network for dstngushng between these two hypotheses. By focusng on constructng local knowledge maps, a person can concentrate on one manageable porton of the modelng task at a tme. Our goal, on the other hand, s to smplfy probablstc nference n complex belef networks. We do ths by dentfyng redundances n the jont probablty dstrbuton. The redundancy consdered n ths paper s the smlarty of states and s related to the same \specc forms of condtonal ndependence" as the smlarty networks developed earler. The local knowledge maps constructed by a knowledge engneer mght, n fact, be used for dentfyng smlar states. A local knowledge map would contan all nodes wth respect to whch the par of hypotheses s not smlar. In our denton, the concept of smlarty of states s more general and can be appled to any node n the network, not just the dstngushed node representng the hypotheses to be chosen from. We demonstrate how ths concept can be appled for model reducton n the case of BN2O networks. The followng theorem shows the redundancy n the jont probablty dstrbuton ntroduced by the smlarty of states: Theorem 4.2: The states X s and X s0 of a node X are smlar wth respect to a node X j the columns of the condtonal probablty matrx (P j ) qp = p(x q j jxp) correspondng to these two states s and s 0 are dentcal the columns of the probablty dstrbuton matrx (D j ) qp = p(x q j ; Xp ) correspondng to the states s and s 0 are lnearly dependent. The proof of the theorem s easy, and follows from the decomposton of the probablty dstrbuton for the two nodes nto a product (D j ) qp = p(x q j jxp )p(xp). In ths form, any postve nstantaton of the node X j n the state Xj k can be represented as removng all rows except k from the matrx D j. After the remanng row s normalzed, the probablty p(x p ) can be read o from ths row. Any negatve nstantaton of the node X j n the state Xj k can be represented as removng the row k from the matrx D j. After the remanng probablty dstrbuton s normalzed, the probablty p(x p ) can be obtaned by summng all numbers n p- th column. The proof s obtaned by consderng all possble nstantatons of X j. Thus, the smlarty of states uncovers a redundancy n the jont probablty dstrbuton. In lnear algebra terms, two or several columns of the matrx representng the jont probablty dstrbuton are lnearly dependent f the correspondng states are smlar. If the columns are close to lnearly dependent, we can approxmate the jont probablty dstrbuton to make the states smlar n order to smplfy probablstc nference. The theorem shows how we can ntroduce the smlarty of states va condtonal probabltes. We aggregate some of the states wth almost dentcal condtonal probablty matrx columns and force them to be smlar by assgnng the same column to every one of these states. Although for a general jont probablty dstrbuton the computatonal complexty of probablstc nference s lnear n the total number of states per any gven node, the computatonal complexty of probablstc nference wth smlar states s lnear only n the total number of states that are not smlar. By constructng models wth exponentally many smlar states we can reduce computatonal complexty from exponental to polynomal n some networks. In the next secton we show that the precson of the reduced model as compared to the orgnal model can be qute satsfactory n these cases. 5 EXAMPLE OF STATE SPACE AGGREGATION We demonstrate the applcaton of our state space aggregaton method on the example of BN2O networks. We assume that a BN2O network has n 1 bnary nodes n the rst layer and n 2 bnary nodes n the second layer, and that every node x d n the rst layer s connected to every node x fj n the second layer. 2 Frst, we descrbe our procedure and then compare the results for our reduced model to the results of a full BN2O network. 5.1 FORMALISM We proceed by combnng all nodes n the rst layer nto one large cluster node X D representng all possble dseases and ther combnatons. Node X D has an exponental number of states 2 n1, and we hope that some of these states can be made smlar. We therefore partton the 2 n1 states nto two subsets: One s the subset of N b base states, whch we denote as S b, and the other s the subset of N = jx D j? N b smlar states, whch we denote as S. We wll explot derent strateges for choosng the subset of states whch we force to be smlar (see subsectons 5.2 and 5.3). Accordng to the denton of smlar states (5), the contrbuton to the dsease probabltes from the set 2 Although sparse nterconnecton reduces the applcablty of ths method compared to the methods based on topologcal decomposton [D'Ambroso, 1994, Heckerman and Breese, 1994], we wll show that state space aggregaton produces satsfactory results even for sparse BN2O networks. A combnaton of the topologcal method and methods based on state space aggregaton s dentely possble but not consdered n ths paper.

5 of smlar states S s a constant factor tmes the combned probablty of the smlar states p(x ). The D posteror probablty of a dsease s then computed as: p (1) (d ) = X X s D 2Sb p (1) (X s D)x d + (d )p (1) (X D); where the rst term s the contrbuton to the dsease probablty from the base states and the second s the contrbuton to the dsease probablty form the smlar states. The coecents (d ) can be obtaned from the pror probabltes: (d ) = p(d )? P s2s b p(x s D )x d 1? P s2s b p(x s D ) ; and are computed n lnear tme gven the condtonally ndependent rst layer nodes n the orgnal network. The condtonal probabltes for the base states match the condtonal probabltes of the correspondng states n the orgnal BN2O model. The condtonal probablty for the smlar states whch has to be dentcal for every smlar state s chosen to preserve the pror probabltes of the ndngs: p(f j jx D) = p(f j)? P s2s b p(f j jx s D )p(xs D ) 1? P s2s b p(x s D ) : (6) The last equaton s an applcaton of the Bayes' rule for a ndng node x fj and the aggregated smlar state. As we can see, the computaton to transform the model to a reduced model nvolves smple summatons over the base states. The computatonal requrements n ths state aggregaton model are thus the same as n the state space abstracton model [Wellman and Lu, 1994] n whch the answer to a query s nferred by summng over the base states only and completely gnorng the rest of the states. Our model accounts for some of the gnored probablty mass va the coecents (d ). Let us now see how the state space aggregaton model helps to ncrease the precson of probablstc nference n BN2O networks. 5.2 RANDOMLY GENERATED BN2O NETWORKS To demonstrate how the state aggregaton model can help mprove the precson of the model and reduce the computaton tme of probablstc nference, we rst generated a BN2O network wth randomly chosen nosy-or coecents drawn from a Beta(2; 4) dstrbuton (wth the expected value hc j = 1=3). If a large number of nodes n the rst layer are n the state true, we expect that the probablty of any ndng s close to one. Thus, we make smlar all states of the cluster node X D n whch the number of dseases present s larger than some xed number d max : X s D 2 S f X N (X D ) =0 x d d max : The number of base states n ths case s polynomal n the number of the rst layer nodes: N b = dx max =0 n 1 1 A n d max 1 ; for d max > 1, and the reducton of the orgnal BN2O model to the model wth the aggregated smlar states has polynomal complexty. Frst, we analyzed the maxmum absolute error for the queres about the probablty of each of the dseases (nodes n the rst layer) gven derent nstantatons wth derent number of postve ndngs (nodes n the second layer). The results of the smulatons for the BN2O network consstng of 18 nodes n the rst layer and 18 nodes n the second layer are presented n Fgures 1 and 2 (for the state space abstracton and the state space aggregaton models respectvely). The error n the state space aggregaton model s much smaller (about an order of magntude for hgh d max ) than the error n the state space abstracton model, where the set S s completely dsregarded. Also, the error for the nstantatons wth the large number of ndngs present the regon where the probablstc nference s computatonally very expensve s almost ndependent of the number of dseases present. Snce the errors ntroduced by the nstantatons of derent ndngs can be consdered ndependent, we expect the maxmum error to be O( p N (X E )),.e. to grow as a square root of the number of nstantated nodes. For our network, the maxmum absolute error s less than 0:005 for d max > 5 over all possble nstantatons of the nodes n the second layer. Second, we analyzed the behavor of the maxmum relatve error n the above network. The relatve error as opposed to the absolute errors mght be more mportant for some problems. For example, the probablty of a lfe-threatenng dsease beng 10?3 s substantally better than the probablty of t beng 10?2, and the relatve error of 10 shows ths more clearly than the absolute error of Fgure 3 shows the maxmum relatve errors for the above models over all possble nstantatons of the second layer nodes. The error n the state space aggregaton method s about an order of magntude lower than n the state space abstracton method for hgh d max. Our method gves superor precson as t partally accounts for the states completely gnored n the state space abstracton method. The error decreases as the combned pror probablty of the smlar states (.e. before any nstantaton), whch s shown on the same plot. The maxmum relatve error s less than 0:01 for d max > 6 over all possble nstantatons of the nodes n the second layer. 5.3 CPCS-LIKE NETWORKS Although the above generated networks do not have the structure that real practcal networks have, the state space aggregaton method can be extended to

6 max absolute error max absolute error max dseases num nstantatons max dseases num nstantatons Fgure 1: The maxmum absolute error of the answer to a query about a dsease probablty for the state space abstracton method adopted from [Wellman and Lu, 1994]. The maxmum error was found by an exhaustve search over all possble postve ndng nstantatons. The computatonal complexty of probablstc nference s O(n dmax 1 ). Fgure 2: The maxmum absolute error of the answer to a query about a dsease probablty for the reducton based on state space aggregaton. The maxmum error was found by an exhaustve search over all possble postve ndng nstantatons. The computatonal complexty of probablstc nference s stll O(n dmax 1 ) as n the state space abstracton method. practcal problems gven some nsght nto the problem doman. For example, the rule for selectng the base states above can be formulated n the doman language: If the number of dseases present s large, then wth a hgh probablty a patent has any magnable ndng present. Thus, the states are almost smlar already and we do not change the condtonal probabltes much by forcng them to be smlar. We can also argue that the cases wth more than a certan number of dseases present occur rarely n practce and are not mportant for dagnoss (have a low combned probablty mass). Although the valdty of these specc rules mght be arguable n the medcal doman, rules of these type can dentely lead to state space aggregaton and smplcaton of probablstc nference. The smlarty of states s present already n many practcal networks. For nstance, Fgure 4 shows that the majorty of the nosy-or coecents n the CPCS network are concentrated close to round numbers lke 0, 0.2, 0.5, 0.8, or 1 snce further precson s not necessary for the dagnostc problem at hand. Besdes, our study of the nosy-or dstrbutons n ths network show that n many cases the coecents are equal exactly, mplyng the approprate redundancy n the jont probablty dstrbuton. Identcaton of these smlar states, however, s best done by a doman expert. To study the eect of structure on the precson of the state space aggregaton and to demonstrate another rule for choosng the set of base states, we bult a BN2O network wth coecents drawn randomly from the set of real nosy-or coecents n the CPCS network. The presence of the nosy-or coecents that are close to zero or one whch consttute about 50% of the total number of nosy-or coecents n the CPCS network makes the state aggregaton more complex and requres a better algorthm for the selecton of smlar states. Consder the states wth the number of dseases present equal to d max as n the prevous subsecton. A subset of d max dseases mght no longer cause a ndng f the coecents for ths subset are close to zero. The condtonal probablty of the ndng s no longer close to one, and ncludng ths state n the set of smlar states and alterng the correspondng condtonal probablty of the ndng can aect the accuracy of the network. Table 1: Errors n the CPCS-lke BN2O network N b =jx D j max abs error max rel error % 4:4 10?2 1: % 6:2 10?3 5:4 10? % 5:8 10?4 1:0 10? % 1:3 10?4 1:7 10?3 To cope wth ths stuaton, we had to modfy the base state selecton algorthm for the CPCS-lke BN2O network. We consder a state of the cluster node X D to be a base f the condtonal probablty of any ndng gven ths state s bgger than a xed parameter. The results for ths base state selecton polcy are gven n Table 1. For a relatve error of 5% we need to account exactly for only 10% of the total number of states, thus reducng the computaton tme of the dagnoss ten tmes. These smple examples show that a large state space of a node can be managed by havng many smlar states n practcal problems, and thus the large szes of the clques n the jon tree can be managed by ntroducng smlarty between states. Gven that the state spaces of the jon tree nodes can be very large, we are lkely to nd exponentally many states that can be aggregated

7 count State Space Aggregaton Error Pror Probablty of Smlar States State Space Abstracton Error postve fndngs value of the coeffcent Fgure 3: The combned pror probablty of the smlar states p(xd) and the maxmum relatve errors jp (1) (d )=p (1) (d )j of the posteror dsease probabltes over all possble queres as a functon of d max. All three curves have the same asymptotc behavor. The error n the state space aggregaton method s smaller snce t partally accounts for the probablty mass that s completely gnored n the state space abstracton method. Fgure 4: The dstrbuton of the nosy-or coecents n the practcal CPCS network. Most coecents are close to round numbers (0, 0.2, 0.5, 0.8, or 1). If a coecent s close to zero, the state of the parent node only slghtly aects the probablty of the chld. If a coecent s close to one, the true state of the parent causes the chld to be n the state true also. nto groups, especally f we have some nsghts nto the underlyng problem. 6 RELATION TO PREVIOUS WORK Snce the BN2O networks are practcally mportant, a few approxmate algorthms has been developed dstnctvely for ths type of networks. The Quckscore uses the nosy-or propertes descrbed n Secton 3 to rearrange the summaton of the jont probablty dstrbuton [Heckerman, 1989], makng the probablstc nference exponental n the number of postvely nstantated nodes, not the number of nodes n the rst layer as gven by the drect trangulaton. The TopN algorthm [Henron, 1991] tres to bound the (ratos of) posteror probabltes for the most lkely N dseases by searchng n a subspace of the full probablty dstrbuton for the rst layer nodes. Stochastc smulaton methods [Henron, 1988] have been speccally extended to sample the jont probablty dstrbuton of BN2O networks. The approach taken n ths paper ders from the prevous ones n that we reduce the complexty of probablstc nference by makng approxmatons n the knowledge representaton, not by makng approxmatons n the nference procedure. The reduced and full models take the same amount of space for ther representaton (the number of coecents to completely specfy the dependence s exactly the same), but the reduced model produces results of almost the same qualty n polynomal amount of tme. On the other hand, our approach s close n sprt to the prevously developed TopN and state space abstracton algorthms n that t tres to account for the major probablty mass of the jont probablty dstrbuton exactly, whle makng approxmatons about the rest of the probablty mass. Our method s drectly related to the proposed earler general approach to complexty reducton usng senstvtes nstead of condtonal probabltes. [Kozlov and Sngh, 1995], and n fact was rst derved n terms of senstvtes. In the prevous work we suggested reducng the computatonal complexty of probablstc nference for general networks by reducng the rank of the senstvty matrces by averagng out the columns of the senstvty matrx. It can be shown that assgnng the same value to condtonal probabltes wthout changng the pror probabltes of nodes s equvalent to averagng out senstvty matrx elements over a subset of states. In the case of BN2O networks ths averagng s reduced to dentfyng the smlar subset of the cluster node X D and assgnng the same condtonal probablty to all these states. However, the methods based on senstvtes are lkely to result n a larger class of complexty reducton methods, partcularly for multply-valued nodes where the analyss n terms of tradtonal condtonal probabltes s complcated. 7 SUMMARY AND FUTURE WORK We dene the property of smlarty of states and use t for model reducton. Two states of a node are smlar

8 f the rato between the probabltes of the two states remans constant after any nstantaton of other nodes n the network. We show that the smlarty of states property can be exploted to perform probablstc nference more ecently. The computatonal complexty of probablstc nference n networks wth smlar states s determned by the total number of non-smlar states nstead of the total number of states, and mght be polynomal n the sze of the network f exponentally many states are smlar. We show a relaton between the smlarty of states property and the redundances n the jont probablty dstrbuton. The states are smlar f and only f the correspondng columns n the jont probablty dstrbuton are lnearly dependent. We nd a generc way of dentfyng smlarty of states and enforcng the smlarty property on states that we want to make smlar through condtonal probabltes. Thus, we can reduce computaton tme of probablstc nference by enforcng the smlarty of states n a model. The accuracy of the reduced model s determned by how smlar the states are n the orgnal problem already. We show that the BN2O models can be readly reduced to a model wth exponentally many smlar states, and that the reduced model produces results very close to the orgnal model for all queres of practcal mportance. The proposed method of complexty reducton s related to the developed earler TopN [Henron, 1991] and the state space abstracton [Wellman and Lu, 1994] methods. As n the above methods, we also try to account for the major probablty mass n the jont probablty dstrbuton exactly, but make some approxmatons about the unaccounted-for probablty mass. When the accounted-for probablty mass s substantal, all methods produce almost exact results. However, our method produces superor accuracy as t estmates the contrbuton from the rest of the probablty mass and performs better on real networks. The model reducton descrbed n ths paper can be readly expanded to any other network represented as a cluster tree (a sngly-connected Markov network of cluster nodes). The cluster nodes wll have exponentally many states and many of these states are lkely to be almost smlar. The method can readly be extended by buldng several groups of smlar states per cluster node, thus mprovng the accuracy wthout much computaton overhead. In ths paper we have shown a successful applcaton on two BN2O networks: One randomly generated and the other buld based on the CPCS medcal dagnostc network. For the network we studed, the error can be as lttle as 5% for the reduced problem whle requrng only 10% of the computaton tme needed by the orgnal problem. Further applcatons of the new approach are of course necessary, and we are actvely pursung the applcaton to practcal belef networks and expert systems. Acknowledgments We thank Randy Mller and Unversty of Pttsburgh for supplyng the CPCS network. We also thank Daphne Koller, Malcolm Pradhan, and Lse Getoor for readng the manuscrpt and valuable comments, John Hennessy for hs support and gudance, and ARPA for nancal support under contract no. N C References Cooper, G. (1990). The computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence, 42:393 { 405. Dagum, P. and Luby, M. (1993). Approxmatng probablstc nference n Bayesan belef networks s NP-hard. Artcal Intellgence, 60:141 { 153. D'Ambroso, B. (1994). Symbolc probablstc nference n large BN2O networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 128 { 135. Heckerman, D. and Breese, J. S. (1994). A new look at causal ndependence. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 286 { 292. Heckerman, D. E. (1989). A tractable nference algorthm for dagnosng multple dseases. In Proceedngs of the Ffth Conference on Uncertanty n Artcal Intellgence, pages 174 { 181. Heckerman, D. E. (1990). Probablstc smlarty networks. Networks, 20:607 { 636. Henron, M. (1988). Propagatng uncertanty n bayesan networks by probablstc logc samplng. In Proceedngs of the Second Conference on Uncertanty n Artcal Intellgence, pages 149 { 163. Henron, M. (1991). Search-based methods to bound dagnostc probabltes n very large belef nets. In Proceedngs of the Seventh Conference on Uncertanty n Artcal Intellgence, pages 142 { 150. Kozlov, A. V. and Sngh, J. P. (1995). Senstvtes: an alternatve to condtonal probabltes for Bayesan belef networks. In Proceedngs of the Eleventh Conference on Uncertanty n Artcal Intellgence, pages 376 { 385. Parker, R. C. and Mller, R. A. (1987). Usng causal knowledge to create smulated patent cases: the CPCS project as an extenson of Internst-1. In Proceedngs of the 11th Annual Symposum on Computer Applcatons n Medcal Care, pages 473 { 480. IEEE Comp Soc Press. Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann. Pradhan, M., Provan, G., Mddleton, B., and Henron, M. (1994). Knowledge engneerng for large belef networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 484 { 490. Wellman, M. P. and Lu, C.-L. (1994). State-space abstracton for anytme evaluaton of probablstc networks. In Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 567 { 574.

9 References [Cooper, 1990] Cooper, G. (1990). The computatonal complexty of probablstc nference usng Bayesan belef networks. Artcal Intellgence, 42:393 { 405. [Dagum and Luby, 1993] Dagum, P. and Luby, M. (1993). Approxmatng probablstc nference n Bayesan belef networks s NP-hard. Artcal Intellgence, 60:141 { 153. [D'Ambroso, 1994] D'Ambroso, B. (1994). Symbolc probablstc nference n large BN2O networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 128 { 135. Morgan Kaufmann. [Heckerman and Breese, 1994] Heckerman, D. and Breese, J. S. (1994). A new look at causal ndependence. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 286 { 292. Morgan Kaufmann. [Heckerman, 1989] Heckerman, D. E. (1989). A tractable nference algorthm for dagnosng multple dseases. In Uncertanty n Artcal Intellgence: Proceedngs of the Ffth Conference, pages 174 { 181. [Heckerman, 1990] Heckerman, D. E. (1990). Probablstc smlarty networks. Networks, 20:607 { 636. [Henron, 1988] Henron, M. (1988). Propagatng uncertanty n bayesan networks by probablstc logc samplng. In Proceedngs of the Second Conference on Uncertanty n Artcal Intellgence, pages 149 { 163. [Henron, 1991] Henron, M. (1991). Search-based methods to bound dagnostc probabltes n very large belef nets. In Uncertanty n Artcal Intellgence: Proceedngs of the Seventh Conference, pages 142 { 150. [Kozlov and Sngh, 1995] Kozlov, A. V. and Sngh, J. P. (1995). Senstvtes: an alternatve to condtonal probabltes for Bayesan belef networks. In Besnard, P. and Hanks, S., edtors, Proceedngs of the Eleventh Conference on Uncertanty n Artcal Intellgence, pages 376 { 385. Morgan Kaufmann. [Parker and Mller, 1987] Parker, R. C. and Mller, R. A. (1987). Usng causal knowledge to create smulated patent cases: the CPCS project as an extenson of Internst-1. In Proceedngs of the 11th Annual Symposum on Computer Applcatons n Medcal Care, pages 473 { 480. IEEE Comp Soc Press. [Pearl, 1988] Pearl, J. (1988). Probablstc Reasonng n Intellgent Systems: Networks of Plausble Inference. Morgan Kaufmann. [Pradhan et al., 1994] Pradhan, M., Provan, G., Mddleton, B., and Henron, M. (1994). Knowledge engneerng for large belef networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 484 { 490. Morgan Kaufmann. [Wellman and Lu, 1994] Wellman, M. P. and Lu, C.- L. (1994). State-space abstracton for anytme evaluaton of probablstc networks. In Lopez de Montara, R. and Poole, D., edtors, Proceedngs of the Tenth Conference on Uncertanty n Artcal Intellgence, pages 567 { 574. Morgan Kaufmann.

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example.

CIS587 - Artificial Intellgence. Bayesian Networks CIS587 - AI. KB for medical diagnosis. Example. CIS587 - Artfcal Intellgence Bayesan Networks KB for medcal dagnoss. Example. We want to buld a KB system for the dagnoss of pneumona. Problem descrpton: Dsease: pneumona Patent symptoms (fndngs, lab tests):

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal

More information

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta Bayesan Networks Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Example Burglar alarm at home Farly relable at detectng

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

The optimal delay of the second test is therefore approximately 210 hours earlier than =2. THE IEC 61508 FORMULAS 223 The optmal delay of the second test s therefore approxmately 210 hours earler than =2. 8.4 The IEC 61508 Formulas IEC 61508-6 provdes approxmaton formulas for the PF for smple

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh Computatonal Bology Lecture 8: Substtuton matrces Saad Mnemneh As we have ntroduced last tme, smple scorng schemes lke + or a match, - or a msmatch and -2 or a gap are not justable bologcally, especally

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Reasoning under Uncertainty

Reasoning under Uncertainty Reasonng under Uncertanty Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Handlng uncertan knowledge p p Symptom(p, Toothache

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

O-line Temporary Tasks Assignment. Abstract. In this paper we consider the temporary tasks assignment

O-line Temporary Tasks Assignment. Abstract. In this paper we consider the temporary tasks assignment O-lne Temporary Tasks Assgnment Yoss Azar and Oded Regev Dept. of Computer Scence, Tel-Avv Unversty, Tel-Avv, 69978, Israel. azar@math.tau.ac.l??? Dept. of Computer Scence, Tel-Avv Unversty, Tel-Avv, 69978,

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

ECEN 5005 Crystals, Nanocrystals and Device Applications Class 19 Group Theory For Crystals

ECEN 5005 Crystals, Nanocrystals and Device Applications Class 19 Group Theory For Crystals ECEN 5005 Crystals, Nanocrystals and Devce Applcatons Class 9 Group Theory For Crystals Dee Dagram Radatve Transton Probablty Wgner-Ecart Theorem Selecton Rule Dee Dagram Expermentally determned energy

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm NYU, Fall 2016 Lattces Mn Course Lecture 2: Gram-Schmdt Vectors and the LLL Algorthm Lecturer: Noah Stephens-Davdowtz 2.1 The Shortest Vector Problem In our last lecture, we consdered short solutons to

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

A New Evolutionary Computation Based Approach for Learning Bayesian Network

A New Evolutionary Computation Based Approach for Learning Bayesian Network Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 4026 4030 Advanced n Control Engneerng and Informaton Scence A New Evolutonary Computaton Based Approach for Learnng Bayesan Network Yungang

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry

Workshop: Approximating energies and wave functions Quantum aspects of physical chemistry Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department

More information

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson

More information

Bayesian belief networks

Bayesian belief networks CS 1571 Introducton to I Lecture 24 ayesan belef networks los Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square CS 1571 Intro to I dmnstraton Homework assgnment 10 s out and due next week Fnal exam: December

More information

Density matrix. c α (t)φ α (q)

Density matrix. c α (t)φ α (q) Densty matrx Note: ths s supplementary materal. I strongly recommend that you read t for your own nterest. I beleve t wll help wth understandng the quantum ensembles, but t s not necessary to know t n

More information

Uncertainty and auto-correlation in. Measurement

Uncertainty and auto-correlation in. Measurement Uncertanty and auto-correlaton n arxv:1707.03276v2 [physcs.data-an] 30 Dec 2017 Measurement Markus Schebl Federal Offce of Metrology and Surveyng (BEV), 1160 Venna, Austra E-mal: markus.schebl@bev.gv.at

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

Snce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t

Snce h( q^; q) = hq ~ and h( p^ ; p) = hp, one can wrte ~ h hq hp = hq ~hp ~ (7) the uncertanty relaton for an arbtrary state. The states that mnmze t 8.5: Many-body phenomena n condensed matter and atomc physcs Last moded: September, 003 Lecture. Squeezed States In ths lecture we shall contnue the dscusson of coherent states, focusng on ther propertes

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Journal of Universal Computer Science, vol. 1, no. 7 (1995), submitted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Springer Pub. Co.

Journal of Universal Computer Science, vol. 1, no. 7 (1995), submitted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Springer Pub. Co. Journal of Unversal Computer Scence, vol. 1, no. 7 (1995), 469-483 submtted: 15/12/94, accepted: 26/6/95, appeared: 28/7/95 Sprnger Pub. Co. Round-o error propagaton n the soluton of the heat equaton by

More information

Dynamical Systems and Information Theory

Dynamical Systems and Information Theory Dynamcal Systems and Informaton Theory Informaton Theory Lecture 4 Let s consder systems that evolve wth tme x F ( x, x, x,... That s, systems that can be descrbed as the evoluton of a set of state varables

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

System in Weibull Distribution

System in Weibull Distribution Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13] Algorthms Lecture 11: Tal Inequaltes [Fa 13] If you hold a cat by the tal you learn thngs you cannot learn any other way. Mark Twan 11 Tal Inequaltes The smple recursve structure of skp lsts made t relatvely

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

Low Complexity Soft-Input Soft-Output Hamming Decoder

Low Complexity Soft-Input Soft-Output Hamming Decoder Low Complexty Soft-Input Soft-Output Hammng Der Benjamn Müller, Martn Holters, Udo Zölzer Helmut Schmdt Unversty Unversty of the Federal Armed Forces Department of Sgnal Processng and Communcatons Holstenhofweg

More information

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH) Chapter 1 Samplng wth Unequal Probabltes Notaton: Populaton element: 1 2 N varable of nterest Y : y1 y2 y N Let s be a sample of elements drawn by a gven samplng method. In other words, s s a subset of

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Canonical transformations

Canonical transformations Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information