Uncertainty processing in FEL-Expert Lecture notes

Size: px

Start display at page:

Download "Uncertainty processing in FEL-Expert Lecture notes"

Stewart Whitehead
5 years ago
Views:

1 Uncertainty processing in FEL-Expert Lecture notes Marek Obitko, 1 Introduction This text describes uncertainty processing in the FEL-Expert system and is intended as lecture notes for introduction into the FEL-Expert system. A theory description in the Czech language is provided in [4]. User manual in Czech language is provided in [7] this is also the ultimate source of description of the current handling of various input values. There is no official definition of the term expert system. However, it can be said, that expert systems are computer programs simulating decision activity of an expert solving complex tasks and using properly coded, explicitly expressed special knowledge taken from the expert, with the goal of achieving in a selected problem area quality of decision at the level of an expert (Feigenbaum, cited in [4]). Expert systems can be divided into three types [4] diagnostic, planning and hybrid expert systems. Task of a diagnostic expert system is to provide diagnosis from the data retrieved from the real world. This type of an expert system is described in detail below. Task of a planning expert system is to find a plan (if possible optimal) consisting from operators (steps) which lead from known start to known stop state and satisfy known restrictions. Planning system has to use its knowledge base to limit number of generated steps leading from the current state and to evaluate found solution. Hybrid expert systems have a combined architecture they use both concepts of diagnostic and planning systems. Examples of hybrid systems are intelligent tutoring systems (diagnosis of a student knowledge and planning of the next teaching process) or monitoring systems (when diagnostic part detects fault the planning part plans repair). Diagnostic expert system provides interpretation of real world data with the goal of selecting the most corresponding goal hypothesis from the set of possible goal hypotheses. The inner current model is updated from retrieved data using the knowledge base by fixed inference engine. Typical architecture of diagnostic expert system is shown in figure 1. In a dialog mode the control mechanism selects questions put to the user and manages updating of the current model. This is repeated until 1

2 Explanation System Data User Knowledge Base Control Mechanism Database General knowledge of a problem Current Model Measuring Devices Data of a particular problem case Figure 1: Block schema of a diagnostic expert system the system selects diagnosis for the consulted case. Process of consultation and updating is described below in detail. The knowledge base describing problem area solved by a diagnostic expert system is often expressed in a form of if then rules and is separated from the control mechanism (see figure 1). The first expert systems (for example DENDRAL, MYCIN, PROSPEC- TOR see [4] for details) were made whole dependent on one problem. From these systems can be extracted the core (the control and inference mechanism) and reused for other problems only by changing the knowledge base. Problem independent empty expert system is called shell. The expert system shell can be used for developing and testing the knowledge base and can thus form expert system for any area. It should be again mentioned that the quality of any expert system depends strongly on its knowledge base. The inference engine (control mechanism) is also important, but the knowledge base is critical. 2 The FEL-Expert System FEL-Expert [5, 2, 4, 6] is a name for family of problem independent expert system shells. The FEL-Expert system in all versions uses probabilistic pseudobayesian approach to handling uncertainty both in the data and in the knowledge base. This approach is adapted from the PROSPECTOR 2

3 system [1]. This section describes the uncertainty handling in the FEL- Expert system. 2.1 Inference Net The fundamental knowledge base for a rule-based diagnostic expert system is expressed by rules. Rules for our expert system are in the following form: if presumption E then conclusion H with probability P(H E) else conclusion H with probability P(H E) These rules can be interpreted in the following way: If the presumption (evidence) E certainly holds then accept the conclusion H with measure (probability) P(H E); if the negation of E certainly holds then accept the conclusion H with measure P(H E). E and H are statements (hypotheses). P(H E) (probability of H when E holds) and P(H E) are subjective conditional probabilities. P(H E) is called sufficiency measure of the rule while P(H E) is called necessity measure of the rule. These measures express uncertainty in the knowledge base. One rule can be expressed in a graphical way as shown in figure 2. E H Figure 2: One rule if E then H ( E H) It should be noted that probability in this context and also in the rest of the thesis has a different meaning than in the classical statistical frequentist sense. The probability here is used to express uncertain knowledge and is subjective and thus has not necessarily to be objective in a statistical meaning, simply because in cases solved by expert systems we are not able to get such a characteristics. If we had such an objective probability we would not have to use expert system and its heuristics. However, to manipulate these probabilities classical statistical methods are often used here. Instead of probability the term pseudoprobability is used often in some publications. The set of the rules introduced above can be represented by a directed graph so that every hypothesis (statement) is represented by one node (vertex) and each rule is represented by one edge see figure 3. This graph is called inference net. Three types of nodes can be found in the inference net: 1. Top (goal) nodes are nodes from which there is no directed edge. These nodes represent so called goal hypotheses (usually possible results of diagnosis). 3

4 H1 H2 H3 E1 E2 E3 E4 Figure 3: Example of an inference net formed by the rules from figure 2 2. Leaf nodes are nodes to which there leads no directed edge. These nodes represent so called leaf statements. The validity of these statements must be always obtained from an observation of the real world. 3. Inner nodes are the other nodes which are neither goal nor leaf. They represent intermediate statements or hypotheses. Every node can be askable or not askable. In the first case (askable node) the user can be asked about the validity of the corresponding statement 1. In the second case (not askable node) it has no sense to ask the user about the corresponding statement. Each node N has assigned a parameter aprior probability P(N) (pseudoprobability subjective probability) of the corresponding hypothesis or statement. Aprior probability of a hypothesis expresses a measure of the validity of the hypothesis before any observation of the real world. Inference net is then weighted directed graph. Every node is weighted by the aprior probability while every edge is weighted by two measures P(H E) and P(H E). During the process of the consultation the probabilities of the hypotheses are updated (recomputed) by applying the Bayes rule (see below). All assigned numerical values are used in this process. The nodes described above are called bayesian nodes because of the name of the formula used for recomputing probabilities of them (see below). Besides this the system FEL-Expert has also a possibility to express knowledge in the form of logical nodes. Using logical nodes it is possible to express logical functions and, or and not. For computing probabilities of logical nodes the formulas from the area 1 Each leaf node must be askable. 4

5 of fuzzy logic are used (see below). Corresponding rules are if E 1 and E 2 then H with P(E 1 &E 2 ) if E 1 or E 2 then H with P(E 1 E 2 ) if not E then H with P( E) There is a possibility to have unlimited number of presumptions (E 1, E 2, E 3,...) in the first two cases. The probabilities of conclusions (H) are computed only from the probabilities of presumptions so the edges to logical nodes do not have assigned any additional numerical values as in the case of edges to bayesian nodes. Bayesian and logical nodes and rules between them form the basic inference net. In addition it is possible to use context or priority links to partially control the process of consultation. The system also offers a possibility of using taxonomy nets which can express a shallow knowledge above the inference net. 2.2 Uncertainty Handling The FEL-Expert system can handle uncertainty both in the knowledge base and in the data from the real world. Uncertainty in the knowledge base is expressed by necessity and sufficiency measures attached to the rules. Uncertainty in the data expresses user s uncertainty in answers to questions put by an expert system (the response can be unsure handling various responses is described in section 2.4). In this section I will derive and describe uncertainity handling using subjective Bayesian (pseudobayesian) approach [1], which is used by the FEL-Expert system Subjective Bayesian Updating Suppose we have a rule if E, then H. Let us begin with the simplified problem of updating the probability of H given its prior value and given that E is observed to be true. By Bayes 2 rule we have P(H k E) = P(E H k) P(H k ) ni=1 P(E H i ) P(H i ) = P(E H k) P(H k ), (1) P(E) P(H E) = P(E H) P(H) P(E) For our purposes we will write the Bayes rule for the negation of H (2) 2 Thomas Bayes, 18 th century P(H E) = P(E H) P(H) P(E) (3) 5

6 By dividing [1] equations 2 and 3 obtain P(H E) P(H E) = P(E H) P(E H) P(H) P(H) (4) Each of the three terms in equation 4 has a traditional interpretation. We define the prior odds on H to be and the posterior odds to be O(H) = P(H) P(H) = O(H E) = P(H E) P(H E) = The likelihood ratio is defined by λ = P(E H) P(E H) P(H) 1 P(H) P(H E) 1 P(H E) (5) (6) (7) so we can rewrite the equation 4 to odds-likelihood formulation of Bayes rule O(H E) = λ O(H) (8) This equation tells us how to update the odds on H given the observation of E. We assume that a human expert has given us the rule and has provided the likelihood ratio λ to indicate the strength of the rule. A high value of λ (λ 1) represents, roughly speaking, the fact that E is sufficient for H, since the observation that E is true will transform indifferent prior odds on H into heavy posterior odds in favor of H. Notice that the underlying probabilities can be recovered from their odds by the simple formula P = O O + 1 so that the odds and the probabilities give exactly the same information. Suppose now that we wish to update the odds on H given that E is observed to be false. As in the equation 8 we write where we define λ by (9) O(H E) = λo(h) (10) λ = P(E H) P(E H) = 1 P(E H) 1 P(E H) (11) Notice that λ must also be provided by the human expert; it cannot be derived from λ. A low value of λ (0 λ 1) represents, roughly speaking, 6

7 the fact, that E is necessary for H, since the observation that E is false will be by equation 10 transform indifferent prior odds on H into odds heavily against H. Curiously, although λ and λ must be separately provided by the expert, they are not completely independent of each other. In particular, equations 7 and 11 yield λ = 1 λ P(E H) 1 P(E H) (12) so that, if we exclude the extreme cases of P(E H) being either 0 or 1, we see that λ > 1 implies λ < 1, and λ < 1 implies λ > 1. Further, we have λ = 1 if and only if λ = 1. This means that if the expert gives a rule such that the presence of E enhances the odds on H (i.e., λ > 1), he should also tell us that the absence of E enhances the odds on H (i.e. λ < 1). This mathematical requirement may violate the intuition. Expert can often say that The presence of E enhances the odds on H, but the absence of E has no significance. In other words, the expert says that λ > 1, but λ = 1. An approach to solve this inconsistency will be discussed below. Also note that the knowledge of both λ and λ is equivalent to knowledge of both P(E H) and P(E H). It follows from equations 7 and 11 that and P(E H) = λ 1 λ λ λ (13) P(E H) = 1 λ λ λ (14) Thus, whether the expert should be asked to provide λ and λ, P(E H) and P(E H), or some other equivalent information, is a psychological rather than a mathematical question. FEL-Expert expects P(H E) and P(H E) to provide this information Uncertain Evidence and Prior Probabilities Having seen how to update the probability of an hypothesis when the evidence is known to be either certainly true or certainly false, let us consider now how updating should proceed when the user of the system is uncertain. We begin by assuming that when a user says I am 70% certain that E is true, he means that P(E relevant observation) = 0.7. We designate by E the relevant observation that he makes, and simply write P(E E ) for the user s response. We now need to obtain an expression for P(H E ). Formally, P(H E ) = P(H,E E ) + P(H,E E ) = 7

8 = P(H E,E ) P(E E ) + P(H E,E ) P(E,E ) (15) We make the reasonable assumption that if we know E to be true (or false), than the observations E relevant to E provide no further information about H. With this assumption, equation 15 becomes P(H E ) = P(H E) P(E E ) + P(H E) P(E E ) (16) Here P(H E) and P(H E) are obtained directly from Bayes rule (equations 8 and 10). If the user is certain that E is true then P(H E ) = P(H E). If the user is certain that E is false then P(H E ) = P(H E). In general, equation 16 gives P(H E ) as a linear interpolation between these two extreme cases. In particular, note that if P(E E ) = P(E) then P(H E ) = P(H). This has the simple interpretation that if the evidence E is no better than a priori knowledge, then the application of the rule leaves the probability of H unchanged. In the pure Bayesian formulation, equation 16 is the solution to the updating question. In practice, however, there are significant difficulties in using this formulation in an inference net. These difficulties result from a combination of the classical Bayesian dilemma over prior probabilities and the use of subjective probabilities. To see this difficulty, consider again a typical pair of nodes E and H embedded in an inference net. It is apparent from equations 8 and 10 that the updating procedure depends on the availability of the prior odds O(H). Thus, although we have not emphasized the point until now, we see that the expert must be dependent upon to provide the prior odds as well as λ and λ when the inference rule is given. On the other hand, recall our earlier observation that E also acts as an hypothesis to be resolved by the nodes below in the net. Thus, the expert must also provide prior odds on E. If all these quantities were specified consistently, then the situation would be as represented in figure 4. The straight line plotted is simply equation 16, and shows the interpolation noted above. In particular, note that if the user asserts that P(E E ) = P(E), then the updated probability is P(E E ) = P(H). In other words, if the user provides no new evidence, then the probability of H remains unchanged. In the practical case, unfortunately, the subjectively obtained prior probabilities are virtually certain to be inconsistent, and the situation becomes as shown in figure 5. Note that P(E), the prior probability provided by the expert, is different from P c (E), the probability consistent with P(H). Here, if the user provides no new evidence i.e. if P(E E ) = P(E) then the formal Bayesian updating will substantially change the probability of H from its prior value P(H). Furthermore, for the case shown in figure 5, if the user asserts that E is true with a probability P(E E ) lying in the interval between P(E) and P c (E), then the updated probability P(H E ) 8

9 1 P(H E') (updated probability of H) P(H E) P(H) P(H E) 0 0 P(E E') (current probability of E) P(E) 1 Figure 4: Idealized updating of P(H E ) will be less than P(H). Thus, we have here an example of a rule intended to increase the probability of H if E is found to be true, but which turns out to have the opposite effect. This type of error can be compounded as probabilities are propagated through the net. Several measures can be taken to correct the unfortunate effects of priors that are inconsistent with inference rules. Since the problem can be thought of as one of overspecification, one approach would be to relax the specification of whatever quantities are subjectively least certain. For example, if the subjective specification of P(E) were least certain (in the expert s opinion), then we might set P(E) = P c (E). This approach leads to difficulties because the pair of nodes E and H under consideration are embedded in a large net (establishing a prior probability of one node to be consistent with a prior probability of other node would make this other node inconsistent with its predecessors in the inference net). Prior probabilities can therefore not be forced into consistency on the basis of the local structure of the inference net. Apparently a more global process (perhaps a relaxation) would be required. A second alternative for achieving consistency would be to adjust the linear interpolation the linear interpolation function shown in figure 5. There are several possiblities, one of which is illustrated in figure 6. The linear function has been broken onto a piecewise linear function at the coordinates of the prior probabilities, forcing consistent updating of the probability of 9

10 1 P(H E') (updated probability of H) P(H E) P(H) P(H E) 0 0 P(E) P C (E) 1 P(E E') (current probability of E) Figure 5: Inconsistent priors H given E. This approach is used by the FEL-Expert system. There are other possibilities of interpolation, which are discussed in [1]. The analytical expression of figure 6 is P(H E ) = P(H E) + P(E E ) P(E) (P(H) P(H E)) for 0 P(E E ) P(E) P(H) P(H E)P(E) 1 P(E) + P(E E ) P(H E) P(H) 1 P(E) for P(E) P(E E ) 1 (17) The Use of Multiple Evidence We turn now to the more general updating problem in which several rules of the form E 1 H,E 2 H,...,E n H all concern the same hypothesis H. Since most nodes in actual inference nets have several incoming arcs (edges), this is the case of greatest practical interest. In order to gain some insight about how multiple evidence should be used to update H when the evidence is uncertain and the priors are inconsistent, let us first consider briefly how updating would formally proceed in simpler cases. Suppose the i th inference rule has associated with it the usual two quantities λ 1 and λ 1. For a first simple case, how should H be updated when all the E i have been observed to be certainly true? This case is analogous to the 10

11 1 P(H E') (updated probability of H) P(H E) P(H) P(H E) 0 0 P(E) P(E E') (current probability of E) P C (E) 1 Figure 6: Consistent interpolation functions case summarized by equation 8. Under the assumption that the pieces of evidence are conditionally independent, i.e., that and that n P(E 1,...,E n H) = P(E i H) (18) i=1 n P(E 1,...,E n H) = P(E i H), (19) i=1 it is not difficult to reach an analogous answer. Specifically, the odds on H are updated by the expression where n O(H E 1,...,E n ) = O(H) λ i (20) i=1 λ i = P(E i H) P(E i H). (21) Similarly, if all the evidence is observed to be certainly false, we can under conditional independence assumptions again factor the joint likelihood ratio to obtain 11

12 n O(H E 1,...,E n ) = O(H) λ i. (22) Now let us consider the general case of uncertain evidence and inconsistent prior probabilities. We already know that the posterior odds O(H E i ) given a single observation E i can be computed using updating function like the one shown in figure 6. We can therefore define, for a single inference rule, an effective likelihood ration λ i by λ i i=1 def = O(H E i ) O(H). (23) By making the assumption now that the E i are independent, we can obtain for the general case an expression similar to the simple updating formulas given by equations 20 and 22: n O(H E 1,...,E n ) = O(H) λ i. (24) To use this expression in an inference net system, we simply store with each node its prior odds (or probability), and store with each incoming arc an effective likelihood ratio λ i. Whenever a piece of evidence provided by the user causes P(E i,e i ) to be updated, a new effective likelihood ratio is computed using equation 24. This procedure has the following consequences [1]: If no evidence is obtained for a rule, then it will retain an initial effective likelihood of unity, since prior and posterior odds are the same. The order in which evidence is obtained and rules are applied does not affect the final posterior probabilities. The same rule can be used repeatedly, with the same or different values for the probability of the evidence. In particular, if a user changes his mind and modifies an earlier assertion, the new assertion will correctly undo any effects of earlier statements Logical Nodes It is often necessary to express a statement as a logical combination of partial statements presumptions. Because we usually do not know anything about the statistical dependencies between the presumptions, the probability of logical nodes is computed using the model taken from the Zadeh s theory of fuzzy sets. FEL-Expert uses the following formulas: i=1 12

13 2.3 Consultation P(E 1 &E 2 ) = min{p(e 1 ),P(E 2 )} (25) P(E 1 E 2 ) = max{p(e 1 ),P(E 2 )} (26) P( E) = 1 P(E) (27) Consultation in a diagnostic expert system is a process of updating the model of the real world with the goal of obtaining the diagnosis in a form of sorted goal hypotheses. The consulting system asks the user and updates the model from answers. The consultation in a dialog form repeats the following two phases: Selection of an askable node and querying the user the node is selected using the strategy described below and the corresponding question is put to the user. Propagation of the information the answer from the user is used to update probabilities in the inference net (the information is propagated from the askable node to goal hypotheses). The first phase uses so called backward chaining (chaining from goal to leaf using stack), the second phase uses so called forward chaining (from leaf to goal using queue). These two phases are repeated until the system has all information needed for examining all goal hypotheses. The diagnosis at the end of the consultation is determined by the order of sorted goal hypotheses Selection of an Askable Node The selection starts from one goal node. The system selects not yet examined goal hypothesis with the highest probability and starts to examine it. This hypothesis is then examined till its complete examination. After this other not examined goal with the highest current probability is selected again. This is repeated until all goals are examined. The backward chaining starts from the currently examined goal hypothesis. All direct presumptions of the hypothesis are scored by a special scoring function which is computed from the parameters in the inference net and from the current model of the world. The node with the highest score N which is not yet examined is selected to examination. Now one of the following situations may happen: Selected node N is askable, is not linked (see section 2.5 for explanation of links) and was not answered. Then the corresponding question is put to the user. 13

14 Selected node N is already answered or is not askable and is linked neither by a priority nor a context link 3. Then the node N is considered as momentarily examined intermediate hypothesis. All its presumptions are scored and the selection continues by the same procedure. Selected node N is not answered and is linked either by a priority or a context link. In this case the node linking the node N is examined Propagation of the Information After answering the question attached to the askable node the retrieved information is propagated through the inference net to the goal hypotheses. The probability is propagated (using the formulas above) from the answered node to its direct successors in the inference net. After this from each changed node the probability is again propagated to its direct successors. This is repeated until the goal hypotheses are reached (i.e. until the node has no successors) Scoring Function Process of a selection of the best askable node is a process of searching in the inference net from the top by depth first search (DFS). In this search heuristic function, so called scoring function, is used. FEL-Expert uses the same scoring function as the PROSPECTOR system. This function f p evaluates each presumptions E i of the given conclusion H by the formula [2] f p (E i ) = log λ λ P(E i E λ ) + log λ (1 P(E i E )) (28) In every step of the selection process is selected the node E i with the highest value of the scoring function f p (E i ). This holds for bayesian nodes 4. For logical nodes the next node is selected by the following strategy: and node the predecessor with the lowest current probability is selected or node the predecessor with the highest probability is selected not node the node has only one predecessor which is selected The formula 28 takes into account only the current model of the solved problem and is not able to accept any additional information such as price 3 By linked by a context link we mean that the context specified by the context link is not satisfied. 4 FEL-Expert 3.2 has also a possibility to use special quantity nodes, so called S and Q nodes. The scoring function has to be different for this kind of nodes, see [2] for details. 14

15 or time of getting the answer or importance of the question assigned by an expert. There are other possibilities of the scoring function taking into account these additional information. Some of them are discussed in [2]. 2.4 Handling of the Data for Inference FEL-Expert can handle various user responses and other inputs. The model described in the previous chapter works only with pseudoprobabilities. However, this is not sufficient for real-life applications, where we need inputs in other forms. For example we need qualitative responses such as yes, no, I don t know and possible values between these responses, as well as quantitative responses such as values 15.42, 58 C or numerical intervals. Types of these responses are described in this section Certainty Factor Certainty factor is used to provide possibility of qualitative responses. The idea is that a man can answer not only yes or no but also probably yes, I think yes, surely no, I don t know and so on. These answers can be for our purposes ordered on a numerical axis where the outer values correspond to the categorical answers surely yes and surely no (figure 7). FEL-Expert uses number 5 for surely no and +5 for surely yes, 0 then means I don t know. This number is called certainty factor R, where R < 5;5 >. P(E E') (probability from certainty factor) 1 P(E) -5 surely no 0 don't know 0 R (certainty factor) 5 surely yes Figure 7: Conversion from a certainty factor to a probability The response in the form of the certainty factor has to be translated to a pseudoprobability that will be used for inference. This translation is made by a linear interpolation between three points (see figure 7). First point, certainty factor 5 means surely no and so corresponds to the probability 0. Similarly the certainty factor 5 corresponds to the probability 1. Certainty 15

16 factor 0 means don t know, which is the case where the aprior probability has to be used. The pseudoprobability is then computed from the formula P(E E ) = { P(E) + (1 P(E)) R 5 for R 0 P(E)(1 R 5 ) for R < 0 (29) Numerical Value Computing the probability from a single numerical value x is based on the following idea: The expert can provide the sufficiency measure (or subjective probability) for some important values. If the user enters any of these values the inference engine can use the corresponding values from the expert. Other values can be computed by a linear interpolation between values provided by the expert. The FEL-Expert 3.2 used so called S-nodes to handle this type of input. Each S-node has assigned important values x i of an observed quantity. From each S-node must lead special rule with attached probabilities P(H E,x i ) which represent measures corresponding to the values specified in the S- node. In this way the value of P(H E,x) is approximated and used to compute the probability of the successor node from the user s input (see [2] for details). In some cases the expert can declare the dependency explicitly. This case cannot be handled by the system FEL-Expert 3.2 directly. However it can be transferred to the previous approximation by choosing important values and corresponding probabilities. In FEL-Expert 4.0, the concept of translators was introduced, so S-nodes with special rules are no more used. Consult user manual [7] for details Other Values See user manual [7] for details. 2.5 Links The basic inference net described above specifies the relation between hypotheses, whether goal hypotheses or evidences or inner hypotheses. It is sufficient to describe relations but is not sufficient to describe the order of the node examination or the examination under specific conditions. To specify this FEL-Expert uses priority and context links Priority Links Priority links specify unconditional order of the examination. It says that some node must be examined before the examination of some other node: Before investigating the node E 1, investigate the node E 2. 16

17 This is important for example when using external programs to call them the system must know the values of certain nodes which are used as parameters before calling the external program. This can be ensured by using priority links Context Links In some cases we do not want to examine certain hypotheses. For example it has no sense to ask whether a brother of a patient is genetically handicapped if we did not prove that the patient has a brother [6]. To specify this conditional examination the context links can be used. The context link specifies that some node is examined only if a probability of some other node is in a given range (context): Investigate the node E 1 only if the probability of the node E 2 is in a given range. If the second node is not yet examined the system examines it first (tries to satisfy the context) and than again looks whether the first node should be examined. 2.6 Taxonomies Taxonomies [6, 5] are used to specify a shallow knowledge over the inference net. They are important especially for large knowledge bases. Taxonomies are used for attention focusing and for specifying hierarchies among hypotheses in the inference net. In the first case they can eliminate examining irrelevant hypotheses, in the second case they can eliminate examining the statements for which we can estimate the answer from the other statements. The taxonomies form additional net (or nets) over the inference net and should be consistent with the inference net (the consistency check is made by finding cycles in the IIC graph (inference and inheritance compound graph). Both taxonomy for attention focusing and for specifying hierarchical dependencies have a form of a tree graph where the nodes are the taxonomy classes. Edges represent dependencies between the taxonomy classes (hierarchy). Subclasses of any given class are classes on the paths from that class to leaves of the tree graph, superclasses of a class are classes on the path to the root of the tree graph. To each node in this graph a list of inference net nodes is assigned (this list can be empty) in this way taxonomy specifies dependencies between group of hypotheses (inference net nodes) Attention Focusing If we have any additional information about a consulted problem case we can use it to focus attention to some hypotheses by taxonomies. This can reduce 17

18 a number of investigated hypotheses and so reduce a number of questions put to the user and a time needed for consultation. First step in using this taxonomy type is to find groups of hypotheses which are applicable to some kind of problem. For example when building a knowledge base for diagnosis of a diseases of animals we can find certain diseases which are specific only for a certain animal species. Knowing the animal before the consultation starts would probably eliminate many diseases which are not possible for this animal. In large knowledge bases these groups often form hierarchical structures (such as hierarchy (taxonomy) of animal species in our example). This hierarchical structure can be expressed by a taxonomy graph. This also brings possibility to select a set of hypotheses groups often the user is not sure about the specific problem case, but can select a group of problem cases (such as birds in our example instead of eagle). The taxonomy class is selected before the consultation. If no class is selected then the consultation runs in a standard way. If any class is selected then the expert system investigates only those goal hypotheses which belong to the selected class and its subclasses the investigated hypotheses are then determined by the union of these classes. There can be more than one taxonomy of this type. In this case if in more taxonomies their classes are selected, then the investigated hypotheses are determined by the intersection of hypotheses sets from each taxonomy. In this way the attention can be focused on a small subset of all possible goal hypotheses in a knowledge base. This type of taxonomy is called hypothesis [6] Hierarchical Dependencies This type of taxonomy is used during the consultation (unlike the previous one, which can be said to be used before the consultation). In many cases when the user answers categorically yes or no to some question we are able to use this answer also for some other questions corresponding to other hypotheses in the knowledge base. For example if the user answers certainly no to the question is the animal a mammal? than we can estimate that the same answer will be for the question is the animal a cat?. On the other hand the categorically positive answer to the question is the animal a crow? means the same answer to the question is the animal a bird?. It should be said that we cannot propagate an answer when the answer is not categorical (if the user is not sure). As can be seen this type of taxonomy can also eliminate a number of investigated hypotheses. In this case the elimination is done by propagating categorical answers through the taxonomy net. The taxonomy net represents hierarchical dependencies between hypotheses groups. The categorical 18

19 answer yes can be propagated to superclasses of a taxonomy class in this taxonomy net (i.e. if something categorically holds for a class then it holds also for its superclasses). Similarly, the categorical answer no can be propagated to subclasses of a taxonomy class (i.e. if something certainly does not hold for a class then it does not hold also for its subclasses). This type of taxonomy is called hidden [6]. In FEL-Expert it is possible to select the kind of propagation of categorical answers for each taxonomy. The first type enables propagation of only positive answers, the second type enables propagation of only negative answers and the third type enables propagation in both directions (both positive and negative answers). References [1] Duda, R. O., Hart, P. E., Nillson, N. J.: Subjective Bayesian Methods For Rule-Based Inference Systems, in Shaffer G., Pearl J. (editors): Readings in Uncertain Reasoning, Morgan Kaufman Publishers, 1990, ISBN [2] Mařík, V.: Využití metod umělé inteligence pro řešení diagnostických úloh (Usage of Artificial Intelligence Methods for Solving Diagnostics Tasks), dissertation thesis, FEL ČVUT, 1988 [3] Mařík, V., Štěpánková, O., Lažanský, J.: Umělá inteligence 1 (Artificial Intelligence 1), Academia, Praha, 1993, ISBN [4] Mařík, V., Štěpánková, O., Lažanský, J.: Umělá inteligence 2 (Artificial Intelligence 2), Academia, Praha, 1997, ISBN [5] Mařík, V., Vlček, T., Kouba, Z., Lažanský, J., Lhotská, L., Štěpánková, O.: Expert System FEL-EXPERT version 3.5, Description and Users s Manual, Technical Report TR-PRG-IEDS-6/92, FAW Linz-Hagenberg, Praha-Wien, 1992 [6] Mařík, V., Vlček, T., Lhotská, L., Wagner, R., Retschitzegger, W.: Exploation of External Programs and Taxonomies in Rule-Based Expert System, Technical Report TR-PRG-IEDS-12/93, FAW Linz-Hagenberg, Praha-Wien, 1993 [7] Obitko, M.: Uživatelská příručka k systému FEL-Expert 4.0 (FEL- Expert 4.0 User Manual), 1999, felex/uman/uman.pdf 19

Basic Probabilistic Reasoning SEG

Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision