Learning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge

Size: px

Start display at page:

Download "Learning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge"

Clarissa Allen
6 years ago
Views:

1 Learning Terminological Naïve Bayesian Classifiers Under Different Assumptions on Missing Knowledge Pasquale Minervini Claudia d Amato Nicola Fanizzi Department of Computer Science University of Bari URSW 2011 Bonn, October 23, 2011

2 Contents 1 Introduction & Motivation

3 Introduction & Motivations Uncertainty is inherently present in real-world knowledge In the SW context difficulties arise modeling real-world domains using only purely logical formalisms Several approaches for coping with unceratin knowledge have been proposed (probabilistic, fuzzy,...) usually probabilistic information is assumed to be available the CWA is adopted Exploiting an already populated ontology, a method capturing probabilistic information could be of help the OWA has to be taken into account

4 Paper Contributions Proposal of a Terminological naïve Bayesian classifier for predicting class-membership probabilistically it is a naïve Bayesian network modeling the conditional dependencies between a learned set of Description Logic (complex) concepts and a target concept it deals with the incomplete knowledge due the OWA by considering different ignorance models: Missing Completely at Random Missing at Random Informatively Missing

5 Knowledge Base Representation The Reference Representation Language Bayesian Networks Assumption: resources, concepts and relationships are defined in terms of a representation that can be mapped to some DL language (with the standard model-theoretic semantics) K = T, A T-box T is a set of definitions A-box A contains extensional assertions on concepts and roles e.g. C(a) and R(a, b) The set of the individuals (resources) occurring in A will be denoted Ind(A)

6 Basics of Bayesian Networks... The Reference Representation Language Bayesian Networks A Bayesian network (BN) is a DAG G representing the conditional dependencies in a set of random variables Each vertex in G corresponds to a random variable X i Each edge in G indicates a direct influence relation between the two connected random variables A set of conditional probability distributions θ G is associated with each vertex Each vertex X i in G is conditionally independent of any subset S Nd(X i ) of vertices that are not descendants of X i

7 ...Basics of Bayesian Networks The Reference Representation Language Bayesian Networks The joint probability distribution Pr(X 1,..., X n ) over a set of random variables {X 1,..., X n } is computed as Pr(X 1,..., X n ) = n Pr(X i parents(x i )); i=1 Given a BN, it is possible to evaluate inference queries by marginalization To decrease the inference complexity the naïve Bayes network is often considered it is assumed that the presence (or absence) of a particular feature (random variable) of a class is unrelated to the presence (or absence) of any other feature, given the class variable (random variable)

8 Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Terminological Naïve Bayesian Network: Definition A Terminological Bayesian Network (TBN) N K, w.r.t. defined as a pair G, Θ G, where: G = V, E is a directed acyclic graph, in which: a DL KB K, is V = {F 1,..., F n, C} is a set of vertices, each F i representing a DL (eventually complex) concepts defined over K and C representing a target concept E V V is a set of edges, modeling the dependence relations between the elements of V; Θ G is a set of conditional probability distributions (CPD), one for each V V, representing the conditional probability of the feature concept given its parents in the graph In the case of a Terminological Naïve Bayesian Network, E = { C, F i i {1,..., n}}, namely i, j {1,..., n} and i j F i is independent of F j given the value of the target concept

9 Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Terminological Naïve Bayesian Network: Example Given: a set of DL feature concepts F = {Female, HasChild := haschild.person} (variable names are used instead of complex feature concepts) a target concept Father the Terminological Naïve Bayesian Network is: Pr(Female Father) Pr(Female Father) Female Father Pr(HasChild Father) Pr(HasChild Father) HasChild

10 Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Given: a target concept C a DL KB K = T, A the sets of of positive, negative and neutral examples for C, denoted with Ind + C (A), Ind C (A) and Ind C 0 (A), so that: a Ind + C (A) : K = C(a), a Ind C (A) : K = C(a), a IndC 0 (A) : K = C(a) K = C(a); an ignorance model a scoring function score for a TBN N K w.r.t. Ind C (A) Find: a network NK maximizing the scoring function N K arg max N K score(n K, Ind C (A)))

11 TBN: the Learning Algorithm... Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models function learn(k, Ind C (A)) {The TBN is initialized as containing only the target concept node} NK = G, Θ G ; G = V {C}, E ; N K ; repeat N K NK ; {A new network is created, having one more node and different parameters than the previous one} Network = c, N K, s extend(n K, Ind C (A)); NK N K ; {Possible stopping conditions: a) improvements in score below a threshold; b) reaching a maximum number of nodes} until stopping criterion on Network; return N K ;

12 ...TBN: the Learning Algorithm Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models function extend(n K, Ind C (A)) Concept Start; Best ; repeat Concepts ; for c {c ρ cl (Concept) c min( c + d, maxlen)} do V V {c }; N K optimalnetwork(v, Ind C (A)); s score(n K, Ind C (A)); Concepts Concepts { c, N K, s }; end for Best arg max c,n K,s Concepts {Best} s ; Concept c : c, N K, s = Best; {Possible stopping conditions: a) exceeding a maximum number of iterations; b) exceeding a maximum number of refinement steps} until Stopping criterion on Best; return Best;

13 Learning a Naïve TBN: Example Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Initial Network Searching for the first feature F ather Male F emale haschild. hasp arent. P erson married. Adding the first feature to the network Searching for the second feature F ather F emale P erson Mammal haschild.p erson hasp arent.p erson hassibling. married. Adding the second feature to the network F ather F emale haschild.p erson

14 The ignorance models Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models To learn the TBN, different assumptions (ignorance models) on the nature of the missing information are considered, given an ideal KB K having additional knowledge: MCAR (Missing Completely At Random) the probability that a C I is missing is independent of any kind of (additional) knowledge: Pr(K = C(a) K = C(a) K ) = Pr(K = C(a) K = C(a)); MAR (Missing At Random) the probability that a C I is missing depends only from K and does not depend on additional knowledge: Pr(K = C(a) K = C(a) K ) = Pr(K = C(a) K = C(a) K); NMAR (Not Missing At Random or IM, Informatively Missing) the probability that a C I is missing could be not the same if additional knowledge is available Pr(K = C(a) K = C(a) K ) Pr(K = C(a) K = C(a) K).

15 TBN under MCAR assumption Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Only positive and negative examples is considered Parameters estimated by the use of the frequency distribution score computed as the log-likelihood on training data: L(N K Ind C (A)) = = log Pr(C(a) N K ) + log Pr( C(a) N K ); a Ind + C (A) a Ind C (A)

16 TBN under MAR assumption Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Positive, negative and neutral examples are considered The EM algorithm is adopted for parameters estimation score is computed as the log-likelihood on training data considering also the neutral examples L(N K Ind C (A)) = + a IndC 0 (A) C {C, C} a Ind + C (A) log Pr(C(a) N K ) + log Pr(C (a) N K ) Pr(C N K ) a Ind C (A) log Pr( C(a) N K );

17 TBN under NMAR assumption Defining a Terminological Naïve Bayesian Network TBN: the Learning Algorithm The Ignorance Models Positive and negative examples are considered For the neutral examples, all the possible fillings are considered Robust Bayesian estimation (RBE) is adopted to learn conditional probability distributions probability intervals are determined instead of single probability values score: as for MCAR considering the mean value of the probability intervals

18 : Example Given: the feature concepts F = {Female, HasChild} the target concept Father the naïve TBN of the previous example the DL KB K an individual a s.t. K = HasChild(a) while the membership of a to Female is not known The probability that a is instance of Father is given by: Pr(Father(a)) = Pr(Father) Pr(HasChild Father) Pr(Father ) Pr(HasChild Father ) ; Father {Father, Father}

19 Classifying individuals using RBE: Example A naïve TBN using Robust Bayesian Estimation for inferring posterior probability intervals in presence of NMAR assumption is s.t. conditional probability contain probability intervals (defined by upper and lower bound) instead of probability values [Pr(Female Father),Pr(Female Father)] [Pr(Female Father),Pr(Female Father)] Female Father [Pr(HasChild Father),Pr(HasChild Father)] [Pr(HasChild Father),Pr(HasChild Father)] HasChild Inference on a instance of Father given that K = HasChild(a), is given by the interval [Pr(Father HasChild), Pr(Father HasChild)], where: Pr(Fa(a)) = Pr(Fa HC) = Pr(Fa(a)) = Pr(Fa HC) = Pr(HC Fa)Pr(Fa) Pr(HC Fa)Pr(Fa) + Pr(HC Fa)Pr( Fa) ; Pr(HC Fa)Pr(Fa) Pr(HC Fa)Pr(Fa) + Pr(HC Fa)Pr( Fa) ;

20 Conclusions & Future Work Conclusions: Proposed a ML method based on the naïve Bayes assumption for estimating the probability that a generic individual belongs to a certain target concept, given its membership relation to an induced set of (complex) DL concepts an ignorance model for handling incomplete knowledge Future works: experimenting with the method finding optimizations of the proposed method

21 The End That s all! Questions?

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals