Target Classification Using Knowledge-Based Probabilistic Model

Size: px

Start display at page:

Download "Target Classification Using Knowledge-Based Probabilistic Model"

Hortense Greene
6 years ago
Views:

1 14th International Conference on Information Fusion Chicago, Illinois, USA, July 5-8, 2011 Target Classification Using Knowledge-Based Probabilistic Model Wenyin Tang 1,K.Z.Mao 1,, Lee Onn Mak 2,GeeWahNg 2, Zhaoyang Sun 1,JiHuaAng 2 and Godfrey Lim 2 1 School of EEE, Nanyang Technological University, Singapore 2 DSO National Laboratories, Singapore Corresponding author, ekzmao@ntu.edu.sg Abstract In past decades, pattern classification has been intensively explored in machine learning. With the in-depth exploration of machine learning in various applications, new challenges arise, which requests researchers to move from driven to domain-driven models by integrating domain knowledge, and to move from static to dynamic models to adapt to the changing environment. This paper proposes an intelligent classification system with following features, to address these requests. Firstly, this system integrates both association and classification modules. The contextual information extracted from input is saved as learnt knowledge which is then combined with given expert knowledge in classification. The experimental study shows that this learning process helps to reduce the ambiguity of classification. Secondly, the proposed classifier, i.e. knowledge-based naive Bayes, classifies the incoming based on both expert knowledge and learnt knowledge. Thirdly, a soft-decision mechanism is adopted in classification algorithm, which can effectively handle overlapping. Keywords: target classification, association, soft decision. I. INTRODUCTION As a highly active area of machine learning, pattern classification has been extensively explored [1], [2]. Successful classification algorithms such as naive Bayes, k-nearest neighbor, artificial neural networks, support vector machines etc, have been applied in various fields including biomedical engineering, image processing, mining and defense technology. Although the classification algorithms more or less emulate the way that human classify perceptual objects or patterns, most of them focus on modeling single functions of human intelligence under some assumptions. Moreover, most classifiers are designed in a -driven manner, and the classification performance largely relies on the quality of given training. This is actually a non-trivial problem because in many applications, a good training set is not readily available. With the in-depth exploration of machine learning methods in various applications, new challenges arise, which requests researchers to move (i) from -driven to domain-driven models by understanding and integrating domain knowledge [3]; (ii) from static to dynamic models capable of adapting to changing environment (this issue is referred to as situation awareness in information fusion [4]). These new challenges motivate us to design a target classification system that can make good use of domain knowledge and learn from environment at the same time. The proposed target classification system is an intelligent system that integrates both association and classification modules. The input is first transferred to association, where the is associated to the historical clusters on entries list. The novel that is excluded from all existing clusters (based on a predefined threshold) will be recorded and input into the classification module. While the non-novel will be associated to its nearest cluster. From the entries list, the contextual information will be extracted from recent historical and saved as learnt knowledge for further use. From the experimental study, we find that integrating the contextual information into the given expert knowledge helps to reduce the ambiguity of classification and therefore improves the overall classification performance. The proposed classification algorithm is knowledge-driven, starting from the prior knowledge stored in known target library (KTL). KTL contains expert knowledge of targets characteristics, either from long period of collection or provided by manufacturers [5]. This model is extremely useful when the prior expert knowledge instead of training is available. Classical models include prototypes models, rule/boundary models, and hybrid models (see, for example, chapter 2 of [6]). In this paper, we propose a probabilistic model, called knowledge-based naive Bayes model, which is capable of integrating both prior and learnt knowledge (from incoming ). Our system employs a soft decision mechanism. The principle behind is, based on the studies of cognitive psychology, identifying alternative hypotheses is helpful in improving human s problem solving skills than fixing on one hypothesis [7]. In classification problem, our previous work [5] found that soft decision is also important to deal with overlapping, which is pervasive over different applications. The rest of this paper is organized as follows. Section II will introduce the proposed knowledge-based target classification system. The experimental study is presented in Section III. Section IV will conclude the whole paper. II. KNOWLEDGE-BASED TARGET CLASSIFICATION SYSTEM A. System Overview Figure 1 shows the flowchart of the proposed knowledgebased target classification system, where the raw is collected from multiple sensors and is aligned to form a ISIF 701

2 Target Figure 1. Sensors Raw Data Alignment Track Data Association EL Response output Data & Predictions Predictions Update Legends: EL: Entries List KTL: Known Target Library LK: Learnt Knowledge Classification LK KTL Knowledge-based target classification system track which is then transferred to association. The association recruits new samples into the entries list (EL) incrementally: the novel track that has not been seen before will be recorded and its predictive result will be retrieved based on the knowledge stored in known target library (KTL) and learnt knowledge (LK). Otherwise, the non-novel track will be used to update the nearest existing entry. KTL contains target/class characteristics either from long period of collection or provided by manufacturers [5]. The target/class characteristics are represented in a common set of attributes, while each class is described by a conjunction of specific attribute values or value ranges. In feature space, the characteristics define a specific region for each of the known target classes. Figure 2 shows a snapshot of KTL with two classes represented in two continuous attributes {X 1,X 2 } (left panel), and the corresponding class regions in the feature space spanned by X 1 and X 2 (right panel). Known Target Library Target1: [ ] [ ] Target2: [ ] [ ] Target 1 Target Figure 2. Snapshot of known target library (left) and the corresponding classes defined in feature space (right) Figure 2 shows a simple example of known target library. The practical known target library is much more complicated, containing a large number of classes, attributes, and many overlaps between different classes as well. In experimental study, the KTL contains 166 target classes (as will be shown in Figure 8). In entries list, the new inputs generated is used to update the learnt knowledge (LK) as contextual information, for further classification. This point will be implemented in detail in Section II-C1. The overall performance of the system mainly depends on the accuracy of classification models embedded in this system. Next we will focus on knowledge-based classification models. B. Related Classification Models 1) Production Rule-based Classifier: Production rule-based classifier considers a simple representation in which each rule consists of a conjunction of constraints on the attributes. Considering the example in Figure 2, two rules can be generated from the known target library, as follows: Rule 1: IF 0.1 X AND 0.4 X THEN Class ω 1 Rule 2: IF 0.4 X AND 0.6 X THEN Class ω 2 If any incoming x i satisfies all the constraints of Rule k, it will be classified to the corresponding class ω k.inthe feature space as shown in Figure 2, each rule defines a region bounded by a hyper-rectangle or a box. Any located inside the box can be identified directly. Production rules interpret the concepts of target classes clearly, but this method has several intrinsic limitations. Firstly, production rules are too strict to tolerate noisy. The that floats out of its defined box due to noise may make the unrecognized, although it may be still very close to the box. Secondly, for located in the overlapping region of two or more classes, production rules are unable to tell which class is more likely to be the true target class. 2) Distance-based Classifier: A more flexible way is to classify a based on the distances from the to different classes. The is assigned to the nearest one or more classes (if several classes get the same distances to the ). Therefore, a will be assigned to its most possible class even if it is outside the defined box. There are two options for calculating distances, i.e. i) distance to class centroid (or prototype model): m d c (x i,ω k )= (x ij μ kj ) 2 (1) j=1 where x ij is the value of j th attribute of input x i and μ kj is the center point of value range of class ω k on attribute X j, i.e. V jk ; and ii) distance to class boundary (or boundary model): d b (x i,ω k )= { 0; if xi inside of class ω k m j=1 [min(x ij V kj )] 2 ; otherwise (2) 702

3 Figure 3. classes Centroid distance (left) and boundary distance (right) to target In centroid distance-based classifier, the class is represented by the center point (shown as + in Figure 3) of a hyperrectangle or box in the feature space. As a result, the boundary information is lost. Some classes, say class 1, may have a bigger expansion on one attribute than other classes, say class 2. The that belongs to class 1 may be misclassified to the class nearby, as shown in Figure 4. Figure 4. The problem of centroid distance: will be misclassified to class 2 because it is nearer to the centroid of class 2. Boundary distance-based classifier will not misclassify a inside of the class boundary as shown in Figure 4, however, it is more sensitive to irrelevant attributes. For example, for represented by relevant attributes X 1 and X 2 and an irrelevant attribute X 3. When the unknown perfectly satisfies the constraints of class k on X 1 and X 2, i.e. 0 boundary distance on these two dimensions, the final boundary distance of the to class ω k will completely depend on the boundary distance on attribute X 3, which may be highly inaccurate. Secondly, boundary distance-based method cannot differentiate the unknown in the overlapping region of two or more classes, as shown in Figure 5. Figure 5. The problem of boundary distance: which class is more likely to be the true target class if the boundary distances are same? C. A Knowledge-Based Probabilistic Model for Classification A probabilistic model for classification involves approximating an unknown target function f : X Y by estimating the posterior probability P (Y X), where Y is a random variable that represents class label, and X is a vector containing n features or attributes, i.e. X =[X 1,X 2,...,X n ]. Based on Bayes rule, the posterior probability P (Y = ω k X = x i ) or P (ω k x i ) can be represented as follows. P (ω k x i )= P (x i ω k )P (ω k ) k P (x (3) i ω k )P (ω k ) For the input x i, applying MAP (maximum a posteriori) yields: y i = ω k if P (ω k x i ) >P(ω l x i ) for all k = l. (4) Bayes rule is actually the basis for various Bayes models, where the difference among various models lies in how we estimate P (X ω k ) (conditional probabilities) and P (ω k ) (priors). It is well-acknowledged that directly estimating P (X ω k ) at the n-dimensional feature space is not always practical, even for a small n. That is because accurate probability estimation requires a large number of training samples, to keep the statistical significance during the estimation [1]. Naive Bayes [8], [9] simplifies the probability estimation of P (X Y ) by making a conditional independence assumption. This reduces the complexity of parameters from the original O(2 n ) to O(n) [1]. Based on the assumption of conditional attribute independence, naive Bayes has: n P (X ω k )=P(X 1,X 2,...,X n ω k )= P (X j ω k ) (5) j=1 Substituting Eq. (5) into Eq. (3) and Eq. (4) yields the following NB classifier: y i =argmax ω k P (ω k ) n P (x ij ω k ) (6) j=1 In literature, the choice of probability estimation methods is usually according to attribute types, i.e. whether the attributes are discrete, continuous or mixed. For discretevalued training, the probability estimation is estimated using maximum likelihood estimation (MLE) by counting. In the case of continuous inputs, the attributes are either discretized into discrete ones [10] or estimates P (x ij ω k ) on continuous feature X j directly. The parametric method will assume that for each possible class ω k the distribution of each continuous X j is Gaussian, and estimate the parameters of each of these Gaussian distributions from training. Nonparametric methods such as Parzen window can also be used to estimate the conditional probability of input X given Y [2]. Due to the independence assumption, NB classifier is simple and can be trained very efficiently. Despite of its simplicity, naive Bayes classifier can often outperform more sophisticated classification methods [9], [11], [12]. The aforementioned methods for probability estimation are in a supervised setting where a set of training is provided. 703

4 In some applications, some expert knowledge of target characteristics, instead of training, may be available. Next, a new knowledge-based naive Bayes classifier using known target library will be introduced. 1) Knowledge-Based Naive Bayes (NB) Classifier: The knowledge-based NB classifier concerns with estimating conditional probabilities P (X j ω k ) and prior probabilities P (ω k ) using given expert knowledge. First and foremost, we need to consider knowledge representations in probability distributions. Given the class descriptions in KTL, the conditional probability P (X j ω k ) can be defined as: P (X j ω k )=η jk f(x j ω k ) (7) where η jk is a normalized factor to ensure + P (X j ω k )dx j = 1 and f(x j ω k ) is a representation function, which can be defined as Eqn (8). { 1, Xj V f(x j ω k )= jk (8) 0, otherwise where V jk is the value range of class k on attribute X j.the conditional probability of X given ω k is estimated by: ˆP (X ω k )= n η jk f(x j ω k ) (9) j=1 If function f( ) is defined as Eqn (8), the knowledgebased NB classifier is equivalent to the production rulebased classifier (as presented in Section II-B1), and thus the problems of the rule-based model mentioned in Section II-B1 will be inherited in the probabilistic model. To deal with the problems, an ideal function f( ) should be able to generalize well and provide a probability measure for any unknown input. Therefore, we modify the 0-1 function defined in Eqn (8) by attaching symmetrical right and left tails to it. The modified function is given in Eqn 10, and is shown in Figure 6 (b). 1, ) X j V jk f(x j ω k )= exp ( (x min(v jk)) 2 2(α r jk ), X 2 j < min(v jk ) ) exp ( (x max(v jk)) 2 2(α r jk ), X 2 j > max(v jk ) (10) where max(v jk ) and min(v jk ) functions represent the maximum and minimum values in V jk. Parameter r jk is the radius of V jk (i.e. r jk = max(v jk) min(v jk ) 2 ) and α is a parameter controlling the decreasing rate of the tails. The setting of parameter α depends on the degree of matching between the real distribution and the given region of target class. In our experimental study, we set α =0.1. For the prior probability estimation, traditional NB classifiers usually calculate P (ω k ) from the training set D by maximum likelihood estimation, i.e. ˆP (ωk )= D{Y =ω k} D, where counts the number of samples in one set Attribute (a) 0-1 function f( ) defined in Eqn (8): equivalent to rule-based classifier p Attribute (b) improved function f( ) defined in Eqn (10) Figure 6. Two knowledge representation functions on attribute value range from 0.4 to 0.6. For the knowledge-based NB classifier where the training set is unavailable, we estimate the prior probability based on recent historical testing set H, by using their predicted class labels as pseudo-feedback. Therefore, the prior probability could be defined as the percentage of the historical testing classified to ω k, i.e. ˆP (ω k, x i H = {x i h,...x i 1 })= H{Y = ω k} H (11) The idea is described in Figure 7, i.e. if the recent input are continuously predicted as class 1, for example, the current input which is located at the overlapping region of class 1 and class 2 should be more likely to belong to class 1. Figure 7. Consider the prior knowledge, the unknown is more likely to belong to class 1. To avoid the domination of the class that appears first, (for example, if the first test is classified to ω 1, then 704

5 ˆP (ω 1 )=1and ˆP (ω 2 )= ˆP (ω 3 )=... = ˆP (ω K )=0,the future unknown will be classified to ω 1 because the prior probabilities for other classes are all equal to zero), we will use a smoothed estimate based on Dirichlet prior over the prior probability by assuming equal priors on each class ω k.and the prior probability estimation function for the i th unknown x i can be calculated based on the h most recent historical, as below: ˆP (ω k, x i H = {x i h,...x i 1 })= H{Y = ω k} + l H + l K (12) where K is the number of classes and l is a parameter that determines the strength of the equal prior assumptions related to the recent history H. In experimental study, we find the number of recent history H for prior probability estimation should not be too large, say hundreds. We suggest to set h<100 as further discussed in Section III-B and l =1. 2) Soft Decision Mechanism: Based on cognitive psychology, one tip on problem solving is to identify alternative hypotheses, rather than fixing on one hypothesis [7]. Our system implements this idea by plugging a soft decision mechanism. Soft decision is very important for the with overlaps between different classes, which is common in practice. When classifying an input that falls into the overlap region of different classes, the traditional classifier that makes crisp decision will inevitably cause a high rate of misclassifications due to the big uncertainty. In this case, soft decision is believed to be better and let experts have the opportunity to evaluate reasonable alternatives. In our previous work [5], we explored soft decision in the context of neural networks, which was proved to be effective in dealing with overlapping. The algorithm for soft decision is quite simple. For any input x i, the output classes are sorted in a descending order of the corresponding posterior probabilities, i.e. ω (1) ω (2) ω (K) if ˆP (ω (1) x i ) ˆP (ω (2) x i ), ˆP (ω (K) x i ), where ω (k) represents the k th ranked class in outputs. We will select top k class by setting a cutoff point between ω (k) ω (k+1) if: ˆP (ω (k) x i ) ˆP (ω (k+1) x i ) ε (13) where ε is a small positive value to control the softness of decision. The smaller the ε, the softer the decision. III. EXPERIMENTAL STUDY A. Data and Evaluation Methods 1) Data: To test the target classification system developed in this paper, a simulated called KTL166- and three real-world sets from UCI machine learning repository [13] are used in the experiments. The simulated includes two parts: 1) KTL with 166 classes represented by 2 continuous attributes and 2) sequentially incoming test, as summarized in Table I. Figure 8 shows the KTL of the simulated with 166 classes in feature space spanned by two attributes X 1 and X 2.From this figure, we can observe that class distributions defined in KTL in application domain could be very complicated, where different classes may overlap severely. attribute Table I SIMULATED DATA SUMMARIZATION 1. Given known target library (KTL): #classes #features type of features continuous 2. Sequential inputs (test ): #inputs #features #classes KTL attribute 1 Figure 8. Known target library of DSO artificial in feature space spanned by attributes X 1 and X 2. For the three UCI sets summarized in Table II, KTL is not available beforehand, and the method that artificially generates KTL for each set will be introduced in detail in Section III-C. Compared with simulated, the number of attributes in these sets are higher. Table II STATISTICS OF THREE REAL-WORLD DATASETS FROM UCI MACHINE LEARNING DATA REPOSITORY set name #classes #samples #attributes ecoli segment vowel ) Evaluation of Soft Decision: The classification accuracy of soft decision is evaluated from two aspects, i.e. error rate and the average number of output class labels. As shown in Table III for example, if the ground truth class of the input is ω 1, there are three different levels of prediction quality: excellent - output only one prediction class that hit the target. good - output more than one class and the true class is found. 705

6 Table III EXCELLENT, GOOD AND POOR PREDICTIONS UNDER SOFT DECISION FRAMEWORK Ground truth class: ω 1 Prediction Error rate No. output Evaluation ω excellent ω 1,ω good ω 2 or ω 2,ω or 2 poor poor - make a wrong decision. Obviously, this evaluation criterion implies that it is better to output multiple predictions than a single but incorrect one. B. Results On Simulated Data Figure 9. Comparison of rule-based, distance-based and probabilistic methods for association and classification. The overall system with both association and classification modules using knowledge-based NB, production rule-based classifier and distance-based classifier has been tested on the simulated summarized in Table I. The experimental results are summarized in Figure 9. In this figure, the xaxis represents the average number of predicted class labels assigned to each input. The y-axis represents the error rate of online association and classification. Each curve or point shows the experimental results from one method, varied by different parameter settings. For knowledge-based NB, we vary the number of history used to estimate the prior probability of knowledge-based NB from h =0(i.e. equal priors), 2, 5, 10, 20, 50, 100. The error curve shows that when h =2, 5, 10, the performance of knowledge-based NB will be improved compared with equal priors when h =0with smaller number of predictions and similar error rates. This implies that introducing prior probability from incoming will help to reduce the ambiguity of classification. However, when h continues to increase, the error rate will increase to tradeoff the lower number of predictions. When compare with other methods, knowledgebased NB shows the better performance. C. Results On UCI Data In this section, our method is further tested using 3 realworld sets from UCI set repository [13] as summerized in Table II. We generate a KTL for each set artificially. For each set D, we first calculate the attribute value ranges of each class, i.e. V jk =[a, b], where a =min{x j (Y = ω k )} and b = max{x j (Y = ω k )}, and generate the perfect KTL={V jk j =1,...,n; k =1,...,K}. Then we generate KTL with an incomplete coverage for the training, i.e. KTL λ, where λ here represents the degree of knowledge completeness. One reason behind this kind of setting is that there may be a gap between the given expert knowledge and the incoming, due to the noise. Another important reason is that the descriptions of class may tend to interpret the typical samples of the class because these descriptions of classes are highly abstracted. Therefore, a reasonable setting here is to give value ranges of each class that only covers the central part of the from this class. As a result, KTL λ contains a set of value ranges shrinking toward their mean values. Specifically, KTL λ = {Vjk λ j = 1,...,n; k = 1,...,K}, where Vjk λ =[a + λ 2 (b a),b λ 2 (b a)], with 0 <λ 1. In classification, KTL λ is used to construct knowledgebased NB, rule-based classifier and distance-based classifiers, and the whole sample set is used as test. The experimental results are shown in Table IV, V and VI. From these experimental results, we can observe that: When the KTL completely and correctly cover class distributions of the true, all classifiers except centroid distance-based classifier perform very well, with a zero error rate and small number of predictions. Notice that the centroid distance-based classifier will not be affected by different degrees of knowledge completeness λ. But the error rate of centroid distance-based classifier is obviously higher than that of other methods, even when λ =1. With the decreasing completeness of knowledge, i.e. the decreasing λ, we find that the performance of production rule-based classifier will degrade dramatically, especially when the dimensionality of feature space (or the number of attributes) is high. As we know, the samples distributed in a high dimensionality are close to the boundary of a class [14]. Therefore, when the value ranges in KTL shrink toward the centers, most of samples will located outside of the class region boxes, and therefore cannot be recognized by production rule-based classifier. Our proposed method knowledge-based NB has reasoning capacity for the located outside of the class region box. That is why knowledge-based NB will keep a relatively lower error rates with the decreasing λ, compared with other three methods. When the knowledge completeness is too low, knowledge-based NB will output a higher number of predicted class labels. This indicates that based on less and less knowledge, knowledge-based NB has a lower confidence in decision making. 706

7 Table IV EXPERIMENTAL RESULTS OF KNOWLEDGE-BASED NB (K-NB), PRODUCTION RULE-BASED CLASSIFIER, BOUNDARY DISTANCE-BASED CLASSIFIER AND CENTROID DISTANCE-BASED CLASSIFIER ON ECOLI DATA Evaluation knowledge completeness parameter λ Methods measures K-NB error rate #prediction production error rate rule #prediction boundary error rate distance #prediction centroid error rate distance #prediction Table V EXPERIMENTAL RESULTS OF KNOWLEDGE-BASED NB (K-NB), PRODUCTION RULE-BASED CLASSIFIER, BOUNDARY DISTANCE-BASED CLASSIFIER AND CENTROID DISTANCE-BASED CLASSIFIER ON SEGMENT DATA Evaluation knowledge completeness parameter λ Methods measures K-NB error rate #prediction production error rate rule #prediction boundary error rate distance #prediction centroid error rate distance #prediction Table VI EXPERIMENTAL RESULTS OF KNOWLEDGE-BASED NB (K-NB), PRODUCTION RULE-BASED CLASSIFIER, BOUNDARY DISTANCE-BASED CLASSIFIER AND CENTROID DISTANCE-BASED CLASSIFIER ON VOWEL DATA Evaluation knowledge completeness parameter λ Methods measures K-NB error rate #prediction production error rate rule #prediction boundary error rate distance #prediction centroid error rate distance #prediction IV. CONCLUSIONS This paper proposed a knowledge-based target classification system. Three features differentiate our method from the traditional methods. Firstly, our system automatically learns contextual information from input. This learnt knowledge is saved and then combined with given expert knowledge in classification. The second feature of our system is that the classification algorithm proposed here is knowledge-driven, which is extremely useful in the applications where expert knowledge is available instead of training. Thirdly, the classification algorithm embeds a soft-decision mechanism, which is able to deal with overlapping. The experimental results show that our system with proposed knowledge-based naive Bayes outperforms other classical classification models including production rule, boundary and prototype models. The experimental results also show that our method performs better than other classical models when the provided knowledge is incomplete subject to the true distributions. REFERENCES [1] T. T. Mitchell, Machine Learning. The McGraw-Hill Companies, Inc., [2] R.O.Duda,P.E.Hart,andD.G.Stork,Pattern Classification. John Wiley & Sons, Inc, [3] P. L. Thomas G. Dietterich, Cognitive Networks (Book Chapter: Machine Learning for Cognitive Networks: Technology Assessment and Research Challenges). John Wiley & Sons, Ltd, [4] E. Bosse, J. Roy, and S. Wark, Concepts, Models, and Tools for Information Fusion. Norwood, MA, USA: Artech House, Inc., [5] W. Tang, K. Z. Mao, L. O. Mak, and G. W. Ng, Classification for overlapping classes using optimized overlapping region detection and soft decision, in Proceedings of the 13th International Conference on Information Fusion, Edinburgh, United Kingdom, July 2010, pp [6] R. Sun, The Cambridge Handbook of Computational Psychology. New York, NY, USA: Cambridge University Press, [7] R. R. Hunt and H. C. Ellis, Fundamentals of Cognitive Psychology, Seventh Edition. New York, NY, USA: The McGraw-Hill Companies, Inc., [8] P. Langley, W. Iba, and K. Thompson, An analysis of bayesian classifiers, in IN PROCEEDINGS OF THE TENTH NATIONAL CON- FERENCE ON ARTI CIAL INTELLIGENCE. MIT Press, 1992, pp [9] P. Domingos and M. Pazzani, On the optimality of the simple bayesian classifier under zero-one loss, Machine Learning, vol. 29, pp , [10] Y. Yang and G. I. Webb, A comparative study of discretization methods for naive-bayes classifiers, in Proceedings of PKAW The 2002 Pacific Rim Knowledge Acquisition Workshop. Tokyo, Japan, 2002, pp [11] N. Friedman, D. Geiger, and M. Goldszmidt, Bayesian network clas- 707

8 sifiers, vol. 29, pp , [12] F. Zheng and G. I. Webb, A comparative study of semi-naive bayes methods in classification learning, in Proceedings of 4th Australasian Data Mining conference (AusDM05, 2005, pp [13] P. M. Murphy and D. W. Aha, Uci repository of machine learning bases, Irvine, CA: University of California, [14] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer,

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood