Handling imprecise and uncertain class labels in classification and clustering

Similar documents
Introduction to belief functions

Learning from partially supervised data using mixture models and belief functions

A novel k-nn approach for data with uncertain attribute values

Belief functions: basic theory and applications

Functions. Marie-Hélène Masson 1 and Thierry Denœux

Learning from data with uncertain labels by boosting credal classifiers

The cautious rule of combination for belief functions and some extensions

Data Fusion in the Transferable Belief Model.

The Unnormalized Dempster s Rule of Combination: a New Justification from the Least Commitment Principle and some Extensions

Combination of classifiers with optimal weight based on evidential reasoning

SMPS 08, 8-10 septembre 2008, Toulouse. A hierarchical fusion of expert opinion in the Transferable Belief Model (TBM) Minh Ha-Duong, CNRS, France

Partially-Hidden Markov Models

Pairwise Classifier Combination using Belief Functions


On belief functions implementations

How to implement the belief functions

Evidence combination for a large number of sources

Fuzzy Systems. Possibility Theory.

Fuzzy Systems. Possibility Theory

Multi-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric

A hierarchical fusion of expert opinion in the Transferable Belief Model (TBM) Minh Ha-Duong, CNRS, France

arxiv: v1 [cs.ai] 28 Oct 2013

arxiv: v1 [cs.ai] 16 Aug 2018

Decision fusion for postal address recognition using belief functions

A hierarchical fusion of expert opinion in the TBM

Entropy and Specificity in a Mathematical Theory of Evidence

Prediction of future observations using belief functions: a likelihood-based approach

Credal Fusion of Classifications for Noisy and Uncertain Data

Hierarchical Proportional Redistribution Principle for Uncertainty Reduction and BBA Approximation

Decision-making with belief functions

arxiv: v1 [cs.cv] 11 Jun 2008

IN CHANGE detection from heterogeneous remote sensing

Belief Functions: the Disjunctive Rule of Combination and the Generalized Bayesian Theorem

Contradiction Measures and Specificity Degrees of Basic Belief Assignments

Managing Decomposed Belief Functions

Online active learning of decision trees with evidential data

A Fuzzy-Cautious OWA Approach with Evidential Reasoning

The Semi-Pascal Triangle of Maximum Deng Entropy

General combination rules for qualitative and quantitative beliefs

A New ER-MCDA Mapping for Decision-Making based on Imperfect Information

Sound Source Localization from Uncertain Information Using the Evidential EM Algorithm

A Static Evidential Network for Context Reasoning in Home-Based Care Hyun Lee, Member, IEEE, Jae Sung Choi, and Ramez Elmasri, Member, IEEE

Analyzing the Combination of Conflicting Belief Functions.

Handling Dierent Forms of. Uncertainty in Regression Analysis: a Fuzzy Belief Structure Approach. Universite de Technologie de Compiegne,

Joint Tracking and Classification of Airbourne Objects using Particle Filters and the Continuous Transferable Belief Model

CSCI-567: Machine Learning (Spring 2019)

On Conditional Independence in Evidence Theory

IGD-TP Exchange Forum n 5 WG1 Safety Case: Handling of uncertainties October th 2014, Kalmar, Sweden

Hierarchical DSmP Transformation for Decision-Making under Uncertainty

Semantics of the relative belief of singletons

Information fusion for scene understanding


Evidential Reasoning for Multi-Criteria Analysis Based on DSmT-AHP

Analyzing the degree of conflict among belief functions.

The internal conflict of a belief function

PATTERN RECOGNITION AND MACHINE LEARNING

Rough operations on Boolean algebras

Context-dependent Combination of Sensor Information in Dempster-Shafer Theory for BDI

Is Entropy Enough to Evaluate the Probability Transformation Approach of Belief Function?

A generic framework for resolving the conict in the combination of belief structures E. Lefevre PSI, Universite/INSA de Rouen Place Emile Blondel, BP

Belief functions: past, present and future

A NEW CLASS OF FUSION RULES BASED ON T-CONORM AND T-NORM FUZZY OPERATORS

The intersection probability and its properties

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Combining Belief Functions Issued from Dependent Sources

Statistical learning. Chapter 20, Sections 1 4 1

Application of Evidence Theory and Discounting Techniques to Aerospace Design

Cautious OWA and Evidential Reasoning for Decision Making under Uncertainty

Data Preprocessing. Cluster Similarity

Adaptative combination rule and proportional conflict redistribution rule for information fusion

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A new generalization of the proportional conflict redistribution rule stable in terms of decision

Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches

Uncertainty and Rules

Application of Evidence Theory to Construction Projects

Sequential adaptive combination of unreliable sources of evidence

Autonomous Underwater Vehicle sensors fusion by the theory of belief functions for Rapid Environment Assessment

Data Fusion with Imperfect Implication Rules

L11: Pattern recognition principles

A Practical Taxonomy of Methods and Literature for Managing Uncertain Spatial Data in Geographic Information Systems

Belief functions: A gentle introduction

Protein Structure Prediction Using Neural Networks

A Generalization of Bayesian Inference in the Dempster-Shafer Belief Theoretic Framework

Change Detection from Remote Sensing Images Based on Evidential Reasoning

Reliability assessment for multi-state systems under uncertainties based on the Dempster-Shafer theory

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

REASONING UNDER UNCERTAINTY: CERTAINTY THEORY

Contradiction measures and specificity degrees of basic belief assignments

A Study of the Pari-Mutuel Model from the Point of View of Imprecise Probabilities

Extended component importance measures considering aleatory and epistemic uncertainties

COMBINING BELIEF FUNCTIONS ISSUED FROM DEPENDENT SOURCES

The Transferable Belief Model for reliability analysis of systems with data uncertainties and failure dependencies

The maximum Deng entropy

Previous Accomplishments. Focus of Research Iona College. Focus of Research Iona College. Publication List Iona College. Journals

Decision of Prognostics and Health Management under Uncertainty

Statistical reasoning with set-valued information : Ontic vs. epistemic views

Naive Possibilistic Classifiers for Imprecise or Uncertain Numerical Data

BIVARIATE P-BOXES AND MAXITIVE FUNCTIONS. Keywords: Uni- and bivariate p-boxes, maxitive functions, focal sets, comonotonicity,

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Possibilistic information fusion using maximal coherent subsets

Transcription:

Handling imprecise and uncertain class labels in classification and clustering Thierry Denœux 1 1 Université de Technologie de Compiègne HEUDIASYC (UMR CNRS 6599) COST Action IC 0702 Working group C, Mallorca, March 16, 2009 Thierry Denœux Handling imprecise and uncertain class labels 1/ 48

Classification and clustering Classical framework We consider a collection L of n objects. Each object is assumed to belong to one of K groups (classes). Each object is described by An attribute vector x R p (attribute data), or Its similarity to all other objects (proximity data). The class membership of objects may be: Completely known, described by class labels (supervised learning); Completely unknown (unsupervised learning); Known for some objects, and unknown for others (semi-supervised learning). Thierry Denœux Handling imprecise and uncertain class labels 2/ 48

Classification and clustering Problems Problem 1: predict the class membership of objects drawn from the same population as L (classification). Problem 2: Estimate parameters of the population from which L is drawn (mixture model estimation). Problem 3: Determine the class membership of objects in L (clustering); supervised unsupervised semi-supervised Problem 1 x x Problem 2 x x x Problem 3 x x Thierry Denœux Handling imprecise and uncertain class labels 3/ 48

Motivations In real situations, we may have only partial knowledge of class labels: intermediate situation between supervised and unsupervised learning partially supervised learning. The class membership of objects can usually be predicted with some remaining uncertainty: the outputs from classification and clustering algorithms should reflect this uncertainty. The theory of belief functions is suitable for representing uncertain and imprecise class information: as input to classification and mixture model estimation algorithms; as output of classification and clustering algorithms. Thierry Denœux Handling imprecise and uncertain class labels 4/ 48

Outline 1 Theory of belief functions 2 Principle Implementation 3 Problem statement Method Simulation results 4 Problem Evidential c-means Thierry Denœux Handling imprecise and uncertain class labels 5/ 48

Mass function Let X be a variable taking values in a finite set Ω (frame of discernment). Mass function: m : 2 Ω [0, 1] such that m(a) = 1. A Ω Every A of Ω such that m(a) > 0 is a focal set of m. Interpretation: m represents An item of evidence regarding the value of X. A state of knowledge (belief state) induced by this item of evidence. Thierry Denœux Handling imprecise and uncertain class labels 6/ 48

Full notation m Ω Ag,t {X}[EC] denotes the mass function Representing the beliefs of agent Ag; At time t; Regarding variable X; Expressed on frame Ω; Based on evidential corpus EC. Thierry Denœux Handling imprecise and uncertain class labels 7/ 48

Special cases m may be seen as: A family of weighted sets {(A i, m(a i )), i = 1,..., r}. A generalized probability distribution (masses are distributed in 2 Ω instead of Ω). Special cases: r = 1: categorical mass function ( set). We denote by m A the categorical mass function with focal set A. A i = 1, i = 1,..., r: Bayesian mass function ( probability distribution). A 1... A r : consonant mass function ( possibility distribution). Thierry Denœux Handling imprecise and uncertain class labels 8/ 48

Belief and plausibility functions Belief function: bel(a) = B A B A m(b) = =B A m(b), A Ω (degree of belief (support) in hypothesis "X A") Plausibility function: pl(a) = m(b), A Ω B A (upper bound on the degree of belief that could be assigned to A after taking into account new information) bel pl. Thierry Denœux Handling imprecise and uncertain class labels 9/ 48

Relations between m, bel et pl Relations: bel(a) = pl(ω) pl(a), A Ω m(a) = { =B A ( 1) A B bel(b), A 1 bel(ω) A = m, bel et pl are thus three equivalent representations of a same piece of information. Thierry Denœux Handling imprecise and uncertain class labels 10/ 48

Dempster s rule Definition (Dempster s rule of combination) A Ω, (m 1 m 2 )(A) = m 1 (B)m 2 (C). B C=A Properties: Commutativity, associativity. Neutral element: vacuous m Ω such that m Ω (Ω) = 1 (represents total ignorance). (m 1 m 2 )( ) 0: degree of conflict. Justified axiomatically. Other rules exist (disjunctive rule, cautious rule, etc...). Thierry Denœux Handling imprecise and uncertain class labels 11/ 48

Discounting Discounting allows us to take into account meta-knowledge about the reliability of a source of information. Let m be a mass function provided by a source of information. α [0, 1] be the plausibility that the source is not reliable. Discounting m with discount rate α yields the following mass function: α m = (1 α)m + αm Ω. Properties: 0 m = m and 1 m = m Ω. Thierry Denœux Handling imprecise and uncertain class labels 12/ 48

Pignistic transformation Assume that our knowledge about X is represented by a mass function m, and we have to choose one element of Ω. Several strategies: 1 Select the element with greatest plausibility. 2 Select the element with greatest pignistic probability: Betp(ω) = (m assumed to be normal) {A Ω ω A} m(a). A Thierry Denœux Handling imprecise and uncertain class labels 13/ 48

Outline Theory of belief functions Principle Implementation 1 Theory of belief functions 2 Principle Implementation 3 Problem statement Method Simulation results 4 Problem Evidential c-means Thierry Denœux Handling imprecise and uncertain class labels 14/ 48

Problem Theory of belief functions Principle Implementation Let Ω denote the set of classes, et L the learning set L = {e i = (x i, m i ), i = 1,..., n} where x i is the attribute vector for object o i, and m i = m Ω {y i } is a mass function on the class y i of object o i. Special cases: m i ({ω k }) = 1: precise labeling; m i (A) = 1 for A Ω: imprecise (set-valued) labeling; m i is a Bayesian mass function: probabilistic labeling; m i is a consonant mass function: possibilistic labeling, etc... Problem: Build a mass function m Ω {y}[x, L] regarding the class y of a new object o described by x. Thierry Denœux Handling imprecise and uncertain class labels 15/ 48

Solution (Denoeux, 1995) Theory of belief functions Principle Implementation Each example e i = (x i, m i ) in L is an item of evidence regarding y. The reliability of this information decreases with the distance between x and x i. It should be discounted with a discount rate α i = φ (d(x, x i )), where φ is a decreasing function from R + to [0, 1]: m{y}[x, e i ] = α i m i. The n mass functions should then be combined conjunctively: m{y}[x, L] = m{y}[x, e i ]... m{y}[x, e n ]. Thierry Denœux Handling imprecise and uncertain class labels 16/ 48

Implementation Principle Implementation Take into account only the k nearest neighbors of x dans L evidential k-nn rule (generalizes the voting k-nn rule). Definition of φ: for instance, φ(d) = β exp( γd 2 ). Determination of hyperparameters β and γ heuristically or by minimizing an error function (Zouhal and Denoeux, 1997). Summarize L as r prototypes learnt by minimizing an error function RBF-like neural network approach (Denoeux, 2000). Thierry Denœux Handling imprecise and uncertain class labels 17/ 48

: EEG data Principle Implementation 500 EEG signals encoded as 64-D patterns, 50 % negative (delta waves), 5 experts. 300 0 0 0 0 0 300 1 0 0 0 0 300 1 0 1 0 0 A/D converter output 250 200 150 100 50 250 200 150 100 50 250 200 150 100 50 0 0 20 40 60 0 0 20 40 60 0 0 20 40 60 300 1 1 0 1 0 300 1 0 1 1 1 300 1 1 1 1 1 A/D converter output 250 200 150 100 50 250 200 150 100 50 250 200 150 100 50 0 0 20 40 60 0 0 20 40 60 time 0 0 20 40 60 Thierry Denœux Handling imprecise and uncertain class labels 18/ 48

Results on EEG data (Denoeux and Zouhal, 2001) Principle Implementation K = 2 classes, d = 64 data labeled by 5 experts Possibilistic labels computed from distribution of expert labels using a probability-possibility transformation. n = 200 learning patterns, 300 test patterns k k-nn w K -NN TBM TBM (crisp labels) (uncert. labels) 9 0.30 0.30 0.31 0.27 11 0.29 0.30 0.29 0.26 13 0.31 0.30 0.31 0.26 Thierry Denœux Handling imprecise and uncertain class labels 19/ 48

Outline Theory of belief functions Problem statement Method Simulation results 1 Theory of belief functions 2 Principle Implementation 3 Problem statement Method Simulation results 4 Problem Evidential c-means Thierry Denœux Handling imprecise and uncertain class labels 20/ 48

Mixture model Theory of belief functions Problem statement Method Simulation results The feature vectors and class labels are assumed to be drawn from a joint probability distribution: P(Y = k) = π k, k = 1,..., K, f (x Y = k) = f (x, θ k ), k = 1,..., K. Let ψ = (π 1,..., π K, θ 1,..., θ K ). Thierry Denœux Handling imprecise and uncertain class labels 21/ 48

Data Theory of belief functions Problem statement Method Simulation results We consider a realization of an iid random sample from (X, Y ) of size n: (x 1, y 1 ),..., (x n, y n ). The class labels are assumed to be imperfectly observed and partially specified by mass functions. The learning set has the following form: L = {(x 1, m 1 ),..., (x n, m n )}. Problem 2: estimate ψ using L. Remark: this problem encompasses supervised, unsupervised and semi-supervised learning as special cases. Thierry Denœux Handling imprecise and uncertain class labels 22/ 48

Generalized likelihood criterion (Côme et al., 2009) Problem statement Method Simulation results Approach: ˆψ = arg max ψ plψ (ψ L). Theorem The logarithm of the conditional plausibility of ψ given L is given by ( ) ln pl Ψ (ψl) = ( N K ) ln pl ik π k f (x i ; θ k ) + ν, i=1 k=1 where pl ik = pl(y i = k) and ν is a constant. Thierry Denœux Handling imprecise and uncertain class labels 23/ 48

Generalized EM algorithm (Côme et al., 2009) Problem statement Method Simulation results An EM algorithm (with guaranteed convergence) can be derived to maximize the previous criterion. This algorithm becomes identical to the classical EM algorithm in the case of completely unsupervised or semi-supervised data. The complexity of this algorithm is identical to that of the classical EM algorithm. Thierry Denœux Handling imprecise and uncertain class labels 24/ 48

Experimental settings Problem statement Method Simulation results Simulated and real data sets. Each example i was assumed to be labelled by an expert who provides his/her most likely label ŷ i and a measure of doubt p i. This information is represented by a simple mass function: m i ({ŷ i }) = 1 p i m i (Ω) = p i. Simulations: p i drawn randomly form a Beta distribution, true label changed to any other label with probability p i. Thierry Denœux Handling imprecise and uncertain class labels 25/ 48

Results Iris data Theory of belief functions Problem statement Method Simulation results Thierry Denœux Handling imprecise and uncertain class labels 26/ 48

Results Wine data Theory of belief functions Problem statement Method Simulation results Thierry Denœux Handling imprecise and uncertain class labels 27/ 48

Outline Theory of belief functions Problem Evidential c-means 1 Theory of belief functions 2 Principle Implementation 3 Problem statement Method Simulation results 4 Problem Evidential c-means Thierry Denœux Handling imprecise and uncertain class labels 28/ 48

Credal partition Problem Evidential c-means n objects described by attribute vectors x 1,..., x n. Assumption: each object belongs to one of K classes in Ω = {ω 1,..., ω K }, Goal: express our beliefs regarding the class membership of objects, in the form of mass functions m 1,..., m n on Ω. Resulting structure = Credal partition, generalizes hard, fuzzy and possibilistic partitions Thierry Denœux Handling imprecise and uncertain class labels 29/ 48

Theory of belief functions Problem Evidential c-means A m 1 (A) m 2 (A) m 3 (A) m 4 (A) m 5 (A) 0 0 0 0 0 {ω 1 } 0 0 0 0.2 0 {ω 2 } 0 1 0 0.4 0 {ω 1, ω 2 } 0.7 0 0 0 0 {ω 3 } 0 0 0.2 0.4 0 {ω 1, ω 3 } 0 0 0.5 0 0 {ω 2, ω 3 } 0 0 0 0 0 Ω 0.3 0 0.3 0 1 Thierry Denœux Handling imprecise and uncertain class labels 30/ 48

Special cases Theory of belief functions Problem Evidential c-means Each m i is a certain bba crisp partition of Ω. Each m i is a Bayesian bba fuzzy partition of Ω u ik = m i ({ω k }), i, k Each m i is a consonant bba possibilistic partition of Ω u ik = pl Ω i ({ω k }) Thierry Denœux Handling imprecise and uncertain class labels 31/ 48

Algorithms Theory of belief functions Problem Evidential c-means EVCLUS (Denoeux and Masson, 2004): proximity (possibly non metric) data, multidimensional scaling approach. Evidential c-means (ECM): (Masson and Denoeux, 2008): attribute data, alternate optimization of an FCM-like cost function. Thierry Denœux Handling imprecise and uncertain class labels 32/ 48

Basic ideas Theory of belief functions Problem Evidential c-means Let v k be the prototype associated to class ω k (k = 1,..., K ). Let A j a non empty subset of Ω (a set of classes). We associate to A j a prototype v j defined as the center of mass of the v k for all ω k A j. Basic ideas: for each non empty A j Ω, m ij = m i (A j ) should be high if x i is close to v j. The distance to the empty set is defined as a fixed value δ. Thierry Denœux Handling imprecise and uncertain class labels 33/ 48

Optimization problem Problem Evidential c-means Minimize J ECM (M, V ) = n i=1 {j/a j,a j Ω} A j α m β ij d 2 ij + n δ 2 m β i, i=1 subject to {j/a j Ω,A j } m ij + m i = 1 i = 1, n, Thierry Denœux Handling imprecise and uncertain class labels 34/ 48

Butterfly dataset Problem Evidential c-means 10 12 8 6 x 2 4 2 2 10 0 1 3 5 6 7 9 11 4 2 6 4 2 0 2 4 6 8 10 x 1 8 Thierry Denœux Handling imprecise and uncertain class labels 35/ 48

Butterfly dataset Results Problem Evidential c-means 1 0.8 m({ 1 }) m( ) m({ 2 }) m(emptyset) bbas 0.6 0.4 0.2 0 0 2 4 6 8 10 12 point number Thierry Denœux Handling imprecise and uncertain class labels 36/ 48

4-class data set Problem Evidential c-means 10 8 6 4 2 x 2 0 2 4 6 8 8 6 4 2 0 2 4 6 8 10 x 1 Thierry Denœux Handling imprecise and uncertain class labels 37/ 48

4-class data set Hard credal partition Problem Evidential c-means 10 8 4 6 1 4 14 134 124 2 34 13 24 12 234 123 0 2 3 23 4 2 6 8 8 6 4 2 0 2 4 6 8 10 Thierry Denœux Handling imprecise and uncertain class labels 38/ 48

4-class data set Lower approximation Problem Evidential c-means 10 8 6 1 L 4 4 L 2 0 2 3 L 2 L 4 6 8 8 6 4 2 0 2 4 6 8 10 Thierry Denœux Handling imprecise and uncertain class labels 39/ 48

4-class data set Upper approximation Problem Evidential c-means 10 8 6 1 U 4 4 U 2 0 3 U 2 2 U 4 6 8 8 6 4 2 0 2 4 6 8 10 Thierry Denœux Handling imprecise and uncertain class labels 40/ 48

Brain data Theory of belief functions Problem Evidential c-means Magnetic resonance imaging of pathological brain, 2 sets of parameters. Image 1 shows normal tissue (bright) and ventricals + cerebrospinal fluid (dark). Image 2 shows pathology (bright). (a) Thierry Denœux Handling imprecise and uncertain class labels 41/ 48 (b)

Brain data Results in gray level space Problem Evidential c-means 150 12 Grey levels in image 2 100 50 2 1 3 13 23 0 0 20 40 60 80 100 120 140 160 180 200 Grey levels in image 1 Thierry Denœux Handling imprecise and uncertain class labels 42/ 48

Brain data Lower and upper approximations Problem Evidential c-means (a) (b) (c) Thierry Denœux Handling imprecise and uncertain class labels 43/ 48

The theory of belief functions extends both set theory and probability theory: It allows for the representation of imprecision and uncertainty. It is more general than possibility theory. Belief functions may be used to represent imprecise and/or uncertain knowledge of class labels soft labels. Many classification and clustering algorithms can be adapted to handle such class labels (partially supervised learning) generate them from data (credal partition) Thierry Denœux Handling imprecise and uncertain class labels 44/ 48

References References I cf. http://www.hds.utc.fr/ tdenoeux T. Denœux. A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man and Cybernetics, 25(05):804-813, 1995. L. M. Zouhal and T. Denoeux. An evidence-theoretic k-nn rule with parameter optimization. IEEE Transactions on Systems, Man and Cybernetics C, 28(2):263-271,1998. Thierry Denœux Handling imprecise and uncertain class labels 45/ 48

References References II cf. http://www.hds.utc.fr/ tdenoeux T. Denœux. A neural network classifier based on Dempster-Shafer theory. IEEE Transactions on Systems, Man and Cybernetics A, 30(2), 131-150, 2000. T. Denoeux and M. Masson. EVCLUS: Evidential Clustering of Proximity Data. IEEE Transactions on Systems, Man and Cybernetics B, (34)1, 95-109, 2004. Thierry Denœux Handling imprecise and uncertain class labels 46/ 48

References References III cf. http://www.hds.utc.fr/ tdenoeux T. Denœux and P. Smets. Classification using Belief Functions: the Relationship between the Case-based and Model-based Approaches. IEEE Transactions on Systems, Man and Cybernetics B, 36(6), 1395-1406, 2006. M.-H. Masson and T. Denoeux. ECM: An evidential version of the fuzzy c-means algorithm. Pattern Recognition, 41(4), 1384-1397, 2008. Thierry Denœux Handling imprecise and uncertain class labels 47/ 48

References References IV cf. http://www.hds.utc.fr/ tdenoeux E. Côme, L. Oukhellou, T. Denoeux and P. Aknin. Learning from partially supervised data using mixture models and belief functions. Pattern Recognition, 42(3), 334-348, 2009. Thierry Denœux Handling imprecise and uncertain class labels 48/ 48