HCOC: hierarchical classifier with overlapping class groups

Size: px

Start display at page:

Download "HCOC: hierarchical classifier with overlapping class groups"

Isabel Martin
6 years ago
Views:

HCOC: hierarchical classifier with overlapping class groups Igor T Podolak Group of Machine Learning Methods GMUM Theoretical Foundations of Machine

1 HCOC: hierarchical classifier with overlapping class groups Igor T Podolak Group of Machine Learning Methods GMUM Theoretical Foundations of Machine Learning, Będlewo th February / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 1/25 25

2 classification problem predefined class hierarchy model building Hierarchical classifier with over- lapping class groups model s architecture weak classifiers cluster weights and evaluation convergence HCOC overlapping of class groups weak classifiers training vs hierarchy fusion of training methods evaluation methods cluster competence learning 2 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 2/25 25

3 Problem statement 1 high number of classes 11 right model architecture 12 unbalanced number of class examples 13 divide the problem into simpler ones? 2 what is a hierarchical classification? 21 predefined class hierarchy 22 map natural class groups to the model architecture 3 solve by splitting the output classification space 31 hierarchically group examples from similar classes 32 hipothesis: if examples from classes A and B are frequently mistaken, then they are probably similar 321 define the similarity of classes with the frequency of incorrect classifications 33 find the class groups using weak classifiers (hierarchically) 3 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 3/25 25

4 Problem tasks to solve 4 HCOC: fusion of supervised training in nodes and unsupervised cluster building 41 supervised training returns class probability vectors 411 hypothesis: similar classification vectors = examples hard to differentiate = classes are similar 42 clustering in classifiers activation space recovers classification errors 43 a classifier trained in supervised mode might be weak 4 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 4/25 25

5 classifier tree root 5 / 23

6 each node is a separate classifier P(C = A x) Cl in node returns a class probability vector Similar activations represent similar classes, thus we may split them into subproblems 6 / 23

7 some classes are classified similarly A B E G J X Similarly classified classes are grouped together into clusters Grouping makes it possible to recover some classification errors later Clusters may overlap 7 / 23

8 classifiers are weak J X C D M P A B E G J X K-class classifier is at least weak if the probability that the activation for the true class is at least 1/K K-class Cl is weak [ iff E[Cl i (x) true(x) = i] for true class is higher than α(k), where α(k) = min α : ( 1) i( ) ] K 1 α i (1 iα 1 α )K 2 + > 1 K K α(k) / 23

9 cluster weights are computed separately for each given input vector J X C D M P A B E G J X w l (x) = f kl = K f klcl k (x) k=1 L l =1 { 1 C k Q l 0 C k Q l K k =1 f k l Cl k (x) w l (x) corresponds to softmax, therefore a model that predicts a cluster is a classifier different competence measures 9 / 23

10 clustering methods J X C D M P A B E G J X w l (x) = K f klcl k (x) k=1 L l =1 K k =1 f k l Cl k (x) SAHN based, Bayesian, GNG based Bayesian: join classes using error matrix GNG: build clusters online simultaneously with classifier training control diversity of clusters and descendant classifiers 10 / 23

11 clusters overlap J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E individual classes may belong to several clusters which clusters do overlap comes from the inability of classifier to solve the actual problem: an architecture corresponding to the problem is being built clusters overlap increases the HCOC accuracy ability 11 / 23

12 independence of HCOC base classifiers J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E each Cl solves its own subproblem subtrees may be built independently in parallel 12 / 23

13 classification on different levels J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X J X E G J 13 / 23

14 HCOC convergence of training J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X Let HCOC be two-level model with l(x, t, h(x)) = (t h(x)) 2 The HCOC risk is lower than risk of root Cl 0 provided that classes are spread independently betwen clusters and k l i p if kl m l iim 0 ik is maximised and higher than i p im ii i p i k (m ik) 2 HCOC is built recursively: the above statement strengthens J X with each level added E G J 14 / 23

15 weakness property of base classifiers J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X more clusters give better results proposed weakness definition allows to control the weakness J X of node classifiers E G J 15 / 23

16 it is possible to build several simple classifiers independently J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X J X E G J M P H I K S T Z N O J X C D I K Q R N O U X E D M P F L S T Z W Y A F Q R D F L U X E V W Y 16 / 23

17 complete classifier J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X J X E G J M P H I K S T Z N O J X C D I K Q R N O U X E D M P F L S T Z W Y A F Q R D F L U X E V W Y J X C Q R D X C D D F L U X E V N O U E V W Y U X E 17 / 23

18 evaluation of HCOC J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X J X E G J P(C j x) = y j (x) = L l=1 w l(x)y l j(x) where y l j(x) is the return value of descendent classifier with competence w l (x) for given x geometric mean in overlaps possible methods: All-subtrees, Single-path, Restricted and α-restricted All-subtrees evaluate all paths Single-path select only the highest competence w l path 18 / 23

19 Restricted and α-restricted approaches J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X only some clusters shall have competence higher than a priori p i probability of classes evaluation of others is equal of adding some noise J X E G J Restricted use only clusters where at least one class has activation higher than a priori p i α-restricted use only clusters where C k Cl k (x) > α(k) (weakness condition is being used) 19 / 23

20 Restricted and α-restricted J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E both methods use only paths which carry correct information with high probability, ie M P H I K Smost T Z Ncorrect O inforation A B E G E G J X J X C D I K Q R N O U X E D M P F L S T Z W Y A F Q R D F L U X E V W Y both have higher accuracy J X J X C Q R D U X E V N O U E G J X C D D F L E V W Y U X E 20 / 23

21 J X C D M P Q R D F L S T Z U X E V W Y A F A B E G J X M P H I K Q R S T Z N O U X E A B E G E G J X J X E G J M P H I K S T Z N O J X C D I K Q R N O U X E D M P F L S T Z W Y A F Q R D F L U X E V W Y J X C Q R D X C D D F L U X E V N O U E V W Y U X E 21 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 21/

22 HCOC properties 1 fusion of supervised and unsuperised training 11 possible solution for a high number of output classes 2 a split corresponds to complexity of subproblem at a node 21 subproblems overlap, hence improvement of accuracy 22 problem split through unsupervised clustering of class outputs 23 clustering control results in different resulting subproblems 24 different clustering methods 241 parallel training 3 weak classifiers in nodes 31 probabilistic measure of classifiers weakness 32 provides for simple weakness control 22 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 22/25 25

23 HCOC properties 4 classifiers competence compted separately for each input vector classified 5 different methods of evaluation 51 simple reduction of unimportant information (noise) 52 evaluation is related to classifier weakness 6 HCOC properties 61 classifier risk is minimised with new layers being added 62 control of diversity 23 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 23/25 25

24 tfml 201? 24 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 24/25 25

25 GMUM 25 / Igor T Podolak igorpodolak@ujedupl HCOC: hierarchical classsifier with overlapping class groups 25/25 25

Modern Information Retrieval

Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction