Information theoretical approach for domain ontology exploration in large Earth observation image archives

Information theoretical approach for domain ontology exploration in large Earth observation image archives Mihai Datcu, Mariana Ciucu DLR Oberpfaffenhofen IGARSS 2004

KIM - Knowledge driven Information Mining in remote sensing image archives Semantic Definition Objectives: a prototype system of a next generation architecture to help the users to gather relevant information rapidly, manage and add value to the huge amounts of historical and newly acquired satellite data-sets Image Information Mode Learn Mode Label Information Name Description Label_1 Pan Mode Reset Similar labels View Order Gallery Store A posteriori Map Models contribution Model1 Model2 Search Search parameters Set label Set a label Probability Separability Coverage Set collections Set Satellites and/or Sensors Collection_1 Satellite_1 Collection_2 Satellite_2 Collection_3 Collection_4 Sensor_1 Collection_5 Sensor_2 Date Area Upper bound(lat/lon) From: yyyy/mm/dd Lower bound(lat/lon) To: yyyy/mm/dd Textual annotation Stored Queries Set a stored query Details Query_1 Query_2 Apply Query_3 Result Data: ERS, Landsat, Ikonos, MERIS, E-SAR, DEDALUS Search Store Query Parameters Confirm Selection Evaluators and users: ESRIN, EUSC, NERSC, DLR Output: prototype system

The paradox We have to design systems to enable people to analyse TeraBytes of data! People have normally trouble in caching more than 7 items at a time

HIERARCHY OF INFORMATION REPRESENTATION un- supervised clustering un- supervised clustering

Clustering and coding Morse alphabet E I A _ O _

Semantic coding texture spectral

Image semantics

Advanced communication Image s I k Cluster s? 1j Model p(? I ) p(l? ) Semantic labels CITY VILAGE ROAD? k Model 2 RIVER FOREST

Data-Information-Knowledge -applicable to a number of fields Geology Hydrology Domain Ontology -knowledge acquisition and sharing within user communities Water Drainage Semantics -multi - type data handling Mountain River Glacier Sea Grassland Agriculture Labels -knowledge communication Spectral Texture Spectral SAR Features Images

Semantic grouping - method Supervised Bayesian classification: p ( L ω i µ i ) > p( L ω ), µ Inference from data to aggregated labels: p( Λ τ D) = p( Λ ) τ p( L Λτ ) p( L p( L ) D) p( L Λτ ) Interactive learning the probabilistic link : p (L Λ, ψ ) = ψ ψ = ψ, K, ψ } τ p( ψ ) = Γ( c) = ( c 1)! p( ψ T) = Dir ( ψ α) p(l Λ τ ) = α α { 1 c

Controlling the Dynamic Indexing Problem In the system there may be labels with different names and with the same information content (the same meaning) Table LABEL from DataBase Labels L1 L2 pω L 11 ) ( 1 pω L 12 ) ( 1 pω ( L 11 2) pω ( L 12 2) p( ij 1 ω L) p( ij 2 ω L) Signal classes ω 11 ω 12 ω ij

Controlling the Dynamic Indexing Solution : similarity measure, Kullback-Leibler divergence KL( L 1 L 2 ) = i j p ( ω L ) ij 1 ( ω L ) p ( ) ij 1 *log q ωij L2 p q ( ωij L1 ) = p1 ( ωi L1 )* p2( ω j L1 ) ( ω L ) = q ( ω L )* q ( ω L ) ij 2 1 i 2 2 j 2 Example The initial label for two classfiles 1 4 7 2 5 8 3 6 9 The labels from database for the same classfiles After Kullback-Leibler procedure the sorted labels list is : 2, 1, 6, 4, 5, 9,7, 8 3

Domain Ontology Search Knowledge Graph Hierarchical structure Union of trees Graph more abstract concept domain Sub-domain Aggregated label aggregation algorithm For the same label we can have many realisations, with the same features ( like L2 ) L1 L2 L3 L4 L5 feature feature feature feature feature classes classes classes classes classes image image image image image classes lavel features lavel classes lavel image lavel more concrete concept When a user is training a new label on an image, the system may try to identify the other labels closest to it ( KL between 0 and 1) and give the user the possibility to choose an old label or to define a new one

Domain Ontology Search Problem Solution User needs to learn from another user We can predict what the user wants similarity measure ->Kullback-Leibler divergence between two ontologies r 1 P(k 1 r 1 ) P(k 2 r 1 ) P(k n r 1 ) k 1 k 2 k n r2 KL' ( r n 1, r2 ) = KL( r1, r2 ) + p( ki r1 )* KL' ( ki, pi ) i= 1 Where, r1 root node of sub-trees for one of users r2 root node of sub-trees for other user k i child nodes of r 1 p i child nodes of r 2 P(p 1 r 2 ) P(p 2 r 2 ) P(p n r 2 ) p 1 p 2 p n

Domain Ontology Search Examples 11178 hd_cloud6 md_cloud the entropy from the root node = hd_cloud6 is 7232032711267443 the entropy from the root node = md_cloud is 875741191905023 KL (hd_cloud6,md_cloud) -----------------------------07136991789787241 11190 ac_acqua_veloce hd_water2 the entropy from the root node = ac_acqua_veloce is 8134289966218503 the entropy from the root node = hd_water2 is 61539319369981 KL (ac_acqua_veloce,hd_water2) -----------------------------06401859742548874

Conclusions The system helps users by cumulative learning incremental acquisition of knowledge building on knowledge acquired in earlier steps take advantage of what was learned before share knowledge IGARSS 2004