Adaptive MultiModal Sensing of General Concealed Targets


 Stuart Pope
 6 months ago
 Views:
Transcription
1 Adaptive MultiModal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham, NC
2 Outline Review of semisupervised statistical classifiers and graphbased prior Extension of semisupervised classifiers to a multisensor setting: Bayesian cotraining Active multisensor sensing  Selection of those members of the unlabeled set for which acquisition of labels would be most informative  Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work
3 Nomenclature Labeled data: Set of N L feature vectors x n for which the associated label is known, denoted l n {,} for the binary case, thereby yielding the set D = {x l }., n n,n L L n = Unlabeled data: he set of N U feature vectors for which the associated labels are unknown, yielding the set D {x. his the data to be classified. = } = U n n N +,N + N L L U Supervised algorithm: Classifier that is designed using D L and tested on D U Semisupervised algorithm: Classifier that is designed using D L and D U. Used to estimate labels of D U.
4 Motivation: FOPEN : target : clutter We typically have far more unlabeled data than labeled examples (N U >>N L ) Seek labels for mostinformative feature vectors in D U ypically classification performed on isolated unlabeled examples one at a time We wish to classify members of D U using all information from D U and D L simultaneously
5 Motivation: SARBased Mine Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, visàvis image chips
6 Motivation: UXO Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, visàvis image chips
7 Logistic Link Function Design and a DataDependent Prior Assume labels {, }, and define the kernelbased function l n N = b y( x w) w K( x, b ) + w = Φ ( x) w n= he probability that x is associated with l n = is expressed as p( l n For the N L labeled examples, we wish to maximize the loglikelihood n = x, w) = σ[ y( x w)] = exp[ Φ ( x) w]/[ + exp( Φ N = L l( w) log P( D, w n= n l n L ) subject to a prior p( wd U ) L D on the weights, with the prior dependent on all data (labeled and unlabeled), via a graphbased construct ( x) w)] he classifier weights w MAP are therefore set at w MAP arg max = l w [ ( w D ) + log p( w D D )] L L U
8 GraphBased Design of DataDependent Prior Use a kernel k(x i,x j ) to define the similarity of x i and x j Note: his kernel is used to define the graph, and need not be the same as that employed in the classifier W ij =k(x i,x j ) is large when the two vectors are similar, e.g., 2 the radial basis function W k( x, x ) = exp[ x x / σ ] ij = i j i j Let the vectors x i constitute the nodes of the graph and let f ( x ) = Φ ( x ) w be a function on the graph i i We seek to minimize the energy function En( f) = 2 i, j W [ f (x ij i ) f( x j)] 2 Large W ij f(x i ) f(x i ) Defining f = { f( x ), f( x2),..., f( xn, it is easy to show that En(f)=f L + N U )} f where is the combinatorial Laplacian =DW, where the matrix W is defined by W ij and D is a diagonal matrix, the ith element of which is expressed as d i W = j ij
9 GraphBased Design of DataDependent Prior  2 Using En(f)=f f, finding f that minimizes En(f) corresponds to a MAP estimate of f from the Gaussian random field density function p( f) = Z β exp[ βen( f)] = Z β exp[ βf f] his gives us a prior on f, which is defined through f ( x ) = Φ ( x ) w, and therefore f=aw with i i A = x N + )] [ Φ( x ), Φ( x2),..., Φ( U NL We therefore have a prior on our model weights w also represented by a Gaussian random field p( w x, x 2,..., x ~ ) = N(, A NU + β NL ) with ~ A = A A We now have a prior on the weights w applied to the labeled data D L, accounting for the interrelationships between all data D L and D U he MAP solution for w, w MAP arg max = l w [ ( w D ) + log p( w D D )] L L U, solved for via an EM algorithm
10 GraphBased Design of DataDependent Prior: Intuition he Gaussian field prior on f essentially prefers functions which vary smoothly across the graph as opposed to functions which vary rapidly In our case we prefer to have the posterior probability of belonging to class + vary smoothly across the neighboring vertices of the graph
11 Decision surface based on labeled data (supervised) Decision surface based on labeled & unlabeled data (semisupervised)
12 UXO Sensing JPG V
13 Outline Review of semisupervised statistical classifiers and graphbased prior Extension of semisupervised classifiers to a multisensor setting: Bayesian cotraining Active multisensor sensing  Selection of those members of the unlabeled set for which acquisition of labels would be most informative  Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work
14 Extension of Graph to Multiple Sensors Graph for feature vectors from Sensor One Items for which features available from both sensors Graph for feature vectors from Sensor wo Assume M sensors are available, and S n represents the subset of sensors deployed for item n For item n we have features ( m) { xn, m S n } Build a graphbased prior for feature vectors from each of the individual sensor types How do we connect features from multiple sensors when available for a given item n? o simplify the subsequent discussion, we assume that we have only two sensors
15 Bayesian Coraining Graph for feature vectors from Sensor One Items for which features available from both sensors Graph for feature vectors from Sensor wo o connect multiple feature vectors (graph nodes) for a given item, we impose a statistical prior favoring that the multiple feature vectors yield decision statistics that agree () () Let f ( xn ) = Φ ( xn ) w represent the decision function for the nth item, with data (2) (2) from Sensor One, and f ( x ) = Φ ( x w is defined similarly for Sensor wo 2 n 2 n ) 2 Let D B represent those elements for which data is available from both sensors, we seek parameters w and w 2 that satisfy min () (2) 2 min () (2) 2 { σ[ f( xn )] σ[ f2( xn )]} [ f( xn ) f2( xn )] w, w w, w 2 n D B = w 2 n D B min, w 2 w Cw
16 MultiSensor GraphBased Prior min he condition w Cw may be expressed in terms of a Gaussian random field w w, 2 prior, the likelihood of which we wish to maximize he cumulative graphbased multisensor prior on the model weights is ~ ~ log p( w, w2 λb, λ, λ2) = log p( w λb, λ, λ2) = λbw Cw + λw Aw + λ2w2 A2w2 + K Hyperparameters that control relative importance of terms Cotraining prior based on multiple views of same item Smoothness prior within Sensor One weights Smoothness prior within Sensor wo weights A Gamma hyperprior is used for ( λ, λ, λ ) p B 2
17 otal Likelihood to be Maximized p( w D L, D U ) N L n= l n l n = { σ[ Φ ( xn) w]} { σ[ Φ ( xn) w]} + p( w, w 2 λ B, λ, λ 2 )p( λ B, λ, λ 2 3 ) dλ Driven by labeled data from Sensor One, Sensor wo or both Graphbased prior based on labeled and unlabeled data from Sensors One and wo We solve for the weights in a maximumlikelihood sense, via an efficient EM algorithm with λ, λ λ serving as the hidden variables B, 2 Once the weights w are so determined, the probability that example x is associated with label l n is expressed as p( l n x, w l n l n ML ) = σ[ Φ ( x) wml]} { σ[ Φ ( x) wml]}
18 Features of Bayesian SemiSupervised Coraining Almost all previous fusion algorithms have assumed that all sensors are applied on each item of interest Using Bayesian cotraining, a subset of sensors may be deployed on any given item Placed within a semisupervised setting, whereby context and changing statistics are accounted for by utilizing the unlabeled data Sensor Sensor 2 Labeled Data Unlabeled data
19 SemiSupervised MultiSensor Processing Example Results: WAAMD Hyperspectral & SAR data Sensor Sensor 2 Labeled Data Unlabeled data Hyperspectral Xband SAR NVESD collected data from Yuma Proving Ground, several different environments Simple feature extraction performed on hyperspectral & SAR data Labeled examples selected randomly, classification performance presented for remaining unlabeled examples
20 N L =386 N U =469
21 N L =66 N U =477
22 Discussion We have demonstrated integration of AHI (hyperspectral) and Veridian (SAR) data, and improved performance with the semisupervised classifier, when N U >>N L Question: In this example, is the SAR busting the hyperspectral performance, viceversa, or both? We have two couple graphs, one each for the SAR and hyperspectral, with these linked via the cotraining prior We can use the subclassifier associated with each of these individual graphs to examine performance with or without the other sensor (e.g., SAR alone visàvis the SAR classifier performance when also using information from the hyperspectral sensor)
23 HyperSpectral alone visàvis HyperSpectral Using SAR Information AHI #B62R rFlw & Veridian #B3R48rFvv.9.8 Probability of Detection est on unlabeled AHI data.2/.2/.8 of /2/Overlap labeled (L/U=57/35) 99testing data including 73mines.2 Supervised (AHI only). Semisupervised (AHI only) Semisupervised (AHI and SAR) Probability of False Alarm
24 AHI #B62R rFlw & Veridian #B3R48rFvv Probability of Detection SAR alone visàvis SAR Using HyperSpectral Information est on unlabeled SAR data.2/.2/.8 of /2/Overlap labeled (L/U=57/35) 28 testing data including mines Supervised (SAR only). Semisupervised (SAR only) Semisupervised (AHI and SAR) Probability of False Alarm
25 Outline Review of semisupervised statistical classifiers and graphbased prior Extension of semisupervised classifiers to a multisensor setting: Bayesian cotraining Active multisensor sensing  Selection of those members of the unlabeled set for which acquisition of labels would be most informative  Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work
26 Active Learning: Adaptive MultiModality Sensing Sensor Sensor 2 Labeled Data Unlabeled data Q: Which of the unlabeled data (from Sensor, Sensor 2, or both) would be most informative if the associated label could be determined (via personnel or auxiliary sensor) Q2: For those examples for which only one sensor was deployed, which would be most informative if the other sensor was deployed to fill in missing data A: heory of optimal experiments
27 ype Active Learning: Labeling Unlabeled Data he graphbased prior does not change with the addition of new labeled examples We assume that the hyperparameters λ B, λ, and λ 2 do not change with the addition of one new labeled example he statistics of the model weights are approximated (Laplace approximation) as p( w D L, D U ) ~ N( w wˆ, H ) where the precision matrix (Hessian) is expressed H = 2 [ log p( w DL, DU )] o within an additive constant the entropy of a Gaussian process is 2 log H
28 ype Active Learning: Labeling Unlabeled Data Expected decrease in entropy on w when the label is acquired for x * I( w; l* ) = H ( w) E{ H ( w l* )} = (/ 2) log[ + p* ( p* ) x* H x* ] p ( l* = x*, wˆ ) Error bars in our model with regard to sample x * : Logistic regression Where do we acquire labels?  hose x * for which the classifier is least certain, i.e., ( l = x, ˆ ) ~.5 p * * w  hose x * for which the logisticregression model has largest error bars
29 ype 2 Active Learning: Deploying Sensors to FillIn Missing Data o simplify the discussion, assume we have two sensors, S and S 2 Let the feature vector measured by S for the ith item (target/nontarget), with defined similarly Using a Laplace approximation, we have with () x i (2) x j ], ˆ [ ), ( p U L U L U L w D D w N σ σ + σ σ + + λ + λ λ = = = (2) (2) (2) 2 2 (2) 2 () () () () B B 2 2 U L ) ( ) ( ) ( ) ( i i i L i i i i i L i i x x x w x w x x x w x w Graphs from individual sensors Cotraining graph Labeled data from sensors S & S 2
30 ype 2 Active Learning L () () L2 () () x (2) (2) L U = λ +λ2 2 +λb B + σ( ) σ( ) i xi w xi w xi + σ( w2xi ) σ( w2xi ) i= i= x (2) i x (2) i We only deploy sensors to add data to the unlabeled data, and therefore only λ + λ2 2 + λb B changes with the addition of the new data (i.e., we improve the quality of the graphbased prior) L U We desire the expected change in the determinant of, but to make computationally tractable we actually compute E{ } L U Use Gaussianmixture models, based on all data, to estimate needed density functions () (2) (2) () p( x x ) and p( x x )
31 Active Selection of Labeled Examples from Unlabeled Data
32 Deployment of Sensor A to Fill In Missing Data from Sensor B Consider WAAMD Data, 2 Potential Fill Ins Sensor A Sensor B Regions Where Data Potentially Filled In
33 AHI #B62R rFlw & Veridian #B3R48rFvv Pd data missing each sensor labeled data Active querying 65 Random querying Pfa
34 AHI #B62R rFlw & Veridian #B3R48rFvv Pd data missing each sensor labeled data Active querying 97 Random querying Pfa
35 AHI #B62R rFlw & Veridian #B3R48rFvv Pd Active querying 29 Random querying data missing each sensor labeled data Pfa
36 Outline Review of semisupervised statistical classifiers and graphbased prior Extension of semisupervised classifiers to a multisensor setting: Bayesian cotraining Active multisensor sensing  Selection of those members of the unlabeled set for which acquisition of labels would be most informative  Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work
37 What is Concept Drift? Sensor Sensor 2 Labeled Data Unlabeled data A fundamental assumption in statistical learning algorithms is that all data are characterized by the same underlying statistics. For M sensors and label l n : p( x (), x (2),...,x ( M ) l n In sensing problems, background and/or sensor conditions may change, and therefore there may manifest changes in the underlying statistics, for the labeled and unlabeled data: Concept Drift ) Can we design algorithms that adapt as the underlying concepts change, such that we can still utilize all available data?
38 Concept Drift Assume we have unlabeled data from environment of interest E D ( m) U = { xn, m Sn} n=, NU for which we seek to estimate the unknown associated labels l n In addition, assume we have labeled data from a related but different environment Ê Dˆ ˆ ( m) L = xn, l n), m Sn} n= NU +, NU + NL {(ˆ ( m) where (ˆ x ˆ n, l ) are data and associated label from the previous environment (sensor m) n Environment Ê Environment E Feature 2 Feature 2 Feature Feature
39 Concept Drift Feature 2 Environment Ê Feature 2 Environment E Feature Feature Problem: We have labeled landmine/fopen/undergroundstructure data from one environment, which we d like to apply to a new but related environment Define the probabilities p( l ˆ ˆ n l n, D U, DL ) for which we impose a Dirichlet conjugate prior, which allows us to incorporate prior knowledge for p( l lˆ) In subsequent discussion we consider a single sensor (M=) and binary labels (l=,) to simplify the discussion
40 Concept Drift 2 Feature 2 Environment Ê Feature 2 Environment E Feature Feature p( l ˆ n l = n =, D, ˆ U D L ) p( w D U, Dˆ L ) N L = {ˆ l n= n log[ σ( w xˆ n ) µ n + σ( w xˆ n )( µ n )] + ( lˆ n ) log[ σ( w xˆ n ) ν n + σ( w xˆ n )( ν n )]} + λw ~ Aw + K p( l ˆ n l = n =, D, ˆ U D L ) Graphbased smoothness prior on unlabeled data
41 Concept Drift 3 Feature 2 Environment Ê Feature 2 Environment E Feature Feature Weights determined as before using an EM algorithm, with the parameters µ n and ν n playing the role of hidden variables (Dirichlet prior employed) Can again do active learning:  Of the unlabeled examples, which would be most informative if they label could be acquired Solved as before within a Laplace approximation (Hessian, etc.)
42 7 Illustrative oy Example logitselect a ctive: iteration Original labeled data Concept Drift Only the Initial Data are Labeled .5
43 Illustrative oy Example  logitselect a ctive: iteration First Active Labeling & Classifier Refinement
44 Illustrative oy Example  2 logitselect a ctive: iteration Second Active Labeling & Classifier Refinement
45 Illustrative oy Example  3 logitselect a ctive: iteration Fourth Active Labeling & Classifier Refinement
46 Example on Real Data: UXO Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, visàvis image chips
47 Results on JPGVV and Badlands Data EMI and magnetometer sensors Labeled data: JPGVV (6 UXO + 88 clutter) Unlabeled data: Badlands (57 UXO clutter) Kernel: Direct kernel (weights applied to feature components) UXO Features = [log(m p ), log(m z ), depth]. Each feature is normalized to zero mean and unitary variance) Five items actively selected for labeling from Badlands data
48 .9 Account for drift in statistics ROC: test data includes actively labeled data.8 probability of UXO detection rain classifier on labeled JPGV and five actively selected items From Badlands rain classifier on labeled JPGV data only logitpushactive (C=e5), 5 primary data logitactive, 5 primary data logit number of excavations
49 Future Work he active learning for labeling and acquisition of new data ( fill in ) is thus far myopic. We believe this can be extended to nonmyopic active learning. We now have multiple actions one may take: (i) perform labeling of unlabeled data or (ii) deploy a given sensor to fill in missing data on a given target. hese will now be integrated into a general sensor and HUMIN management structure, accounting for deployment costs We need to extend the multisensor fillin active learning to the conceptdrift framework Have performed ML (EM algorithm) estimation of model parameters. Now employing ensemble/variational techniques to extract full posterior on model parameters (initial work discussed during Workshop slides available) Deploy algorithms on actual hardware (robots), in collaboration with Quantum Magnetics, Inc.