Adaptive Multi-Modal Sensing of General Concealed Targets

Similar documents
Multi-Modal Inverse Scattering for Detection and Classification Of General Concealed Targets:

STA 4273H: Statistical Machine Learning

Introduction to Signal Detection and Classification. Phani Chavali

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Active and Semi-supervised Kernel Classification

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Pattern Recognition and Machine Learning

Nonparametric Bayesian Methods (Gaussian Processes)

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

CS6220: DATA MINING TECHNIQUES

Density Estimation. Seungjin Choi

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Migratory Logistic Regression for Learning Concept Drift Between Two Data Sets with Application to UXO Sensing

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Lecture : Probabilistic Machine Learning

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Algorithmisches Lernen/Machine Learning

p(d θ ) l(θ ) 1.2 x x x

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Probabilistic Graphical Models for Image Analysis - Lecture 1

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

L11: Pattern recognition principles

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Learning From Crowds. Presented by: Bei Peng 03/24/15

Study Notes on the Latent Dirichlet Allocation

ENVIRONMENTAL remediation of sites containing unexploded

Radial Basis Function Network for Multi-task Learning

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Generative Clustering, Topic Modeling, & Bayesian Inference

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Gaussian Processes in Machine Learning

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Generative v. Discriminative classifiers Intuition

Parametric Techniques

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

Statistical Methods for SVM

Neutron inverse kinetics via Gaussian Processes

CS534 Machine Learning - Spring Final Exam

Probabilistic Machine Learning. Industrial AI Lab.

PATTERN RECOGNITION AND MACHINE LEARNING

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Support Vector Machines

Introduction. Chapter 1

ECE521 Lecture7. Logistic Regression

FINAL: CS 6375 (Machine Learning) Fall 2014

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Parametric Techniques Lecture 3

Manifold Regularization

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Expectation maximization

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Cheng Soon Ong & Christian Walder. Canberra February June 2018

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Midterm exam CS 189/289, Fall 2015

Least Squares Regression

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

STA 4273H: Statistical Machine Learning

Machine Learning Linear Classification. Prof. Matteo Matteucci

Recent Advances in Bayesian Inference Techniques

Density Estimation: ML, MAP, Bayesian estimation

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Machine Learning Practice Page 2 of 2 10/28/13

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Ch 4. Linear Models for Classification

MODULE -4 BAYEIAN LEARNING

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Introduction to Statistical Inference

What is semi-supervised learning?

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Learning Bayesian network : Given structure and completely observed data

Target Tracking and Classification using Collaborative Sensor Networks

(2) I. INTRODUCTION (3)

Mathematical Formulation of Our Example

Notes on Machine Learning for and

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Overfitting, Bias / Variance Analysis

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Bayesian Learning. Bayesian Learning Criteria

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Introduction to Machine Learning Midterm, Tues April 8

Statistical Learning. Philipp Koehn. 10 November 2015

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Naïve Bayes classification

BAYESIAN DECISION THEORY

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Transcription:

Adaptive Multi-Modal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham, NC 2778-29 lcarin@ee.duke.edu

Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

Nomenclature Labeled data: Set of N L feature vectors x n for which the associated label is known, denoted l n {,} for the binary case, thereby yielding the set D = {x l }., n n,n L L n = Unlabeled data: he set of N U feature vectors for which the associated labels are unknown, yielding the set D {x. his the data to be classified. = } = U n n N +,N + N L L U Supervised algorithm: Classifier that is designed using D L and tested on D U Semi-supervised algorithm: Classifier that is designed using D L and D U. Used to estimate labels of D U.

Motivation: FOPEN : target : clutter We typically have far more unlabeled data than labeled examples (N U >>N L ) Seek labels for most-informative feature vectors in D U ypically classification performed on isolated unlabeled examples one at a time We wish to classify members of D U using all information from D U and D L simultaneously

Motivation: SAR-Based Mine Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, vis-à-vis image chips

Motivation: UXO Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, vis-à-vis image chips

Logistic Link Function Design and a Data-Dependent Prior Assume labels {, }, and define the kernel-based function l n N = b y( x w) w K( x, b ) + w = Φ ( x) w n= he probability that x is associated with l n = is expressed as p( l n For the N L labeled examples, we wish to maximize the log-likelihood n = x, w) = σ[ y( x w)] = exp[ Φ ( x) w]/[ + exp( Φ N = L l( w) log P( D, w n= n l n L ) subject to a prior p( wd U ) L D on the weights, with the prior dependent on all data (labeled and unlabeled), via a graph-based construct ( x) w)] he classifier weights w MAP are therefore set at w MAP arg max = l w [ ( w D ) + log p( w D D )] L L U

Graph-Based Design of Data-Dependent Prior Use a kernel k(x i,x j ) to define the similarity of x i and x j Note: his kernel is used to define the graph, and need not be the same as that employed in the classifier W ij =k(x i,x j ) is large when the two vectors are similar, e.g., 2 the radial basis function W k( x, x ) = exp[ x x / σ ] ij = i j i j Let the vectors x i constitute the nodes of the graph and let f ( x ) = Φ ( x ) w be a function on the graph i i We seek to minimize the energy function En( f) = 2 i, j W [ f (x ij i ) f( x j)] 2 Large W ij f(x i ) f(x i ) Defining f = { f( x ), f( x2),..., f( xn, it is easy to show that En(f)=f L + N U )} f where is the combinatorial Laplacian =D-W, where the matrix W is defined by W ij and D is a diagonal matrix, the ith element of which is expressed as d i W = j ij

Graph-Based Design of Data-Dependent Prior - 2 Using En(f)=f f, finding f that minimizes En(f) corresponds to a MAP estimate of f from the Gaussian random field density function p( f) = Z β exp[ βen( f)] = Z β exp[ βf f] his gives us a prior on f, which is defined through f ( x ) = Φ ( x ) w, and therefore f=aw with i i A = x N + )] [ Φ( x ), Φ( x2),..., Φ( U NL We therefore have a prior on our model weights w also represented by a Gaussian random field p( w x, x 2,..., x ~ ) = N(, A NU + β NL ) with ~ A = A A We now have a prior on the weights w applied to the labeled data D L, accounting for the inter-relationships between all data D L and D U he MAP solution for w, w MAP arg max = l w [ ( w D ) + log p( w D D )] L L U, solved for via an EM algorithm

Graph-Based Design of Data-Dependent Prior: Intuition he Gaussian field prior on f essentially prefers functions which vary smoothly across the graph as opposed to functions which vary rapidly In our case we prefer to have the posterior probability of belonging to class + vary smoothly across the neighboring vertices of the graph

Decision surface based on labeled data (supervised) 3 2 2 3 2.5 2.5.5.5.5 2 Decision surface based on labeled & unlabeled data (semi-supervised)

UXO Sensing JPG V

Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

Extension of Graph to Multiple Sensors Graph for feature vectors from Sensor One Items for which features available from both sensors Graph for feature vectors from Sensor wo Assume M sensors are available, and S n represents the subset of sensors deployed for item n For item n we have features ( m) { xn, m S n } Build a graph-based prior for feature vectors from each of the individual sensor types How do we connect features from multiple sensors when available for a given item n? o simplify the subsequent discussion, we assume that we have only two sensors

Bayesian Co-raining Graph for feature vectors from Sensor One Items for which features available from both sensors Graph for feature vectors from Sensor wo o connect multiple feature vectors (graph nodes) for a given item, we impose a statistical prior favoring that the multiple feature vectors yield decision statistics that agree () () Let f ( xn ) = Φ ( xn ) w represent the decision function for the nth item, with data (2) (2) from Sensor One, and f ( x ) = Φ ( x w is defined similarly for Sensor wo 2 n 2 n ) 2 Let D B represent those elements for which data is available from both sensors, we seek parameters w and w 2 that satisfy min () (2) 2 min () (2) 2 { σ[ f( xn )] σ[ f2( xn )]} [ f( xn ) f2( xn )] w, w w, w 2 n D B = w 2 n D B min, w 2 w Cw

Multi-Sensor Graph-Based Prior min he condition w Cw may be expressed in terms of a Gaussian random field w w, 2 prior, the likelihood of which we wish to maximize he cumulative graph-based multi-sensor prior on the model weights is ~ ~ log p( w, w2 λb, λ, λ2) = log p( w λb, λ, λ2) = λbw Cw + λw Aw + λ2w2 A2w2 + K Hyper-parameters that control relative importance of terms Co-training prior based on multiple views of same item Smoothness prior within Sensor One weights Smoothness prior within Sensor wo weights A Gamma hyper-prior is used for ( λ, λ, λ ) p B 2

otal Likelihood to be Maximized p( w D L, D U ) N L n= l n -l n = { σ[ Φ ( xn) w]} { σ[ Φ ( xn) w]} + p( w, w 2 λ B, λ, λ 2 )p( λ B, λ, λ 2 3 ) dλ Driven by labeled data from Sensor One, Sensor wo or both Graph-based prior based on labeled and unlabeled data from Sensors One and wo We solve for the weights in a maximum-likelihood sense, via an efficient EM algorithm with λ, λ λ serving as the hidden variables B, 2 Once the weights w are so determined, the probability that example x is associated with label l n is expressed as p( l n x, w l n -l n ML ) = σ[ Φ ( x) wml]} { σ[ Φ ( x) wml]}

Features of Bayesian Semi-Supervised Co-raining Almost all previous fusion algorithms have assumed that all sensors are applied on each item of interest Using Bayesian co-training, a subset of sensors may be deployed on any given item Placed within a semi-supervised setting, whereby context and changing statistics are accounted for by utilizing the unlabeled data Sensor Sensor 2 Labeled Data Unlabeled data

Semi-Supervised Multi-Sensor Processing Example Results: WAAMD Hyperspectral & SAR data Sensor Sensor 2 Labeled Data Unlabeled data Hyperspectral X-band SAR NVESD collected data from Yuma Proving Ground, several different environments Simple feature extraction performed on hyper-spectral & SAR data Labeled examples selected randomly, classification performance presented for remaining unlabeled examples

N L =386 N U =469

N L =66 N U =477

Discussion We have demonstrated integration of AHI (hyperspectral) and Veridian (SAR) data, and improved performance with the semi-supervised classifier, when N U >>N L Question: In this example, is the SAR busting the hyperspectral performance, vice-versa, or both? We have two couple graphs, one each for the SAR and hyperspectral, with these linked via the co-training prior We can use the sub-classifier associated with each of these individual graphs to examine performance with or without the other sensor (e.g., SAR alone vis-à-vis the SAR classifier performance when also using information from the hyper-spectral sensor)

Hyper-Spectral alone vis-à-vis Hyper-Spectral Using SAR Information AHI #B62R2344952946rFlw & Veridian #B3R48rFvv.9.8 Probability of Detection.7.6.5.4.3 2. est on unlabeled AHI data.2/.2/.8 of /2/Overlap labeled (L/U=57/35) 99testing data including 73mines.2 Supervised (AHI only). Semi-supervised (AHI only) Semi-supervised (AHI and SAR).2.4.6.8 Probability of False Alarm

AHI #B62R2344952946rFlw & Veridian #B3R48rFvv Probability of Detection SAR alone vis-à-vis SAR Using Hyper-Spectral Information.9.8.7.6.5.4.3.2 2. est on unlabeled SAR data.2/.2/.8 of /2/Overlap labeled (L/U=57/35) 28 testing data including mines Supervised (SAR only). Semi-supervised (SAR only) Semi-supervised (AHI and SAR).2.4.6.8 Probability of False Alarm

Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

Active Learning: Adaptive Multi-Modality Sensing Sensor Sensor 2 Labeled Data Unlabeled data Q: Which of the unlabeled data (from Sensor, Sensor 2, or both) would be most informative if the associated label could be determined (via personnel or auxiliary sensor) Q2: For those examples for which only one sensor was deployed, which would be most informative if the other sensor was deployed to fill in missing data A: heory of optimal experiments

ype Active Learning: Labeling Unlabeled Data he graph-based prior does not change with the addition of new labeled examples We assume that the hyper-parameters λ B, λ, and λ 2 do not change with the addition of one new labeled example he statistics of the model weights are approximated (Laplace approximation) as p( w D L, D U ) ~ N( w wˆ, H ) where the precision matrix (Hessian) is expressed H = 2 [ log p( w DL, DU )] o within an additive constant the entropy of a Gaussian process is 2 log H

ype Active Learning: Labeling Unlabeled Data Expected decrease in entropy on w when the label is acquired for x * I( w; l* ) = H ( w) E{ H ( w l* )} = (/ 2) log[ + p* ( p* ) x* H x* ] p ( l* = x*, wˆ ) Error bars in our model with regard to sample x * : Logistic regression Where do we acquire labels? - hose x * for which the classifier is least certain, i.e., ( l = x, ˆ ) ~.5 p * * w - hose x * for which the logistic-regression model has largest error bars

ype 2 Active Learning: Deploying Sensors to Fill-In Missing Data o simplify the discussion, assume we have two sensors, S and S 2 Let the feature vector measured by S for the ith item (target/non-target), with defined similarly Using a Laplace approximation, we have with () x i (2) x j ], ˆ [ ), ( p U L U L U L w D D w N σ σ + σ σ + + λ + λ λ = = = (2) (2) (2) 2 2 (2) 2 () () () () B B 2 2 U L ) ( ) ( ) ( ) ( i i i L i i i i i L i i x x x w x w x x x w x w Graphs from individual sensors Co-training graph Labeled data from sensors S & S 2

ype 2 Active Learning L () () L2 () () x (2) (2) L U = λ +λ2 2 +λb B + σ( ) σ( ) i xi w xi w xi + σ( w2xi ) σ( w2xi ) i= i= x (2) i x (2) i We only deploy sensors to add data to the unlabeled data, and therefore only λ + λ2 2 + λb B changes with the addition of the new data (i.e., we improve the quality of the graph-based prior) L U We desire the expected change in the determinant of, but to make computationally tractable we actually compute E{ } L U Use Gaussian-mixture models, based on all data, to estimate needed density functions () (2) (2) () p( x x ) and p( x x )

Active Selection of Labeled Examples from Unlabeled Data

Deployment of Sensor A to Fill In Missing Data from Sensor B Consider WAAMD Data, 2 Potential Fill Ins Sensor A Sensor B Regions Where Data Potentially Filled In

AHI #B62R23439272535rFlw & Veridian #B3R48rFvv.9.8.7.6 Pd.5.4.3.2 97 data missing each sensor labeled data Active querying 65 Random querying 65..2.4.6.8 Pfa

AHI #B62R23439272535rFlw & Veridian #B3R48rFvv.9.8.7.6 Pd.5.4.3.2 97 data missing each sensor labeled data Active querying 97 Random querying 97..2.4.6.8 Pfa

AHI #B62R23439272535rFlw & Veridian #B3R48rFvv.9.8.7.6 Pd.5.4.3.2. Active querying 29 Random querying 29 97 data missing each sensor labeled data.2.4.6.8 Pfa

Outline Review of semi-supervised statistical classifiers and graph-based prior Extension of semi-supervised classifiers to a multi-sensor setting: Bayesian co-training Active multi-sensor sensing - Selection of those members of the unlabeled set for which acquisition of labels would be most informative - Selection of those members of the unlabeled set for which new sensors should be deployed and new data acquired Concept drift: Adaptive sensing when the statistics of the unlabeled data change or drift from those of the original labeled data Future work

What is Concept Drift? Sensor Sensor 2 Labeled Data Unlabeled data A fundamental assumption in statistical learning algorithms is that all data are characterized by the same underlying statistics. For M sensors and label l n : p( x (), x (2),...,x ( M ) l n In sensing problems, background and/or sensor conditions may change, and therefore there may manifest changes in the underlying statistics, for the labeled and unlabeled data: Concept Drift ) Can we design algorithms that adapt as the underlying concepts change, such that we can still utilize all available data?

Concept Drift Assume we have unlabeled data from environment of interest E D ( m) U = { xn, m Sn} n=, NU for which we seek to estimate the unknown associated labels l n In addition, assume we have labeled data from a related but different environment Ê Dˆ ˆ ( m) L = xn, l n), m Sn} n= NU +, NU + NL {(ˆ ( m) where (ˆ x ˆ n, l ) are data and associated label from the previous environment (sensor m) n Environment Ê Environment E Feature 2 Feature 2 Feature Feature

Concept Drift Feature 2 Environment Ê Feature 2 Environment E Feature Feature Problem: We have labeled landmine/fopen/underground-structure data from one environment, which we d like to apply to a new but related environment Define the probabilities p( l ˆ ˆ n l n, D U, DL ) for which we impose a Dirichlet conjugate prior, which allows us to incorporate prior knowledge for p( l lˆ) In subsequent discussion we consider a single sensor (M=) and binary labels (l=,) to simplify the discussion

Concept Drift 2 Feature 2 Environment Ê Feature 2 Environment E Feature Feature p( l ˆ n l = n =, D, ˆ U D L ) p( w D U, Dˆ L ) N L = {ˆ l n= n log[ σ( w xˆ n ) µ n + σ( w xˆ n )( µ n )] + ( lˆ n ) log[ σ( w xˆ n ) ν n + σ( w xˆ n )( ν n )]} + λw ~ Aw + K p( l ˆ n l = n =, D, ˆ U D L ) Graph-based smoothness prior on unlabeled data

Concept Drift 3 Feature 2 Environment Ê Feature 2 Environment E Feature Feature Weights determined as before using an EM algorithm, with the parameters µ n and ν n playing the role of hidden variables (Dirichlet prior employed) Can again do active learning: - Of the unlabeled examples, which would be most informative if they label could be acquired Solved as before within a Laplace approximation (Hessian, etc.)

7 Illustrative oy Example logit-select a ctive: iteration 6 5 4 Original labeled data.5 3 2 Concept Drift -.5 - -2 - -3 2 3 4 5 6 7 8 9 Only the Initial Data are Labeled -.5

Illustrative oy Example - logit-select a ctive: iteration 7 6 5.5 4 3 2 -.5 - -2 - -3 2 3 4 5 6 7 8 9 First Active Labeling & Classifier Refinement

Illustrative oy Example - 2 logit-select a ctive: iteration 2 7 6 5.5 4 3 2 -.5 - -2 - -3 2 3 4 5 6 7 8 9 Second Active Labeling & Classifier Refinement

Illustrative oy Example - 3 logit-select a ctive: iteration 4 7 6 5.5 4 3 2 -.5 - -2 - -3 2 3 4 5 6 7 8 9 Fourth Active Labeling & Classifier Refinement

Example on Real Data: UXO Detection remendous amount of unlabeled data, very limited labeled data Classification far easier when placed in context of entire image, vis-à-vis image chips

Results on JPGV-V and Badlands Data EMI and magnetometer sensors Labeled data: JPGV-V (6 UXO + 88 clutter) Unlabeled data: Badlands (57 UXO + 435 clutter) Kernel: Direct kernel (weights applied to feature components) UXO Features = [log(m p ), log(m z ), depth]. Each feature is normalized to zero mean and unitary variance) Five items actively selected for labeling from Badlands data

.9 Account for drift in statistics ROC: test data includes actively labeled data.8 probability of UXO detection.7.6.5.4.3.2 rain classifier on labeled JPG-V and five actively selected items From Badlands rain classifier on labeled JPG-V data only logit-push-active (C=e-5), 5 primary data logit-active, 5 primary data logit. 5 5 2 25 3 35 4 number of excavations

Future Work he active learning for labeling and acquisition of new data ( fill in ) is thus far myopic. We believe this can be extended to non-myopic active learning. We now have multiple actions one may take: (i) perform labeling of unlabeled data or (ii) deploy a given sensor to fill in missing data on a given target. hese will now be integrated into a general sensor and HUMIN management structure, accounting for deployment costs We need to extend the multi-sensor fill-in active learning to the concept-drift framework Have performed ML (EM algorithm) estimation of model parameters. Now employing ensemble/variational techniques to extract full posterior on model parameters (initial work discussed during Workshop slides available) Deploy algorithms on actual hardware (robots), in collaboration with Quantum Magnetics, Inc.