Bayesian Multiple Target Localization

Similar documents
Bayesian Multiple Target Localization

The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy

TWENTY QUESTIONS WITH NOISE: BAYES OPTIMAL POLICIES FOR ENTROPY LOSS

Coupled Bisection for Root-Ordering

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Curve Fitting Re-visited, Bishop1.2.5

Bayes-Optimal Methods for Optimization via Simulation:

A COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION

Parameter Estimation. Industrial AI Lab.

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

An Introduction to Statistical and Probabilistic Linear Models

Algorithmisches Lernen/Machine Learning

Part 1: Expectation Propagation

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.

Undirected Graphical Models

Machine Learning 2017

Introduction to Probability and Statistics (Continued)

Bayesian Machine Learning

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Lecture 5: GPs and Streaming regression

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Introduction to Probabilistic Graphical Models: Exercises

Machine Learning and Deep Learning! Vincent Lepetit!

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Machine Learning for Signal Processing Bayes Classification and Regression

On Markov chain Monte Carlo methods for tall data

Introduction to Restricted Boltzmann Machines

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Machine Learning Basics: Maximum Likelihood Estimation

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

How can a Machine Learn: Passive, Active, and Teaching

Uncertain<T>! A First-Order Type for Uncertain Data! James Bornholt! Todd Mytkowicz! Kathryn S. McKinley!

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Group Testing under Sum Observations for Heavy Hitter Detection

Bayesian Methods: Naïve Bayes

Pattern Recognition. Parameter Estimation of Probability Density Functions

Spectral Hashing: Learning to Leverage 80 Million Images

STA 4273H: Statistical Machine Learning

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Learning gradients: prescriptive models

CS 231A Section 1: Linear Algebra & Probability Review

2D Image Processing (Extended) Kalman and particle filter

COS 429: COMPUTER VISON Face Recognition

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Midterm, Fall 2003

Sparse Stochastic Inference for Latent Dirichlet Allocation

CHAPTER-17. Decision Tree Induction

VCMC: Variational Consensus Monte Carlo

STA 4273H: Sta-s-cal Machine Learning

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

CMU-Q Lecture 24:

The sample complexity of agnostic learning with deterministic labels

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Gaussian Process Optimization with Mutual Information

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Parametric Techniques Lecture 3

Machine Learning, Fall 2012 Homework 2

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

STA 4273H: Statistical Machine Learning

Bayesian Methods for Machine Learning

Approximate Bayesian Computation and Particle Filters

Stochastic Gradient Descent

Linear Dynamical Systems

Bayesian Machine Learning

Scalable Bayesian Event Detection and Visualization

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Parametric Techniques

Restricted Boltzmann Machines

Sparse Linear Models (10/7/13)

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Unequal Error Protection Querying Policies for the Noisy 20 Questions Problem

Bayesian Decision and Bayesian Learning

Multi-Attribute Bayesian Optimization under Utility Uncertainty

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Probabilistic Graphical Models

p(d θ ) l(θ ) 1.2 x x x

Empirical Risk Minimization

Bayesian Nonparametric Regression for Diabetes Deaths

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Scaling Neighbourhood Methods

Artificial Intelligence

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Distance dependent Chinese restaurant processes

Hybrid Machine Learning Algorithms

Bayesian Contextual Multi-armed Bandits

Entropy Rate of Stochastic Processes

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Inference of cluster distance and geometry from astrometry

Latent Dirichlet Allocation (LDA)

Notes on Machine Learning for and

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Boosting: Algorithms and Applications

Transcription:

Bayesian Multiple Target Localization Purnima Rajan (Johns Hopkins) Weidong Han (Princeton) Raphael Sznitman (Bern) Peter Frazier (Cornell, Uber) Bruno Jedynak (Johns Hopkins, Portland State) Thursday July 9, 2015 International Conference on Machine Learning (ICML) Lille, France

Here s the program 1 Study a Bayesian active learning problem. 2 Use our results to find faces in images like these:

This is our Bayesian active learning problem: Localize targets by counting There are k targets in some set Ω = [0,1]. Their (unknown) locations are θ 1,...,θ k Ω, drawn iid from a prior density. For n = 1,...,N We choose A n Ω We observe X n. X n s distribution depends on the number of targets in A n. Goal: choose the questions to reduce uncertainty in the target locations, as measured by the expected entropy of the posterior.

The dyadic policy The dyadic policy constructs A n recursively: We partition Ω into 2 n regions. A n is the union of half of these regions. To obtain the partition for A n+1, we take the partition for A n and split each region into two sub-regions with equal mass under the prior (but not necessarily equal size, if the prior is not uniform). f 0 (u) 1 prior 0 u 1 A 1 n=1 A 2 A 2 n=2 A 3 A 3 A 3 A 3 n=3

The dyadic policy can be easily parallelized The dyadic policy s questions A n do not depend on the answers X n. It is non-adaptive. This makes it easy to parallelize. We compare it with OPT, the value of an optimal adaptive policy.

The dyadic policy is a constant factor approximation to the optimal adaptive policy Theorem DYADIC OPT D k C k. DYADIC = H(p 0 ) E DYADIC [H(p N )] is the expected entropy reduction under the dyadic policy. OPT = H(p 0 ) inf π E π [H(p N )] is the expected entropy reduction under an optimal adaptive policy.

The dyadic policy is a constant factor approximation to the optimal adaptive policy Theorem DYADIC OPT D k C k. DYADIC = H(p 0 ) E DYADIC [H(p N )] is the expected entropy reduction under the dyadic policy. OPT = H(p 0 ) inf π E π [H(p N )] is the expected entropy reduction under an optimal adaptive policy. C k is the channel capacity, ( k C k = suph q(z)f ( z) q z=0 D k is this explicit expression, ( ( 1 k D k = H 2 k z k z=0 ) f ( z) ) ) 1 2 k k z=0 k z=0 q(z)h (f ( z)) ( ) k H (f ( z)) z

The dyadic policy is a 2-approximation to OPT in the noise-free case When observations are noise-free: D k = H(Bin(k, 1 2 )) 1 2 and C k = log(k + 1), implying DYADIC OPT H(Bin(k, 1 2 )) 1 log(k + 1) 2 If k = 1 (there is a single target), then DYADIC OPT 1 and the dyadic policy is optimal among adaptive policies. Additional result not in the paper: In the noise-free case, for k 1, the dyadic policy is optimal among non-adaptive policies.

Localizing k objects simultaneously is much faster than localizing them one at a time Number of questions per object 30 25 20 15 10 5 Benchmark 1 Benchmark 2 Dyadic Policy Lower Bound on Opt 1 2 4 8 16 32 64 128 k Plot shows: (# of questions to reduce total entropy by k 20 bits)/k Benchmark 1 localizes targets one-at-a-time using bisection. Benchmark 2 is sequential bifurcation (Bettonvil and Kleijnen 1997). Green and blue lines come from our bounds.

Proof Sketch Theorem DYADIC OPT D k C k. Expected entropy reduction of the optimal: OPT C k N.

Proof Sketch Theorem DYADIC OPT D k C k. Expected entropy reduction of the optimal: OPT C k N. Expected entropy reduction of the dyadic: DYADIC = D k N.

Proof Sketch Theorem DYADIC OPT D k C k. Expected entropy reduction of the optimal: OPT C k N. Expected entropy reduction of the dyadic: DYADIC = D k N. The answers to the dyadic questions are independent. The entropy reduction under the dyadic is the sum of the entropy reduction due to each question. The number of targets in A n is Binomial(k, 1 2 ), allowing us to compute the entropy reduction from this question.

The dyadic policy can be used to localize multiple object instances in computer vision Let s build a tool that can find faces in images like this:

The standard approach applies a high accuracy classifier at each pixel in the image We have access to a high-accuracy classifier. input is the image, and a pixel within the image. output is whether or not there is a face centered at that pixel. we use a boosted collection of 500 stumps (features are windowed sums of oriented gradients) Standard practice is to run the high-accuracy classifier at each pixel within the image to find all of the faces. But, this is slow.

We use the dyadic policy to localize multiple object instances Instead, we do this: (Screening) Use the dyadic policy, together with a fast low-accuracy classifier, to compute a posterior distribution on object instance locations. (Refinement) Using the posterior to prioritize pixels, run a slow high-accuracy classifier on a (hopefully small) number of pixels to confirm the object instance locations. Fewer calls to the high-accuracy classifier vs. the standard approach makes it faster.

Screening Step 1: Run a fast low-accuracy classifier at all pixels Step 1: Run a fast low-accuracy classifier at each pixel in the image. Our low-accuracy classifier is a boosted collection of 50 stumps. 50 stumps is 10% of the high-accuracy classifier.

Screening Step 2: Answer the dyadic questions Step 2: Compute the answers to the dyadic questions X n is the sum of the low-accuracy response over the region A n. The regions A n, n = 1,...,8, are X n can be computed quickly using an integral image.

Screening Step 3: Compute the posterior Under other policies, computing the posterior is slow (needs MCMC). Under the dyadic policy, computing the posterior is fast: Let C = N n=1 B n, where each B n is either A n or Ω \ A n. The expected number of targets in C under the posterior is E[number of targets in C X 1:N ] = k N n=1 ( en ) sn ( 1 e ) n 1 sn, k k where s n = 1{B n = A n }. e n = E[number of targets in A n X n ] = k j=0 jp(x n =x n Z n =j)p(z n =j) P(X n =x n )

Refinement Following screening, we do refinement. We use one of these algorithms. Posterior Rank (PR): Rank pixels by the marginal probability of containing a target, and apply the high-accuracy classifier in this order. Iterated Posterior Rank (IPR): Like Iterated Rank, but when a target is found, we mask the target and recompute the posterior and pixel order. Masking is done without recomputing the answers to the dyadic questions: E[number of new targets in C X 1:N ] = k N n=1 ( ) e sn ( ) n 1 e 1 sn n, k k where e n has been replaced by e n, the expected number of new targets in A n, given X n and the locations of discovered targets.

Example: Seinfeld, Iterated Posterior Rank Iteration 1

Example: Seinfeld, Iterated Posterior Rank Iteration 2

Example: Seinfeld, Iterated Posterior Rank Iteration 3

Example: Seinfeld, Iterated Posterior Rank Iteration 4

The dyadic policy reduces the number of times we call the slow high-accuracy classifier on real images Results for 35 images from the MIT+CMU corpus. Dataset contains 130 faces (3.7 faces per image on average) Dataset contains 1.2 10 7 pixels We found all the faces by running the high-accuracy classifier on only 3% of the pixels using posterior rank. A naive approach would run it on more than 50% of the pixels.

The dyadic policy reduces the number of times we call the slow high-accuracy classifier on simulated images Log number of calls to the oracle 18 16 14 12 10 8 6 4 2 IR PR IPR EP k = 3 IR = iterated rank (baseline that goes in arbitrary order) (I)PR = (iterated) posterior rank (our algorithms) EP = entropy pursuit 0 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 Log image size

Future Work Practical improvements for computer vision Unknown k (easy) Integration of better weak / strong classifiers (easy) Likelihoods that depend on the size of the queried region (hard) Separate multiple scales (hard) Other applications Screening for important input factors to computational codes. Group testing [our model generalizes the classical model] Combinatorial chemistry Heavy hitter detection in network analysis

Thank You!

Backup

Example: The Doors, Posterior Rank

Simulated Image: Iterated Posterior Rank

Simulated Image: Iterated Posterior Rank

Simulated Image: Iterated Posterior Rank

Simulated Image: Iterated Posterior Rank

Integral Images Integral(x, y) is the sum of the low-accuracy classifier s responses at all pixels below and to the left of y. Integral(x,y) can be computed iteratively for all x and y by touching each pixel once. Sums over rectangles can be computed quickly from these values: sum within box = integral(upper left) - integral(upper right) - integral(lower left) + integral(lower right)

The dyadic policy reduces the number of times we call the slow high-accuracy classifier on simulated images Log number of calls to the oracle 18 16 14 12 10 8 6 4 2 IR PR IPR K = 3, σ = 0.5 0 8x8 16x16 32x32 64x64 128x128 256x256 512x512 1024x1024 Log image size