Quantifying Fingerprint Evidence using Bayesian Alignment

Similar documents
AALBORG UNIVERSITY. Fingerprint analysis with marked point processes. Peter G. M. Forbes, Steffen Lauritzen and Jesper Møller

Markov Chain Monte Carlo methods

Bayesian model selection for computer model validation via mixture model estimation

On the Individuality of Fingerprints: Models and Methods

Metropolis-Hastings Algorithm

Bayesian model selection in graphs by using BDgraph package

Adaptive HMC via the Infinite Exponential Family

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo (MCMC)

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Introduction to Machine Learning

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

On the Evidential Value of Fingerprints

Monte Carlo in Bayesian Statistics

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Likelihood-free MCMC

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Computational statistics

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

A Minutiae-based Fingerprint Individuality Model

MONTE CARLO METHODS. Hedibert Freitas Lopes

Introduction to Bayesian methods in inverse problems

Introduction to Probabilistic Machine Learning

Fitting Narrow Emission Lines in X-ray Spectra

MCMC Sampling for Bayesian Inference using L1-type Priors

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS. By Sarat C. Dass and Mingfei Li Michigan State University

CPSC 540: Machine Learning

Bayesian Inference for Clustered Extremes

Learning the hyper-parameters. Luca Martino

The Minimum Message Length Principle for Inductive Inference

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian inference J. Daunizeau

Foundations of Statistical Inference

Fingerprint Individuality

STAT 425: Introduction to Bayesian Analysis

Stat 516, Homework 1

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

STA 4273H: Statistical Machine Learning

Announcement. HW4 has been assigned. Finger Print Recognition using Minutiae. Biometrics CSE 190 Lecture 16. CSE190, Winter CSE190, Winter 2011

Riemann Manifold Methods in Bayesian Statistics

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Bayesian inference for multivariate skew-normal and skew-t distributions

Default Priors and Effcient Posterior Computation in Bayesian

Non-Parametric Bayes

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayesian time series classification

Bayesian Classification and Regression Trees

Bayesian spatial hierarchical modeling for temperature extremes

Monte Carlo integration

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Markov Chain Monte Carlo, Numerical Integration

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

A = {(x, u) : 0 u f(x)},

Bayesian non-parametric model to longitudinally predict churn

Probabilistic Time Series Classification

Basic math for biology

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

MCMC for non-linear state space models using ensembles of latent sequences

Pseudo-marginal MCMC methods for inference in latent variable models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Hierarchical Modeling for Univariate Spatial Data

Contents. Part I: Fundamentals of Bayesian Inference 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

STA 4273H: Sta-s-cal Machine Learning

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Chapter 6 Fingerprints By the end of this chapter you will be able to:

A Bayesian perspective on GMM and IV

Making rating curves - the Bayesian approach

Sub-kilometer-scale space-time stochastic rainfall simulation

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Bayesian learning of sparse factor loadings

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Overall Objective Priors

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Empirical Bayes Unfolding of Elementary Particle Spectra at the Large Hadron Collider

Principles of Bayesian Inference

!) + log(t) # n i. The last two terms on the right hand side (RHS) are clearly independent of θ and can be

Mixture models. Mixture models MCMC approaches Label switching MCMC for variable dimension models. 5 Mixture models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Brief introduction to Markov Chain Monte Carlo

A short introduction to INLA and R-INLA

Hidden Markov Models

On Markov chain Monte Carlo methods for tall data

Statistics & Data Sciences: First Year Prelim Exam May 2018

STA414/2104 Statistical Methods for Machine Learning II

Bayesian Nonparametric Regression for Diabetes Deaths

Hmms with variable dimension structures and extensions

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Sequential Monte Carlo Methods

Transcription:

Quantifying Fingerprint Evidence using Bayesian Alignment Peter Forbes Joint work with Steffen Lauritzen and Jesper Møller Department of Statistics University of Oxford UCL CSML Lunch Talk 14 February 2014

History of fingerprints Fingerprints have been used to authenticate legal documents in China since 300 BC Scottish missionary Henry Faulds first used fingerprints for forensic identification in 1880 Sir Francis Galton established in 1892 that fingerprints are invariant over time and crudely estimated that Probability two fingerprints are identical = 1/68 billion Yet crime scene prints are often partial and blurry

Strength of evidence For its entire history, fingerprint evidence has been presented categorically Of all the methods of identification, fingerprinting alone has proved to be both infallible and feasible (FBI training manual, 1963) From a statistical viewpoint, the scientific foundation for fingerprint individuality is incredibly weak... there has been much speculation and little data. None of the models has been subjected to testing, which is of course the basic element of the scientific approach (Stoney, 2001)

Motivation Fingerprint evidence is forensic acme: UK has 330,000 crime scene prints collected each year and 34,000 identifications Recent push to follow DNA evidence and quantify the uncertainty (Neumann et al., 2012) My model attempts this by computing the likelihood ratio between two competing models: H p : A and B originate from the same finger H d : A and B originate from independent fingers. Test using a public dataset from the NIST-FBI (Garris and McCabe, 2000)

Fingerprints as minutia sets Figure: An example fingerprint with minutiae labelled Minutiae are points where epidermal ridges end or bifurcate Represent with m = (r C, s S 1, t {0, 1}), where S 1 is the unit circle embedded in C Most fingers have around 135 minutiae

Preliminaries Define the complex Normal distribution on C n as Z CN n (µ, Σ) Re(Z µ), Im(Z µ) iid N n(0, Σ/2) with density ϕ n (z; µ, Σ) Define the root von Mises distribution on S 1 as X, Y iid rvm(κ) Pr(XY ) = exp{κre(x Ȳ )}/{2πI 0 (κ)} A set of points A V form a marked Poisson point process with rate ρ and mark distribution g (denoted MPPP(ρ, g)) iff Number of points in any two disjoint regions are independent Number of points in v V Poisson( v ρ(r)dr) Each point has a mark, and marks are iid with density g Conditional on A, Pr(A) r A ρ(r)g(mark(r))

Model for latent finger Like Green and Mardia (2006), we view the observed point sets as partial, distorted copies of a latent true point set Latent minutiae are distributed as MPPP(ρ, g) on C with ρ(r) = ρ 0 ϕ 1 (r; 0, σ 2 ) and g(s, t) = p t (1 p) 1 t /(2π) for ρ 0 > 0, σ > 0 and p (0, 1)

Model for observed minutia sets m l observed w.p. q A {m a = (r a, s a, t a )} r a r a + CN(0, ω 2 ) s a s a + rvm(κ) Latent finger {m l = (r l, s l, t l )} Independent Binomial Thinning q A, q B (0, 1) Independent Observation Errors ω, κ > 0 m l observed w.p. q B {m b = (r b, s b, t b )} r b r b + CN(0, ω 2 ) s b s b + rvm(κ) r a ψ A (r a + τ A ) s b ψ A s b Rigid motion τ A, τ B C 2 ψ A, ψ B S 1 r a ψ B (r a + τ B ) s b ψ B s b Fingerprint Fingermark A = {(r a, s a, t a )} Observed minutia sets B = {(r b, s b, t b )}

Introduction Model Example fingerprint Algorithm Results Future work References

Example high-quality fingermark

Introduction Model Algorithm Example overlaid with a rigid motion Results Future work References

Finding the densities Desire the likelihood ratio LR = Pr(A, B H p) Pr(A, B H d ) First we need to find the densities under H d and H p These will depend on the constants ρ 0, p, ω, κ Will also depend on the variables θ = (q A, q B, τ A, τ B, ψ, σ)

Finding the densities Under H d, A and B are from independent latent fingers. By integrating over the latent minutia, we have Pr(A, B θ, H d ) ρ A (r a )g(s a, t a ) (r a,s a,t a) A ρ B (r b )g(s b, t b ) where ρ A and ρ B are given by (r b,s b,t b ) B ρ A (r a ) = ρ 0 q A ϕ 1 (r a ; τ A, σ 2 + ω 2 ) ρ B (r b ) = ρ 0 q B ϕ 1 (r b ; τ B, σ 2 + ω 2 ) This can be integrated analytically over θ

Finding the densities Under H p we observe A = M 10 {m a : (m a, m b ) M 11 } and B = M 01 {m b : (m a, m b ) M 11 } where M 10 MPPP((1 q B )ρ A ( ), g) are the points observed in A not B M 01 MPPP((1 q A )ρ B ( ), g) are the points observed in B not A M 11 MPPP(ρ 11, g 11 ) are the points observed in both A and B Note M 11 is a MPPP on C 2. Integrating over the latent true minutia (r, s, t), we have (( ) ra ρ 11 (r a, r b ) = ρ 0 q A q B ϕ 2 ; r b ( τa τ B ) ( σ, 2 + ω 2 σ 2 )) ψ σ 2 ψ σ 2 + ω 2 g 11 (s a, t a, s b, t b ) = I(t a = t b ) pta (1 p) 1 ta 4π 2 exp{κre(s a s b ψ)} I 0 (κ) where ψ = ψ A ψb is the rotation between A and B

Finding the densities The density under H p is thus Pr(A, B θ, H p ) (1 q B )ρ A (r a )g(s a, t a ) (r a,s a,t a) M 01 (1 q A )ρ B (r b )g(s b, t b ) (r b,s b,t b ) M 10 (r a,s a,t a) (r b,s b,t b ) M 11 ρ 11 (r a, r b )g 11 (s a, t a, s b, t b ) but we only observe A = M 10 {m a : (m a, m b ) M 11 } and B = M 01 {m b : (m a, m b ) M 11 }! Need to treat the matching ξ between A and B as an unknown variable and sum over its possible values

Computing the likelihood ratio τ A, τ B, ψ, σ are assigned (improper) flat priors π, which ensures LR is invariant under similarity transformations q A has a conjugate Beta prior with hyperparameters α, β q B has a flat prior ρ 0, p, ω, κ, α, β are constants to be estimated by MLE Find LR by marginalizing over θ and ξ ξ Pr(A, B, ξ θ, Hp )π(θ)dθ LR = Pr(A, B θ, Hd )π(θ)dθ Sum in numerator contains many terms min( A, B ) n=0 Use MCMC to approximate the LR A! B! n!( A n)!( B n)! 10100

An estimate for LR Define the joint distribution of (θ, ξ, H) by { p 0 Pr(A, B, ξ θ, H p )π(θ) if H = H p, Pr(θ, ξ, H, A, B) = (1 p 0 )Pr(A, B H d )q(ξ θ)q(θ) if H = H d. where p 0 (0, 1) and the densities q are chosen to promote good mixing of our MCMC over the model space By sampling from this, we can approximate LR by replacing the below expectation with its sample average: LR = Pr(A, B H p) Pr(A, B H d ) = p 0 1 p 0 E θ,ξ,h A,B [I (H = H p )] E θ,ξ,h A,B [I (H = H d )].

Tuning the sampler To accurately estimate LR we must switch models often, i.e. p 0 Pr(A, B, ξ θ, H p )π(θ) (1 p 0 )Pr(A, B H d )q(ξ θ)q(θ) We have severe problems with local modes under H p due to the high dimensionality of ξ Attempting to tune q(θ) often resulted in tuning to the current local mode, so we use a fixed diffuse distribution We choose q(ξ θ) to approximate Pr(θ, ξ A, B, H p ) Put some arbitrary ordering on A and let ξ α = {(a, b) ξ : a < α}, B α = {b B : (a, b) ξ α } q(ξ θ) = A α=1 Pr(θ, ξ α+1, H p, A, B) b (B\B Pr(θ, ξ α) α (α, b), H p, A, B)

A better estimate for LR We want to choose p 0 so that p 0 Pr(A, B, ξ θ, H p )π(θ) (1 p 0 )Pr(A, B H d )q(ξ θ)q(θ) over a large portion of the state space (θ, ξ) This is very difficult, so we tune p 0 to ensure good mixing based on our previous samples Letting l(θ, ξ, A, B) = Pr(θ, ξ, A, B H p )/Pr(θ, ξ, A, B H d ), [ E θ,ξ A,B {1 p 0 + p 0 /l(θ, ξ, A, B)} 1] LR = [ E θ,ξ A,B {p 0 + (1 p 0 )l(θ, ξ, A, B)} 1]. Can show that replacing the expectations with sample averages leads to an valid estimator of LR even if we change p 0 each iteration! p0 n n m=n 100 1 p0 n = {1 + 1/l(θm, ξ m, A, B)} 1 n m=n 100 {1 + l(θm, ξ m, A, B)} 1

Gibbs sampler Algorithm 1 Gibbs sampler for joint posterior of (θ, ξ, H) Require: θ 0, ξ 0 set to some initial value. Set H 0 = H p. for n = 1,..., N do if H = H p then (qa n, qn B ) Sample ( q A, q B A, B, τ n 1 A, τ n 1 B, σ n 1, ψ n 1, ξ n 1, H n 1) (τa n, τ B n) Sample ( τ A, τ B A, B, qa n, qn B, σn 1, ψ n 1, ξ n 1, H n 1) σ n Sample ( σ A, B, qa n, qn B, τ A n, τ B n, ψn 1, ξ n 1, H n 1) ψ n Sample ( ψ A, B, qa n, qn B, τ A n, τ B n, σn, ξ n 1, H n 1) ξ n ξ n 1 for j = 1,..., n A do do repeatedly to reduce autocorrelation ξ n Sample ( ξ A, B, qa n, qn B, τ A n, τ B n, σn, ψ n, ξ n, H n 1) end for end if p0 n adaptive value to increase mixing in H H n Sample (H A, B, qa n, qn B, τ A n, τ B n, σn, ψ n, ξ n ) end for Not quite reversible jump: we don t change states under H d This approach provides better model mixing when the proposal distributions q(θ, ξ) are far from the posterior distributions

ξ sampler Basic Metropolis Hastings algorithm like (Green and Mardia, 2006) has accept rates less than 10 5 infeasibly slow Instead we sample directly by reducing the adjacent states with an auxiliary variable α which takes values uniformly on A There are B + 1 states adjacent to any (α, ξ), obtained by matching α to each b B or leaving α unmatched α b α b α α b α b (a) Add (b) Swap b (c) Remove (d) Swap a (e) Swap Figure: Matches ξ which are adjacent to (α, ξ 0 ).

NIST-FBI dataset NIST-FBI dataset of 258 fingerprint/fingermark pairs All images have their minutiae labelled by expert examiners Split into three subsets (good, bad, and ugly) based on fingermark quality Figure: Example fingermarks from Garris and McCabe (2000).

Simulated dataset Generated 258 fingerprint/fingermark pairs based on model assumptions Split into three sets (good, bad, and ugly) in order of decreasing B Computed all 258 258 pairwise likelihood ratios Computed LRs are almost entirely determined by B

MCMC behavior

MCMC behavior

Results: simulated data 0.15 G B U 0.1 0.05 0 0.20.40.60.8 1 0 80 60 40 20 0 20 40 60 80 0.15 0.1 0.05 0 80 60 40 20 0 20 40 60 80 0.15 0.1 0.05 1 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 0 0.20.40.60.8 1 0 0.20.40.60.8 1 0 80 60 40 20 0 20 40 60 80 1 0.8 0.6 0.4 0.2 0 Histogram of the log 10 -likelihood ratios for good, bad and ugly simulated fingermarks. Inset ROC curve has false

Results: NIST-FBI data 0.15 G B U 0.1 0.05 0 0.20.40.60.8 1 0 80 60 40 20 0 20 40 60 80 0.15 0.1 0.05 0 80 60 40 20 0 20 40 60 80 0.15 0.1 0.05 1 0.8 0.6 0.4 0.2 0 1 0.8 0.6 0.4 0.2 0 0 0.20.40.60.8 1 0 0.20.40.60.8 1 0 80 60 40 20 0 20 40 60 80 1 0.8 0.6 0.4 0.2 0 Histogram of the log 10 -likelihood ratios for good, bad and ugly NIST-FBI fingermarks. Inset ROC curve has false

Better model for latent minutiae The intensity for the latent finger MPPP is inaccurate, since most minutiae occur in areas of high minutia curvature Figure: Minutia density over various fingerprints from Chen and Jain (2009)

Better distortion model Basic model assumes observed minutiae vary from true minutiae by a rigid motion and iid noise Actually, nearby minutiae have spatially correlated distortions Account for this using a smoothing thin plate spline model, which leads to a Gaussian process for the distortions Figure: Example smoothing thin plate spline from Chui and Rangarajan (2000)

Conclusion We have developed a simple model to quantify the strength of evidence for forensic fingerprints Better latent distributions and distortion models will increase discrimination between true and false matches Must manage trade off between model complexity and computational efficiency Computed likelihood ratios should be calibrated against ground-truth database

Conclusion Chen, Y. and A. K. Jain (2009). Beyond minutiae: A fingerprint individuality model with pattern, ridge and pore features. Chui, H. and A. Rangarajan (2000). A new algorithm for non-rigid point matching. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 44 51. Garris, M. and R. McCabe (2000). NIST special database 27: Fingerprint minutiae from latent and matching tenprint images. Technical report, NIST, Gaithersburg, MD, USA. Green, P. J. and K. V. Mardia (2006). Bayesian alignment using hierarchical models, with applications in protein bioinformatics. Biometrika 93(2), 235 254. Neumann, C., I. W. Evett, and J. E. Skerrett (2012). Quantifying the weight of evidence from a forensic fingerprint comparison: a new paradigm (with discussion). Journal of the Royal Statistical Society: Series A 175(2), 371 415.