A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS. By Sarat C. Dass and Mingfei Li Michigan State University

Size: px

Start display at page:

Download "A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS. By Sarat C. Dass and Mingfei Li Michigan State University"

Scot Frank Atkinson
5 years ago
Views:

1 Technical Report RM669, Department of Statistics & Probability, Michigan State University A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS By Sarat C. Dass and Mingfei Li Michigan State University Hierarchical mixture models arise naturally for clustering a heterogeneous population of objects where observations made on each object follow a standard mixture density. Hierarchical mixtures utilize complementary aspects of mixtures at different levels of the hierarchy. At the first top level, the mixture is used to perform clustering of the objects, while at the second level, nested mixture models are used as flexible representations of distributions of observables from each object. Inference for hierarchical mixtures is more challenging since the number of unknown mixture components arise in both the first and second levels of the hierarchy. In this paper, a Bayesian approach based on Reversible Jump Markov Chain Monte Carlo methodology is developed for the inference of all unknown parameters of hierarchical mixtures. Our methodology is then applied to the clustering of fingerprint images and used to assess the variability of quantities which are functions of the second level mixtures. 1. Hierarchical Mixture Models. Consider an object, O, selected at random from a heterogenous population, P, with G unknown clusters. Let X x 1, x 2, x 3, denote the observables on O where x j x 1 j, x 2 j,, x d j is a d-variate random vector in R d. A hierarchical mixture model for the distribution of O in the population is qx = G n ω g g=1 j=1 q g x j 1.1 where x = x 1, x 2,, x n are the n observations made on O, ω g, g = 1, 2,, G are the G cluster proportions with ω g > 0 and G g=1 ω g = 1, q g is the mixture density for the g-th cluster given by q g x = K g k=1 p f x θ, 1.2 with f denoting a density with respect to the Lebesgue measure on R d, p denoting the mixing probabilities satisfying: 1 p > 0 and 2 K g k=1 p = 1, and θ denoting the set of all unknown parameters in f. Identifiability of the hierarchical mixture model of 1.1 is achieved by imposing the constraints ω 1 < ω 2 < < ω G and θ 1g θ 2g θ Kg g, 1.3 Research supported under NSF DMS grant no AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35 Keywords and phrases: Model-based clustering, Gaussian mixtures, Bayesian inference, Reversible Jump Markov Chain Monte Carlo methods, Fingerprint individuality 1

2 2 DASS & LI for each g = 1, 2,, G, where is a partial ordering to be defined later. The set of all unknown parameters in the hierarchical mixture model 1.1 is denoted by x = G, ω, K, p, θ where ω = ω 1, ω 2,, ω G, K = K 1, K 2,, K g, p = p, k = 1, 2,, K g, g = 1, 2,, G, and θ = θ, k = 1, 2,, K g, g = 1, 2,, G. Hierarchical mixture models of 1.1 arise naturally when the goal is to cluster a population of objects, where observables from each object follow a standard mixture distribution. At the first top, or G level, the mixture is used to perform clustering of the objects, while at the second or K g level, nested mixture models nested within each g = 1, 2,, G specification are used as flexible representations of distributions of observables. The unknown number of mixture components, or mixture complexity, arise at both levels of the hierarchy, and is, therefore, more challenging to estimate compared to standard mixtures. Estimating mixture complexity has been the focus of intense research for many years resulting in various estimation methodologies in a broad application domain. Non-parametric methods were developed in Escobar and West 1995 and Roeder and Wasserman 1997, whereas Ishwaran et al and Woo and Sriram 2006 developed methodology for the robust estimation of mixture complexity for count data. In a Bayesian framework, the work of Green and Richardson 1997 demonstrated that both issues of estimating mixture parameters can be unified if viewed as a problem of model selection. With this view in mind, Green and Richardson 1997 developed a Reversible Jump Markov Chain Monte Carlo RJMCMC approach for the estimating mixture complexity by exploring the space of models of varying dimensions. In this paper, we develop a Bayesian framework based on RJMCMC for the inference on hierarchical mixture models. One advantage of our approach is that we are able to assess the variability of the cluster estimate, which in turn, can be used to ascertain the variability of quantities that are functions of the clusters. We present one such example based on clustering a sample of fingerprint images. One quantity that is of special interest is the probability of a random correspondence PRC which measures to what extent two randomly chosen fingerprints from a population match with each other. The RJMCMC methodology developed in this paper provides an estimate of its mean and variance. In the subsequent text, we assume each f is multivariate normal with mean vector µ µ 1, µ2,, µd R d and covariance matrix Rd R d. Our analysis on fingerprint images in Section 6 revealed that it is adequate to consider diagonal covariance matrices of the form = diag σ 1 σ b 2 is the variance of the b-th component of the multivariate normal dis- where tribution. Thus, we have 2, σ 2 2,, f x θ = φ d x µ, σ = d b=1 φ 1 x b µ b, σ b 2 σ d where φ 1 µ, σ 2 denotes the density of the univariate normal distribution with

3 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 3 mean µ and variance σ 2, and σ 2 σ 1, σ 2 2,, σ d 2 is the d- variate vector of the variances. The second identifiability condition of 1.3 is reexpressed in terms of the first component of the mean vector as µ 1 1g < µ1 2g < < µ1 K g g. 1.5 In the subsequent text, the identifiability condition 1.5 based on the first components of µ for k = 1, 2,, K g will be re-written using the symbol as µ 1g µ 2g µ Kgg 1.6 for each g = 1, 2,, G. For N independent objects selected randomly from the population, say, O i x ij, j = 1, 2,, n i, i = 1, 2,, N, it follows that the joint distribution of the observations are N N qx i = i=1 G i=1 g=1 n i K g ω g j=1 k=1 p φ d xij µ, σ. 1.7 Two other notations are introduced here: µ and σ will respectively denote the collection of all {µ, k = 1, 2,, K g, g = 1, 2,, G} and {σ, k = 1, 2,, K g, g = 1, 2,, G} vectors. Our goal is to make inference about the unknown parameters x = G, ω, K, p, µ, σ based on the joint distribution in A Bayesian Framework for Inference. For the subsequent text, we introduce some additional notations. The symbol IS denotes the indicator function of the set S, that is IS = 1 if S is true, and 0, otherwise. The notation A, B, C, D, denotes the distribution of random variables A, B, conditioned on C, D, with πa, B, C, D, denoting the specific form of the conditional distribution. The notation πa, B, denotes the distribution of A, B, given the rest of the parameters. We specify a joint prior distribution on x as follows: 1 G and K: We assume πg, K = πg πk G = π 0 G G π 0 K g 2.1 where π 0 is the discrete uniform distribution between G min and G max respectively, K min to K max, both inclusive, for G respectively, K g. 2 The first and second level mixing proportions: We assume g=1 πω, p G, K = G! D G ω δ ω Iω 1 < ω 2 < < ω G G D Kg p g δ p, 2.2 where D H δ denotes the H-dimensional Dirichlet density with the H-component baseline measure δ, δ,, δ, where δ is a pre-specified constant, and p g p 1g, p 2g,, p Kg,g. g=1

4 4 DASS & LI The indicator function arises due to the imposed identifiability constraint 1.3 on ω; it follows that G! is the appropriate normalizing constant for this constrained density which is obtained by integrating out ω and noting that D G ω δ ω is invariant under different permutations of ω. ω 1, ω 2,, ω G are exchangeable. 3 The prior on the mean vector is taken as πµ K, G = G K g! g=1 d K g b=2 k=1 K g k=1 φ 1 µ 1 µ 0, τ 2 I µ 1 1g < µ1 2g < < µ1 K g g φ 1 µ b µ 0, τ The indicator function appears due to the identifiability constraint 1.3 imposed on µ with resulting normalizing constant K g! for each g = 1, 2,, G. 4 The prior distribution of the variances is taken as πσ K, G = K G g g=1 d k=1 b=1 2 IG σ b α0, β where IG denotes the inverse gamma distribution with prior shape and scale parameters α 0 and β 0, respectively. The joint prior distribution on x = G, ω, K, p, µ, σ is thus given by πx = πg, K πω, p G, K πµ G, K πσ G, K 2.5 where the component priors are given by equations The prior on x depends on the hyper-parameters δ p,δ ω,g max,g min,k min,k max, µ 0 τ 2, α 0 and β 0, all of which need to be specified for a given application. This will be done in Sections 5 and 6. The likelihood 1.7 involves several summations within each product term and is simplified by augmenting variables to denote the class labels of the individual observations. Two different class labels are introduced for the two levels of mixtures: The augmented variable W W 1, W 2,, W N denotes the class label of the G sub-populations, that is, W i = g whenever object i, O i, arises from the g-th subpopulation, and Z Z 1, Z 2,, Z N with Z i Z ij, j = 1, 2,, n i, where Z ij = k for 1 k K g if x ij arises from the k-th mixture component φ d µ, σ. We denote the augmented parameter space by the same symbol x as before, that is, x = G, ω, K, p, µ, σ, W, Z. The augmented likelihood is now lg, ω, K, p, µ, σ, W, Z = N with priors on W and Z given by n i G K g i=1 j=1 g=1 k=1 φ d x ij µ, σ IZij=k,Wi=g. 2.6 πw, Z G, K, ω, p = πw G, ω πz G, K, W, p 2.7

5 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 5 where πw G, ω = N G i=1 g=1 ωiw i=g g and πz G, K, W, p = G g=1 i : W i =g ni Kg j=1 k=1 piz ij=k. Based on the augmented likelihood and prior distributions, one can write down the posterior distribution up to a normalizing constant via Bayes Theorem. The posterior has the expression πx data lg, ω, K, p, µ, σ, W, Z πw, Z G, K, ω, p based on 2.5, 2.6 and 2.7. πg, K, ω, p, µ, σ Posterior Inference. The total number of unknown parameters in the hierarchical mixture model depends on the values G and K. Thus, the posterior in 2.8 can be viewed as a probability distribution on the space of all hierarchical mixture models with varying dimensions. To obtain posterior inference for such a space of models, Green 1995 and Green and Richardson 1997 developed the RJMCMC for Bayesian inference. In this paper, we develop a RJMCMC approach to explore the posterior distribution in 2.8 resulting from the hierarchical mixture model specification. We briefly discuss the most general RJMCMC framework here. Let x and y be elements of the model space with possibly differing dimensions. The RJMCMC approach proposes a move, say m, with probability r m. The move m takes x to y via the proposal distribution q m x, y. In order to maintain the time reversibility condition, we require to accept the proposal with probability αx, y = min { πy data 1, πx data r m q m y, x r m q m x, y } ; 3.1 in 3.1, q m y, x represents the probability of moving from y to x based on the reverse move m, and πx data denotes the posterior distribution of x given data. It is crucial that the moves m and m be reversible see Green 1995, meaning that the densities q m x, y and q m y, x have the same support with respect to a dominating measure. In case y represents the higher dimensional model, we can first sample u from a proposal q 0 x, u with possible dependence on x, and then obtain y as a one-to-one function of x, u. In that case, the proposal density q m x, y in 3.1 is expressed as [ ] y q m x, y = q 0 x, u/det 3.2 x, u y where x,u denotes the Jacobian of the transformation from x, u to y, and det represents the absolute value of its determinant. If the triplet x, u, y involves some discrete components, then the Jacobian of the transformation is obtained by the one-to-one map of the continuous parts of y and x, u, which can depend on the values realized by the discrete components. For the inference on hierarchical mixture models, five types of updating steps are considered with reversible pairs of moves, m, m, corresponding to moves in

6 6 DASS & LI spaces of varying dimensions. The steps are: 1 Update G with m, m G-split, G-merge, 2 Update K G, ω, W with m, m K-split, K-merge, 3 Update ω G, K, W, Z, p, µ, σ, 4 Update W, Z G, K, ω, p, µ, σ, and 5 Update p, µ, σ G, K, ω, W, Z. 3.3 The steps 3-5 do not involve jumps in spaces of varying dimensions, and can be carried out based on a regular Gibbs proposal Update G. To discuss the G-split and G-merge moves, we let x and y denote two different states of the model space, that is, x = G, ω, K, p, µ, σ, W, Z and y = G, ω, K, p, µ, σ, W, Z 3.4 where the s in 3.4 denote a possibly different setting of the parameters The G-merge move. The G-merge move changes the current G to G 1 that is, G = G 1 and is carried out based on the following steps: STEP 1: First, two of the G components, say g 1 and g 2 with g 1 < g 2, are selected randomly for merging into a new component g. The first level mixing proportions are merged as ω g = ω g1 + ω g2. STEP 2: The K-components in K corresponding to g 1 and g 2 are, respectively, K g1 and K g2. These are combined to obtain the new K-value, K g, in the following way. Adding K g1 + K g2 = K t, { Kt + 1/2 if K K g = t is odd, and 3.5 K t /2 if K t is even. STEP 3: Next, p g1, µ g1, σ g1 and p g2, µ g2, σ g2 are merged to obtain p g, µ g, σ g as follows. The identifiability conditions of 1.6 holds for g = g 1 and g = g 2, and must be ensured to hold for g = g after the merge step. To achieve this, the K t µ s are arranged in increasing order µ 1 µ 2 µ Kt 1 µ Kt 3.6 with associated probability p j for µ j, for j = 1, 2,, K t. Thus, p j are a rearrangement of the K t probabilities in p g1 and p g2 according to the partial ordering on µ g1 and µ g2 in 3.6. First, the case when K t is even is considered. Adjacent µ values in 3.6 are paired µ 1 µ 2 }{{} µ 3 µ 4 }{{} µ K t 1 µ Kt }{{}, 3.7 and the corresponding g parameters are obtained using the formulas p = p 2k 1 +p 2k 2, µ = p 2k 1µ 2k 1 + p 2k µ 2k p 2k 1 + p 2k, and σ = p 2k 1σ 2k 1 + p 2k σ 2k p 2k 1 + p 2k, 3.8

7 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 7 for k = 1, 2,, K g. STEP 4: To obtain W and Z, objects with W i = g 1 or W i = g 2 are relabeled as W i = g. For these objects, the allocation to the K g components is carried out using a Bayes allocation scheme. The probability that object i is assigned to component k is P Zij = k Wi = g p = φ dx ij µ, σ Kg k=1 p φ d x ij µ, σ. 3.9 for k = 1, 2,, K g. The allocation of all x ij to the K g components is the product of the above probabilities, namely, P mergealloc = n i i : Wi =g j=1 P Z ij = k ij W i = g 3.10 where k ij are the realized values of k when the allocation is done for each observation x ij. When K t is odd, an index, i 0 is selected at random from the set of all odd integers up to K t, namely, {1, 3, 5,, K t }. The triplet p i0, µ i0, σ i0 is not merged with any other indices but the new p i 0 = p i0 /2. The remaining adjacent indices are merged according to STEP 3 above. For the G-merge step, the proposal density, q m x, y, is given by { G 1 q m x, y = 2 P mergealloc if Kt is even, and 1 P mergealloc 2 K t +1 if K t is odd. G 2 This completes the G-merge move The G-split move. The split move is reverse to the merge step above and is carried out in the following steps: STEP 1: A candidate G-component for split, say g, is chosen randomly with probability 1/G. The split components are denoted by g 1 and g 2. The first level mixing probability, ω g, is split into ω g1 and ω g2 by generating a uniform random variable, u 0, in [0, 1] and setting ω g1 = u 0 ω g and ω g2 = 1 u 0 ω g. STEP 2: The value of K g is transformed to K t where K t is either 2K g 1 or 2K g with probability 1/2 each. Once K t is determined, a pair of indices K g1, K g2 is selected randomly from the set of all possible pairs of integers in { K min, K min + 1,, K max } 2 satisfying K g1 + K g2 = K t. If M 0 is the total number of such pairs, then the probability of selecting one such pair is 1/M 0. The selection of K g1 and K g2 determines the number of second level components in the g 1 and g 2 groups. STEP 3: The aim now is to split each component of the triplet p g, µ g, σ g into 2 parts: p g1, µ g1, σ g1 and p g2, µ g2, σ g2 such that both µ g1 and µ g2 satisfy the constraints 1.6 for g = g 1 and g 2. The case of K g1 +K g2 = 2K g is first considered.

8 8 DASS & LI u 1g u 2g u u Kgg 2p 1g 2p 2g 2p 2p Kg g v 1g v 2g v v Kg,g p 1 1g p 2 1g p 1 2g p 2 2g p 1 p 2 p 1 K g g p 2 K g g µ 1g µ 2g µ µ Kgg y 1g ỹ 1g y 2g ỹ 2g y ỹ y Kg g ỹ Kg g σ 1g σ 2g σ σ Kg g z 1g z 1g z 2g z 2g z z z Kg g z Kg g Fig 1. Diagram showing the split of 2p g, µ g and σ g. The partial ordering refers to the ordering of the µ 1 s. The variables u, k = 1, 2,, K g determine how many splits out of two go to component g 1 for each k. The right arrows represents the sequential split for µ g and σ g. A sketch of the split move is best described by the diagram in Figure 1, which introduce the additional variables to be used for performing the split. In Figure 1, 2p g is considered for splitting because the two split components will represent the second level mixing probabilities of g 1 and g 2, the sum of which together equals 2. For each k, the variable u in Figure 1 takes three values, namely, 0, 1 and 2 that respectively determines if the split components of 2p, µ and σ either 1 both go to component g 2, 2 one goes to component g 1 and the other goes to g 2, or 3 both go to g 1. The variables u, k = 1, 2,, K g must satisfy several constraints: 1 K g k=1 u = K g1, 2 u = 1 for any k such that p > 0.5, and 3 k : u =h 2p < 1 for h = 0, 2. Restriction 1 means that the number of components that go to g 1 must be K g1 which is already pre-selected. The need for restriction 2 can be seen as follows: If u = 0 or 2, and p > 0.5, the total probability 2p will be assigned to g 1 or g 2, and the sum of the second level mixing probabilities for that g component will be greater than 1, which is not possible. Restriction 3 is necessary to ensure that second level mixing probabilities for both g 1 and g 2 are non-negative see equation To generate the vector u u 1g, u 2g,, u Kgg, we consider all combinations of u {0, 1, 2} Kg, and reject the ones that do not satisfy the three restrictions. From the total number of remaining admissible combinations, M 1 say, we select a vector u randomly with equal probability 1/M 1. Once u has been generated, a random vector v v, k = 1, 2,, K g is generated to split 2p g see Figure 1. Some notations are in order: Let A 0 = { k : u = 0}, A 1 = { k : u = 1} and A 2 = { k : u = 2}. As in the case of u, a few restrictions also need to be placed on the vector v. To see what these restrictions are, we denote p 1 = 2v p and p 2 = 21 v p 3.12

9 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 9 for k = 1, 2,, K g, to be the split components from 2 p. Note that depending on the value of u = 0, 1, or 2, the split components, p 1 and p2, are either both assigned to component g 2, one to g 1 and the other to g 2, or both to g 1. For the case u = 1, we will assume that p 1 is the split probability that goes to g 1 and p 2 goes to g 2. Note that the mixing probabilities for both components g 1 and g 2 should equal 1. This implies p p = 1 and p p = k : k A 1 k : k A 2 k : k A 1 k : k A 0 for components g 1 and g 2, respectively. The second equation of 3.13 is redundant if the first is assumed since k : k A 1 p 1 + k : k A 2 2 p + k : k A 1 p 2 + k : k A 0 2 p = 2 K g k=1 p = 2. We re-write the first equation as k : k A 1 a k v = where a k = 2p /1 k : k A 2 2p. Equation 3.14 implies that the entries of the vector v are required to satisfy two restrictions: 1 0 v 1 for k = 1, 2,, K g from 3.12, and 2 Equation 3.14 above. In the Appendix, an algorithm is given to generate such a v where the proposal density can be written down in the closed form see 7.2. The next step in the G-split move is to split µ g and σ g. Each component of µ g = µ, k = 1, 2,, K g and σ g = σ, k = 1, 2,, K g in Figure 1 are split sequentially starting from k = 1, then k = 2 and so on until k = K g. At the k-th stage, µ is split into the components y and ỹ where y is d-dimensional vector consisting of the entries y 1, y2,, yd, and ỹ = ỹ 1, ỹ2,, ỹd. Similarly, σ is split into the components z z 1, z2,, zd and z z 1, z2,, zd. The collection of variables { y, k = 1, 2,, K g } and { z, k = 1, 2,, K g } are denoted by y and z, respectively, and represent the additional variables that require to be generated for the split, via the proposal distribution q 0 y, z, say. The remaining variables with s are obtained by solving the vector equations p 1 y + p 2 ỹ p 1 + p2 = µ and p 1 z + p 2 z p 1 + p2 = σ 3.15 componentwise. We describe the split move further to see what properties q 0 y, z should satisfy. While the values of y and ỹ respectively, z and z are candidate values for µ 1 and µ 2 respectively, σ 1 and σ 2, they are still not quite so since µ g1 and µ g2 must satisfy the constraints 1.6. To achieve this, we introduce two functions operating on d-dimensional vectors. The vector-valued min and max functions are defined as follows: For each s = 1, 2,, S, let a s and

10 10 DASS & LI b s denote two d-dimensional vectors given by a s = a 1 s, a 2 s,, a d s and b s = b 1 s, b 2 s,, b d s. We define min a s, b s = a s and max a s, b s = b s 3.16 for each s = 1, 2,, S if a 1 b 1 recall this is by definition a 1 1 b 1 1, and vice versa when b 1 a 1. Thus, the maximum and minimum functions above operate on the indices s 1 with output depending on the index s = 1. In the present case, consider the maximum and minimum functions defined as in 3.16 for each k = 1, 2,, K g. Here, S = 3 with a 1 = y, a 2 = z, and a 3 = p 1, and b 1 = ỹ, b 2 = z, and b 3 = p 2. If y ỹ, then it follows that min y, ỹ = y, max y, ỹ = ỹ, min z, z = z, max z, z = z, and 3.17 min p 1, p2 = p 1, max p 1, p2 = p 2 ; if ỹ y, then the opposite holds true. To ensure that the constraints 1.6 hold, y is generated in a way so that max y k 1g, ỹ k 1g y, ỹ µ k+1g 3.18 in the sequential procedure for k = 1, 2,, K g. In 3.18, maximum function max y 0g, ỹ 0g for k = 1 respectively, µkg +1g for k = K g is defined to be the vector of lower respectively, upper bounds for the means. In the application to fingerprint images in Section 6, each image has size which implies that the componentwise lower and upper bounds are, respectively, 0 and 500. For z and z, we require these variables to satisfy the constraints z 0 and z 0 componentwise since they are candidate values for σ 1 and σ 2. Thus, the generation of y and z, via the proposal distribution q 0 y, z, requires that 3.18, z 0, and z 0 be satisfied. A proposal density that achieves this is discussed in the Appendix. The values of each triplet p g1, µ g1, σ g1 and p g2, µ g2, σ g2 can now be obtained. A sequential procedure is again adopted. The post-split parameters p gh, µ gh and σ gh, h = 1, 2, are initialized to the empty set. Starting from k = 1, the sets are appended as follows: For h = 0, 2, define h 1 = 2 and h 2 = 1 if h = 0, and h 1 = 1 and h 2 = 2 if h = 2. For k = 1, 2,, K g, if k A h, p gh1 = p gh1, min p 1, p2, max p 1, p2, p gh2 = p gh2, µ gh1 = µ gh1, min y, ỹ, max y, ỹ, µ gh2 = µ gh2, 3.19 σ gh1 = σ gh1, min z, z, max z, z, σ gh2 = σ gh2, and if k A 1, p g1 = p g1, p 1, p g2 = p g2, p 2, µ g1 = µ g1, y, µg2 = µg2, ỹ, σg1 = σ g1, z, and σ g2 = σ g2, z.

11 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 11 The above procedure guarantees that the post-split components µ g1 and µ g2 satisfy the constraints 1.6. At this point, we can explicitly determine some of the components of y in equation 3.4; we have G = G + 1, K = K {K g1, K g2 } \ {K g }, p = p {p g1, p g2 } \ {p g }, µ = µ {µ g1, µ g2 } \ {µ g } and σ = σ {σ g1, σ g2 } \ {σ g }. When K g1 +K g2 = 2K g 1, an index i 0 is selected from the set I 0 = { k : 2p < 1 } with probability 1/ I 0. The component with index i 0 is not split, and assigned a value of u i0g of either 0 or 2. For this case, u = u 1g, u 2g,, u Kg 1g, u i0g is chosen from the product space {0, 1, 2} Kg 1 {0, 2}, with M 1 denoting the number of admissible combinations satisfying the three restrictions on u. After selecting a u, we define p i = 2p 0g i 0 g, y i 0 g = ỹ i 0 g = µ i 0 g, and z i 0 g = z i 0 g = σ i 0 g where g is either g 1 or g 2 depending on the selected u. The split procedure above is carried out for the remaining indices k i 0. STEP 4: To complete the G-split proposal, we require to obtain the new first and second level labels, W and Z, in y see 3.4. All objects with labels W i = g are split into either Wi = g 1 or Wi = g 2 with allocation probabilities obtained as follows: Define Q i g h = n i Kgh j=1 k=1 p h φ d x ij µ h, σ h with h = 1, 2 for the i-th object. The W -allocation probabilities for components g 1 and g 2 are given by P W i = g h = Q i g h /Q i g 1 + Q i g for h = 1, 2. Once Wi has been determined, the Zij s are determined from the Bayes allocation probabilities 3.9 which is denoted here by Q ij k, g h for h = 1, 2. It follows that the allocation probability for the G-split move is P splitalloc = h=1,2 i : W i=g ih Q i g ih n i j=1 Q ij k ij, g ih 3.21 where g ih is the realized value of g h for the i-th object, and k ij are the realized values of k for the second level labels Z ij. Dass and Li 2008 give the proposal density for the G-split move as q m x, y = R 0 q 0 v q 0 y, z P splitalloc G M 0 M 1 [ ] y det x, u 3.22 where R 0 = 1/2 or 1/2 I 0 according to whether K g1 + K g2 = 2K g or 2K g 1 is chosen; in 3.22, [ ] y det x, u = 2 2Kg 1 ω g k A 0 A 2 A c 2 p K g k=1 2d 1 + p1 p is the absolute value of the Jacobian of the transformation from x, u y, and A c 1 is the set A 1 excluding the largest element. The explicit expression of 3.23 is derived in the Appendix.

12 12 DASS & LI G-split y, u x G-merge Fig 2. Figure showing the G-split and G-merge proposals as a reversible pair of moves. We conclude the G-split and G-merge sections with a note on establishing reversibility of the two moves. The G-merge proposal move m takes x to y with proposal density given by q m x, y in However, the acceptance probability in 3.1 also requires the proposal density, q m y, x, to move from y to x based on the reverse move m. In order to show that m is precisely the G-split move, we require to show that given x and y, there is a unique u such that x can be obtained from the combination of y, u via the G-split move see Figure 2. We demonstrate this in the next paragraph. For the G-split move, the variables in u is given by u = u 0, K t, u, v, y, z. These variables have the same interpretation as in the G-split move discussed earlier. Now, we check to see if u can be determined from x and y. First, the variable u 0 can be determined from u 0 = ω g1 /ω g. Second, the value of K t = K g1 + K g2. Note that K g alone cannot determine K t since K t is either 2K g or 2K g 1 with probability 1/2 each. However, with information on K g1 and K g2, K t is uniquely determined. Next, to get u, we rearrange the components of µ g1 and µ g2 in the increasing order 3.6. If K t is even, K g K t /2 adjacent pairs of µs are formed, and u is assigned the values 0, 1 or 2 since it is known from which component either g 1 or g 2 the two means in each of the k pairs came from. The case for odd K t can be similarly handled since one of the µ components is not paired, and subsequently, u for that component is either 2 or 0 depending on whether the µ component came from g 1 or g 2. Once u is obtained, v can be determined in the following way: The components of p g1 and p g2 are arranged according to the increasing order of µs. Suppose the k-th pair consists of µ k g µ k g with corresponding probabilities p k g and p k g, where k, k, g and g are some indices of k and g resulting from the ordering. The value of v = p k g /2p where p k g + p k g /2 = p is the merged probability in the G-merge move. Next, we obtain the values of y and z. In the case u = 1, y respectively, ỹ equals to the µ-value that came from component g 1 respectively, g 2 in the pair µ k g, µ k g. In the case when u = 0 or 2, y and ỹ can be determined only up to min y, ỹ and max y, ỹ. Subsequently, the proposal density

13 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 13 q 0 y, z depends only on min y, ỹ and min y, ỹ for these values of u. A similar argument as above can be made for the z and z Update K. We consider the move types K-split type m and K-merge move type m. The update of K is carried out for fixed G by selecting a component g on which K g will be updated to either K g 1 or K g + 1. For the K-merge move, we select two adjacent components for merging where adjacency is determined by the partial ordering 1.6. The merged mixing probability, mean and variance for the new component, k, are given by p k g = p + p k+1g, µ k g = p µ + p k+1g µ k+1g p + p k+1g and σ k g = p σ + p k+1g σ k+1g p + p k+1g The objects with W i = g and Z ij = k or k + 1 are merged into a newly relabelled bin W i = g and Z ij = k. For the K-split move, we first select a component that we want to split, k, which will be split into k 1 and k 2. A uniform random variable u 0 is selected to split p into p k1 g and p k2 g in the following way: p k1 g = u 0 p and p k2 g = 1 u 0 p Next, µ and σ are split by generating the variables y and z as in the case for the G-split move but now for a single k only. As in the G-split move, y and ỹ are required to satisfy a similar constraint of the form µ k 1g y, ỹ µ k+1g, 3.26 so that the post-split µ parameters satisfy the restriction 3.26, and subsequently 1.6. Once y and z are generated, the assignments to µ kh g and σ kh g, h = 1, 2 are done as follows: µ k1g = miny, ỹ, µ k2g = maxy, ỹ, σ k1g = minz, z, σ k2g = maxz, z. Objects with W i = g and observation labels Z ij = k are allocated to component k 1 or k 2 based on the Bayes allocation probabilities given by 3.9 with fixed W i = g Update Other Steps. The update of the other quantities in steps 3-5 of equation 3.3 can be done via regular Gibbs sampler since they do not involve models in spaces of varying dimensions. We give the summary steps here for completion. Update ω G, K, W, Z, p, µ, σ: The conditional posterior distribution of ω given the remaining parameters is given by G πω g=1 ω δ ω+n g 1 g Iω 1 < ω 2 < ω G 3.27

14 14 DASS & LI where N g = N i=1 IW i = g is the number of objects with label W i = g. Equation 3.27 is the order statistic distribution of a Dirichlet with parameters δ ω +N 1, δ ω + N 2,, δ ω + N G and can be easily simulated from. Update W, Z G, K, ω, p, µ, σ: The conditional posterior distribution of W i, Z i is independent of each other. The update of W i and Z i W i based on the conditional posterior distribution is the Bayes allocation scheme of 3.20 and 3.9. Update p, µ, σ G, K, ω, W, Z: The conditional posterior distribution of p = {p g, g = 1, 2,, G} is given by πp G K g g=1 k=1 p δ π+n where N = N ni i=1 j=1 IW i = g, Z ij = k is the number of observations x ij with W i = g and Z ij = k. Thus, each p g is independent Dirichlet with parameters δ p + N 1g, δ p + N 1g,, δ p + N Kgg. The update of µ is carried out based on generating from its conditional posterior distribution πµ. The generation scheme for µ is as follows: µ 1 1g, µ1 2g,, µ1 K g g K g k=1 φ 1 µ 1 ξ1, η1 I{µ 1 1g < µ1 2g < < µ1 independently for each g = 1, 2,, G, and for the remaining components, K g g } 3.29 µ b φ 1µ b ξb, ηb 3.30 independently for each b 2, k = 1, 2,, K g and g = 1, 2,, G; in 3.29 and 3.30, ξ b = N σ b xb τ 2 µ 0 N σ b τ 2, and η b = N σ b τ Equation 3.29 is the distribution of the order statistic from independent normals with different means and variances and can be simulated easily. The variances σ, are updated via σ b 2 2 IG σ b b α, βb, 3.32 independently of each other, where α b = α 0 + N and β b = 1/β 0 + ij x b ij µb 2 /2 1 with ij denoting the sum over all observations with W i = g and Z ij = k.

15 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS g=1 g=2 g= a g=1 g=2 g= b Fig 3. a Density and b scatter plots for objects with univariate and bivariate observables corresponding to d = 1 and d = 2, respectively Update Empty Components. The RJMCMC sampler developed also incorporates the updating of empty components into the chain. This is done with some modification to the earlier updating G and K move types. Empty components can arise naturally in the sampler when allocating the observations into the g or k components in both the G-split and K-split moves. In case of the G-split move, for example, it is possible that no objects are allocated into one of the split g components. Instead of rejecting this proposal altogether, we incorporate an additional variable, E g, that indicates whether the g component is empty; E g = 1 respectively, 0 indicates that a component is non-empty respectively, empty. The introduction of E g incorporates additional steps into the RJMCMC algorithm, namely, E-Add and E-Remove which are reversible to each other. In the E-Remove move, an empty g component is selected for removal. The only change in this case is in the subpopulation parameters ω, since after the removal of ω g, the remaining ω probabilities should sum to 1. We thus have ω g = ω g /1 ω g 3.33 for g g. In the E-Add move type, a uniform random variable u 0 in [0, 1] is generated and the probabilities ω are redistributed to include the empty component g according to ω g = u 0 and ω g = 1 u 0 ω g for g g The proposal distributions for the E-Add and E-Remove move types and the associated Jacobians are given in the Appendix. The E-Add and E-Remove reversible move types for the K components is similar, and therefore, not discussed further. 4. Convergence Diagnostics. The assessment of convergence of the RJM- CMC is carried out based on the methodology of Brooks and Guidici 1998,2000. Brooks and Guidici 1998,2000 suggests running I 2 chains from different starting values and monitoring parameters that maintain the same interpretation across different models. Six quantities for used for monitoring, namely, the overall variance, ˆV, the within chain variance, W c, within model variance W m, within model within chain variance W m W c, the between model variance, B m and the between model

16 DASS & LI 5.5 x x x x 10 4 a b c x 10 4 Fig 4. Convergence diagnostics for d = 1. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 10,000 iterations. within chain variance, B m W c. For each monitoring parameter, the corresponding three plots of ˆV and W c, W m and W m W c, and B m and B m W c against the number of iterations should be close to each other to indicate that the chains have sufficiently mixed. Our choice of the monitoring parameter is the log-likelihood of the hierarchical mixture model see Simulation. Two simulation experiments were carried out for the cases d = 1 and d = 2 with prior parameter specifications given by G min = K min = 2, G max = K max = 5, δ π = δ ω = 1, µ 0 = 7, τ 0 = 20, α 0 = 2.04 and β 0 = 0.5/α 0 1. The population of objects were simulated from G = 3 groups with population proportions ω = 0.2, 0.3, 0.5. The nested K-components were chosen to be K g = 3 for all g = 1, 2, 3. The specification of p, µ and σ are as follows: p g = 0.33, 0.33, 0.34 for all g, µ 1 = 6, 4, 2, µ 2 = 5, 7, 9 and µ 3 = 14, 17, 20. Common variances were assumed: σ g = 0.5, 0.5, 0.5 for g = 1, 2, 3. For the second experiment with d = 2, we took µ 1 = 1 6, 4, 2, µ 2 = 1 5, 7, 9, and µ 3 = 1 14, 17, 20, where 1 = 1, 1. All component variances of σ were taken to be 0.5. The total number of objects sampled from the population were N = 100 with n i the number of observables from the i-th object were iid from a Discrete Uniform distribution on the integers from 20 to 40, both inclusive. The density plot for the 3 components of the hierarchical mixture model in the case of d = 1 as well as the scatter plot for d = 2 based on a sample of observations from the population are given in Figures 3 a and b, respectively. In both experimens, the RJMCMC algorithm is cycled through the 7 updating steps 5 steps in 3.3 as well as 2 steps involving updating empty G and K components. We took the probabilities of selecting various move types to be r m = r m = 0.5 for the moves m, m = G-split,G-merge for G = G min + 1, G min + 2,, G max 1. When G = G max, r m = 0 = 1 r m and r m = 0 = 1 r m for G = G min. We used the same probabilities of selecting the K move types as above. The space of hierarchical mixture models with the above specifications consists of = 1, 360 models and so monitoring convergence of the chain becomes important. A total of I = 3 chains were chosen for monitoring with initial estimates of the hierarchical mixture model obtained using the values of G = 2, 3 and 4; Zhu,

17 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS x x 10 4 x 10 4 a b c Fig 5. Convergence diagnostics for d = 2. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 10,000 iterations. x x x 10 5 a b c Fig 6. Convergence diagnostics for the NIST Fingerprint Database. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 20,000 iterations. Dass and Jain 2007 develops an algorithm that fits a hierarchical mixture model based on an agglomerative clustering procedure on the space of standard mixtures which requires a pre-specified value of G as input, and subsequently, the three choices of G mentioned were used to get the three initial estimates. The RJMCMC algorithm was run for B = 60, 000 and convergence of the chain was monitored see Figures 4 and 5. In both experiments, the RJMCMC appear to have converged after 60, 000 iterations. Highest posterior probabilities were found to be at the true values of the parameters. 6. An Application: Assessing the Individuality of Fingerprints. Fingerprint individuality refers to the study of the extent of uniqueness of fingerprints. It is the primary measure for assessing the uncertainty involved when individuals are identified based on their fingerprints, and has been the highlight of many court cases recently. In the case of Daubert v. Merrell Dow Pharmaceuticals Daubert v. Merrell Dow Pharmaceuticals Inc., 1993, the U.S. Supreme Court ruled that in order for an expert forensic testimony to be allowed in courts, it had to be subject to five main criteria of scientific validation, that is, whether i the particular technique or methodology has been subject to statistical hypothesis testing, ii its error rates has been established, iii standards controlling the technique s operation exist and have been maintained, iv it has been peer reviewed, and v

350 300 250 200 150 100 50 0 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 18 DASS & LI Fig 7. Posterior distribution of P RC based on 1, 000 realizations of the RJMCMC after burn-in. Fig 8.

it has a general widespread acceptance see Pankanti, Prabhakar and Jain 2002, and Zhu et al 2007. Following Daubert, forensic evidence based on fingerprints was first challenged in the 1999 case of U.

18 DASS & LI Fig 7. Posterior distribution of P RC based on 1, 000 realizations of the RJMCMC after burn-in. Fig 8. Illustrating impostor minutiae matching taken from Pankanti et al A total of m = 64 and n = 65 minutiae were detected in left and right image, respectively, and 25 correspondences i.e., matches were found. it has a general widespread acceptance see Pankanti, Prabhakar and Jain 2002, and Zhu et al Following Daubert, forensic evidence based on fingerprints was first challenged in the 1999 case of U.S. v. Byron C. Mitchell, stating that the fundamental premise for asserting the uniqueness of fingerprints had not been objectively tested and its potential matching error rates were unknown. Subsequently, fingerprint based identification has been challenged in more than 20 court cases in the United States. A quantitative measure of fingerprint individuality is given by the probability of a random correspondence PRC, which is the probability that a random pair of fingerprints in the population will match with each other. Mathematically, the PRC is expressed as P RC w m, n = P S w m, n, 6.1 where S denotes the number of feature matches with distribution based on all random pairs of fingerprints from the target population, w is the observed number of matches, and m and n, respectively, are the number of features in the two fingerprint images. Small respectively, large values of the PRC indicate low respectively, high uncertainty associated with the identification decision. Here, we focus on a particular type of feature match based on minutiae. Minutiae are locations i.e., x j R 2 on the fingerprint image which correspond to

19 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 19 ridge anomalies for example, ridge bifurcations and ridge endings and are used by forensic experts to declare that two fingerprints belong to the same individual if sufficiently large number of minutiae are found to be common to both prints. Figure 8 shows an example of such a random match between two fingerprints of different individuals also called an impostor match. The number of matches is determined by the number of minutiae in the right panel that falls within a square of area 4r 2 0 centered at each minutiae in the left panel, where r 0 is a small number relative to the size of the fingerprint image. The number of matching minutiae in Figure 8 is 25, but it is not known how likely such a match occurs between a pair of impostor fingerprints in the population of individuals. The reliability of the estimated PRC depends on how well elicited statistical models fits the distribution of minutiae in the population. Thus, candidate statistical models have to meet two important requirements: i flexibility, that is, the model can represent minutiae distributions over the entire population, and ii associated measures of fingerprint individuality can be easily obtained from these models. Zhu et al demonstrated that a mixture of multivariate normals with independent components fits the distribution of minutiae well for each fingerprint. Furthermore, when m and n are large, the distribution of S in 6.1 can be approximated by a Poisson distribution with mean expected number of matches λq 1, q 2, m, n = m n pq 1, q where q h represents the normal mixture see 1.2 fitted to fingerprint h for h = 1, 2, and pq 1, q 2 is the probability of a match between a pair of random minutiae, one generated from q 1 and the other from q 2. The analytical expression for pq 1, q 2 is pq 1, q 2 = 4 r 2 0 K 1 K 2 k=1 k =1 b=1 2 φ µ b k1 µb k2, σ b k1 + σ b k2, }{{}}{{} µ σ where φ 1 µ, σ 2 is the normal density with mean µ and variance σ 2. One drawback of Zhu et al 2007 is that no statistical model is elicited on the minutiae for a population of fingerprints. The hierarchical mixture model of 1.1 is such a population model on minutiae since x j R 2 satisfying both requirements of i flexibility and ii computational ease mentioned earlier. For a fingerprint pair coming from the subpopulation g 1 and g 2, we have q 1 = q g1 and q 2 = q g2 in 6.2. Hence, it follows that the mean PRC corresponding to w observed matches in the population is given by P RCw m, n = G G g 1 =1 g 2 =1 ω g1 ω g2 P S w λh g1, h g2, m, n, 6.4 where S follows a Poisson distribution with mean λh g1, h g2, m, n. The RJMCMC algorithm developed in the previous section can now be used to obtain the posterior distribution of P RC. As an illustration, we considered 100 fingerprint images from the NIST Special Database 4 as a sample from a population of fingerprints. A total

20 20 DASS & LI of I = 3 chains were run with starting values given by the algorithm of Zhu et al for the cases G = 1, 2 and 3. Figure 6 gives the diagnostics plots of the RJMCMC sampler which establish convergence after a burn-in of B = 100, 000 runs. The posterior distribution of P RC corresponding to m = 64, n = 65, w = 25 and r 0 = 15 pixels based on 1, 000 realizations of the RJMCMC after the burn-in period is given in Figure 7. The posterior mean and standard deviation in Figure 7 is and , respectively, and the 95% HPD interval is [0.63, 0.735], approximately. We conclude that if a fingerprint pair was chosen from this population with m = 64, n = 65 and an observed number of matches w = 25, there is high uncertainty in making a positive identification. Our analysis actually indicates that the fingerprints in Figure 8 represent a typical impostor pair. The 95% HPD set suggests that the PRC can be as high as 0.735, that is, about 3 in every 4 impostor pairs can yield 25 or more matches. 7. Summary and Future Work. We have developed Bayesian inference methodology for the inference on hierarchical mixture models with application to the study of fingerprint individuality. Our future work will be to derive hierarchical mixture models on the extended feature space consisting of minutiae and other fingerprint features. In this paper, we only considered a two level hierarchy. The US- VISIT program now requires individuals to submit prints from all 10 fingers. This is the case of a 3-level hierarchical mixture model; in the first top level, individuals form clusters based on similar characteristics of their 10 fingers, and the distribution of features in each finger is modelled using standard mixtures. Hierarchical mixture models have potential use in other areas as well, including the clustering of soil samples objects based on soil characteristics which can be modelled by a mixture or a transformation of mixtures. Acknowledgment. The authors wish to acknowledge the support of NSF DMS grant no while conducting this research. References. [1] Brooks, S. P. and Giudici, P Convergence assessment for Reversible Jump MCMC Simulations. Bayesian Statistics 6, Oxford University Press. [2] Brooks, S. P. and Giudici, P Markov Chain Monte Carlo Convergence Assessment vis Two-Way Analysis of Variance. Journal of Computational and Graphical Statistics, 9, 2, [3] Zhu, Y., Dass, S. C., and Jain, A. K Statistical Models for Assessing the Individuality of Fingerprints. IEEE Transactions on Information Forensics and Security, 2, 3, [4] Escobar, M. and West, M Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90, 430, [5] Roeder, K. and Wasserman, L Practical Bayesian Density Estimation Using Mixtures of Normals. Journal of the American Statistical Association, 92, 439, [6] Woo, M. J. and Sriram, T. N Robust estimation of mixture complexity for count data. Computational Statistics & Data Analysis, 51, 9, [7] Ishwaran, H., James, L.F., and Sun, J Bayesian model selection in finite mixtures by marginal density decompositions. Journal of the American Statistical Association, 96, 456, [8] Green, P. and Richardson, S On the Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society Series B, 59, 4,

21 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 21 [9] Green, P Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 4, [10] NIST Special Database 4. NIST: 8-Bit Gray Scale Images of Fingerprint Image Groups FIGS. Online: [11] Pankanti, S. and Prabhakar, S. and Jain, A. K On the Individuality of Fingerprints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 8, [12] Daubert v. Merrel Dow Pharmaceuticals Inc U.S. 579, 113 S. Ct. 2786, 125 L.Ed.2d 469. Appendix Generating v. The generation of v is discussed here. Let A 1 = T denote the cardinality of the set A 1 and let A c 1 be all k indices in A 1 excluding the largest one. Without loss of generality, we can relabel the k ordered indices in A 1 as 1, 2,, T. It follows that the indices of A c 1 are 1, 2,, T 1. The restriction k : k A 1 a k v k = 1 can be rewritten as v T = 1 T 1 k=1 a kv k /a T. Since 0 v T 1, it follows that the T 1 indices in A c 1 must satisfy the inequality 1 a T T 1 k=1 a k v k 1. Also note that 0 T 1 k=1 a k v k T 1 k=1 a k since each 0 v k 1. Combining these inequalities, we get the following restriction on the T 1 free parameters of v: max0, 1 a T T 1 k=1 T 1 a k v k min1, k=1 a k 7.1 Let C = { v 1, v 2,, v T 1 : Equation 7.1 is satisfied and 0 v k 1 }. It follows that C is a convex polyhedral in the unit hypercube [0, 1] T 1. The challenge now is to generate v 1, v 2,, v T 1 from C and be able to write down the proposal density q 0 v 1, v 2,, v T 1 in a closed form. To do this, we determine the set of all extreme points of C. There are a total of T inequality constraints on v 1, v 2,, v T 1 : 1 T 1 constraints of the form 0 v k 1 and 2 one constraint of the form A T 1 k=1 a k v k B given by equation 7.1. Extreme points of a convex polyhedral in T 1 dimensions are formed by T 1 active equations. We can select T 1 candidate active equations from the T constraints above, solve for v 1, v 2,, v T 1 based on the T 1 equations and then check based on the remaining equation whether the solution obtained is admissible. For example, we may select all v k = 0 for k = 1, 2,, T 1 from 1. Plugging in 0, 0,, 0 in the remaining equation 2, we get T 1 k=1 a k v k = 0. So, if a T < 1 respectively, a T 1, we get 0 < A respectively, 0 A, giving an inadmissible respectively, admissible solution. Another candidate extreme point is formed by selecting the first T 2 v k s to be zero and solving v T 1 from the equation T 1 k=1 a k v k = B. The solution here is 0, 0, 0,, 0, B/a T 1 and will be admissible if 0 B/a T 1 1 since the inequality that was not used is 0 v T 1 1. It is easy to see that there are T 1 2 T 1 candidate extreme points to be checked for admissibility; we have to select T 1 constraints from T first which can be done in T 1 ways. Next, for each of the T 1 selected constraints, we can select either the lower or upper bounds of the constraints for candidate active equations.

On the Individuality of Fingerprints: Models and Methods

On the Individuality of Fingerprints: Models and Methods Dass, S., Pankanti, S., Prabhakar, S., and Zhu, Y. Abstract Fingerprint individuality is the study of the extent of uniqueness of fingerprints and