A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS. By Sarat C. Dass and Mingfei Li Michigan State University

Size: px
Start display at page:

Download "A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS. By Sarat C. Dass and Mingfei Li Michigan State University"

Transcription

1 Technical Report RM669, Department of Statistics & Probability, Michigan State University A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS By Sarat C. Dass and Mingfei Li Michigan State University Hierarchical mixture models arise naturally for clustering a heterogeneous population of objects where observations made on each object follow a standard mixture density. Hierarchical mixtures utilize complementary aspects of mixtures at different levels of the hierarchy. At the first top level, the mixture is used to perform clustering of the objects, while at the second level, nested mixture models are used as flexible representations of distributions of observables from each object. Inference for hierarchical mixtures is more challenging since the number of unknown mixture components arise in both the first and second levels of the hierarchy. In this paper, a Bayesian approach based on Reversible Jump Markov Chain Monte Carlo methodology is developed for the inference of all unknown parameters of hierarchical mixtures. Our methodology is then applied to the clustering of fingerprint images and used to assess the variability of quantities which are functions of the second level mixtures. 1. Hierarchical Mixture Models. Consider an object, O, selected at random from a heterogenous population, P, with G unknown clusters. Let X x 1, x 2, x 3, denote the observables on O where x j x 1 j, x 2 j,, x d j is a d-variate random vector in R d. A hierarchical mixture model for the distribution of O in the population is qx = G n ω g g=1 j=1 q g x j 1.1 where x = x 1, x 2,, x n are the n observations made on O, ω g, g = 1, 2,, G are the G cluster proportions with ω g > 0 and G g=1 ω g = 1, q g is the mixture density for the g-th cluster given by q g x = K g k=1 p f x θ, 1.2 with f denoting a density with respect to the Lebesgue measure on R d, p denoting the mixing probabilities satisfying: 1 p > 0 and 2 K g k=1 p = 1, and θ denoting the set of all unknown parameters in f. Identifiability of the hierarchical mixture model of 1.1 is achieved by imposing the constraints ω 1 < ω 2 < < ω G and θ 1g θ 2g θ Kg g, 1.3 Research supported under NSF DMS grant no AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35 Keywords and phrases: Model-based clustering, Gaussian mixtures, Bayesian inference, Reversible Jump Markov Chain Monte Carlo methods, Fingerprint individuality 1

2 2 DASS & LI for each g = 1, 2,, G, where is a partial ordering to be defined later. The set of all unknown parameters in the hierarchical mixture model 1.1 is denoted by x = G, ω, K, p, θ where ω = ω 1, ω 2,, ω G, K = K 1, K 2,, K g, p = p, k = 1, 2,, K g, g = 1, 2,, G, and θ = θ, k = 1, 2,, K g, g = 1, 2,, G. Hierarchical mixture models of 1.1 arise naturally when the goal is to cluster a population of objects, where observables from each object follow a standard mixture distribution. At the first top, or G level, the mixture is used to perform clustering of the objects, while at the second or K g level, nested mixture models nested within each g = 1, 2,, G specification are used as flexible representations of distributions of observables. The unknown number of mixture components, or mixture complexity, arise at both levels of the hierarchy, and is, therefore, more challenging to estimate compared to standard mixtures. Estimating mixture complexity has been the focus of intense research for many years resulting in various estimation methodologies in a broad application domain. Non-parametric methods were developed in Escobar and West 1995 and Roeder and Wasserman 1997, whereas Ishwaran et al and Woo and Sriram 2006 developed methodology for the robust estimation of mixture complexity for count data. In a Bayesian framework, the work of Green and Richardson 1997 demonstrated that both issues of estimating mixture parameters can be unified if viewed as a problem of model selection. With this view in mind, Green and Richardson 1997 developed a Reversible Jump Markov Chain Monte Carlo RJMCMC approach for the estimating mixture complexity by exploring the space of models of varying dimensions. In this paper, we develop a Bayesian framework based on RJMCMC for the inference on hierarchical mixture models. One advantage of our approach is that we are able to assess the variability of the cluster estimate, which in turn, can be used to ascertain the variability of quantities that are functions of the clusters. We present one such example based on clustering a sample of fingerprint images. One quantity that is of special interest is the probability of a random correspondence PRC which measures to what extent two randomly chosen fingerprints from a population match with each other. The RJMCMC methodology developed in this paper provides an estimate of its mean and variance. In the subsequent text, we assume each f is multivariate normal with mean vector µ µ 1, µ2,, µd R d and covariance matrix Rd R d. Our analysis on fingerprint images in Section 6 revealed that it is adequate to consider diagonal covariance matrices of the form = diag σ 1 σ b 2 is the variance of the b-th component of the multivariate normal dis- where tribution. Thus, we have 2, σ 2 2,, f x θ = φ d x µ, σ = d b=1 φ 1 x b µ b, σ b 2 σ d where φ 1 µ, σ 2 denotes the density of the univariate normal distribution with

3 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 3 mean µ and variance σ 2, and σ 2 σ 1, σ 2 2,, σ d 2 is the d- variate vector of the variances. The second identifiability condition of 1.3 is reexpressed in terms of the first component of the mean vector as µ 1 1g < µ1 2g < < µ1 K g g. 1.5 In the subsequent text, the identifiability condition 1.5 based on the first components of µ for k = 1, 2,, K g will be re-written using the symbol as µ 1g µ 2g µ Kgg 1.6 for each g = 1, 2,, G. For N independent objects selected randomly from the population, say, O i x ij, j = 1, 2,, n i, i = 1, 2,, N, it follows that the joint distribution of the observations are N N qx i = i=1 G i=1 g=1 n i K g ω g j=1 k=1 p φ d xij µ, σ. 1.7 Two other notations are introduced here: µ and σ will respectively denote the collection of all {µ, k = 1, 2,, K g, g = 1, 2,, G} and {σ, k = 1, 2,, K g, g = 1, 2,, G} vectors. Our goal is to make inference about the unknown parameters x = G, ω, K, p, µ, σ based on the joint distribution in A Bayesian Framework for Inference. For the subsequent text, we introduce some additional notations. The symbol IS denotes the indicator function of the set S, that is IS = 1 if S is true, and 0, otherwise. The notation A, B, C, D, denotes the distribution of random variables A, B, conditioned on C, D, with πa, B, C, D, denoting the specific form of the conditional distribution. The notation πa, B, denotes the distribution of A, B, given the rest of the parameters. We specify a joint prior distribution on x as follows: 1 G and K: We assume πg, K = πg πk G = π 0 G G π 0 K g 2.1 where π 0 is the discrete uniform distribution between G min and G max respectively, K min to K max, both inclusive, for G respectively, K g. 2 The first and second level mixing proportions: We assume g=1 πω, p G, K = G! D G ω δ ω Iω 1 < ω 2 < < ω G G D Kg p g δ p, 2.2 where D H δ denotes the H-dimensional Dirichlet density with the H-component baseline measure δ, δ,, δ, where δ is a pre-specified constant, and p g p 1g, p 2g,, p Kg,g. g=1

4 4 DASS & LI The indicator function arises due to the imposed identifiability constraint 1.3 on ω; it follows that G! is the appropriate normalizing constant for this constrained density which is obtained by integrating out ω and noting that D G ω δ ω is invariant under different permutations of ω. ω 1, ω 2,, ω G are exchangeable. 3 The prior on the mean vector is taken as πµ K, G = G K g! g=1 d K g b=2 k=1 K g k=1 φ 1 µ 1 µ 0, τ 2 I µ 1 1g < µ1 2g < < µ1 K g g φ 1 µ b µ 0, τ The indicator function appears due to the identifiability constraint 1.3 imposed on µ with resulting normalizing constant K g! for each g = 1, 2,, G. 4 The prior distribution of the variances is taken as πσ K, G = K G g g=1 d k=1 b=1 2 IG σ b α0, β where IG denotes the inverse gamma distribution with prior shape and scale parameters α 0 and β 0, respectively. The joint prior distribution on x = G, ω, K, p, µ, σ is thus given by πx = πg, K πω, p G, K πµ G, K πσ G, K 2.5 where the component priors are given by equations The prior on x depends on the hyper-parameters δ p,δ ω,g max,g min,k min,k max, µ 0 τ 2, α 0 and β 0, all of which need to be specified for a given application. This will be done in Sections 5 and 6. The likelihood 1.7 involves several summations within each product term and is simplified by augmenting variables to denote the class labels of the individual observations. Two different class labels are introduced for the two levels of mixtures: The augmented variable W W 1, W 2,, W N denotes the class label of the G sub-populations, that is, W i = g whenever object i, O i, arises from the g-th subpopulation, and Z Z 1, Z 2,, Z N with Z i Z ij, j = 1, 2,, n i, where Z ij = k for 1 k K g if x ij arises from the k-th mixture component φ d µ, σ. We denote the augmented parameter space by the same symbol x as before, that is, x = G, ω, K, p, µ, σ, W, Z. The augmented likelihood is now lg, ω, K, p, µ, σ, W, Z = N with priors on W and Z given by n i G K g i=1 j=1 g=1 k=1 φ d x ij µ, σ IZij=k,Wi=g. 2.6 πw, Z G, K, ω, p = πw G, ω πz G, K, W, p 2.7

5 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 5 where πw G, ω = N G i=1 g=1 ωiw i=g g and πz G, K, W, p = G g=1 i : W i =g ni Kg j=1 k=1 piz ij=k. Based on the augmented likelihood and prior distributions, one can write down the posterior distribution up to a normalizing constant via Bayes Theorem. The posterior has the expression πx data lg, ω, K, p, µ, σ, W, Z πw, Z G, K, ω, p based on 2.5, 2.6 and 2.7. πg, K, ω, p, µ, σ Posterior Inference. The total number of unknown parameters in the hierarchical mixture model depends on the values G and K. Thus, the posterior in 2.8 can be viewed as a probability distribution on the space of all hierarchical mixture models with varying dimensions. To obtain posterior inference for such a space of models, Green 1995 and Green and Richardson 1997 developed the RJMCMC for Bayesian inference. In this paper, we develop a RJMCMC approach to explore the posterior distribution in 2.8 resulting from the hierarchical mixture model specification. We briefly discuss the most general RJMCMC framework here. Let x and y be elements of the model space with possibly differing dimensions. The RJMCMC approach proposes a move, say m, with probability r m. The move m takes x to y via the proposal distribution q m x, y. In order to maintain the time reversibility condition, we require to accept the proposal with probability αx, y = min { πy data 1, πx data r m q m y, x r m q m x, y } ; 3.1 in 3.1, q m y, x represents the probability of moving from y to x based on the reverse move m, and πx data denotes the posterior distribution of x given data. It is crucial that the moves m and m be reversible see Green 1995, meaning that the densities q m x, y and q m y, x have the same support with respect to a dominating measure. In case y represents the higher dimensional model, we can first sample u from a proposal q 0 x, u with possible dependence on x, and then obtain y as a one-to-one function of x, u. In that case, the proposal density q m x, y in 3.1 is expressed as [ ] y q m x, y = q 0 x, u/det 3.2 x, u y where x,u denotes the Jacobian of the transformation from x, u to y, and det represents the absolute value of its determinant. If the triplet x, u, y involves some discrete components, then the Jacobian of the transformation is obtained by the one-to-one map of the continuous parts of y and x, u, which can depend on the values realized by the discrete components. For the inference on hierarchical mixture models, five types of updating steps are considered with reversible pairs of moves, m, m, corresponding to moves in

6 6 DASS & LI spaces of varying dimensions. The steps are: 1 Update G with m, m G-split, G-merge, 2 Update K G, ω, W with m, m K-split, K-merge, 3 Update ω G, K, W, Z, p, µ, σ, 4 Update W, Z G, K, ω, p, µ, σ, and 5 Update p, µ, σ G, K, ω, W, Z. 3.3 The steps 3-5 do not involve jumps in spaces of varying dimensions, and can be carried out based on a regular Gibbs proposal Update G. To discuss the G-split and G-merge moves, we let x and y denote two different states of the model space, that is, x = G, ω, K, p, µ, σ, W, Z and y = G, ω, K, p, µ, σ, W, Z 3.4 where the s in 3.4 denote a possibly different setting of the parameters The G-merge move. The G-merge move changes the current G to G 1 that is, G = G 1 and is carried out based on the following steps: STEP 1: First, two of the G components, say g 1 and g 2 with g 1 < g 2, are selected randomly for merging into a new component g. The first level mixing proportions are merged as ω g = ω g1 + ω g2. STEP 2: The K-components in K corresponding to g 1 and g 2 are, respectively, K g1 and K g2. These are combined to obtain the new K-value, K g, in the following way. Adding K g1 + K g2 = K t, { Kt + 1/2 if K K g = t is odd, and 3.5 K t /2 if K t is even. STEP 3: Next, p g1, µ g1, σ g1 and p g2, µ g2, σ g2 are merged to obtain p g, µ g, σ g as follows. The identifiability conditions of 1.6 holds for g = g 1 and g = g 2, and must be ensured to hold for g = g after the merge step. To achieve this, the K t µ s are arranged in increasing order µ 1 µ 2 µ Kt 1 µ Kt 3.6 with associated probability p j for µ j, for j = 1, 2,, K t. Thus, p j are a rearrangement of the K t probabilities in p g1 and p g2 according to the partial ordering on µ g1 and µ g2 in 3.6. First, the case when K t is even is considered. Adjacent µ values in 3.6 are paired µ 1 µ 2 }{{} µ 3 µ 4 }{{} µ K t 1 µ Kt }{{}, 3.7 and the corresponding g parameters are obtained using the formulas p = p 2k 1 +p 2k 2, µ = p 2k 1µ 2k 1 + p 2k µ 2k p 2k 1 + p 2k, and σ = p 2k 1σ 2k 1 + p 2k σ 2k p 2k 1 + p 2k, 3.8

7 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 7 for k = 1, 2,, K g. STEP 4: To obtain W and Z, objects with W i = g 1 or W i = g 2 are relabeled as W i = g. For these objects, the allocation to the K g components is carried out using a Bayes allocation scheme. The probability that object i is assigned to component k is P Zij = k Wi = g p = φ dx ij µ, σ Kg k=1 p φ d x ij µ, σ. 3.9 for k = 1, 2,, K g. The allocation of all x ij to the K g components is the product of the above probabilities, namely, P mergealloc = n i i : Wi =g j=1 P Z ij = k ij W i = g 3.10 where k ij are the realized values of k when the allocation is done for each observation x ij. When K t is odd, an index, i 0 is selected at random from the set of all odd integers up to K t, namely, {1, 3, 5,, K t }. The triplet p i0, µ i0, σ i0 is not merged with any other indices but the new p i 0 = p i0 /2. The remaining adjacent indices are merged according to STEP 3 above. For the G-merge step, the proposal density, q m x, y, is given by { G 1 q m x, y = 2 P mergealloc if Kt is even, and 1 P mergealloc 2 K t +1 if K t is odd. G 2 This completes the G-merge move The G-split move. The split move is reverse to the merge step above and is carried out in the following steps: STEP 1: A candidate G-component for split, say g, is chosen randomly with probability 1/G. The split components are denoted by g 1 and g 2. The first level mixing probability, ω g, is split into ω g1 and ω g2 by generating a uniform random variable, u 0, in [0, 1] and setting ω g1 = u 0 ω g and ω g2 = 1 u 0 ω g. STEP 2: The value of K g is transformed to K t where K t is either 2K g 1 or 2K g with probability 1/2 each. Once K t is determined, a pair of indices K g1, K g2 is selected randomly from the set of all possible pairs of integers in { K min, K min + 1,, K max } 2 satisfying K g1 + K g2 = K t. If M 0 is the total number of such pairs, then the probability of selecting one such pair is 1/M 0. The selection of K g1 and K g2 determines the number of second level components in the g 1 and g 2 groups. STEP 3: The aim now is to split each component of the triplet p g, µ g, σ g into 2 parts: p g1, µ g1, σ g1 and p g2, µ g2, σ g2 such that both µ g1 and µ g2 satisfy the constraints 1.6 for g = g 1 and g 2. The case of K g1 +K g2 = 2K g is first considered.

8 8 DASS & LI u 1g u 2g u u Kgg 2p 1g 2p 2g 2p 2p Kg g v 1g v 2g v v Kg,g p 1 1g p 2 1g p 1 2g p 2 2g p 1 p 2 p 1 K g g p 2 K g g µ 1g µ 2g µ µ Kgg y 1g ỹ 1g y 2g ỹ 2g y ỹ y Kg g ỹ Kg g σ 1g σ 2g σ σ Kg g z 1g z 1g z 2g z 2g z z z Kg g z Kg g Fig 1. Diagram showing the split of 2p g, µ g and σ g. The partial ordering refers to the ordering of the µ 1 s. The variables u, k = 1, 2,, K g determine how many splits out of two go to component g 1 for each k. The right arrows represents the sequential split for µ g and σ g. A sketch of the split move is best described by the diagram in Figure 1, which introduce the additional variables to be used for performing the split. In Figure 1, 2p g is considered for splitting because the two split components will represent the second level mixing probabilities of g 1 and g 2, the sum of which together equals 2. For each k, the variable u in Figure 1 takes three values, namely, 0, 1 and 2 that respectively determines if the split components of 2p, µ and σ either 1 both go to component g 2, 2 one goes to component g 1 and the other goes to g 2, or 3 both go to g 1. The variables u, k = 1, 2,, K g must satisfy several constraints: 1 K g k=1 u = K g1, 2 u = 1 for any k such that p > 0.5, and 3 k : u =h 2p < 1 for h = 0, 2. Restriction 1 means that the number of components that go to g 1 must be K g1 which is already pre-selected. The need for restriction 2 can be seen as follows: If u = 0 or 2, and p > 0.5, the total probability 2p will be assigned to g 1 or g 2, and the sum of the second level mixing probabilities for that g component will be greater than 1, which is not possible. Restriction 3 is necessary to ensure that second level mixing probabilities for both g 1 and g 2 are non-negative see equation To generate the vector u u 1g, u 2g,, u Kgg, we consider all combinations of u {0, 1, 2} Kg, and reject the ones that do not satisfy the three restrictions. From the total number of remaining admissible combinations, M 1 say, we select a vector u randomly with equal probability 1/M 1. Once u has been generated, a random vector v v, k = 1, 2,, K g is generated to split 2p g see Figure 1. Some notations are in order: Let A 0 = { k : u = 0}, A 1 = { k : u = 1} and A 2 = { k : u = 2}. As in the case of u, a few restrictions also need to be placed on the vector v. To see what these restrictions are, we denote p 1 = 2v p and p 2 = 21 v p 3.12

9 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 9 for k = 1, 2,, K g, to be the split components from 2 p. Note that depending on the value of u = 0, 1, or 2, the split components, p 1 and p2, are either both assigned to component g 2, one to g 1 and the other to g 2, or both to g 1. For the case u = 1, we will assume that p 1 is the split probability that goes to g 1 and p 2 goes to g 2. Note that the mixing probabilities for both components g 1 and g 2 should equal 1. This implies p p = 1 and p p = k : k A 1 k : k A 2 k : k A 1 k : k A 0 for components g 1 and g 2, respectively. The second equation of 3.13 is redundant if the first is assumed since k : k A 1 p 1 + k : k A 2 2 p + k : k A 1 p 2 + k : k A 0 2 p = 2 K g k=1 p = 2. We re-write the first equation as k : k A 1 a k v = where a k = 2p /1 k : k A 2 2p. Equation 3.14 implies that the entries of the vector v are required to satisfy two restrictions: 1 0 v 1 for k = 1, 2,, K g from 3.12, and 2 Equation 3.14 above. In the Appendix, an algorithm is given to generate such a v where the proposal density can be written down in the closed form see 7.2. The next step in the G-split move is to split µ g and σ g. Each component of µ g = µ, k = 1, 2,, K g and σ g = σ, k = 1, 2,, K g in Figure 1 are split sequentially starting from k = 1, then k = 2 and so on until k = K g. At the k-th stage, µ is split into the components y and ỹ where y is d-dimensional vector consisting of the entries y 1, y2,, yd, and ỹ = ỹ 1, ỹ2,, ỹd. Similarly, σ is split into the components z z 1, z2,, zd and z z 1, z2,, zd. The collection of variables { y, k = 1, 2,, K g } and { z, k = 1, 2,, K g } are denoted by y and z, respectively, and represent the additional variables that require to be generated for the split, via the proposal distribution q 0 y, z, say. The remaining variables with s are obtained by solving the vector equations p 1 y + p 2 ỹ p 1 + p2 = µ and p 1 z + p 2 z p 1 + p2 = σ 3.15 componentwise. We describe the split move further to see what properties q 0 y, z should satisfy. While the values of y and ỹ respectively, z and z are candidate values for µ 1 and µ 2 respectively, σ 1 and σ 2, they are still not quite so since µ g1 and µ g2 must satisfy the constraints 1.6. To achieve this, we introduce two functions operating on d-dimensional vectors. The vector-valued min and max functions are defined as follows: For each s = 1, 2,, S, let a s and

10 10 DASS & LI b s denote two d-dimensional vectors given by a s = a 1 s, a 2 s,, a d s and b s = b 1 s, b 2 s,, b d s. We define min a s, b s = a s and max a s, b s = b s 3.16 for each s = 1, 2,, S if a 1 b 1 recall this is by definition a 1 1 b 1 1, and vice versa when b 1 a 1. Thus, the maximum and minimum functions above operate on the indices s 1 with output depending on the index s = 1. In the present case, consider the maximum and minimum functions defined as in 3.16 for each k = 1, 2,, K g. Here, S = 3 with a 1 = y, a 2 = z, and a 3 = p 1, and b 1 = ỹ, b 2 = z, and b 3 = p 2. If y ỹ, then it follows that min y, ỹ = y, max y, ỹ = ỹ, min z, z = z, max z, z = z, and 3.17 min p 1, p2 = p 1, max p 1, p2 = p 2 ; if ỹ y, then the opposite holds true. To ensure that the constraints 1.6 hold, y is generated in a way so that max y k 1g, ỹ k 1g y, ỹ µ k+1g 3.18 in the sequential procedure for k = 1, 2,, K g. In 3.18, maximum function max y 0g, ỹ 0g for k = 1 respectively, µkg +1g for k = K g is defined to be the vector of lower respectively, upper bounds for the means. In the application to fingerprint images in Section 6, each image has size which implies that the componentwise lower and upper bounds are, respectively, 0 and 500. For z and z, we require these variables to satisfy the constraints z 0 and z 0 componentwise since they are candidate values for σ 1 and σ 2. Thus, the generation of y and z, via the proposal distribution q 0 y, z, requires that 3.18, z 0, and z 0 be satisfied. A proposal density that achieves this is discussed in the Appendix. The values of each triplet p g1, µ g1, σ g1 and p g2, µ g2, σ g2 can now be obtained. A sequential procedure is again adopted. The post-split parameters p gh, µ gh and σ gh, h = 1, 2, are initialized to the empty set. Starting from k = 1, the sets are appended as follows: For h = 0, 2, define h 1 = 2 and h 2 = 1 if h = 0, and h 1 = 1 and h 2 = 2 if h = 2. For k = 1, 2,, K g, if k A h, p gh1 = p gh1, min p 1, p2, max p 1, p2, p gh2 = p gh2, µ gh1 = µ gh1, min y, ỹ, max y, ỹ, µ gh2 = µ gh2, 3.19 σ gh1 = σ gh1, min z, z, max z, z, σ gh2 = σ gh2, and if k A 1, p g1 = p g1, p 1, p g2 = p g2, p 2, µ g1 = µ g1, y, µg2 = µg2, ỹ, σg1 = σ g1, z, and σ g2 = σ g2, z.

11 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 11 The above procedure guarantees that the post-split components µ g1 and µ g2 satisfy the constraints 1.6. At this point, we can explicitly determine some of the components of y in equation 3.4; we have G = G + 1, K = K {K g1, K g2 } \ {K g }, p = p {p g1, p g2 } \ {p g }, µ = µ {µ g1, µ g2 } \ {µ g } and σ = σ {σ g1, σ g2 } \ {σ g }. When K g1 +K g2 = 2K g 1, an index i 0 is selected from the set I 0 = { k : 2p < 1 } with probability 1/ I 0. The component with index i 0 is not split, and assigned a value of u i0g of either 0 or 2. For this case, u = u 1g, u 2g,, u Kg 1g, u i0g is chosen from the product space {0, 1, 2} Kg 1 {0, 2}, with M 1 denoting the number of admissible combinations satisfying the three restrictions on u. After selecting a u, we define p i = 2p 0g i 0 g, y i 0 g = ỹ i 0 g = µ i 0 g, and z i 0 g = z i 0 g = σ i 0 g where g is either g 1 or g 2 depending on the selected u. The split procedure above is carried out for the remaining indices k i 0. STEP 4: To complete the G-split proposal, we require to obtain the new first and second level labels, W and Z, in y see 3.4. All objects with labels W i = g are split into either Wi = g 1 or Wi = g 2 with allocation probabilities obtained as follows: Define Q i g h = n i Kgh j=1 k=1 p h φ d x ij µ h, σ h with h = 1, 2 for the i-th object. The W -allocation probabilities for components g 1 and g 2 are given by P W i = g h = Q i g h /Q i g 1 + Q i g for h = 1, 2. Once Wi has been determined, the Zij s are determined from the Bayes allocation probabilities 3.9 which is denoted here by Q ij k, g h for h = 1, 2. It follows that the allocation probability for the G-split move is P splitalloc = h=1,2 i : W i=g ih Q i g ih n i j=1 Q ij k ij, g ih 3.21 where g ih is the realized value of g h for the i-th object, and k ij are the realized values of k for the second level labels Z ij. Dass and Li 2008 give the proposal density for the G-split move as q m x, y = R 0 q 0 v q 0 y, z P splitalloc G M 0 M 1 [ ] y det x, u 3.22 where R 0 = 1/2 or 1/2 I 0 according to whether K g1 + K g2 = 2K g or 2K g 1 is chosen; in 3.22, [ ] y det x, u = 2 2Kg 1 ω g k A 0 A 2 A c 2 p K g k=1 2d 1 + p1 p is the absolute value of the Jacobian of the transformation from x, u y, and A c 1 is the set A 1 excluding the largest element. The explicit expression of 3.23 is derived in the Appendix.

12 12 DASS & LI G-split y, u x G-merge Fig 2. Figure showing the G-split and G-merge proposals as a reversible pair of moves. We conclude the G-split and G-merge sections with a note on establishing reversibility of the two moves. The G-merge proposal move m takes x to y with proposal density given by q m x, y in However, the acceptance probability in 3.1 also requires the proposal density, q m y, x, to move from y to x based on the reverse move m. In order to show that m is precisely the G-split move, we require to show that given x and y, there is a unique u such that x can be obtained from the combination of y, u via the G-split move see Figure 2. We demonstrate this in the next paragraph. For the G-split move, the variables in u is given by u = u 0, K t, u, v, y, z. These variables have the same interpretation as in the G-split move discussed earlier. Now, we check to see if u can be determined from x and y. First, the variable u 0 can be determined from u 0 = ω g1 /ω g. Second, the value of K t = K g1 + K g2. Note that K g alone cannot determine K t since K t is either 2K g or 2K g 1 with probability 1/2 each. However, with information on K g1 and K g2, K t is uniquely determined. Next, to get u, we rearrange the components of µ g1 and µ g2 in the increasing order 3.6. If K t is even, K g K t /2 adjacent pairs of µs are formed, and u is assigned the values 0, 1 or 2 since it is known from which component either g 1 or g 2 the two means in each of the k pairs came from. The case for odd K t can be similarly handled since one of the µ components is not paired, and subsequently, u for that component is either 2 or 0 depending on whether the µ component came from g 1 or g 2. Once u is obtained, v can be determined in the following way: The components of p g1 and p g2 are arranged according to the increasing order of µs. Suppose the k-th pair consists of µ k g µ k g with corresponding probabilities p k g and p k g, where k, k, g and g are some indices of k and g resulting from the ordering. The value of v = p k g /2p where p k g + p k g /2 = p is the merged probability in the G-merge move. Next, we obtain the values of y and z. In the case u = 1, y respectively, ỹ equals to the µ-value that came from component g 1 respectively, g 2 in the pair µ k g, µ k g. In the case when u = 0 or 2, y and ỹ can be determined only up to min y, ỹ and max y, ỹ. Subsequently, the proposal density

13 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 13 q 0 y, z depends only on min y, ỹ and min y, ỹ for these values of u. A similar argument as above can be made for the z and z Update K. We consider the move types K-split type m and K-merge move type m. The update of K is carried out for fixed G by selecting a component g on which K g will be updated to either K g 1 or K g + 1. For the K-merge move, we select two adjacent components for merging where adjacency is determined by the partial ordering 1.6. The merged mixing probability, mean and variance for the new component, k, are given by p k g = p + p k+1g, µ k g = p µ + p k+1g µ k+1g p + p k+1g and σ k g = p σ + p k+1g σ k+1g p + p k+1g The objects with W i = g and Z ij = k or k + 1 are merged into a newly relabelled bin W i = g and Z ij = k. For the K-split move, we first select a component that we want to split, k, which will be split into k 1 and k 2. A uniform random variable u 0 is selected to split p into p k1 g and p k2 g in the following way: p k1 g = u 0 p and p k2 g = 1 u 0 p Next, µ and σ are split by generating the variables y and z as in the case for the G-split move but now for a single k only. As in the G-split move, y and ỹ are required to satisfy a similar constraint of the form µ k 1g y, ỹ µ k+1g, 3.26 so that the post-split µ parameters satisfy the restriction 3.26, and subsequently 1.6. Once y and z are generated, the assignments to µ kh g and σ kh g, h = 1, 2 are done as follows: µ k1g = miny, ỹ, µ k2g = maxy, ỹ, σ k1g = minz, z, σ k2g = maxz, z. Objects with W i = g and observation labels Z ij = k are allocated to component k 1 or k 2 based on the Bayes allocation probabilities given by 3.9 with fixed W i = g Update Other Steps. The update of the other quantities in steps 3-5 of equation 3.3 can be done via regular Gibbs sampler since they do not involve models in spaces of varying dimensions. We give the summary steps here for completion. Update ω G, K, W, Z, p, µ, σ: The conditional posterior distribution of ω given the remaining parameters is given by G πω g=1 ω δ ω+n g 1 g Iω 1 < ω 2 < ω G 3.27

14 14 DASS & LI where N g = N i=1 IW i = g is the number of objects with label W i = g. Equation 3.27 is the order statistic distribution of a Dirichlet with parameters δ ω +N 1, δ ω + N 2,, δ ω + N G and can be easily simulated from. Update W, Z G, K, ω, p, µ, σ: The conditional posterior distribution of W i, Z i is independent of each other. The update of W i and Z i W i based on the conditional posterior distribution is the Bayes allocation scheme of 3.20 and 3.9. Update p, µ, σ G, K, ω, W, Z: The conditional posterior distribution of p = {p g, g = 1, 2,, G} is given by πp G K g g=1 k=1 p δ π+n where N = N ni i=1 j=1 IW i = g, Z ij = k is the number of observations x ij with W i = g and Z ij = k. Thus, each p g is independent Dirichlet with parameters δ p + N 1g, δ p + N 1g,, δ p + N Kgg. The update of µ is carried out based on generating from its conditional posterior distribution πµ. The generation scheme for µ is as follows: µ 1 1g, µ1 2g,, µ1 K g g K g k=1 φ 1 µ 1 ξ1, η1 I{µ 1 1g < µ1 2g < < µ1 independently for each g = 1, 2,, G, and for the remaining components, K g g } 3.29 µ b φ 1µ b ξb, ηb 3.30 independently for each b 2, k = 1, 2,, K g and g = 1, 2,, G; in 3.29 and 3.30, ξ b = N σ b xb τ 2 µ 0 N σ b τ 2, and η b = N σ b τ Equation 3.29 is the distribution of the order statistic from independent normals with different means and variances and can be simulated easily. The variances σ, are updated via σ b 2 2 IG σ b b α, βb, 3.32 independently of each other, where α b = α 0 + N and β b = 1/β 0 + ij x b ij µb 2 /2 1 with ij denoting the sum over all observations with W i = g and Z ij = k.

15 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS g=1 g=2 g= a g=1 g=2 g= b Fig 3. a Density and b scatter plots for objects with univariate and bivariate observables corresponding to d = 1 and d = 2, respectively Update Empty Components. The RJMCMC sampler developed also incorporates the updating of empty components into the chain. This is done with some modification to the earlier updating G and K move types. Empty components can arise naturally in the sampler when allocating the observations into the g or k components in both the G-split and K-split moves. In case of the G-split move, for example, it is possible that no objects are allocated into one of the split g components. Instead of rejecting this proposal altogether, we incorporate an additional variable, E g, that indicates whether the g component is empty; E g = 1 respectively, 0 indicates that a component is non-empty respectively, empty. The introduction of E g incorporates additional steps into the RJMCMC algorithm, namely, E-Add and E-Remove which are reversible to each other. In the E-Remove move, an empty g component is selected for removal. The only change in this case is in the subpopulation parameters ω, since after the removal of ω g, the remaining ω probabilities should sum to 1. We thus have ω g = ω g /1 ω g 3.33 for g g. In the E-Add move type, a uniform random variable u 0 in [0, 1] is generated and the probabilities ω are redistributed to include the empty component g according to ω g = u 0 and ω g = 1 u 0 ω g for g g The proposal distributions for the E-Add and E-Remove move types and the associated Jacobians are given in the Appendix. The E-Add and E-Remove reversible move types for the K components is similar, and therefore, not discussed further. 4. Convergence Diagnostics. The assessment of convergence of the RJM- CMC is carried out based on the methodology of Brooks and Guidici 1998,2000. Brooks and Guidici 1998,2000 suggests running I 2 chains from different starting values and monitoring parameters that maintain the same interpretation across different models. Six quantities for used for monitoring, namely, the overall variance, ˆV, the within chain variance, W c, within model variance W m, within model within chain variance W m W c, the between model variance, B m and the between model

16 DASS & LI 5.5 x x x x 10 4 a b c x 10 4 Fig 4. Convergence diagnostics for d = 1. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 10,000 iterations. within chain variance, B m W c. For each monitoring parameter, the corresponding three plots of ˆV and W c, W m and W m W c, and B m and B m W c against the number of iterations should be close to each other to indicate that the chains have sufficiently mixed. Our choice of the monitoring parameter is the log-likelihood of the hierarchical mixture model see Simulation. Two simulation experiments were carried out for the cases d = 1 and d = 2 with prior parameter specifications given by G min = K min = 2, G max = K max = 5, δ π = δ ω = 1, µ 0 = 7, τ 0 = 20, α 0 = 2.04 and β 0 = 0.5/α 0 1. The population of objects were simulated from G = 3 groups with population proportions ω = 0.2, 0.3, 0.5. The nested K-components were chosen to be K g = 3 for all g = 1, 2, 3. The specification of p, µ and σ are as follows: p g = 0.33, 0.33, 0.34 for all g, µ 1 = 6, 4, 2, µ 2 = 5, 7, 9 and µ 3 = 14, 17, 20. Common variances were assumed: σ g = 0.5, 0.5, 0.5 for g = 1, 2, 3. For the second experiment with d = 2, we took µ 1 = 1 6, 4, 2, µ 2 = 1 5, 7, 9, and µ 3 = 1 14, 17, 20, where 1 = 1, 1. All component variances of σ were taken to be 0.5. The total number of objects sampled from the population were N = 100 with n i the number of observables from the i-th object were iid from a Discrete Uniform distribution on the integers from 20 to 40, both inclusive. The density plot for the 3 components of the hierarchical mixture model in the case of d = 1 as well as the scatter plot for d = 2 based on a sample of observations from the population are given in Figures 3 a and b, respectively. In both experimens, the RJMCMC algorithm is cycled through the 7 updating steps 5 steps in 3.3 as well as 2 steps involving updating empty G and K components. We took the probabilities of selecting various move types to be r m = r m = 0.5 for the moves m, m = G-split,G-merge for G = G min + 1, G min + 2,, G max 1. When G = G max, r m = 0 = 1 r m and r m = 0 = 1 r m for G = G min. We used the same probabilities of selecting the K move types as above. The space of hierarchical mixture models with the above specifications consists of = 1, 360 models and so monitoring convergence of the chain becomes important. A total of I = 3 chains were chosen for monitoring with initial estimates of the hierarchical mixture model obtained using the values of G = 2, 3 and 4; Zhu,

17 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS x x 10 4 x 10 4 a b c Fig 5. Convergence diagnostics for d = 2. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 10,000 iterations. x x x 10 5 a b c Fig 6. Convergence diagnostics for the NIST Fingerprint Database. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 20,000 iterations. Dass and Jain 2007 develops an algorithm that fits a hierarchical mixture model based on an agglomerative clustering procedure on the space of standard mixtures which requires a pre-specified value of G as input, and subsequently, the three choices of G mentioned were used to get the three initial estimates. The RJMCMC algorithm was run for B = 60, 000 and convergence of the chain was monitored see Figures 4 and 5. In both experiments, the RJMCMC appear to have converged after 60, 000 iterations. Highest posterior probabilities were found to be at the true values of the parameters. 6. An Application: Assessing the Individuality of Fingerprints. Fingerprint individuality refers to the study of the extent of uniqueness of fingerprints. It is the primary measure for assessing the uncertainty involved when individuals are identified based on their fingerprints, and has been the highlight of many court cases recently. In the case of Daubert v. Merrell Dow Pharmaceuticals Daubert v. Merrell Dow Pharmaceuticals Inc., 1993, the U.S. Supreme Court ruled that in order for an expert forensic testimony to be allowed in courts, it had to be subject to five main criteria of scientific validation, that is, whether i the particular technique or methodology has been subject to statistical hypothesis testing, ii its error rates has been established, iii standards controlling the technique s operation exist and have been maintained, iv it has been peer reviewed, and v

18 DASS & LI Fig 7. Posterior distribution of P RC based on 1, 000 realizations of the RJMCMC after burn-in. Fig 8. Illustrating impostor minutiae matching taken from Pankanti et al A total of m = 64 and n = 65 minutiae were detected in left and right image, respectively, and 25 correspondences i.e., matches were found. it has a general widespread acceptance see Pankanti, Prabhakar and Jain 2002, and Zhu et al Following Daubert, forensic evidence based on fingerprints was first challenged in the 1999 case of U.S. v. Byron C. Mitchell, stating that the fundamental premise for asserting the uniqueness of fingerprints had not been objectively tested and its potential matching error rates were unknown. Subsequently, fingerprint based identification has been challenged in more than 20 court cases in the United States. A quantitative measure of fingerprint individuality is given by the probability of a random correspondence PRC, which is the probability that a random pair of fingerprints in the population will match with each other. Mathematically, the PRC is expressed as P RC w m, n = P S w m, n, 6.1 where S denotes the number of feature matches with distribution based on all random pairs of fingerprints from the target population, w is the observed number of matches, and m and n, respectively, are the number of features in the two fingerprint images. Small respectively, large values of the PRC indicate low respectively, high uncertainty associated with the identification decision. Here, we focus on a particular type of feature match based on minutiae. Minutiae are locations i.e., x j R 2 on the fingerprint image which correspond to

19 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 19 ridge anomalies for example, ridge bifurcations and ridge endings and are used by forensic experts to declare that two fingerprints belong to the same individual if sufficiently large number of minutiae are found to be common to both prints. Figure 8 shows an example of such a random match between two fingerprints of different individuals also called an impostor match. The number of matches is determined by the number of minutiae in the right panel that falls within a square of area 4r 2 0 centered at each minutiae in the left panel, where r 0 is a small number relative to the size of the fingerprint image. The number of matching minutiae in Figure 8 is 25, but it is not known how likely such a match occurs between a pair of impostor fingerprints in the population of individuals. The reliability of the estimated PRC depends on how well elicited statistical models fits the distribution of minutiae in the population. Thus, candidate statistical models have to meet two important requirements: i flexibility, that is, the model can represent minutiae distributions over the entire population, and ii associated measures of fingerprint individuality can be easily obtained from these models. Zhu et al demonstrated that a mixture of multivariate normals with independent components fits the distribution of minutiae well for each fingerprint. Furthermore, when m and n are large, the distribution of S in 6.1 can be approximated by a Poisson distribution with mean expected number of matches λq 1, q 2, m, n = m n pq 1, q where q h represents the normal mixture see 1.2 fitted to fingerprint h for h = 1, 2, and pq 1, q 2 is the probability of a match between a pair of random minutiae, one generated from q 1 and the other from q 2. The analytical expression for pq 1, q 2 is pq 1, q 2 = 4 r 2 0 K 1 K 2 k=1 k =1 b=1 2 φ µ b k1 µb k2, σ b k1 + σ b k2, }{{}}{{} µ σ where φ 1 µ, σ 2 is the normal density with mean µ and variance σ 2. One drawback of Zhu et al 2007 is that no statistical model is elicited on the minutiae for a population of fingerprints. The hierarchical mixture model of 1.1 is such a population model on minutiae since x j R 2 satisfying both requirements of i flexibility and ii computational ease mentioned earlier. For a fingerprint pair coming from the subpopulation g 1 and g 2, we have q 1 = q g1 and q 2 = q g2 in 6.2. Hence, it follows that the mean PRC corresponding to w observed matches in the population is given by P RCw m, n = G G g 1 =1 g 2 =1 ω g1 ω g2 P S w λh g1, h g2, m, n, 6.4 where S follows a Poisson distribution with mean λh g1, h g2, m, n. The RJMCMC algorithm developed in the previous section can now be used to obtain the posterior distribution of P RC. As an illustration, we considered 100 fingerprint images from the NIST Special Database 4 as a sample from a population of fingerprints. A total

20 20 DASS & LI of I = 3 chains were run with starting values given by the algorithm of Zhu et al for the cases G = 1, 2 and 3. Figure 6 gives the diagnostics plots of the RJMCMC sampler which establish convergence after a burn-in of B = 100, 000 runs. The posterior distribution of P RC corresponding to m = 64, n = 65, w = 25 and r 0 = 15 pixels based on 1, 000 realizations of the RJMCMC after the burn-in period is given in Figure 7. The posterior mean and standard deviation in Figure 7 is and , respectively, and the 95% HPD interval is [0.63, 0.735], approximately. We conclude that if a fingerprint pair was chosen from this population with m = 64, n = 65 and an observed number of matches w = 25, there is high uncertainty in making a positive identification. Our analysis actually indicates that the fingerprints in Figure 8 represent a typical impostor pair. The 95% HPD set suggests that the PRC can be as high as 0.735, that is, about 3 in every 4 impostor pairs can yield 25 or more matches. 7. Summary and Future Work. We have developed Bayesian inference methodology for the inference on hierarchical mixture models with application to the study of fingerprint individuality. Our future work will be to derive hierarchical mixture models on the extended feature space consisting of minutiae and other fingerprint features. In this paper, we only considered a two level hierarchy. The US- VISIT program now requires individuals to submit prints from all 10 fingers. This is the case of a 3-level hierarchical mixture model; in the first top level, individuals form clusters based on similar characteristics of their 10 fingers, and the distribution of features in each finger is modelled using standard mixtures. Hierarchical mixture models have potential use in other areas as well, including the clustering of soil samples objects based on soil characteristics which can be modelled by a mixture or a transformation of mixtures. Acknowledgment. The authors wish to acknowledge the support of NSF DMS grant no while conducting this research. References. [1] Brooks, S. P. and Giudici, P Convergence assessment for Reversible Jump MCMC Simulations. Bayesian Statistics 6, Oxford University Press. [2] Brooks, S. P. and Giudici, P Markov Chain Monte Carlo Convergence Assessment vis Two-Way Analysis of Variance. Journal of Computational and Graphical Statistics, 9, 2, [3] Zhu, Y., Dass, S. C., and Jain, A. K Statistical Models for Assessing the Individuality of Fingerprints. IEEE Transactions on Information Forensics and Security, 2, 3, [4] Escobar, M. and West, M Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90, 430, [5] Roeder, K. and Wasserman, L Practical Bayesian Density Estimation Using Mixtures of Normals. Journal of the American Statistical Association, 92, 439, [6] Woo, M. J. and Sriram, T. N Robust estimation of mixture complexity for count data. Computational Statistics & Data Analysis, 51, 9, [7] Ishwaran, H., James, L.F., and Sun, J Bayesian model selection in finite mixtures by marginal density decompositions. Journal of the American Statistical Association, 96, 456, [8] Green, P. and Richardson, S On the Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society Series B, 59, 4,

21 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 21 [9] Green, P Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 4, [10] NIST Special Database 4. NIST: 8-Bit Gray Scale Images of Fingerprint Image Groups FIGS. Online: [11] Pankanti, S. and Prabhakar, S. and Jain, A. K On the Individuality of Fingerprints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 8, [12] Daubert v. Merrel Dow Pharmaceuticals Inc U.S. 579, 113 S. Ct. 2786, 125 L.Ed.2d 469. Appendix Generating v. The generation of v is discussed here. Let A 1 = T denote the cardinality of the set A 1 and let A c 1 be all k indices in A 1 excluding the largest one. Without loss of generality, we can relabel the k ordered indices in A 1 as 1, 2,, T. It follows that the indices of A c 1 are 1, 2,, T 1. The restriction k : k A 1 a k v k = 1 can be rewritten as v T = 1 T 1 k=1 a kv k /a T. Since 0 v T 1, it follows that the T 1 indices in A c 1 must satisfy the inequality 1 a T T 1 k=1 a k v k 1. Also note that 0 T 1 k=1 a k v k T 1 k=1 a k since each 0 v k 1. Combining these inequalities, we get the following restriction on the T 1 free parameters of v: max0, 1 a T T 1 k=1 T 1 a k v k min1, k=1 a k 7.1 Let C = { v 1, v 2,, v T 1 : Equation 7.1 is satisfied and 0 v k 1 }. It follows that C is a convex polyhedral in the unit hypercube [0, 1] T 1. The challenge now is to generate v 1, v 2,, v T 1 from C and be able to write down the proposal density q 0 v 1, v 2,, v T 1 in a closed form. To do this, we determine the set of all extreme points of C. There are a total of T inequality constraints on v 1, v 2,, v T 1 : 1 T 1 constraints of the form 0 v k 1 and 2 one constraint of the form A T 1 k=1 a k v k B given by equation 7.1. Extreme points of a convex polyhedral in T 1 dimensions are formed by T 1 active equations. We can select T 1 candidate active equations from the T constraints above, solve for v 1, v 2,, v T 1 based on the T 1 equations and then check based on the remaining equation whether the solution obtained is admissible. For example, we may select all v k = 0 for k = 1, 2,, T 1 from 1. Plugging in 0, 0,, 0 in the remaining equation 2, we get T 1 k=1 a k v k = 0. So, if a T < 1 respectively, a T 1, we get 0 < A respectively, 0 A, giving an inadmissible respectively, admissible solution. Another candidate extreme point is formed by selecting the first T 2 v k s to be zero and solving v T 1 from the equation T 1 k=1 a k v k = B. The solution here is 0, 0, 0,, 0, B/a T 1 and will be admissible if 0 B/a T 1 1 since the inequality that was not used is 0 v T 1 1. It is easy to see that there are T 1 2 T 1 candidate extreme points to be checked for admissibility; we have to select T 1 constraints from T first which can be done in T 1 ways. Next, for each of the T 1 selected constraints, we can select either the lower or upper bounds of the constraints for candidate active equations.

On the Individuality of Fingerprints: Models and Methods

On the Individuality of Fingerprints: Models and Methods On the Individuality of Fingerprints: Models and Methods Dass, S., Pankanti, S., Prabhakar, S., and Zhu, Y. Abstract Fingerprint individuality is the study of the extent of uniqueness of fingerprints and

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Graphical Models for Query-driven Analysis of Multimodal Data

Graphical Models for Query-driven Analysis of Multimodal Data Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Quantifying Fingerprint Evidence using Bayesian Alignment

Quantifying Fingerprint Evidence using Bayesian Alignment Quantifying Fingerprint Evidence using Bayesian Alignment Peter Forbes Joint work with Steffen Lauritzen and Jesper Møller Department of Statistics University of Oxford UCL CSML Lunch Talk 14 February

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics A Bayesian sparse finite mixture model for clustering data from a heterogeneous population Erlandson F. Saraiva a, Adriano K. Suzuki b and

More information

Bayesian finite mixtures with an unknown number of. components: the allocation sampler

Bayesian finite mixtures with an unknown number of. components: the allocation sampler Bayesian finite mixtures with an unknown number of components: the allocation sampler Agostino Nobile and Alastair Fearnside University of Glasgow, UK 2 nd June 2005 Abstract A new Markov chain Monte Carlo

More information

Fingerprint Individuality

Fingerprint Individuality Fingerprint Individuality On the Individuality of Fingerprints, Sharat Pankanti, Anil Jain and Salil Prabhakar, IEEE Transactions on PAMI, 2002 US DOJ, Office of the Inspector General, A Review of the

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection Silvia Pandolfi Francesco Bartolucci Nial Friel University of Perugia, IT University of Perugia, IT

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

Stochastic Realization of Binary Exchangeable Processes

Stochastic Realization of Binary Exchangeable Processes Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,

More information

On the Evidential Value of Fingerprints

On the Evidential Value of Fingerprints On the Evidential Value of Fingerprints Heeseung Choi, Abhishek Nagar and Anil K. Jain Dept. of Computer Science and Engineering Michigan State University East Lansing, MI, U.S.A. {hschoi,nagarabh,jain}@cse.msu.edu

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Hyperparameter estimation in Dirichlet process mixture models

Hyperparameter estimation in Dirichlet process mixture models Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and

More information

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis

ARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models

More information

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 18-16th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Trans-dimensional Markov chain Monte Carlo. Bayesian model for autoregressions.

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017 Chalmers April 6, 2017 Bayesian philosophy Bayesian philosophy Bayesian statistics versus classical statistics: War or co-existence? Classical statistics: Models have variables and parameters; these are

More information

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Assessing Regime Uncertainty Through Reversible Jump McMC

Assessing Regime Uncertainty Through Reversible Jump McMC Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Overview. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information