A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS. By Sarat C. Dass and Mingfei Li Michigan State University
|
|
- Scot Frank Atkinson
- 5 years ago
- Views:
Transcription
1 Technical Report RM669, Department of Statistics & Probability, Michigan State University A BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURES WITH APPLICATION TO CLUSTERING FINGERPRINTS By Sarat C. Dass and Mingfei Li Michigan State University Hierarchical mixture models arise naturally for clustering a heterogeneous population of objects where observations made on each object follow a standard mixture density. Hierarchical mixtures utilize complementary aspects of mixtures at different levels of the hierarchy. At the first top level, the mixture is used to perform clustering of the objects, while at the second level, nested mixture models are used as flexible representations of distributions of observables from each object. Inference for hierarchical mixtures is more challenging since the number of unknown mixture components arise in both the first and second levels of the hierarchy. In this paper, a Bayesian approach based on Reversible Jump Markov Chain Monte Carlo methodology is developed for the inference of all unknown parameters of hierarchical mixtures. Our methodology is then applied to the clustering of fingerprint images and used to assess the variability of quantities which are functions of the second level mixtures. 1. Hierarchical Mixture Models. Consider an object, O, selected at random from a heterogenous population, P, with G unknown clusters. Let X x 1, x 2, x 3, denote the observables on O where x j x 1 j, x 2 j,, x d j is a d-variate random vector in R d. A hierarchical mixture model for the distribution of O in the population is qx = G n ω g g=1 j=1 q g x j 1.1 where x = x 1, x 2,, x n are the n observations made on O, ω g, g = 1, 2,, G are the G cluster proportions with ω g > 0 and G g=1 ω g = 1, q g is the mixture density for the g-th cluster given by q g x = K g k=1 p f x θ, 1.2 with f denoting a density with respect to the Lebesgue measure on R d, p denoting the mixing probabilities satisfying: 1 p > 0 and 2 K g k=1 p = 1, and θ denoting the set of all unknown parameters in f. Identifiability of the hierarchical mixture model of 1.1 is achieved by imposing the constraints ω 1 < ω 2 < < ω G and θ 1g θ 2g θ Kg g, 1.3 Research supported under NSF DMS grant no AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35 Keywords and phrases: Model-based clustering, Gaussian mixtures, Bayesian inference, Reversible Jump Markov Chain Monte Carlo methods, Fingerprint individuality 1
2 2 DASS & LI for each g = 1, 2,, G, where is a partial ordering to be defined later. The set of all unknown parameters in the hierarchical mixture model 1.1 is denoted by x = G, ω, K, p, θ where ω = ω 1, ω 2,, ω G, K = K 1, K 2,, K g, p = p, k = 1, 2,, K g, g = 1, 2,, G, and θ = θ, k = 1, 2,, K g, g = 1, 2,, G. Hierarchical mixture models of 1.1 arise naturally when the goal is to cluster a population of objects, where observables from each object follow a standard mixture distribution. At the first top, or G level, the mixture is used to perform clustering of the objects, while at the second or K g level, nested mixture models nested within each g = 1, 2,, G specification are used as flexible representations of distributions of observables. The unknown number of mixture components, or mixture complexity, arise at both levels of the hierarchy, and is, therefore, more challenging to estimate compared to standard mixtures. Estimating mixture complexity has been the focus of intense research for many years resulting in various estimation methodologies in a broad application domain. Non-parametric methods were developed in Escobar and West 1995 and Roeder and Wasserman 1997, whereas Ishwaran et al and Woo and Sriram 2006 developed methodology for the robust estimation of mixture complexity for count data. In a Bayesian framework, the work of Green and Richardson 1997 demonstrated that both issues of estimating mixture parameters can be unified if viewed as a problem of model selection. With this view in mind, Green and Richardson 1997 developed a Reversible Jump Markov Chain Monte Carlo RJMCMC approach for the estimating mixture complexity by exploring the space of models of varying dimensions. In this paper, we develop a Bayesian framework based on RJMCMC for the inference on hierarchical mixture models. One advantage of our approach is that we are able to assess the variability of the cluster estimate, which in turn, can be used to ascertain the variability of quantities that are functions of the clusters. We present one such example based on clustering a sample of fingerprint images. One quantity that is of special interest is the probability of a random correspondence PRC which measures to what extent two randomly chosen fingerprints from a population match with each other. The RJMCMC methodology developed in this paper provides an estimate of its mean and variance. In the subsequent text, we assume each f is multivariate normal with mean vector µ µ 1, µ2,, µd R d and covariance matrix Rd R d. Our analysis on fingerprint images in Section 6 revealed that it is adequate to consider diagonal covariance matrices of the form = diag σ 1 σ b 2 is the variance of the b-th component of the multivariate normal dis- where tribution. Thus, we have 2, σ 2 2,, f x θ = φ d x µ, σ = d b=1 φ 1 x b µ b, σ b 2 σ d where φ 1 µ, σ 2 denotes the density of the univariate normal distribution with
3 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 3 mean µ and variance σ 2, and σ 2 σ 1, σ 2 2,, σ d 2 is the d- variate vector of the variances. The second identifiability condition of 1.3 is reexpressed in terms of the first component of the mean vector as µ 1 1g < µ1 2g < < µ1 K g g. 1.5 In the subsequent text, the identifiability condition 1.5 based on the first components of µ for k = 1, 2,, K g will be re-written using the symbol as µ 1g µ 2g µ Kgg 1.6 for each g = 1, 2,, G. For N independent objects selected randomly from the population, say, O i x ij, j = 1, 2,, n i, i = 1, 2,, N, it follows that the joint distribution of the observations are N N qx i = i=1 G i=1 g=1 n i K g ω g j=1 k=1 p φ d xij µ, σ. 1.7 Two other notations are introduced here: µ and σ will respectively denote the collection of all {µ, k = 1, 2,, K g, g = 1, 2,, G} and {σ, k = 1, 2,, K g, g = 1, 2,, G} vectors. Our goal is to make inference about the unknown parameters x = G, ω, K, p, µ, σ based on the joint distribution in A Bayesian Framework for Inference. For the subsequent text, we introduce some additional notations. The symbol IS denotes the indicator function of the set S, that is IS = 1 if S is true, and 0, otherwise. The notation A, B, C, D, denotes the distribution of random variables A, B, conditioned on C, D, with πa, B, C, D, denoting the specific form of the conditional distribution. The notation πa, B, denotes the distribution of A, B, given the rest of the parameters. We specify a joint prior distribution on x as follows: 1 G and K: We assume πg, K = πg πk G = π 0 G G π 0 K g 2.1 where π 0 is the discrete uniform distribution between G min and G max respectively, K min to K max, both inclusive, for G respectively, K g. 2 The first and second level mixing proportions: We assume g=1 πω, p G, K = G! D G ω δ ω Iω 1 < ω 2 < < ω G G D Kg p g δ p, 2.2 where D H δ denotes the H-dimensional Dirichlet density with the H-component baseline measure δ, δ,, δ, where δ is a pre-specified constant, and p g p 1g, p 2g,, p Kg,g. g=1
4 4 DASS & LI The indicator function arises due to the imposed identifiability constraint 1.3 on ω; it follows that G! is the appropriate normalizing constant for this constrained density which is obtained by integrating out ω and noting that D G ω δ ω is invariant under different permutations of ω. ω 1, ω 2,, ω G are exchangeable. 3 The prior on the mean vector is taken as πµ K, G = G K g! g=1 d K g b=2 k=1 K g k=1 φ 1 µ 1 µ 0, τ 2 I µ 1 1g < µ1 2g < < µ1 K g g φ 1 µ b µ 0, τ The indicator function appears due to the identifiability constraint 1.3 imposed on µ with resulting normalizing constant K g! for each g = 1, 2,, G. 4 The prior distribution of the variances is taken as πσ K, G = K G g g=1 d k=1 b=1 2 IG σ b α0, β where IG denotes the inverse gamma distribution with prior shape and scale parameters α 0 and β 0, respectively. The joint prior distribution on x = G, ω, K, p, µ, σ is thus given by πx = πg, K πω, p G, K πµ G, K πσ G, K 2.5 where the component priors are given by equations The prior on x depends on the hyper-parameters δ p,δ ω,g max,g min,k min,k max, µ 0 τ 2, α 0 and β 0, all of which need to be specified for a given application. This will be done in Sections 5 and 6. The likelihood 1.7 involves several summations within each product term and is simplified by augmenting variables to denote the class labels of the individual observations. Two different class labels are introduced for the two levels of mixtures: The augmented variable W W 1, W 2,, W N denotes the class label of the G sub-populations, that is, W i = g whenever object i, O i, arises from the g-th subpopulation, and Z Z 1, Z 2,, Z N with Z i Z ij, j = 1, 2,, n i, where Z ij = k for 1 k K g if x ij arises from the k-th mixture component φ d µ, σ. We denote the augmented parameter space by the same symbol x as before, that is, x = G, ω, K, p, µ, σ, W, Z. The augmented likelihood is now lg, ω, K, p, µ, σ, W, Z = N with priors on W and Z given by n i G K g i=1 j=1 g=1 k=1 φ d x ij µ, σ IZij=k,Wi=g. 2.6 πw, Z G, K, ω, p = πw G, ω πz G, K, W, p 2.7
5 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 5 where πw G, ω = N G i=1 g=1 ωiw i=g g and πz G, K, W, p = G g=1 i : W i =g ni Kg j=1 k=1 piz ij=k. Based on the augmented likelihood and prior distributions, one can write down the posterior distribution up to a normalizing constant via Bayes Theorem. The posterior has the expression πx data lg, ω, K, p, µ, σ, W, Z πw, Z G, K, ω, p based on 2.5, 2.6 and 2.7. πg, K, ω, p, µ, σ Posterior Inference. The total number of unknown parameters in the hierarchical mixture model depends on the values G and K. Thus, the posterior in 2.8 can be viewed as a probability distribution on the space of all hierarchical mixture models with varying dimensions. To obtain posterior inference for such a space of models, Green 1995 and Green and Richardson 1997 developed the RJMCMC for Bayesian inference. In this paper, we develop a RJMCMC approach to explore the posterior distribution in 2.8 resulting from the hierarchical mixture model specification. We briefly discuss the most general RJMCMC framework here. Let x and y be elements of the model space with possibly differing dimensions. The RJMCMC approach proposes a move, say m, with probability r m. The move m takes x to y via the proposal distribution q m x, y. In order to maintain the time reversibility condition, we require to accept the proposal with probability αx, y = min { πy data 1, πx data r m q m y, x r m q m x, y } ; 3.1 in 3.1, q m y, x represents the probability of moving from y to x based on the reverse move m, and πx data denotes the posterior distribution of x given data. It is crucial that the moves m and m be reversible see Green 1995, meaning that the densities q m x, y and q m y, x have the same support with respect to a dominating measure. In case y represents the higher dimensional model, we can first sample u from a proposal q 0 x, u with possible dependence on x, and then obtain y as a one-to-one function of x, u. In that case, the proposal density q m x, y in 3.1 is expressed as [ ] y q m x, y = q 0 x, u/det 3.2 x, u y where x,u denotes the Jacobian of the transformation from x, u to y, and det represents the absolute value of its determinant. If the triplet x, u, y involves some discrete components, then the Jacobian of the transformation is obtained by the one-to-one map of the continuous parts of y and x, u, which can depend on the values realized by the discrete components. For the inference on hierarchical mixture models, five types of updating steps are considered with reversible pairs of moves, m, m, corresponding to moves in
6 6 DASS & LI spaces of varying dimensions. The steps are: 1 Update G with m, m G-split, G-merge, 2 Update K G, ω, W with m, m K-split, K-merge, 3 Update ω G, K, W, Z, p, µ, σ, 4 Update W, Z G, K, ω, p, µ, σ, and 5 Update p, µ, σ G, K, ω, W, Z. 3.3 The steps 3-5 do not involve jumps in spaces of varying dimensions, and can be carried out based on a regular Gibbs proposal Update G. To discuss the G-split and G-merge moves, we let x and y denote two different states of the model space, that is, x = G, ω, K, p, µ, σ, W, Z and y = G, ω, K, p, µ, σ, W, Z 3.4 where the s in 3.4 denote a possibly different setting of the parameters The G-merge move. The G-merge move changes the current G to G 1 that is, G = G 1 and is carried out based on the following steps: STEP 1: First, two of the G components, say g 1 and g 2 with g 1 < g 2, are selected randomly for merging into a new component g. The first level mixing proportions are merged as ω g = ω g1 + ω g2. STEP 2: The K-components in K corresponding to g 1 and g 2 are, respectively, K g1 and K g2. These are combined to obtain the new K-value, K g, in the following way. Adding K g1 + K g2 = K t, { Kt + 1/2 if K K g = t is odd, and 3.5 K t /2 if K t is even. STEP 3: Next, p g1, µ g1, σ g1 and p g2, µ g2, σ g2 are merged to obtain p g, µ g, σ g as follows. The identifiability conditions of 1.6 holds for g = g 1 and g = g 2, and must be ensured to hold for g = g after the merge step. To achieve this, the K t µ s are arranged in increasing order µ 1 µ 2 µ Kt 1 µ Kt 3.6 with associated probability p j for µ j, for j = 1, 2,, K t. Thus, p j are a rearrangement of the K t probabilities in p g1 and p g2 according to the partial ordering on µ g1 and µ g2 in 3.6. First, the case when K t is even is considered. Adjacent µ values in 3.6 are paired µ 1 µ 2 }{{} µ 3 µ 4 }{{} µ K t 1 µ Kt }{{}, 3.7 and the corresponding g parameters are obtained using the formulas p = p 2k 1 +p 2k 2, µ = p 2k 1µ 2k 1 + p 2k µ 2k p 2k 1 + p 2k, and σ = p 2k 1σ 2k 1 + p 2k σ 2k p 2k 1 + p 2k, 3.8
7 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 7 for k = 1, 2,, K g. STEP 4: To obtain W and Z, objects with W i = g 1 or W i = g 2 are relabeled as W i = g. For these objects, the allocation to the K g components is carried out using a Bayes allocation scheme. The probability that object i is assigned to component k is P Zij = k Wi = g p = φ dx ij µ, σ Kg k=1 p φ d x ij µ, σ. 3.9 for k = 1, 2,, K g. The allocation of all x ij to the K g components is the product of the above probabilities, namely, P mergealloc = n i i : Wi =g j=1 P Z ij = k ij W i = g 3.10 where k ij are the realized values of k when the allocation is done for each observation x ij. When K t is odd, an index, i 0 is selected at random from the set of all odd integers up to K t, namely, {1, 3, 5,, K t }. The triplet p i0, µ i0, σ i0 is not merged with any other indices but the new p i 0 = p i0 /2. The remaining adjacent indices are merged according to STEP 3 above. For the G-merge step, the proposal density, q m x, y, is given by { G 1 q m x, y = 2 P mergealloc if Kt is even, and 1 P mergealloc 2 K t +1 if K t is odd. G 2 This completes the G-merge move The G-split move. The split move is reverse to the merge step above and is carried out in the following steps: STEP 1: A candidate G-component for split, say g, is chosen randomly with probability 1/G. The split components are denoted by g 1 and g 2. The first level mixing probability, ω g, is split into ω g1 and ω g2 by generating a uniform random variable, u 0, in [0, 1] and setting ω g1 = u 0 ω g and ω g2 = 1 u 0 ω g. STEP 2: The value of K g is transformed to K t where K t is either 2K g 1 or 2K g with probability 1/2 each. Once K t is determined, a pair of indices K g1, K g2 is selected randomly from the set of all possible pairs of integers in { K min, K min + 1,, K max } 2 satisfying K g1 + K g2 = K t. If M 0 is the total number of such pairs, then the probability of selecting one such pair is 1/M 0. The selection of K g1 and K g2 determines the number of second level components in the g 1 and g 2 groups. STEP 3: The aim now is to split each component of the triplet p g, µ g, σ g into 2 parts: p g1, µ g1, σ g1 and p g2, µ g2, σ g2 such that both µ g1 and µ g2 satisfy the constraints 1.6 for g = g 1 and g 2. The case of K g1 +K g2 = 2K g is first considered.
8 8 DASS & LI u 1g u 2g u u Kgg 2p 1g 2p 2g 2p 2p Kg g v 1g v 2g v v Kg,g p 1 1g p 2 1g p 1 2g p 2 2g p 1 p 2 p 1 K g g p 2 K g g µ 1g µ 2g µ µ Kgg y 1g ỹ 1g y 2g ỹ 2g y ỹ y Kg g ỹ Kg g σ 1g σ 2g σ σ Kg g z 1g z 1g z 2g z 2g z z z Kg g z Kg g Fig 1. Diagram showing the split of 2p g, µ g and σ g. The partial ordering refers to the ordering of the µ 1 s. The variables u, k = 1, 2,, K g determine how many splits out of two go to component g 1 for each k. The right arrows represents the sequential split for µ g and σ g. A sketch of the split move is best described by the diagram in Figure 1, which introduce the additional variables to be used for performing the split. In Figure 1, 2p g is considered for splitting because the two split components will represent the second level mixing probabilities of g 1 and g 2, the sum of which together equals 2. For each k, the variable u in Figure 1 takes three values, namely, 0, 1 and 2 that respectively determines if the split components of 2p, µ and σ either 1 both go to component g 2, 2 one goes to component g 1 and the other goes to g 2, or 3 both go to g 1. The variables u, k = 1, 2,, K g must satisfy several constraints: 1 K g k=1 u = K g1, 2 u = 1 for any k such that p > 0.5, and 3 k : u =h 2p < 1 for h = 0, 2. Restriction 1 means that the number of components that go to g 1 must be K g1 which is already pre-selected. The need for restriction 2 can be seen as follows: If u = 0 or 2, and p > 0.5, the total probability 2p will be assigned to g 1 or g 2, and the sum of the second level mixing probabilities for that g component will be greater than 1, which is not possible. Restriction 3 is necessary to ensure that second level mixing probabilities for both g 1 and g 2 are non-negative see equation To generate the vector u u 1g, u 2g,, u Kgg, we consider all combinations of u {0, 1, 2} Kg, and reject the ones that do not satisfy the three restrictions. From the total number of remaining admissible combinations, M 1 say, we select a vector u randomly with equal probability 1/M 1. Once u has been generated, a random vector v v, k = 1, 2,, K g is generated to split 2p g see Figure 1. Some notations are in order: Let A 0 = { k : u = 0}, A 1 = { k : u = 1} and A 2 = { k : u = 2}. As in the case of u, a few restrictions also need to be placed on the vector v. To see what these restrictions are, we denote p 1 = 2v p and p 2 = 21 v p 3.12
9 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 9 for k = 1, 2,, K g, to be the split components from 2 p. Note that depending on the value of u = 0, 1, or 2, the split components, p 1 and p2, are either both assigned to component g 2, one to g 1 and the other to g 2, or both to g 1. For the case u = 1, we will assume that p 1 is the split probability that goes to g 1 and p 2 goes to g 2. Note that the mixing probabilities for both components g 1 and g 2 should equal 1. This implies p p = 1 and p p = k : k A 1 k : k A 2 k : k A 1 k : k A 0 for components g 1 and g 2, respectively. The second equation of 3.13 is redundant if the first is assumed since k : k A 1 p 1 + k : k A 2 2 p + k : k A 1 p 2 + k : k A 0 2 p = 2 K g k=1 p = 2. We re-write the first equation as k : k A 1 a k v = where a k = 2p /1 k : k A 2 2p. Equation 3.14 implies that the entries of the vector v are required to satisfy two restrictions: 1 0 v 1 for k = 1, 2,, K g from 3.12, and 2 Equation 3.14 above. In the Appendix, an algorithm is given to generate such a v where the proposal density can be written down in the closed form see 7.2. The next step in the G-split move is to split µ g and σ g. Each component of µ g = µ, k = 1, 2,, K g and σ g = σ, k = 1, 2,, K g in Figure 1 are split sequentially starting from k = 1, then k = 2 and so on until k = K g. At the k-th stage, µ is split into the components y and ỹ where y is d-dimensional vector consisting of the entries y 1, y2,, yd, and ỹ = ỹ 1, ỹ2,, ỹd. Similarly, σ is split into the components z z 1, z2,, zd and z z 1, z2,, zd. The collection of variables { y, k = 1, 2,, K g } and { z, k = 1, 2,, K g } are denoted by y and z, respectively, and represent the additional variables that require to be generated for the split, via the proposal distribution q 0 y, z, say. The remaining variables with s are obtained by solving the vector equations p 1 y + p 2 ỹ p 1 + p2 = µ and p 1 z + p 2 z p 1 + p2 = σ 3.15 componentwise. We describe the split move further to see what properties q 0 y, z should satisfy. While the values of y and ỹ respectively, z and z are candidate values for µ 1 and µ 2 respectively, σ 1 and σ 2, they are still not quite so since µ g1 and µ g2 must satisfy the constraints 1.6. To achieve this, we introduce two functions operating on d-dimensional vectors. The vector-valued min and max functions are defined as follows: For each s = 1, 2,, S, let a s and
10 10 DASS & LI b s denote two d-dimensional vectors given by a s = a 1 s, a 2 s,, a d s and b s = b 1 s, b 2 s,, b d s. We define min a s, b s = a s and max a s, b s = b s 3.16 for each s = 1, 2,, S if a 1 b 1 recall this is by definition a 1 1 b 1 1, and vice versa when b 1 a 1. Thus, the maximum and minimum functions above operate on the indices s 1 with output depending on the index s = 1. In the present case, consider the maximum and minimum functions defined as in 3.16 for each k = 1, 2,, K g. Here, S = 3 with a 1 = y, a 2 = z, and a 3 = p 1, and b 1 = ỹ, b 2 = z, and b 3 = p 2. If y ỹ, then it follows that min y, ỹ = y, max y, ỹ = ỹ, min z, z = z, max z, z = z, and 3.17 min p 1, p2 = p 1, max p 1, p2 = p 2 ; if ỹ y, then the opposite holds true. To ensure that the constraints 1.6 hold, y is generated in a way so that max y k 1g, ỹ k 1g y, ỹ µ k+1g 3.18 in the sequential procedure for k = 1, 2,, K g. In 3.18, maximum function max y 0g, ỹ 0g for k = 1 respectively, µkg +1g for k = K g is defined to be the vector of lower respectively, upper bounds for the means. In the application to fingerprint images in Section 6, each image has size which implies that the componentwise lower and upper bounds are, respectively, 0 and 500. For z and z, we require these variables to satisfy the constraints z 0 and z 0 componentwise since they are candidate values for σ 1 and σ 2. Thus, the generation of y and z, via the proposal distribution q 0 y, z, requires that 3.18, z 0, and z 0 be satisfied. A proposal density that achieves this is discussed in the Appendix. The values of each triplet p g1, µ g1, σ g1 and p g2, µ g2, σ g2 can now be obtained. A sequential procedure is again adopted. The post-split parameters p gh, µ gh and σ gh, h = 1, 2, are initialized to the empty set. Starting from k = 1, the sets are appended as follows: For h = 0, 2, define h 1 = 2 and h 2 = 1 if h = 0, and h 1 = 1 and h 2 = 2 if h = 2. For k = 1, 2,, K g, if k A h, p gh1 = p gh1, min p 1, p2, max p 1, p2, p gh2 = p gh2, µ gh1 = µ gh1, min y, ỹ, max y, ỹ, µ gh2 = µ gh2, 3.19 σ gh1 = σ gh1, min z, z, max z, z, σ gh2 = σ gh2, and if k A 1, p g1 = p g1, p 1, p g2 = p g2, p 2, µ g1 = µ g1, y, µg2 = µg2, ỹ, σg1 = σ g1, z, and σ g2 = σ g2, z.
11 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 11 The above procedure guarantees that the post-split components µ g1 and µ g2 satisfy the constraints 1.6. At this point, we can explicitly determine some of the components of y in equation 3.4; we have G = G + 1, K = K {K g1, K g2 } \ {K g }, p = p {p g1, p g2 } \ {p g }, µ = µ {µ g1, µ g2 } \ {µ g } and σ = σ {σ g1, σ g2 } \ {σ g }. When K g1 +K g2 = 2K g 1, an index i 0 is selected from the set I 0 = { k : 2p < 1 } with probability 1/ I 0. The component with index i 0 is not split, and assigned a value of u i0g of either 0 or 2. For this case, u = u 1g, u 2g,, u Kg 1g, u i0g is chosen from the product space {0, 1, 2} Kg 1 {0, 2}, with M 1 denoting the number of admissible combinations satisfying the three restrictions on u. After selecting a u, we define p i = 2p 0g i 0 g, y i 0 g = ỹ i 0 g = µ i 0 g, and z i 0 g = z i 0 g = σ i 0 g where g is either g 1 or g 2 depending on the selected u. The split procedure above is carried out for the remaining indices k i 0. STEP 4: To complete the G-split proposal, we require to obtain the new first and second level labels, W and Z, in y see 3.4. All objects with labels W i = g are split into either Wi = g 1 or Wi = g 2 with allocation probabilities obtained as follows: Define Q i g h = n i Kgh j=1 k=1 p h φ d x ij µ h, σ h with h = 1, 2 for the i-th object. The W -allocation probabilities for components g 1 and g 2 are given by P W i = g h = Q i g h /Q i g 1 + Q i g for h = 1, 2. Once Wi has been determined, the Zij s are determined from the Bayes allocation probabilities 3.9 which is denoted here by Q ij k, g h for h = 1, 2. It follows that the allocation probability for the G-split move is P splitalloc = h=1,2 i : W i=g ih Q i g ih n i j=1 Q ij k ij, g ih 3.21 where g ih is the realized value of g h for the i-th object, and k ij are the realized values of k for the second level labels Z ij. Dass and Li 2008 give the proposal density for the G-split move as q m x, y = R 0 q 0 v q 0 y, z P splitalloc G M 0 M 1 [ ] y det x, u 3.22 where R 0 = 1/2 or 1/2 I 0 according to whether K g1 + K g2 = 2K g or 2K g 1 is chosen; in 3.22, [ ] y det x, u = 2 2Kg 1 ω g k A 0 A 2 A c 2 p K g k=1 2d 1 + p1 p is the absolute value of the Jacobian of the transformation from x, u y, and A c 1 is the set A 1 excluding the largest element. The explicit expression of 3.23 is derived in the Appendix.
12 12 DASS & LI G-split y, u x G-merge Fig 2. Figure showing the G-split and G-merge proposals as a reversible pair of moves. We conclude the G-split and G-merge sections with a note on establishing reversibility of the two moves. The G-merge proposal move m takes x to y with proposal density given by q m x, y in However, the acceptance probability in 3.1 also requires the proposal density, q m y, x, to move from y to x based on the reverse move m. In order to show that m is precisely the G-split move, we require to show that given x and y, there is a unique u such that x can be obtained from the combination of y, u via the G-split move see Figure 2. We demonstrate this in the next paragraph. For the G-split move, the variables in u is given by u = u 0, K t, u, v, y, z. These variables have the same interpretation as in the G-split move discussed earlier. Now, we check to see if u can be determined from x and y. First, the variable u 0 can be determined from u 0 = ω g1 /ω g. Second, the value of K t = K g1 + K g2. Note that K g alone cannot determine K t since K t is either 2K g or 2K g 1 with probability 1/2 each. However, with information on K g1 and K g2, K t is uniquely determined. Next, to get u, we rearrange the components of µ g1 and µ g2 in the increasing order 3.6. If K t is even, K g K t /2 adjacent pairs of µs are formed, and u is assigned the values 0, 1 or 2 since it is known from which component either g 1 or g 2 the two means in each of the k pairs came from. The case for odd K t can be similarly handled since one of the µ components is not paired, and subsequently, u for that component is either 2 or 0 depending on whether the µ component came from g 1 or g 2. Once u is obtained, v can be determined in the following way: The components of p g1 and p g2 are arranged according to the increasing order of µs. Suppose the k-th pair consists of µ k g µ k g with corresponding probabilities p k g and p k g, where k, k, g and g are some indices of k and g resulting from the ordering. The value of v = p k g /2p where p k g + p k g /2 = p is the merged probability in the G-merge move. Next, we obtain the values of y and z. In the case u = 1, y respectively, ỹ equals to the µ-value that came from component g 1 respectively, g 2 in the pair µ k g, µ k g. In the case when u = 0 or 2, y and ỹ can be determined only up to min y, ỹ and max y, ỹ. Subsequently, the proposal density
13 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 13 q 0 y, z depends only on min y, ỹ and min y, ỹ for these values of u. A similar argument as above can be made for the z and z Update K. We consider the move types K-split type m and K-merge move type m. The update of K is carried out for fixed G by selecting a component g on which K g will be updated to either K g 1 or K g + 1. For the K-merge move, we select two adjacent components for merging where adjacency is determined by the partial ordering 1.6. The merged mixing probability, mean and variance for the new component, k, are given by p k g = p + p k+1g, µ k g = p µ + p k+1g µ k+1g p + p k+1g and σ k g = p σ + p k+1g σ k+1g p + p k+1g The objects with W i = g and Z ij = k or k + 1 are merged into a newly relabelled bin W i = g and Z ij = k. For the K-split move, we first select a component that we want to split, k, which will be split into k 1 and k 2. A uniform random variable u 0 is selected to split p into p k1 g and p k2 g in the following way: p k1 g = u 0 p and p k2 g = 1 u 0 p Next, µ and σ are split by generating the variables y and z as in the case for the G-split move but now for a single k only. As in the G-split move, y and ỹ are required to satisfy a similar constraint of the form µ k 1g y, ỹ µ k+1g, 3.26 so that the post-split µ parameters satisfy the restriction 3.26, and subsequently 1.6. Once y and z are generated, the assignments to µ kh g and σ kh g, h = 1, 2 are done as follows: µ k1g = miny, ỹ, µ k2g = maxy, ỹ, σ k1g = minz, z, σ k2g = maxz, z. Objects with W i = g and observation labels Z ij = k are allocated to component k 1 or k 2 based on the Bayes allocation probabilities given by 3.9 with fixed W i = g Update Other Steps. The update of the other quantities in steps 3-5 of equation 3.3 can be done via regular Gibbs sampler since they do not involve models in spaces of varying dimensions. We give the summary steps here for completion. Update ω G, K, W, Z, p, µ, σ: The conditional posterior distribution of ω given the remaining parameters is given by G πω g=1 ω δ ω+n g 1 g Iω 1 < ω 2 < ω G 3.27
14 14 DASS & LI where N g = N i=1 IW i = g is the number of objects with label W i = g. Equation 3.27 is the order statistic distribution of a Dirichlet with parameters δ ω +N 1, δ ω + N 2,, δ ω + N G and can be easily simulated from. Update W, Z G, K, ω, p, µ, σ: The conditional posterior distribution of W i, Z i is independent of each other. The update of W i and Z i W i based on the conditional posterior distribution is the Bayes allocation scheme of 3.20 and 3.9. Update p, µ, σ G, K, ω, W, Z: The conditional posterior distribution of p = {p g, g = 1, 2,, G} is given by πp G K g g=1 k=1 p δ π+n where N = N ni i=1 j=1 IW i = g, Z ij = k is the number of observations x ij with W i = g and Z ij = k. Thus, each p g is independent Dirichlet with parameters δ p + N 1g, δ p + N 1g,, δ p + N Kgg. The update of µ is carried out based on generating from its conditional posterior distribution πµ. The generation scheme for µ is as follows: µ 1 1g, µ1 2g,, µ1 K g g K g k=1 φ 1 µ 1 ξ1, η1 I{µ 1 1g < µ1 2g < < µ1 independently for each g = 1, 2,, G, and for the remaining components, K g g } 3.29 µ b φ 1µ b ξb, ηb 3.30 independently for each b 2, k = 1, 2,, K g and g = 1, 2,, G; in 3.29 and 3.30, ξ b = N σ b xb τ 2 µ 0 N σ b τ 2, and η b = N σ b τ Equation 3.29 is the distribution of the order statistic from independent normals with different means and variances and can be simulated easily. The variances σ, are updated via σ b 2 2 IG σ b b α, βb, 3.32 independently of each other, where α b = α 0 + N and β b = 1/β 0 + ij x b ij µb 2 /2 1 with ij denoting the sum over all observations with W i = g and Z ij = k.
15 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS g=1 g=2 g= a g=1 g=2 g= b Fig 3. a Density and b scatter plots for objects with univariate and bivariate observables corresponding to d = 1 and d = 2, respectively Update Empty Components. The RJMCMC sampler developed also incorporates the updating of empty components into the chain. This is done with some modification to the earlier updating G and K move types. Empty components can arise naturally in the sampler when allocating the observations into the g or k components in both the G-split and K-split moves. In case of the G-split move, for example, it is possible that no objects are allocated into one of the split g components. Instead of rejecting this proposal altogether, we incorporate an additional variable, E g, that indicates whether the g component is empty; E g = 1 respectively, 0 indicates that a component is non-empty respectively, empty. The introduction of E g incorporates additional steps into the RJMCMC algorithm, namely, E-Add and E-Remove which are reversible to each other. In the E-Remove move, an empty g component is selected for removal. The only change in this case is in the subpopulation parameters ω, since after the removal of ω g, the remaining ω probabilities should sum to 1. We thus have ω g = ω g /1 ω g 3.33 for g g. In the E-Add move type, a uniform random variable u 0 in [0, 1] is generated and the probabilities ω are redistributed to include the empty component g according to ω g = u 0 and ω g = 1 u 0 ω g for g g The proposal distributions for the E-Add and E-Remove move types and the associated Jacobians are given in the Appendix. The E-Add and E-Remove reversible move types for the K components is similar, and therefore, not discussed further. 4. Convergence Diagnostics. The assessment of convergence of the RJM- CMC is carried out based on the methodology of Brooks and Guidici 1998,2000. Brooks and Guidici 1998,2000 suggests running I 2 chains from different starting values and monitoring parameters that maintain the same interpretation across different models. Six quantities for used for monitoring, namely, the overall variance, ˆV, the within chain variance, W c, within model variance W m, within model within chain variance W m W c, the between model variance, B m and the between model
16 DASS & LI 5.5 x x x x 10 4 a b c x 10 4 Fig 4. Convergence diagnostics for d = 1. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 10,000 iterations. within chain variance, B m W c. For each monitoring parameter, the corresponding three plots of ˆV and W c, W m and W m W c, and B m and B m W c against the number of iterations should be close to each other to indicate that the chains have sufficiently mixed. Our choice of the monitoring parameter is the log-likelihood of the hierarchical mixture model see Simulation. Two simulation experiments were carried out for the cases d = 1 and d = 2 with prior parameter specifications given by G min = K min = 2, G max = K max = 5, δ π = δ ω = 1, µ 0 = 7, τ 0 = 20, α 0 = 2.04 and β 0 = 0.5/α 0 1. The population of objects were simulated from G = 3 groups with population proportions ω = 0.2, 0.3, 0.5. The nested K-components were chosen to be K g = 3 for all g = 1, 2, 3. The specification of p, µ and σ are as follows: p g = 0.33, 0.33, 0.34 for all g, µ 1 = 6, 4, 2, µ 2 = 5, 7, 9 and µ 3 = 14, 17, 20. Common variances were assumed: σ g = 0.5, 0.5, 0.5 for g = 1, 2, 3. For the second experiment with d = 2, we took µ 1 = 1 6, 4, 2, µ 2 = 1 5, 7, 9, and µ 3 = 1 14, 17, 20, where 1 = 1, 1. All component variances of σ were taken to be 0.5. The total number of objects sampled from the population were N = 100 with n i the number of observables from the i-th object were iid from a Discrete Uniform distribution on the integers from 20 to 40, both inclusive. The density plot for the 3 components of the hierarchical mixture model in the case of d = 1 as well as the scatter plot for d = 2 based on a sample of observations from the population are given in Figures 3 a and b, respectively. In both experimens, the RJMCMC algorithm is cycled through the 7 updating steps 5 steps in 3.3 as well as 2 steps involving updating empty G and K components. We took the probabilities of selecting various move types to be r m = r m = 0.5 for the moves m, m = G-split,G-merge for G = G min + 1, G min + 2,, G max 1. When G = G max, r m = 0 = 1 r m and r m = 0 = 1 r m for G = G min. We used the same probabilities of selecting the K move types as above. The space of hierarchical mixture models with the above specifications consists of = 1, 360 models and so monitoring convergence of the chain becomes important. A total of I = 3 chains were chosen for monitoring with initial estimates of the hierarchical mixture model obtained using the values of G = 2, 3 and 4; Zhu,
17 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS x x 10 4 x 10 4 a b c Fig 5. Convergence diagnostics for d = 2. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 10,000 iterations. x x x 10 5 a b c Fig 6. Convergence diagnostics for the NIST Fingerprint Database. Panels a, b and c, respectively, show the plots of ˆV, W c, W m, W m W c and B m, B m W c as a function of the iterations. The x-axis unit is 20,000 iterations. Dass and Jain 2007 develops an algorithm that fits a hierarchical mixture model based on an agglomerative clustering procedure on the space of standard mixtures which requires a pre-specified value of G as input, and subsequently, the three choices of G mentioned were used to get the three initial estimates. The RJMCMC algorithm was run for B = 60, 000 and convergence of the chain was monitored see Figures 4 and 5. In both experiments, the RJMCMC appear to have converged after 60, 000 iterations. Highest posterior probabilities were found to be at the true values of the parameters. 6. An Application: Assessing the Individuality of Fingerprints. Fingerprint individuality refers to the study of the extent of uniqueness of fingerprints. It is the primary measure for assessing the uncertainty involved when individuals are identified based on their fingerprints, and has been the highlight of many court cases recently. In the case of Daubert v. Merrell Dow Pharmaceuticals Daubert v. Merrell Dow Pharmaceuticals Inc., 1993, the U.S. Supreme Court ruled that in order for an expert forensic testimony to be allowed in courts, it had to be subject to five main criteria of scientific validation, that is, whether i the particular technique or methodology has been subject to statistical hypothesis testing, ii its error rates has been established, iii standards controlling the technique s operation exist and have been maintained, iv it has been peer reviewed, and v
18 DASS & LI Fig 7. Posterior distribution of P RC based on 1, 000 realizations of the RJMCMC after burn-in. Fig 8. Illustrating impostor minutiae matching taken from Pankanti et al A total of m = 64 and n = 65 minutiae were detected in left and right image, respectively, and 25 correspondences i.e., matches were found. it has a general widespread acceptance see Pankanti, Prabhakar and Jain 2002, and Zhu et al Following Daubert, forensic evidence based on fingerprints was first challenged in the 1999 case of U.S. v. Byron C. Mitchell, stating that the fundamental premise for asserting the uniqueness of fingerprints had not been objectively tested and its potential matching error rates were unknown. Subsequently, fingerprint based identification has been challenged in more than 20 court cases in the United States. A quantitative measure of fingerprint individuality is given by the probability of a random correspondence PRC, which is the probability that a random pair of fingerprints in the population will match with each other. Mathematically, the PRC is expressed as P RC w m, n = P S w m, n, 6.1 where S denotes the number of feature matches with distribution based on all random pairs of fingerprints from the target population, w is the observed number of matches, and m and n, respectively, are the number of features in the two fingerprint images. Small respectively, large values of the PRC indicate low respectively, high uncertainty associated with the identification decision. Here, we focus on a particular type of feature match based on minutiae. Minutiae are locations i.e., x j R 2 on the fingerprint image which correspond to
19 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 19 ridge anomalies for example, ridge bifurcations and ridge endings and are used by forensic experts to declare that two fingerprints belong to the same individual if sufficiently large number of minutiae are found to be common to both prints. Figure 8 shows an example of such a random match between two fingerprints of different individuals also called an impostor match. The number of matches is determined by the number of minutiae in the right panel that falls within a square of area 4r 2 0 centered at each minutiae in the left panel, where r 0 is a small number relative to the size of the fingerprint image. The number of matching minutiae in Figure 8 is 25, but it is not known how likely such a match occurs between a pair of impostor fingerprints in the population of individuals. The reliability of the estimated PRC depends on how well elicited statistical models fits the distribution of minutiae in the population. Thus, candidate statistical models have to meet two important requirements: i flexibility, that is, the model can represent minutiae distributions over the entire population, and ii associated measures of fingerprint individuality can be easily obtained from these models. Zhu et al demonstrated that a mixture of multivariate normals with independent components fits the distribution of minutiae well for each fingerprint. Furthermore, when m and n are large, the distribution of S in 6.1 can be approximated by a Poisson distribution with mean expected number of matches λq 1, q 2, m, n = m n pq 1, q where q h represents the normal mixture see 1.2 fitted to fingerprint h for h = 1, 2, and pq 1, q 2 is the probability of a match between a pair of random minutiae, one generated from q 1 and the other from q 2. The analytical expression for pq 1, q 2 is pq 1, q 2 = 4 r 2 0 K 1 K 2 k=1 k =1 b=1 2 φ µ b k1 µb k2, σ b k1 + σ b k2, }{{}}{{} µ σ where φ 1 µ, σ 2 is the normal density with mean µ and variance σ 2. One drawback of Zhu et al 2007 is that no statistical model is elicited on the minutiae for a population of fingerprints. The hierarchical mixture model of 1.1 is such a population model on minutiae since x j R 2 satisfying both requirements of i flexibility and ii computational ease mentioned earlier. For a fingerprint pair coming from the subpopulation g 1 and g 2, we have q 1 = q g1 and q 2 = q g2 in 6.2. Hence, it follows that the mean PRC corresponding to w observed matches in the population is given by P RCw m, n = G G g 1 =1 g 2 =1 ω g1 ω g2 P S w λh g1, h g2, m, n, 6.4 where S follows a Poisson distribution with mean λh g1, h g2, m, n. The RJMCMC algorithm developed in the previous section can now be used to obtain the posterior distribution of P RC. As an illustration, we considered 100 fingerprint images from the NIST Special Database 4 as a sample from a population of fingerprints. A total
20 20 DASS & LI of I = 3 chains were run with starting values given by the algorithm of Zhu et al for the cases G = 1, 2 and 3. Figure 6 gives the diagnostics plots of the RJMCMC sampler which establish convergence after a burn-in of B = 100, 000 runs. The posterior distribution of P RC corresponding to m = 64, n = 65, w = 25 and r 0 = 15 pixels based on 1, 000 realizations of the RJMCMC after the burn-in period is given in Figure 7. The posterior mean and standard deviation in Figure 7 is and , respectively, and the 95% HPD interval is [0.63, 0.735], approximately. We conclude that if a fingerprint pair was chosen from this population with m = 64, n = 65 and an observed number of matches w = 25, there is high uncertainty in making a positive identification. Our analysis actually indicates that the fingerprints in Figure 8 represent a typical impostor pair. The 95% HPD set suggests that the PRC can be as high as 0.735, that is, about 3 in every 4 impostor pairs can yield 25 or more matches. 7. Summary and Future Work. We have developed Bayesian inference methodology for the inference on hierarchical mixture models with application to the study of fingerprint individuality. Our future work will be to derive hierarchical mixture models on the extended feature space consisting of minutiae and other fingerprint features. In this paper, we only considered a two level hierarchy. The US- VISIT program now requires individuals to submit prints from all 10 fingers. This is the case of a 3-level hierarchical mixture model; in the first top level, individuals form clusters based on similar characteristics of their 10 fingers, and the distribution of features in each finger is modelled using standard mixtures. Hierarchical mixture models have potential use in other areas as well, including the clustering of soil samples objects based on soil characteristics which can be modelled by a mixture or a transformation of mixtures. Acknowledgment. The authors wish to acknowledge the support of NSF DMS grant no while conducting this research. References. [1] Brooks, S. P. and Giudici, P Convergence assessment for Reversible Jump MCMC Simulations. Bayesian Statistics 6, Oxford University Press. [2] Brooks, S. P. and Giudici, P Markov Chain Monte Carlo Convergence Assessment vis Two-Way Analysis of Variance. Journal of Computational and Graphical Statistics, 9, 2, [3] Zhu, Y., Dass, S. C., and Jain, A. K Statistical Models for Assessing the Individuality of Fingerprints. IEEE Transactions on Information Forensics and Security, 2, 3, [4] Escobar, M. and West, M Bayesian Density Estimation and Inference Using Mixtures. Journal of the American Statistical Association, 90, 430, [5] Roeder, K. and Wasserman, L Practical Bayesian Density Estimation Using Mixtures of Normals. Journal of the American Statistical Association, 92, 439, [6] Woo, M. J. and Sriram, T. N Robust estimation of mixture complexity for count data. Computational Statistics & Data Analysis, 51, 9, [7] Ishwaran, H., James, L.F., and Sun, J Bayesian model selection in finite mixtures by marginal density decompositions. Journal of the American Statistical Association, 96, 456, [8] Green, P. and Richardson, S On the Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society Series B, 59, 4,
21 BAYESIAN ANALYSIS OF HIERARCHICAL MIXTURE MODELS 21 [9] Green, P Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 4, [10] NIST Special Database 4. NIST: 8-Bit Gray Scale Images of Fingerprint Image Groups FIGS. Online: [11] Pankanti, S. and Prabhakar, S. and Jain, A. K On the Individuality of Fingerprints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 8, [12] Daubert v. Merrel Dow Pharmaceuticals Inc U.S. 579, 113 S. Ct. 2786, 125 L.Ed.2d 469. Appendix Generating v. The generation of v is discussed here. Let A 1 = T denote the cardinality of the set A 1 and let A c 1 be all k indices in A 1 excluding the largest one. Without loss of generality, we can relabel the k ordered indices in A 1 as 1, 2,, T. It follows that the indices of A c 1 are 1, 2,, T 1. The restriction k : k A 1 a k v k = 1 can be rewritten as v T = 1 T 1 k=1 a kv k /a T. Since 0 v T 1, it follows that the T 1 indices in A c 1 must satisfy the inequality 1 a T T 1 k=1 a k v k 1. Also note that 0 T 1 k=1 a k v k T 1 k=1 a k since each 0 v k 1. Combining these inequalities, we get the following restriction on the T 1 free parameters of v: max0, 1 a T T 1 k=1 T 1 a k v k min1, k=1 a k 7.1 Let C = { v 1, v 2,, v T 1 : Equation 7.1 is satisfied and 0 v k 1 }. It follows that C is a convex polyhedral in the unit hypercube [0, 1] T 1. The challenge now is to generate v 1, v 2,, v T 1 from C and be able to write down the proposal density q 0 v 1, v 2,, v T 1 in a closed form. To do this, we determine the set of all extreme points of C. There are a total of T inequality constraints on v 1, v 2,, v T 1 : 1 T 1 constraints of the form 0 v k 1 and 2 one constraint of the form A T 1 k=1 a k v k B given by equation 7.1. Extreme points of a convex polyhedral in T 1 dimensions are formed by T 1 active equations. We can select T 1 candidate active equations from the T constraints above, solve for v 1, v 2,, v T 1 based on the T 1 equations and then check based on the remaining equation whether the solution obtained is admissible. For example, we may select all v k = 0 for k = 1, 2,, T 1 from 1. Plugging in 0, 0,, 0 in the remaining equation 2, we get T 1 k=1 a k v k = 0. So, if a T < 1 respectively, a T 1, we get 0 < A respectively, 0 A, giving an inadmissible respectively, admissible solution. Another candidate extreme point is formed by selecting the first T 2 v k s to be zero and solving v T 1 from the equation T 1 k=1 a k v k = B. The solution here is 0, 0, 0,, 0, B/a T 1 and will be admissible if 0 B/a T 1 1 since the inequality that was not used is 0 v T 1 1. It is easy to see that there are T 1 2 T 1 candidate extreme points to be checked for admissibility; we have to select T 1 constraints from T first which can be done in T 1 ways. Next, for each of the T 1 selected constraints, we can select either the lower or upper bounds of the constraints for candidate active equations.
On the Individuality of Fingerprints: Models and Methods
On the Individuality of Fingerprints: Models and Methods Dass, S., Pankanti, S., Prabhakar, S., and Zhu, Y. Abstract Fingerprint individuality is the study of the extent of uniqueness of fingerprints and
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationBayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models
Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationGraphical Models for Query-driven Analysis of Multimodal Data
Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationThe Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model
Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationQuantifying Fingerprint Evidence using Bayesian Alignment
Quantifying Fingerprint Evidence using Bayesian Alignment Peter Forbes Joint work with Steffen Lauritzen and Jesper Møller Department of Statistics University of Oxford UCL CSML Lunch Talk 14 February
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationSlice Sampling Mixture Models
Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics A Bayesian sparse finite mixture model for clustering data from a heterogeneous population Erlandson F. Saraiva a, Adriano K. Suzuki b and
More informationBayesian finite mixtures with an unknown number of. components: the allocation sampler
Bayesian finite mixtures with an unknown number of components: the allocation sampler Agostino Nobile and Alastair Fearnside University of Glasgow, UK 2 nd June 2005 Abstract A new Markov chain Monte Carlo
More informationFingerprint Individuality
Fingerprint Individuality On the Individuality of Fingerprints, Sharat Pankanti, Anil Jain and Salil Prabhakar, IEEE Transactions on PAMI, 2002 US DOJ, Office of the Inspector General, A Review of the
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationHmms with variable dimension structures and extensions
Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationBayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration
Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationDifferent points of view for selecting a latent structure model
Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationarxiv: v1 [stat.ap] 27 Mar 2015
Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationeqr094: Hierarchical MCMC for Bayesian System Reliability
eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167
More informationA generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection
A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection Silvia Pandolfi Francesco Bartolucci Nial Friel University of Perugia, IT University of Perugia, IT
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationBayesian inference for multivariate skew-normal and skew-t distributions
Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential
More informationStochastic Realization of Binary Exchangeable Processes
Stochastic Realization of Binary Exchangeable Processes Lorenzo Finesso and Cecilia Prosdocimi Abstract A discrete time stochastic process is called exchangeable if its n-dimensional distributions are,
More informationOn the Evidential Value of Fingerprints
On the Evidential Value of Fingerprints Heeseung Choi, Abhishek Nagar and Anil K. Jain Dept. of Computer Science and Engineering Michigan State University East Lansing, MI, U.S.A. {hschoi,nagarabh,jain}@cse.msu.edu
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationMore Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction
Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationHyperparameter estimation in Dirichlet process mixture models
Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and
More informationARTICLE IN PRESS. Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect. Journal of Multivariate Analysis
Journal of Multivariate Analysis ( ) Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Marginal parameterizations of discrete models
More informationAn Introduction to Reversible Jump MCMC for Bayesian Networks, with Application
An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 18-16th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Trans-dimensional Markov chain Monte Carlo. Bayesian model for autoregressions.
More informationBayesian time series classification
Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationSupplementary Note on Bayesian analysis
Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationBayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017
Chalmers April 6, 2017 Bayesian philosophy Bayesian philosophy Bayesian statistics versus classical statistics: War or co-existence? Classical statistics: Models have variables and parameters; these are
More informationSimultaneous inference for multiple testing and clustering via a Dirichlet process mixture model
Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationAssessing Regime Uncertainty Through Reversible Jump McMC
Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationOverview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda
Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing
More informationBayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang
Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More information