Collaborative Place Models Supplement 2

Size: px

Start display at page:

Download "Collaborative Place Models Supplement 2"

Louisa Henry
6 years ago
Views:

Collaborative Place Models Supplement Ber Kapicioglu Foursquare Labs berapicioglu@gmailcom Robert E

daviddavidr@gmailcom Tony Jebara Columbia University jebara@cscolumbiaedu Bayesian Gaussian Mixture

process The model is depicted in Figure Figure : Graphical model representation of BGMM The geographic

denote the inverse-wishart distribution Let Dirichlet K ( ) denote a symmetric Dirichlet if its

BGMM is described in more formal terms below Further technical details about the distributions we use in

1 Collaborative Place Models Supplement Ber Kapicioglu Foursquare Labs Robert E Schapire Princeton University schapire@csprincetonedu David S Rosenberg YP Mobile Labs daviddavidr@gmailcom Tony Jebara Columbia University jebara@cscolumbiaedu Bayesian Gaussian Mixture Model (BGMM) In this section, we provide a formal description of BGMM and describe its generative process The model is depicted in Figure Figure : Graphical model representation of BGMM The geographic coordinates, denoted by `, are the only observed variables Let denote the normal distribution and IW denote the inverse-wishart distribution Let Dirichlet K ( ) denote a symmetric Dirichlet if its parameter is a scalar and a general Dirichlet if its parameter is a vector The generative process of BGMM is described in more formal terms below Further technical details about the distributions we use in our model can be found in the appendix Draw a place distribution Dirichlet K ( ) For each place, (a) Draw a place covariance IW (,v) (b) Draw a place mean µ, p

2 3 For each observation index n, Inference (a) Draw a place assignment z n Categorical( ) (b) Draw a location `n ( zn, zn ) In this section, we derive our inference algorithm, and we present our derivation in multiple steps First, we use a strategy popularized by Griffiths and Steyvers [], and derive a collapsed Gibbs sampler to sample from the posterior distribution of the categorical random variables conditioned on the observed geographic coordinates Second, we derive the conditional lielihood of the posterior samples, which we use to determine the sampler s convergence Finally, we derive formulas for approximating the posterior expectations of the non-categorical random variables conditioned on the posterior samples Collapsed Gibbs Sampler In Lemma, we derive the collapsed Gibbs sampler for variable z Given a vector x and an index, let x indicate all the entries of the vector excluding the one at index For Lemma, assume i denotes the index of the variable that will be sampled Lemma The unnormalized probability of z i conditioned on the observed location data and remaining categorical variables is p (z i z i, `) / tṽ `i µ, ( p + ) + m, p (ṽ ) The parameters ṽ, µ,, and p are defined in the proof t denotes the bivariate t-distribution and m, denotes counts, both of which are defined in the appendix Proof We decompose the probability into two components using Bayes theorem: p (z i z i, `) p (`i z i, z i, ` i) p (z i z i, ` i) p (`i z i, ` i) p (`i z i, z i, ` i) p (z i z i ) p (`i z i, ` i) / p (`i z i, z i, ` i) () p (z i z i ) () In the first part of the derivation, we operate on () We augment it with and : p (`i z i, z i, ` i) p `i z i, z i, ` i,, p, z i, z i, ` i d d p `i z i,, p, z i, z i, ` i d d `i, (3) p, z i, z i, ` i d d (4)

3 , We convert (4) into a more tractable form Let M be a set of indices, which we define in the appendix, and let ` M,, denote the subset of observations whose indices are in M In the derivation below, we treat all variables other than and as a constant: p, z i, z i, ` i p, z i, z i, ` M, p, z i, ` M, p ` M, / p Y 0 ` M, j Y 0 j Y j IW M, p, z i,, z i p ` M, z i,, z i p, p `j p `j `j,v,, z i A p,,,z j A p,, A µ, p Since the normal-inverse-wishart distribution is the conjugate prior of the multivariate normal distribution, the posterior is also a normal-inverse-wishart distribution, p, z i, z i, ` i µ, IW p, ṽ, (5) whose parameters are defined as p p + m,, ṽ + m, ` m, j j M,, X M, `j, µ p µ + m, `, p S X `j ` + S + `j ` T, p m, T ` p + m, µ ` µ 3

4 The posterior parameters depicted above are derived based on the conjugacy properties of Gaussian distributions, as described in [] We rewrite () by combining (3), (4), and (5) to obtain p (`i z i, z i, ` i) `i, p, z i, z i, ` i d d `i, µ, p IW, ṽ d d tṽ `i µ, ( p + ), (6) p (ṽ ) where t is the bivariate t-distribution (6) is derived by applying Equation 58 from [] ow, we move onto the second part of the derivation We operate on () and augment it with : p (z i z i ) p (z i z i, ) p ( z i )d p (z i ) (7) p ( z i )d (8) We convert (8) into a more tractable form As before, let M be a set of indices, which we define in the appendix, and let z M denote the subset of place assignments whose indices are in M In the derivation below, we treat all variables other than as a constant: p ( z i ) p z M p z M p ( ) p z M Y / p (z j ) p ( ) j j M Y M Categorical (z j ) Dirichlet K ( ) ) p ( z i ) Dirichlet K + m,,, + m K,, (9) where the last step follows because Dirichlet distribution is the conjugate prior of the categorical distribution We rewrite () by combining (7), (8), and (9): p (z i z i ) p (z i ) p ( z i )d Dirichlet K + m,,, + m K, d + m, K + m (0) The last step follows because it is the expected value of the Dirichlet distribution Finally, we combine (), (), (6), and (0) to obtain the unnormalized probability distribution: p (z i z i, `) / p (`i z i, z i, ` i) p (z i z i ) tṽ `i µ, ( p + ) p (ṽ ) + m, K + m 4

5 Lielihoods In this subsection, we derive the conditional lielihoods of the posterior samples conditioned on the observed geographical coordinates We use these conditional lielihoods to determine the sampler s convergence We present the derivations in multiple lemmas and combine them in a theorem at the end of the subsection Let denote the gamma function Lemma The marginal probability of the categorical random variable z is Q (K ) K + m, p (z) ( ) K K + m, where the counts m, and m are defined in the appendix Proof Below, we will augment the marginal probability with, and then factorize it based on the conditional independence assumptions made by our model: p (z) p (z ) p ( ) d 0 p ( ) p (z j ) A Y j K ( ) Y j Categorical (z j ) A d () ow, we substitute the probabilities in () with Dirichlet and categorical distributions, which are defined in more detail in the appendix: 0 p K ( ) Y Categorical (z j ) A d j B ( ) B ( ) +m, m, d B ( ) B + m,,, + m K, (K ) K Q ( ) K + m, K + m d For our final derivation, let denote the bivariate gamma function, and let denote the determinant Lemma 3 The conditional probability of the observed locations ` conditioned on z is v p p (` z) v p m, The parameters v,, and p are defined in the proof, and the counts m, are defined in the appendix 5

6 Proof We will factorize the probability using the conditional independence assumptions made by the model, and then simplify the resulting probabilities by integrating out the means and covariances associated with the place clusters: p (` z) p, `M z p, `M z, p u, u Y jm, `j, p, d d Y jm, µ, p p `j z j, IW, d d,v, d d We apply Equation 66 from [], which describes the conjugacy properties of Gaussian distributions, to reformulate () into its final form: p (` z) µ, Y IW,v `j, d d p m, v p v p The definitions for v,, and p are provided in (5) jm, Finally, we combine Lemmas and 3 to provide the log-lielihood of the samples z conditioned on the observations ` Lemma 4 The log-lielihood of the samples z conditioned on the observations ` is log p (z `) + KX log KX where C denotes the constant terms + m, v log m, log v log log p + C, Proof The result follows by multiplying the probabilities stated in Lemmas and 3, and applying the logarithm function 3 Parameter estimation In Subsection, we described a collapsed Gibbs sampler for sampling the posteriors of the categorical random variables Below, Lemmas 5 and 6 show how these samples, denoted as z, can be used to approximate the posterior expectations of,, and Lemma 5 The expectation of given the observed geographical coordinates and the posterior samples is E [ z, `] + m, K + m where the counts m, and m are defined in the appendix, () 6

7 Proof p ( z, `) p ( z) p (z ) p ( ) p (z) Q p (z j ) p ( ) j p (z) / Dirichlet K ( ) Y j Categorical (z j ) Dirichlet K + m,,, + m K, ) E [ z, `] + m, K + m Lemma 6 The expectations of posterior samples is and given the observed geographical coordinates and the h i u E u z, ` µ u and h i u E u z, ` u v u 3 Parameters µ u, u, and v u are defined in the proof of Lemma 7

8 Proof p, z, ` ) p, z, ` ) p z, ` ) h i E z, ` ) p z, ` ) h i E z, ` p, z, ` p, z,, `M p `M, Q jm, / tv p `j,, z p, `M z,, z p, `M µ, p IW p, z z p,,v p, `M z µ, IW,v p Q jm, µ, IW p, v µ, p (v ) µ IW u, v v 3 Y jm, `j `j,, 3 Appendix 3 Miscellaneous notation Throughout the paper, we use various notations to represent sets of indices and their cardinalities Vectors y and z denote the component and place assignments in CPM, respectively Each vector entry is identified by a tuple index (u, w, n), where u {,,U} is a user, w {,,W} is a weehour, and n {,, u,w } is an iteration index For the subsequent notations, we assume that the random variables y and z are already sampled We refer to a subset of indices using M 0,f0 u 0,w 0 {( u, ẇ, ṅ) z u,ẇ,ṅ 0,y u,ẇ,ṅ f 0, u u 0, ẇ w 0 }, where u 0 denotes the user, w 0 denotes the weehour, 0 denotes the place, and f 0 denotes the component If we want the subset of indices to be unrestricted with respect to a category, we use the placeholder For example, has no constraints with respect to places M,f0 u 0,w 0 {( u, ẇ, ṅ) y u,ẇ,ṅ f 0, u u 0, ẇ w 0 } 8

9 Given a subset of indices denoted by M, the lowercase m M denotes its cardinality For example, given a set of indices M u,f0 0, {( u, ẇ, ṅ) z u,ẇ,ṅ,y u,ẇ,ṅ f 0, u u 0, ẇ w 0 }, its cardinality is m,f0 u 0, M,f0 u 0, For the collapsed Gibbs sampler, the sets of indices and cardinalities used in the derivations exclude the index that will be sampled We use to modify sets or cardinalities for this exclusion Let (u, w, n) denote the index that will be sampled, then given an index set M, let M M {(u, w, n)} represent the excluding set and let m M represent the corresponding cardinality For example, and M,f0 u 0, M,f0 u 0, m,f0 u 0, {(u, w, n)},f0 M u 0, In the proof of Lemma, parameters ṽ u, µu, u, and p u are defined using cardinalities that exclude the current index (u, w, n) Similarly, in the proof of Lemma 6, parameters µ u, u, and v u are defined lie their wiggly versions, but the counts used in their definitions do not exclude the current index 3 Probability distributions Let denote a bivariate gamma function, defined as Y (a) a + j j Let >and let R be a positive definite scale matrix The inverse-wishart distribution, which is the conjugate prior to the multivariate normal distribution, is defined as IW (, ) 3 exp tr Let R be a positive definite covariance matrix and let µ R denote a mean vector The multivariate normal distribution is defined as (` µ, ) ( ) exp (` µ)t (` µ) Let >and let R, then the -dimensional t-distribution is defined as t v (x µ, ) + (x µ)t (x µ) + Let K>be the number of categories and let (,, K ) be the concentration parameters, where > 0 for all {,,K} Then, the K-dimensional Dirichlet distribution, which is the conjugate prior to the categorical distribution, is defined as Dirichlet K (x ) B ( ) x, where KQ ( ) B ( ) KP We abuse the Dirichlet notation slightly and use it to define the K-dimensional symmetric Dirichlet distribution as well Let >0 be a scalar concentration parameter Then, the symmetric Dirichlet distribution is defined as Dirichlet K (x )Dirichlet K (x,, K ), where for all {,,K} 9

10 References [] Thomas L Griffiths and Mar Steyvers Finding scientific topics Proceedings of the ational Academy of Sciences of the United States of America, 0(Suppl ):58 535, April 004 [] Kevin Murphy Conjugate bayesian analysis of the gaussian distribution October 007 0

Collaborative Place Models Supplement 1

Collaborative Place Models Supplement 1 Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com