Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com Tony Jebara Columbia University jebara@cs.columbia.edu Inference CPM comrises a satial comonent, which reresents the inferred lace clusters, and a temoral comonent, which reresents the inferred lace distributions for each weehour. The model is deicted in Figure. Figure : Grahical model reresentation of CPM. The geograhic coordinates, denoted by `, are the only observed variables. The model assumes that all users share the same coefficients over the comonent lace distributions. We resent the derivation of our inference algorithm in multile stes. First, we use a strategy oularized by Griffiths and Steyvers [], and derive a collased Gibbs samler to samle from the osterior distribution of the categorical random variables conditioned on the observed geograhic coordinates. Second, we derive the conditional lielihood of the osterior samles, which we use to determine the samler s convergence. Finally, we derive formulas for aroximating the osterior exectations of the non-categorical random variables conditioned on the osterior samles.
. Collased Gibbs Samler In Lemmas and, we derive the collased Gibbs samler for variables z and y, resectively. Given a vector x and an index, let x indicate all the entries of the vector excluding the one at index. For Lemmas and, assume i (u, w, n) denotes the index of the variable that will be samled. Lemma. The unnormalized robability of z i conditioned on the observed location data and remaining categorical variables is z i y i f,z i, y i, ` / tṽu `i µ u, u ( u + ) u (ṽu ) + m,f. The arameters ṽ u, µu, u, and u are defined in the roof. t denotes the bivariate t-distribution and m,f denotes counts, both of which are defined in the aendix. Proof. We decomose the robability into two comonents using Bayes theorem: z i y i f,z i, y i, ` `i z i, y i f,z i, y i, ` i z i y i f,z i, y i, ` i `i y i f,z i, y i, ` i (`i z i, z i, ` i) z i y i f,z i, y i `i y i f,z i, y i, ` i / (`i z i, z i, ` i) () z i y i f,z i, y i. () In the first art of the derivation, we oerate on (). We augment it with and : (`i z i, z i, ` i) `i z i, z i, ` i, u, u u, u z i, z i, ` i d u d u `i z i, u, u u, u z i, z i, ` i d u d u N `i u, u (3) u, u z i, z i, ` i d u d u. (4), We convert (4) into a more tractable form. Let M be a set of indices, which we define in the aendix, and let ` M,, denote the subset of observations whose indices are in M. In the derivation
below, we treat all variables other than u and u as a constant: u, u z i, z i, ` i u, u z i, z i, ` M, u, u z i, ` M, ` M, / ` M, 0 @ `j 0 j @ 0 j @ j IW M, M, M, u, u, z i u, u z i ` M, z i u, u, z i u, u u, u, z i A u, u `j N `j u,v u, u,z j A u, u u, u A N. u µ u, u u Since the normal-inverse-wishart distribution is the conjugate rior of the multivariate normal distribution, the osterior is also a normal-inverse-wishart distribution, u, u z i, z i, ` i N u µ u, u u IW u u, ṽ u, (5) whose arameters are defined as u u + m,, ṽ u + m, `u m, j, X M, `j, µ u u µ u + m, S u j X M, u `j u + S u + `u, `u `j u m, `u u + m, `u T, µ u `u µ u T. 3
The osterior arameters deicted above are derived based on the conjugacy roerties of Gaussian distributions, as described in []. We rewrite () by combining (3), (4), and (5) to obtain (`i z i, z i, ` i) N `i u, u u, u z i, z i, ` i d u d u N `i u, u N u µ u, u u IW u u, ṽ u d u d u tṽu `i µ u, u ( u + ) u (ṽu ), (6) where t is the bivariate t-distribution. (6) is derived by alying Equation 58 from []. Now, we move onto the second art of the derivation. We oerate on () and augment it with : z i y i f,z i, y i z i y i f,z i, y i, f u f u y i f,z i, y i d f u z i y i f, f u (7) f u y i f,y i, z i d f u. (8),f We convert (8) into a more tractable form. As before, let M be a set of indices, which we define,f in the aendix, and let z M,f denote the subset of lace assignments whose indices are in M. In the derivation below, we treat all variables other than f u as a constant: f u y i f,y i, z i f u y i f,y i, z M,f z M y,f i f,y i, f u y i f,y i, f u y i f,y i, z M,f / z j y i f,y i, f u f u ) f u y i f,y i, z i j j j M,f M,f M,f z j y j, f u f u Categorical z j f u Dirichlet K f u Dirichlet Ku f u + m,f,..., + m Ku,f, (9) where the last ste follows because Dirichlet distribution is the conjugate rior of the categorical distribution. We rewrite () by combining (7), (8), and (9): z i y i f,z i, y i z i y i f, f u f u y i f,y i, z i d f u f u, Dirichlet K u f u + m,f,..., + m Ku,f d f u + m,f K u + m,f. (0) The last ste follows because it is the exected value of the Dirichlet distribution. 4
Finally, we combine (), (), (6), and (0) to obtain the unnormalized robability distribution: z i y i f,z i, y i, ` / (`i z i, z i, ` i) z i y i f,z i, y i tṽu `i µ u, u ( u + ) + m,f u (ṽu ) K u + m,f. Lemma. The unnormalized robability of y i conditioned on the observed location data and remaining categorical variables is y i f z i, y i, z i, ` / + m,f K u + m,f where the counts m,f, m,f, and m,f are defined in the aendix. w,f + m,f, Proof. We decomose the robability into two comonents using Bayes theorem: y i f z i, y i, z i, ` y i f z i, y i, z i Since () is equal to (), we rewrite it using (0) z i y i f,z i, y i y i f z i, y i z i z i, y i / z i y i f,z i, y i () y i f z i, y i. () z i y i f,z i, y i + m,f K u + m,f. (3) We oerate on () and augment it with : y i f z i, y i y i f y i y i f y i, w w y i d w (y i f w) w y i d w w,f (4) w y i d w. (5), We convert (5) into a more tractable form. As before, let M be a set of indices, which we define in the aendix, and let y M, denote the subset of comonent assignments whose indices are in M., In the derivation below, we treat all variables other than w as a constant, w y i w y M, ( y M, w ) w y M, / y M, w ( w ) (y j w) ( w ) j j M, M, Categorical (y j w) Dirichlet F ( w w) ) w y i Dirichlet F w w, + m,,..., w,f + m,f, (6) 5
where the last ste follows because Dirichlet distribution is the conjugate rior of the categorical distribution. We rewrite () by combining (4), (5), and (6): y i f z i, y i P f w,f w y i d w w,fdirichlet F w w, + m,,..., w,f + m,f w,f + m,f w,f + m,f d w. (7) Finally, we combine (), (), (3), and (7) to obtain the unnormalized robability distribution: y i f z i, y i, z i, ` / z i y i f,z i, y i y i f z i, y i + m,f K u + m,f P f w,f + m,f w,f + m,f.. Lielihoods In this subsection, we derive the conditional lielihoods of the osterior samles conditioned on the observed geograhical coordinates. We use these conditional lielihoods to determine the samler s convergence. We resent the derivations in multile lemmas and combine them in a theorem at the end of the subsection. Let denote the gamma function. Lemma 3. The marginal robability of the categorical random variable y is (y) W w FQ f FP f w,f ( w,f ) FQ f FP f, where the counts m,f are defined in the aendix. Proof. Let (,..., W ) denote the collection of random variables for all weehours. Below, we will augment the marginal robability with, and then factorize it based on the conditional 6
indeendence assumtions made by our model: (y) (y ) ( )d 0 @ W (y j ) A ( w ) d jm,, 0 W @ w jm, 0 W @ ( w ) w W 0 @ ( w ) w w (y j w) A jm, jm, W ( w ) d w (y j w) A d W 0 @Dirichlet F ( w w) w (y j w) A d w jm, Categorical (y j w) A d w. (8) Now, we substitute the robabilities in (8) with Dirichlet and categorical distributions, which are defined in more detail in the aendix: W 0 (y) @Dirichlet F ( w w) Categorical (y j w) A d w w W 0 @ F B ( w w ) f W 0 @ F B ( w ) w W B ( w ) B w W w FQ f FP f w,f ( w,f ) f w,f w,f w,f w,f jm, 0 F A @ +m,f f m,f w,f A d w w, + m,,..., w,f + m,f FQ f FP f. A d w Lemma 4. The conditional robability of the categorical random variable z conditioned on y is U F ( K u ) Ku Q + m,f (z y), u ( ) Ku K u + m,f f where the counts m,f and m,f are defined in the aendix. n o Proof. Let f u u {,...,U},f {,...,F} denote the collection of random variables for all users and comonents. Below, we will augment the conditional robability with, and then 7
factorize it based on the conditional indeendence assumtions made by our model: (z y) (z y, ) ( y) d 0 @ (z j y, ) A ( ) d jm,, 0 U @ U F u f U u f u f U u f 0 F @ F 0 @ jm,f f u f u u 0 U z j y j, f A @ jm,f jm,f F 0 @Dirichlet Ku f u F u f u z j y j, f A d u z j y j, f A d f u jm,f u f A d u Categorical z j f A d f u. (9) Now, we substitute the robabilities in (9) with Dirichlet and categorical distributions, which are defined in more detail in the aendix: U F 0 (z y) @Dirichlet Ku u f u Categorical z j f A d f u u f U F u f U F u f U u f U u f B ( ) B ( ) K u K u f u, f u, jm,f +m,f K u F B ( ) B + m,f,..., + m Ku,f F ( K u ) Ku Q ( ) Ku + m,f. K u + m,f f u, d u f m,f d u f denote the bivariate gamma function, and let denote the determi- For our final derivation, let nant. Lemma 5. The conditional robability of the observed locations ` conditioned on z and y is U K u u (` z, y) u m, v u u v u u. The arameters v u, u, and u aendix. are defined in the roof, and the counts m, are defined in the 8
Proof. We will factorize the robability using the conditional indeendence assumtions made by the model, and then simlify the resulting robabilities by integrating out the means and covariances associated with the lace clusters: (` z, y) (` z) U K u u u U K u u U K u u U K u u jm, N `M, z, `M z, u, u u, u d u d u u, u N `j jm, u µ u, u u `j z j, IW u, u d u d u. u, u d u d u u,v We aly Equation 66 from [], which describes the conjugacy roerties of Gaussian distributions, to reformulate (0) into its final form: U K u (` z, y) N u µ u, u u IW u,v N `j U K u u m, v u u u The definitions for v u, u, and u are rovided in (5). v u u. jm, (0) u, u d u d u Finally, we combine Lemmas 3, 4, and 5 to rovide the log-lielihood of the samles z and y conditioned on the observations `. Lemma 6. The log-lielihood of the samles z and y conditioned on the observations ` is 0 WX log (z, y `) @ 0 UX + @ + FX log w f FX u f UX XK u u where C denotes the constant terms. log v u log A K u + m,f K u + m, log X log + m,f A v u log u log u + C, Proof. The result follows by multilying the robabilities stated in Lemmas 3, 4, and 5, and alying the logarithm function..3 Parameter estimation In Subsection., we described a collased Gibbs samler for samling the osteriors of the categorical random variables. Below, Lemmas 7, 8, and 9 show how these samles, denoted as y and z, can be used to aroximate the osterior exectations of,,, and. 9
Lemma 7. The exectation of samles is given the observed geograhical coordinates and the osterior w,f E [ w,f y, z, `] P where the counts m,f are defined in the aendix. Proof. ( w y, z, `) ) w,f E [ w,f y, z, `] f w y M, y M, w Q ( w ) y M, jm, y M, ( w ) (y j w) / Dirichlet F ( w w) jm,, Categorical (y j w) Dirichlet F w w, + m,,..., w,f + m,f P f. Lemma 8. The exectation of given the observed geograhical coordinates and the osterior samles is h f u, E f u, y, z, `i where the counts m,f and m,f are defined in the aendix. + m,f K u + m,f, Proof. f u y, z, ` ) h f u, E f u, y, z, `i f u y, z f u z, y M,f z M,f f u, y z y M,f Q f u jm,f f u y z j f u, y z y M,f / Dirichlet Ku f u jm,f Categorical z j f u Dirichlet Ku f u + m,f,..., + m Ku,f + m,f K u + m,f. 0
Lemma 9. The exectations of osterior samles is and and given the observed geograhical coordinates and the h i u E u y, z, ` µ u h i u E u y, z, ` u v u 3. Parameters µ u, u, and v u are defined in the roof of Lemma. Proof. u, u y, z, ` ) u, u y, z, ` ) u y, z, ` ) h i u E u y, z, ` ) u y, z, ` ) h i u E u y, z, ` u, u z, ` u, u z,, `M `M, Q jm, N / N N tv u µ u IW `j u µ u, u u u v u 3. u, u, z u, u z, `M z u, u, z u, u, `M IW u µ u, u u u µ u, u u z IW IW u µ u u, u (vu ) u u, v u u,v, `M z u,v u u, v u Q jm, N jm, N `j `j u, u u, u Aendix. Miscellaneous notation Throughout the aer, we use various notations to reresent sets of indices and their cardinalities. Vectors y and z denote the comonent and lace assignments in CPM, resectively. Each vector entry is identified by a tule index (u, w, n), where u {,...,U} is a user, w {,...,W} is a weehour, and n {,...,N u,w } is an iteration index. For the subsequent notations, we assume that the random variables y and z are already samled. We refer to a subset of indices using
M 0,f0 u 0,w 0 {(u, w, n) z u,w,n 0,y u,w,n f 0,u u 0,w w 0 }, where u 0 denotes the user, w 0 denotes the weehour, 0 denotes the lace, and f 0 denotes the comonent. If we want the subset of indices to be unrestricted with resect to a category, we use the laceholder. For examle, has no constraints with resect to laces. M,f0 u 0,w 0 {(u, w, n) y u,w,n f 0,u u 0,w w 0 } Given a subset of indices denoted by M, the lowercase m M denotes its cardinality. For examle, given a set of indices its cardinality is M,f0 u 0, {(u, w, n) y u,w,n f 0,u u 0 }, m,f0 u 0, M,f0 u 0,. For the collased Gibbs samler, the sets of indices and cardinalities used in the derivations exclude the index that will be samled. We use to modify sets or cardinalities for this exclusion. Let (u, w, n) denote the index that will be samled, then given an index set M, let M M {(u, w, n)} reresent the excluding set and let m M reresent the corresonding cardinality. For examle, and M,f0 u 0, M,f0 u 0, m,f0 u 0, {(u, w, n)},f0 M u 0,. In the roof of Lemma, arameters ṽ u, µu, u, and u are defined using cardinalities that exclude the current index (u, w, n). Similarly, in the roof of Lemma 9, arameters µ u, u, and v u are defined lie their wiggly versions, but the counts used in their definitions do not exclude the current index. We define additional notation to reresent the sufficient statistics used by the learning algorithm. Let i (u, w, n) denote an observation index. Then, S u X im, denotes the sum of the observed coordinates that have been assigned to user u and lace. Similarly, P u X `i`ti im, denotes the sum of the outer roducts of the observed coordinates that have been assigned to user u and lace.. Probability distributions `i Let denote a bivariate gamma function, defined as (a) a + j. j Let >and let R be a ositive definite scale matrix. The inverse-wishart distribution, which is the conjugate rior to the multivariate normal distribution, is defined as IW (, ) 3 ex tr.
Let R be a ositive definite covariance matrix and let µ R denote a mean vector. The multivariate normal distribution is defined as N (` µ, ) ( ) ex (` µ)t (` µ). Let >and let R, then the -dimensional t-distribution is defined as t v (x µ, ) + (x µ)t (x µ) + Let K> be the number of categories and let (,..., K ) be the concentration arameters, where > 0 for all {,...,K}. Then, the K-dimensional Dirichlet distribution, which is the conjugate rior to the categorical distribution, is defined as Dirichlet K (x ) K B ( ) x, where KQ ( ) B ( ) KP. We abuse the Dirichlet notation slightly and use it to define the K-dimensional symmetric Dirichlet distribution as well. Let >0 be a scalar concentration arameter. Then, the symmetric Dirichlet distribution is defined as where References for all {,...,K}. Dirichlet K (x )Dirichlet K (x,..., K ), [] Thomas L. Griffiths and Mar Steyvers. Finding scientific toics. Proceedings of the National Academy of Sciences of the United States of America, 0(Sul ):58 535, Aril 004. [] Kevin Murhy. Conjugate bayesian analysis of the gaussian distribution. October 007.. 3