Collaborative Place Models Supplement 1

Similar documents
Collaborative Place Models Supplement 2

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Learning Sequence Motif Models Using Gibbs Sampling

Introduction to Probability for Graphical Models

Estimation of the large covariance matrix with two-step monotone missing data

Bayesian Networks Practice

Chapter 7: Special Distributions

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

CMSC 425: Lecture 4 Geometry and Geometric Programming

Bayesian Networks Practice

Numerical Methods for Particle Tracing in Vector Fields

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Approximating min-max k-clustering

Uniform Law on the Unit Sphere of a Banach Space

Supplemental Information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

The Poisson Regression Model

p-adic Measures and Bernoulli Numbers

Analysis of some entrance probabilities for killed birth-death processes

Department of Mathematics

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Stochastic integration II: the Itô integral

Methods for Computing Marginal Data Densities from the Gibbs Output

Online Appendix for The Timing and Method of Payment in Mergers when Acquirers Are Financially Constrained

Principal Components Analysis and Unsupervised Hebbian Learning

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

A Time-Varying Threshold STAR Model of Unemployment

Hidden Predictors: A Factor Analysis Primer

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Sampling. Inferential statistics draws probabilistic conclusions about populations on the basis of sample statistics

Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Joint Property Estimation for Multiple RFID Tag Sets Using Snapshots of Variable Lengths

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

On parameter estimation in deformable models

Slash Distributions and Applications

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

General Linear Model Introduction, Classes of Linear models and Estimation

SPARSE SIGNAL RECOVERY USING A BERNOULLI GENERALIZED GAUSSIAN PRIOR

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001

A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning

Hotelling s Two- Sample T 2

A Note on Massless Quantum Free Scalar Fields. with Negative Energy Density

arxiv: v1 [physics.data-an] 26 Oct 2012

SAS for Bayesian Mediation Analysis

SPACE situational awareness (SSA) encompasses intelligence,

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Distributed Rule-Based Inference in the Presence of Redundant Information

Correspondence Between Fractal-Wavelet. Transforms and Iterated Function Systems. With Grey Level Maps. F. Mendivil and E.R.

ON PRIORS FOR IMPULSE RESPONSES IN BAYESIAN STRUCTURAL VAR MODELS

Supplement to the paper Accurate distributions of Mallows C p and its unbiased modifications with applications to shrinkage estimation

On the Chvatál-Complexity of Knapsack Problems

New Information Measures for the Generalized Normal Distribution

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Estimation of Separable Representations in Psychophysical Experiments

On information plus noise kernel random matrices

Ž. Ž. Ž. 2 QUADRATIC AND INVERSE REGRESSIONS FOR WISHART DISTRIBUTIONS 1

Extensions of the Penalized Spline Propensity Prediction Method of Imputation

arxiv: v1 [cs.lg] 31 Jul 2014

Frequency-Domain Design of Overcomplete. Rational-Dilation Wavelet Transforms

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Generalized optimal sub-pattern assignment metric

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

1 Properties of Spherical Harmonics

Elementary theory of L p spaces

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION

Econometrica Supplementary Material

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

ON UNIFORM BOUNDEDNESS OF DYADIC AVERAGING OPERATORS IN SPACES OF HARDY-SOBOLEV TYPE. 1. Introduction

AMS10 HW1 Grading Rubric

COMMUNICATION BETWEEN SHAREHOLDERS 1

The Fekete Szegő theorem with splitting conditions: Part I

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

General Random Variables

Linear diophantine equations for discrete tomography

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM

The Binomial Approach for Probability of Detection

Slides Prepared by JOHN S. LOUCKS St. Edward s s University Thomson/South-Western. Slide

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

Distributed K-means over Compressed Binary Data

Analysis of M/M/n/K Queue with Multiple Priorities

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Fibration of Toposes PSSL 101, Leeds

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Published: 14 October 2013

The inverse Goldbach problem

Pell's Equation and Fundamental Units Pell's equation was first introduced to me in the number theory class at Caltech that I never comleted. It was r

Named Entity Recognition using Maximum Entropy Model SEEM5680

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales

Positive decomposition of transfer functions with multiple poles

On the optimality of Naïve Bayes with dependent binary features

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Transcription:

Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com Tony Jebara Columbia University jebara@cs.columbia.edu Inference CPM comrises a satial comonent, which reresents the inferred lace clusters, and a temoral comonent, which reresents the inferred lace distributions for each weehour. The model is deicted in Figure. Figure : Grahical model reresentation of CPM. The geograhic coordinates, denoted by `, are the only observed variables. The model assumes that all users share the same coefficients over the comonent lace distributions. We resent the derivation of our inference algorithm in multile stes. First, we use a strategy oularized by Griffiths and Steyvers [], and derive a collased Gibbs samler to samle from the osterior distribution of the categorical random variables conditioned on the observed geograhic coordinates. Second, we derive the conditional lielihood of the osterior samles, which we use to determine the samler s convergence. Finally, we derive formulas for aroximating the osterior exectations of the non-categorical random variables conditioned on the osterior samles.

. Collased Gibbs Samler In Lemmas and, we derive the collased Gibbs samler for variables z and y, resectively. Given a vector x and an index, let x indicate all the entries of the vector excluding the one at index. For Lemmas and, assume i (u, w, n) denotes the index of the variable that will be samled. Lemma. The unnormalized robability of z i conditioned on the observed location data and remaining categorical variables is z i y i f,z i, y i, ` / tṽu `i µ u, u ( u + ) u (ṽu ) + m,f. The arameters ṽ u, µu, u, and u are defined in the roof. t denotes the bivariate t-distribution and m,f denotes counts, both of which are defined in the aendix. Proof. We decomose the robability into two comonents using Bayes theorem: z i y i f,z i, y i, ` `i z i, y i f,z i, y i, ` i z i y i f,z i, y i, ` i `i y i f,z i, y i, ` i (`i z i, z i, ` i) z i y i f,z i, y i `i y i f,z i, y i, ` i / (`i z i, z i, ` i) () z i y i f,z i, y i. () In the first art of the derivation, we oerate on (). We augment it with and : (`i z i, z i, ` i) `i z i, z i, ` i, u, u u, u z i, z i, ` i d u d u `i z i, u, u u, u z i, z i, ` i d u d u N `i u, u (3) u, u z i, z i, ` i d u d u. (4), We convert (4) into a more tractable form. Let M be a set of indices, which we define in the aendix, and let ` M,, denote the subset of observations whose indices are in M. In the derivation

below, we treat all variables other than u and u as a constant: u, u z i, z i, ` i u, u z i, z i, ` M, u, u z i, ` M, ` M, / ` M, 0 @ `j 0 j @ 0 j @ j IW M, M, M, u, u, z i u, u z i ` M, z i u, u, z i u, u u, u, z i A u, u `j N `j u,v u, u,z j A u, u u, u A N. u µ u, u u Since the normal-inverse-wishart distribution is the conjugate rior of the multivariate normal distribution, the osterior is also a normal-inverse-wishart distribution, u, u z i, z i, ` i N u µ u, u u IW u u, ṽ u, (5) whose arameters are defined as u u + m,, ṽ u + m, `u m, j, X M, `j, µ u u µ u + m, S u j X M, u `j u + S u + `u, `u `j u m, `u u + m, `u T, µ u `u µ u T. 3

The osterior arameters deicted above are derived based on the conjugacy roerties of Gaussian distributions, as described in []. We rewrite () by combining (3), (4), and (5) to obtain (`i z i, z i, ` i) N `i u, u u, u z i, z i, ` i d u d u N `i u, u N u µ u, u u IW u u, ṽ u d u d u tṽu `i µ u, u ( u + ) u (ṽu ), (6) where t is the bivariate t-distribution. (6) is derived by alying Equation 58 from []. Now, we move onto the second art of the derivation. We oerate on () and augment it with : z i y i f,z i, y i z i y i f,z i, y i, f u f u y i f,z i, y i d f u z i y i f, f u (7) f u y i f,y i, z i d f u. (8),f We convert (8) into a more tractable form. As before, let M be a set of indices, which we define,f in the aendix, and let z M,f denote the subset of lace assignments whose indices are in M. In the derivation below, we treat all variables other than f u as a constant: f u y i f,y i, z i f u y i f,y i, z M,f z M y,f i f,y i, f u y i f,y i, f u y i f,y i, z M,f / z j y i f,y i, f u f u ) f u y i f,y i, z i j j j M,f M,f M,f z j y j, f u f u Categorical z j f u Dirichlet K f u Dirichlet Ku f u + m,f,..., + m Ku,f, (9) where the last ste follows because Dirichlet distribution is the conjugate rior of the categorical distribution. We rewrite () by combining (7), (8), and (9): z i y i f,z i, y i z i y i f, f u f u y i f,y i, z i d f u f u, Dirichlet K u f u + m,f,..., + m Ku,f d f u + m,f K u + m,f. (0) The last ste follows because it is the exected value of the Dirichlet distribution. 4

Finally, we combine (), (), (6), and (0) to obtain the unnormalized robability distribution: z i y i f,z i, y i, ` / (`i z i, z i, ` i) z i y i f,z i, y i tṽu `i µ u, u ( u + ) + m,f u (ṽu ) K u + m,f. Lemma. The unnormalized robability of y i conditioned on the observed location data and remaining categorical variables is y i f z i, y i, z i, ` / + m,f K u + m,f where the counts m,f, m,f, and m,f are defined in the aendix. w,f + m,f, Proof. We decomose the robability into two comonents using Bayes theorem: y i f z i, y i, z i, ` y i f z i, y i, z i Since () is equal to (), we rewrite it using (0) z i y i f,z i, y i y i f z i, y i z i z i, y i / z i y i f,z i, y i () y i f z i, y i. () z i y i f,z i, y i + m,f K u + m,f. (3) We oerate on () and augment it with : y i f z i, y i y i f y i y i f y i, w w y i d w (y i f w) w y i d w w,f (4) w y i d w. (5), We convert (5) into a more tractable form. As before, let M be a set of indices, which we define in the aendix, and let y M, denote the subset of comonent assignments whose indices are in M., In the derivation below, we treat all variables other than w as a constant, w y i w y M, ( y M, w ) w y M, / y M, w ( w ) (y j w) ( w ) j j M, M, Categorical (y j w) Dirichlet F ( w w) ) w y i Dirichlet F w w, + m,,..., w,f + m,f, (6) 5

where the last ste follows because Dirichlet distribution is the conjugate rior of the categorical distribution. We rewrite () by combining (4), (5), and (6): y i f z i, y i P f w,f w y i d w w,fdirichlet F w w, + m,,..., w,f + m,f w,f + m,f w,f + m,f d w. (7) Finally, we combine (), (), (3), and (7) to obtain the unnormalized robability distribution: y i f z i, y i, z i, ` / z i y i f,z i, y i y i f z i, y i + m,f K u + m,f P f w,f + m,f w,f + m,f.. Lielihoods In this subsection, we derive the conditional lielihoods of the osterior samles conditioned on the observed geograhical coordinates. We use these conditional lielihoods to determine the samler s convergence. We resent the derivations in multile lemmas and combine them in a theorem at the end of the subsection. Let denote the gamma function. Lemma 3. The marginal robability of the categorical random variable y is (y) W w FQ f FP f w,f ( w,f ) FQ f FP f, where the counts m,f are defined in the aendix. Proof. Let (,..., W ) denote the collection of random variables for all weehours. Below, we will augment the marginal robability with, and then factorize it based on the conditional 6

indeendence assumtions made by our model: (y) (y ) ( )d 0 @ W (y j ) A ( w ) d jm,, 0 W @ w jm, 0 W @ ( w ) w W 0 @ ( w ) w w (y j w) A jm, jm, W ( w ) d w (y j w) A d W 0 @Dirichlet F ( w w) w (y j w) A d w jm, Categorical (y j w) A d w. (8) Now, we substitute the robabilities in (8) with Dirichlet and categorical distributions, which are defined in more detail in the aendix: W 0 (y) @Dirichlet F ( w w) Categorical (y j w) A d w w W 0 @ F B ( w w ) f W 0 @ F B ( w ) w W B ( w ) B w W w FQ f FP f w,f ( w,f ) f w,f w,f w,f w,f jm, 0 F A @ +m,f f m,f w,f A d w w, + m,,..., w,f + m,f FQ f FP f. A d w Lemma 4. The conditional robability of the categorical random variable z conditioned on y is U F ( K u ) Ku Q + m,f (z y), u ( ) Ku K u + m,f f where the counts m,f and m,f are defined in the aendix. n o Proof. Let f u u {,...,U},f {,...,F} denote the collection of random variables for all users and comonents. Below, we will augment the conditional robability with, and then 7

factorize it based on the conditional indeendence assumtions made by our model: (z y) (z y, ) ( y) d 0 @ (z j y, ) A ( ) d jm,, 0 U @ U F u f U u f u f U u f 0 F @ F 0 @ jm,f f u f u u 0 U z j y j, f A @ jm,f jm,f F 0 @Dirichlet Ku f u F u f u z j y j, f A d u z j y j, f A d f u jm,f u f A d u Categorical z j f A d f u. (9) Now, we substitute the robabilities in (9) with Dirichlet and categorical distributions, which are defined in more detail in the aendix: U F 0 (z y) @Dirichlet Ku u f u Categorical z j f A d f u u f U F u f U F u f U u f U u f B ( ) B ( ) K u K u f u, f u, jm,f +m,f K u F B ( ) B + m,f,..., + m Ku,f F ( K u ) Ku Q ( ) Ku + m,f. K u + m,f f u, d u f m,f d u f denote the bivariate gamma function, and let denote the determi- For our final derivation, let nant. Lemma 5. The conditional robability of the observed locations ` conditioned on z and y is U K u u (` z, y) u m, v u u v u u. The arameters v u, u, and u aendix. are defined in the roof, and the counts m, are defined in the 8

Proof. We will factorize the robability using the conditional indeendence assumtions made by the model, and then simlify the resulting robabilities by integrating out the means and covariances associated with the lace clusters: (` z, y) (` z) U K u u u U K u u U K u u U K u u jm, N `M, z, `M z, u, u u, u d u d u u, u N `j jm, u µ u, u u `j z j, IW u, u d u d u. u, u d u d u u,v We aly Equation 66 from [], which describes the conjugacy roerties of Gaussian distributions, to reformulate (0) into its final form: U K u (` z, y) N u µ u, u u IW u,v N `j U K u u m, v u u u The definitions for v u, u, and u are rovided in (5). v u u. jm, (0) u, u d u d u Finally, we combine Lemmas 3, 4, and 5 to rovide the log-lielihood of the samles z and y conditioned on the observations `. Lemma 6. The log-lielihood of the samles z and y conditioned on the observations ` is 0 WX log (z, y `) @ 0 UX + @ + FX log w f FX u f UX XK u u where C denotes the constant terms. log v u log A K u + m,f K u + m, log X log + m,f A v u log u log u + C, Proof. The result follows by multilying the robabilities stated in Lemmas 3, 4, and 5, and alying the logarithm function..3 Parameter estimation In Subsection., we described a collased Gibbs samler for samling the osteriors of the categorical random variables. Below, Lemmas 7, 8, and 9 show how these samles, denoted as y and z, can be used to aroximate the osterior exectations of,,, and. 9

Lemma 7. The exectation of samles is given the observed geograhical coordinates and the osterior w,f E [ w,f y, z, `] P where the counts m,f are defined in the aendix. Proof. ( w y, z, `) ) w,f E [ w,f y, z, `] f w y M, y M, w Q ( w ) y M, jm, y M, ( w ) (y j w) / Dirichlet F ( w w) jm,, Categorical (y j w) Dirichlet F w w, + m,,..., w,f + m,f P f. Lemma 8. The exectation of given the observed geograhical coordinates and the osterior samles is h f u, E f u, y, z, `i where the counts m,f and m,f are defined in the aendix. + m,f K u + m,f, Proof. f u y, z, ` ) h f u, E f u, y, z, `i f u y, z f u z, y M,f z M,f f u, y z y M,f Q f u jm,f f u y z j f u, y z y M,f / Dirichlet Ku f u jm,f Categorical z j f u Dirichlet Ku f u + m,f,..., + m Ku,f + m,f K u + m,f. 0

Lemma 9. The exectations of osterior samles is and and given the observed geograhical coordinates and the h i u E u y, z, ` µ u h i u E u y, z, ` u v u 3. Parameters µ u, u, and v u are defined in the roof of Lemma. Proof. u, u y, z, ` ) u, u y, z, ` ) u y, z, ` ) h i u E u y, z, ` ) u y, z, ` ) h i u E u y, z, ` u, u z, ` u, u z,, `M `M, Q jm, N / N N tv u µ u IW `j u µ u, u u u v u 3. u, u, z u, u z, `M z u, u, z u, u, `M IW u µ u, u u u µ u, u u z IW IW u µ u u, u (vu ) u u, v u u,v, `M z u,v u u, v u Q jm, N jm, N `j `j u, u u, u Aendix. Miscellaneous notation Throughout the aer, we use various notations to reresent sets of indices and their cardinalities. Vectors y and z denote the comonent and lace assignments in CPM, resectively. Each vector entry is identified by a tule index (u, w, n), where u {,...,U} is a user, w {,...,W} is a weehour, and n {,...,N u,w } is an iteration index. For the subsequent notations, we assume that the random variables y and z are already samled. We refer to a subset of indices using

M 0,f0 u 0,w 0 {(u, w, n) z u,w,n 0,y u,w,n f 0,u u 0,w w 0 }, where u 0 denotes the user, w 0 denotes the weehour, 0 denotes the lace, and f 0 denotes the comonent. If we want the subset of indices to be unrestricted with resect to a category, we use the laceholder. For examle, has no constraints with resect to laces. M,f0 u 0,w 0 {(u, w, n) y u,w,n f 0,u u 0,w w 0 } Given a subset of indices denoted by M, the lowercase m M denotes its cardinality. For examle, given a set of indices its cardinality is M,f0 u 0, {(u, w, n) y u,w,n f 0,u u 0 }, m,f0 u 0, M,f0 u 0,. For the collased Gibbs samler, the sets of indices and cardinalities used in the derivations exclude the index that will be samled. We use to modify sets or cardinalities for this exclusion. Let (u, w, n) denote the index that will be samled, then given an index set M, let M M {(u, w, n)} reresent the excluding set and let m M reresent the corresonding cardinality. For examle, and M,f0 u 0, M,f0 u 0, m,f0 u 0, {(u, w, n)},f0 M u 0,. In the roof of Lemma, arameters ṽ u, µu, u, and u are defined using cardinalities that exclude the current index (u, w, n). Similarly, in the roof of Lemma 9, arameters µ u, u, and v u are defined lie their wiggly versions, but the counts used in their definitions do not exclude the current index. We define additional notation to reresent the sufficient statistics used by the learning algorithm. Let i (u, w, n) denote an observation index. Then, S u X im, denotes the sum of the observed coordinates that have been assigned to user u and lace. Similarly, P u X `i`ti im, denotes the sum of the outer roducts of the observed coordinates that have been assigned to user u and lace.. Probability distributions `i Let denote a bivariate gamma function, defined as (a) a + j. j Let >and let R be a ositive definite scale matrix. The inverse-wishart distribution, which is the conjugate rior to the multivariate normal distribution, is defined as IW (, ) 3 ex tr.

Let R be a ositive definite covariance matrix and let µ R denote a mean vector. The multivariate normal distribution is defined as N (` µ, ) ( ) ex (` µ)t (` µ). Let >and let R, then the -dimensional t-distribution is defined as t v (x µ, ) + (x µ)t (x µ) + Let K> be the number of categories and let (,..., K ) be the concentration arameters, where > 0 for all {,...,K}. Then, the K-dimensional Dirichlet distribution, which is the conjugate rior to the categorical distribution, is defined as Dirichlet K (x ) K B ( ) x, where KQ ( ) B ( ) KP. We abuse the Dirichlet notation slightly and use it to define the K-dimensional symmetric Dirichlet distribution as well. Let >0 be a scalar concentration arameter. Then, the symmetric Dirichlet distribution is defined as where References for all {,...,K}. Dirichlet K (x )Dirichlet K (x,..., K ), [] Thomas L. Griffiths and Mar Steyvers. Finding scientific toics. Proceedings of the National Academy of Sciences of the United States of America, 0(Sul ):58 535, Aril 004. [] Kevin Murhy. Conjugate bayesian analysis of the gaussian distribution. October 007.. 3