Bayesian Nonparametric Models for Ranking Data

Size: px

Start display at page:

Download "Bayesian Nonparametric Models for Ranking Data"

Cody Mosley
5 years ago
Views:

1 Bayesian Nonparametric Models for Ranking Data François Caron 1, Yee Whye Teh 1 and Brendan Murphy 2 1 Dept of Statistics, University of Oxford, UK 2 School of Mathematical Sciences, University College Dublin, Ireland BNPSki, January 2014 Caron, Teh & Murphy 1 / 23

2 Ranked Data Rank these 5 movies by preference Rank your 5 favourite movies among these 30 Rank your 5 favourite movies Aim Bayesian nonparametric model to model top-m partial rankings of a potentially infinite number of items. Each item is modelled using a positive rating parameter that is inferred from partial rankings. Develop efficient computational procedure for posterior simulation. Mixture generalizations. Application to preferences in college programmes Caron, Teh & Murphy 2 / 23

3 Preferences for college degree programmes Application for college degree programmes in Ireland in applicants provided top-10 rankings 533 degree programmes available for selection Heterogenity in the data Table: Two samples from the preference data. Rank CAO code College Degree Programme 1 DN002 University College Dublin Medicine 2 GY501 NUI - Galway Medicine 3 CK701 University College Cork Medicine 4 DN006 University College Dublin Physiotherapy 5 TR053 Trinity College Dublin Physiotherapy 6 DN004 University College Dublin Radiotherapy 7 TR007 Trinity College Dublin Clinical Speech 8 FT223 Dublin IT Human Nutrition 9 TR084 Trinity College Dublin Social Work 10 DN007 University College Dublin Social Science Rank CAO code College Degree Programme 1 MI005 Mary Immaculate Limerick Education - Primary Teaching 2 CK301 University College Cork Law 3 CK105 University College Cork European Studies 4 CK107 University College Cork Language - French 5 CK101 University College Cork Arts Caron, Teh & Murphy 3 / 23

4 Parametric Plackett-Luce Model Population of M items X 1,..., X M. To each item X k assign a rating parameter w k > 0. Stage-wise interpretation: Pick first item with probabilities proportional to w k s. Remove first item. Pick second item with probabilities proportional to w k s. Remove second item.... The probability of a ranking (X ρ1,..., X ρm ), with ρ = (ρ 1,..., ρ M ) a permutation, is P (ρ w) = M i=1 w ρi M k=1 w k i 1 j=1 w ρ j [Luce, 1959, Plackett, 1975], [Gormley and Murphy, 2008, Gormley and Murphy, 2009] Caron, Teh & Murphy 4 / 23

5 Nonparametric Plackett-Luce Model Population of items X 1, X 2,... of infinite size. Each item X k assigned a rating parameter w k > 0. Stage-wise interpretation: Pick first item with probabilities proportional to w k s. Remove first item. Pick second item with probabilities proportional to w k s. Remove second item.... The probability of a (finite, partial) ranking X ρ1,..., X ρm P (ρ w) = m i=1 w ρi k=1 w k i 1 j=1 w ρ j is Caron, Teh & Murphy 5 / 23

6 Thurstonian Interpretation For each item X k define an exponential random variate: z k Exp(w k ) Induces a partial permutation ρ such that z ρ1 < < z ρm < A race among items: z k is the time that item X k finished the race. ρ is the order of champion, runner-up, second runner-up... The probability of a partial permutation is: P (ρ w) =P (z ρ1 < z ρ2 < < z ρm < everything else) m w ρi = k=1 w k i 1 j=1 w ρ j i=1 Caron, Teh & Murphy 6 / 23

7 Thurstonian Interpretation w k z k ρ 4 ρ 3 ρ items 15 items 15 ρ ρ ratings time P (ρ w) =P (z ρ1 < z ρ2 < < z ρm < everything else) m w ρi = k=1 w k i 1 j=1 w ρ j i=1 Caron, Teh & Murphy 7 / 23

8 Data Augmentation Reparametrize in terms of inter-arrival durations: Z 1 = z ρ1, Z 2 = z ρ2 z ρ1,... Z m = z ρm z ρm 1 w k z k ρ 4 ρ 3 ρ items 15 items 15 ρ ρ ratings time Z 1 Z 2 Z 3 Z 4 Z 5 Caron, Teh & Murphy 8 / 23

9 Data Augmentation Augmented system with auxiliary variables Z 1,..., Z m : i 1 Z i ρ, w, X Exp w k Joint probability is: P (ρ, Z w, X) = k=1 j=1 m w ρi exp w k i=1 What prior to use for w, X? k=1 Independent Gamma s for w k s do not work. Gamma process. Better: completely random measures. w ρj i 1 j=1 w ρj Z i Caron, Teh & Murphy 9 / 23

10 Completely Random Measures Lévy intensity λ(w). Base distribution H with density h(x). Random atomic measure G = w k δ Xk k=1 Construction: two-dimensional Poisson process N = {w k, X k } with intensity λ(w)h(x): [Kingman, 1967, Lijoi and Prünster, 2010] Caron, Teh & Murphy 10 / 23

11 Completely Random Measures Conditions on Lévy intensity: 0 0 λ(w)dw = infinitely many items. (1 e w )λ(w)dw < finite total w k. k=1 Plackett-Luce sampling without replacement: Pick first item with probabilities proportional to w k s; remove. Pick second item with probabilities proportional to w k s; remove;... Size-biased sampling of the atoms in G. Caron, Teh & Murphy 11 / 23

12 Prior Draws Generalized Gamma process with λ(w) = α Γ(1 σ) w σ 1 e τ w, τ = (a) α = 0.1, σ = (b) α = 1, σ = (c) α = 3, σ = (d) α = 1, σ = 0.1 (e) α = 1, σ = 0.5 (f) α = 1, σ = 0.9 Caron, Teh & Murphy 12 / 23

13 Posterior Characterization Observe L partial rankings Y l = (Y l1,..., Y lml ) for l = 1,..., L. X1,..., X K are the K unique items among observations Associated auxiliary variables Z l1,..., Z lml. Theorem The posterior distribution given partial rankings Y and auxiliary variables Z is a CRM with fixed atoms: G Y, Z = G + K wk δ Xk k=1 where G and w 1,..., w K are mutually independent. The law of G is still CRM with Lévy intensity λ (w) = λ(w)e w( li Z li) while the masses have distributions, P (w k Y, Z) (w k )n ke w k ( li δ likz li ) λ(w k ). Characterization similar to that for normalized random measures. [Prünster, 2002, James, 2002, James et al., 2009] Caron, Teh & Murphy 13 / 23

14 Bayesian Inference via Gibbs Sampling Rating parameters wk of observed items. Latent variables Z li. CRM G containing unobserved items. Marginalize out G, keeping only its total mass w. Easy Gibbs sampler for generalized gamma process class of CRM Z li rest Exponential wk rest Gamma w rest Exponentially tilted stable Caron, Teh & Murphy 14 / 23

15 Nonparametric Plackett-Luce Mixture Model π Partial rankings reflecting the preferences of a heterogenous population. π GEM(θ) c l π Discrete(π) Y l c l, G cl PL(G cl ) c ρ G k C k Gamma(αH + C k, τ + φ) C k G 0 Poisson(φG 0 ) G 0 Gamma(αH, τ ) G k C k G k Gamma(αH, τ ) does not work. Caron, Teh & Murphy 15 / 23 G 0

16 Irish University Programme Applications Irish university applicants. Each applicant ranks their top-10 desired university programmes. Point estimate of the partition ĉ = arg min c (i) {c (1),...,c (N) } k l (δ (i) c k c(i) l where the coclustering matrix ζ is obtained with N ζ kl ) 2 ζ kl = 1 N i=1 δ (i) c k c(i) l c (i), i = 1,..., N are the Monte Carlo samples and δ kl = 1 if k = l, 0 otherwise. Caron, Teh & Murphy 16 / 23

17 Irish University Programme Applications Table: Description of the different clusters. The size of the clusters, the entropy and a cluster description are provided. Cluster Size Entropy Description Cluster Size Entropy Description Social Science/Tourism Engineering Science Teaching/Arts Business/Commerce Art/Music - Dublin Arts Engineering - Dublin Business/Marketing - Dublin Medicine Construction Arts/Religion/Theology CS - outside Dublin Arts/History - Dublin CS - Dublin Galway Arts/Social - outside Dublin Limerick Business/Finance - Dublin Law Arts/Psychology - Dublin Business - Dublin Cork Arts/Bus. - Dublin Comm./Journalism - Dublin Mixed Caron, Teh & Murphy 17 / 23

18 Irish University Programme Applications Table: Cluster 7: Computer Science - outside Dublin Rank Aver. Norm. Weight College Degree Programme Cork IT Computer Applications Limerick IT Software Development University of Limerick Computer Systems Waterford IT Applied Computing Cork IT Software Dev & Comp Net IT Carlow Computer Networking Athlone IT Computer and Software Engineering University College Cork Computer Science Dublin City University Computer Applications University of Limerick Information Technology Table: Cluster 8: Computer Science - Dublin Rank Aver. Norm. Weight College Degree Programme Dublin City University Computer Applications University College Dublin Computer Science NUI - Maynooth Computer Science Dublin IT Computer Science National College of Ireland Software Systems Dublin IT Business Info. Systems Dev Trinity College Dublin Computer Science Dublin IT Applied Sciences/Computing Trinity College Dublin Information & Comm. Tech University College Dublin B.A. (Computer Science) Caron, Teh & Murphy 18 / 23

19 Irish University Programme Applications Table: Cluster 12: Cork Rank Aver. Norm. Weight College Degree Programme University College Cork Arts University College Cork Computer Science University College Cork Commerce University College Cork Business Information Systems Cork IT Computer Applications Cork IT Software Dev & Comp Net University College Cork Finance University College Cork Law University College Cork Accounting University College Cork Biological and Chemical Sciences Caron, Teh & Murphy 19 / 23

20 Irish University Programme Applications Caron, Teh & Murphy 20 / 23

21 Summary A Bayesian nonparametric Plackett-Luce model for partial rankings. Completely random measures and posterior characterization. Easy Gibbs sampling for posterior simulation. Mixture generalization. Future: Dependent general CRM models Caron, Teh & Murphy 21 / 23

22 Bibliography I Caron, F., Teh, Y. W., and Murphy, B. (2013). Bayesian nonparametric Plackett-Luce models for the analysis of preferences for college degree programmes. To appear in Annals of Applied Statistics. Gormley, I. C. and Murphy, T. B. (2008). Exploring voting blocs with the Irish electorate: a mixture modeling approach. Journal of the American Statistical Association, 103(483): Gormley, I. C. and Murphy, T. B. (2009). A grade of membership model for rank data. Bayesian Analysis, 4(2): James, L., Lijoi, A., and Prünster, I. (2009). Posterior analysis for normalized random measures with independent increments. Scandinavian Journal of Statistics, 36(1): James, L. F. (2002). Poisson process partition calculus with applications to exchangeable models and bayesian nonparametrics. arxiv preprint math/ Caron, Teh & Murphy 22 / 23

23 Bibliography II Kingman, J. F. C. (1967). Completely random measures. Pacific Journal of Mathematics, 21(1): Lijoi, A. and Prünster, I. (2010). Models beyond the Dirichlet process. In Hjort, N. L., C. Holmes, P. M., and Walker, S. G., editors, Bayesian Nonparametrics. Cambridge University Press. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. Wiley. Plackett, R. (1975). The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2): Prünster, I. (2002). Random probability measures derived from increasing additive processes and their application to Bayesian statistics. PhD thesis, University of Pavia. Caron, Teh & Murphy 23 / 23

Bayesian nonparametric models for bipartite graphs

Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers