Preferences in college applications

Similar documents
Bayesian Nonparametric Models for Ranking Data

Bayesian Nonparametrics: Dirichlet Process

Uptake of IGCSE subjects 2012

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

(Bayesian) Statistics with Rankings

Bayesian Nonparametric Regression for Diabetes Deaths

CSC 2541: Bayesian Methods for Machine Learning

Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM)

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Bayesian non parametric approaches: an introduction

the long tau-path for detecting monotone association in an unspecified subpopulation

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Lecture 16: Mixtures of Generalized Linear Models

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

Distribution of GCSE Grades Summer 2016

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Bayesian shrinkage approach in variable selection for mixed

Gentle Introduction to Infinite Gaussian Mixture Modeling

Non-Parametric Bayes

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Norm Referenced Test (NRT)

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Lesson 5.4: The Normal Distribution, page 251

Bayesian Nonparametrics: Models Based on the Dirichlet Process

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

Continuous Probability Distributions

Examinations Booklet

Section 4. Test-Level Analyses

Gibbs Sampling in Latent Variable Models #1

Collaborative Place Models Supplement 2

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

A semiparametric Bayesian model for examiner agreement in periodontal research

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Gibbs Sampling in Linear Models #2

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Bayesian Methods for Machine Learning

Independent Samples t tests. Background for Independent Samples t test

Hyperparameter estimation in Dirichlet process mixture models

Table of Contents. Enrollment. Introduction

Bayesian nonparametric models for bipartite graphs

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Probabilistic Time Series Classification

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Construction of Dependent Dirichlet Processes based on Poisson Processes

4. Nonlinear regression functions

Présentation des travaux en statistiques bayésiennes

Non-parametric Clustering with Dirichlet Processes

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Efficient Bayesian Multivariate Surface Regression

EQ: What is a normal distribution?

Scaling Neighbourhood Methods

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Model Based Clustering of Count Processes Data

Simulating Random Variables

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

An Introduction to Bayesian Machine Learning

Mark Scheme (Results) Summer Pearson Edexcel GCE in Statistics 3 (6691/01)

Acoustic Unit Discovery (AUD) Models. Leda Sarı

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Estimating a Piecewise Growth Model with Longitudinal Data that Contains Individual Mobility across Clusters

Bayesian nonparametrics

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

Infinite latent feature models and the Indian Buffet Process

Time-Sensitive Dirichlet Process Mixture Models

FREQUENCY DISTRIBUTIONS AND PERCENTILES

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Expectation Maximization

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Infinite Mixtures of Trees

Tables and Figures. This draft, July 2, 2007

ST 740: Multiparameter Inference

Jun Tu. Department of Geography and Anthropology Kennesaw State University

Inference for a Population Proportion

AP Human Geography Syllabus

Outline lecture 6 2(35)

Conditional Independence and Factorization

Robust Bayesian Simple Linear Regression

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

C) Discuss two factors that are contributing to the rapid geographical shifts in urbanization on a global scale.

CTDL-Positive Stable Frailty Model

Bayesian Estimation with Sparse Grids

A Process over all Stationary Covariance Kernels

CS-E3210 Machine Learning: Basic Principles

NONPARAMETRIC BAYESIAN DENSITY ESTIMATION ON A CIRCLE

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

CPSC 540: Machine Learning

Transcription:

Preferences in college applications A non-parametric Bayesian analysis of top-10 rankings Alnur Ali 1 Thomas Brendan Murphy 2 Marina Meilă 3 Harr Chen 4 1 Microsoft 2 University College Dublin 3 University of Washington 4 Massachusetts Institute of Technology

Outline College Applications Goals Dataset Data Coding Generalized Mallow s models Dirichlet process mixture models Gibbs sampler Findings General properties Overall trends Conclusions

College Applications Irish college applicants apply through a central system administered by the College Applications Office (CAO) Applicants list up to ten degree courses in order of preference Applicants are awarded points on the basis of their Leaving Certificate results; these determine course entry

Goals It has been postulated that a number of factors influence course choices: Institution & Location Degree subject Degree type (Specific vs General) Points Requirement Gender 500 450 points 400 350 Do points requirements influence ranks? 300 1 2 3 4 5 6 7 8 9 10 rank

Dataset We study the cohort of applicants to degree courses from the year 2000 The applications data has the following properties: There were 55737 applicants; They selected from a list of 533 courses; Applicants selected up to 10 courses

Data Coding The data coding (s 1, s 2,, s t ) of π σ is defined by s j + 1 = rank of π 1 (j) in σ after removing π 1 (1 : j 1) Example, if σ = [a b c d] and π = [c a b d] σ π 1 (1) = c s 1 = 2 a b c d π 1 (2) = a s 2 = 0 a b d π 1 (3) = b s 3 = 0 b d π 1 (4) = d s 4 = 0 d Kendall s distance is d Kendall (π, σ) = t 1 j=1 s j

Generalized Mallow s models Mallow s model assumes that P(π σ, θ) = 1 t 1 ψ(θ) exp θ s j (π σ) Can extend Mallow s model to allow for varying precision in ranking P(π σ, θ) = 1 t 1 ψ( θ) exp θ j s j (π σ) Location parameter σ, scale parameters (θ 1,, θ max t 1 ) ψ( θ) is a tractable normalization constant j=1 j=1

Dirichlet process mixture models α p ci πi G0 σc, θc K p Dirichlet(α/K,, α/k) c i Multinomial(p 1,, p K ) σ c, θ c G 0 P 0 (σ, θ; ν, r) π i GM(π i σ c, θ c ) N Prior: conjugate to GM, informative wrt θ DPMM benefits: no need to specify K upfront, identifies both large and small clusters

Gibbs sampler 1 Resample cluster assignments: 11 Draw existing cluster wp N c 1 N+α 1 GM(π σ c, θ c ) or Beta function approximation 12 Draw new cluster wp α (n t)! N+α 1 n! 2 Resample cluster parameters: 21 Draw θ c by slice sampling or a Beta distribution approx 22 Draw σ c stage-wise or by a Beta function approx Beta approx based sampler (Beta-Gibbs) faster than slice based sampler (Slice-Gibbs) (per iteration & overall time to convergence)

General properties of the clusterings The DPMM found 164 clusters Thirty three of these clusters had nine or more members 10 3 clust size 10 2 10 1 0 5 10 15 20 25 30 cluster The clusters were characterized by a number of features Cluster Size Description Male (%) Points Average (SD) 1 4536 CS & Engineering 772 369 (41) 2 4340 Applied Business 485 366 (40) 3 4077 Arts & Social Science 131 384 (42) 4 3898 Engineering (Ex-Dublin) 852 374 (39) 5 3814 Business (Ex-Dublin) 418 394 (32) 6 3106 Cork Based 489 397 (33) 33 9 Teaching (Home Economics) 00 417 (4)

Precision The precision parameters (θ j ) were very high for top rankings rank j 1 2 3 4 5 6 7 8 9 10 5 10 15 20 25 30 cluster The θ j values tended to decrease with j In many cases, the θ j values dropped suddenly after a particular point The central ranking σ for each cluster is of length 533; the θ j values suggested a point to truncate the ranking 4 35 3 25 2 15 1 05 0

Overall trends Subject Subject matter is a key determinant of course choice The courses chosen are similar in subject area Some opt for general degrees (eg Science) and others opt for specific (eg Chemical Engineering) Gender There is quite a difference in the percentage male/female applicants in some clusters Males tend to dominate CS/Engineering clusters Females tend to dominate social science/education clusters Geography There is evidence of the college location influencing choice The sixth largest cluster is dominated by courses from colleges in Cork (CIT and UCC) There is evidence of a mix of subject matter and geography having a joint effect; the fourth largest cluster is dominated by engineering courses outside Dublin

Overall trends Subject Subject matter is a key determinant of course choice The courses chosen are similar in subject area Some opt for general degrees (eg Science) and others opt for specific (eg Chemical Engineering) Gender There is quite a difference in the percentage male/female applicants in some clusters Males tend to dominate CS/Engineering clusters Females tend to dominate social science/education clusters Geography There is evidence of the college location influencing choice The sixth largest cluster is dominated by courses from colleges in Cork (CIT and UCC) There is evidence of a mix of subject matter and geography having a joint effect; the fourth largest cluster is dominated by engineering courses outside Dublin

Overall trends Subject Subject matter is a key determinant of course choice The courses chosen are similar in subject area Some opt for general degrees (eg Science) and others opt for specific (eg Chemical Engineering) Gender There is quite a difference in the percentage male/female applicants in some clusters Males tend to dominate CS/Engineering clusters Females tend to dominate social science/education clusters Geography There is evidence of the college location influencing choice The sixth largest cluster is dominated by courses from colleges in Cork (CIT and UCC) There is evidence of a mix of subject matter and geography having a joint effect; the fourth largest cluster is dominated by engineering courses outside Dublin

Points The points requirements for the courses in the truncated central rankings were not monotonically decreasing in any cluster points 2 4 rank j 6 8 413 10 12 200 5 10 15 20 25 30 cluster This suggests that points requirements are not important when students are ranking courses

Conclusions & Lessons Learned The CAO system appears to be working more effectively than many suggest The clusters revealed in this analysis tend to be cohesive in subject matter The focus of possible improvements to the CAO system might be directed at how points are scored The Generalized Mallows DPMM facilitated discovering small clusters that were missed in previous analyses The model also allowed for the study of precision in rankings within clusters

Questions? Thanks!