Model Based Clustering of Count Processes Data

Similar documents
ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian Process Regression

Gaussian Process Regression Model in Spatial Logistic Regression

STAT 518 Intro Student Presentation

Nonparametric Bayesian Methods (Gaussian Processes)

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Bayesian Methods for Machine Learning

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Nonparametric Bayesian Methods - Lecture I

Neutron inverse kinetics via Gaussian Processes

Nonparameteric Regression:

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

STA 4273H: Statistical Machine Learning

Gaussian Processes in Machine Learning

Density Estimation. Seungjin Choi

Introduction to Gaussian Process

Curves clustering with approximation of the density of functional random variables

A Process over all Stationary Covariance Kernels

Probabilistic Models for Learning Data Representations. Andreas Damianou

Multistate Modeling and Applications

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Introduction to Gaussian Processes

GAUSSIAN PROCESS REGRESSION

State Space Gaussian Processes with Non-Gaussian Likelihoods

Model Selection for Gaussian Processes

Introduction to Machine Learning

System identification and control with (deep) Gaussian processes. Andreas Damianou

Reliability Monitoring Using Log Gaussian Process Regression

Lecture 16 Deep Neural Generative Models

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

The Expectation-Maximization Algorithm

Prediction of double gene knockout measurements

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

GWAS V: Gaussian processes

Introduction to emulators - the what, the when, the why

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Machine Learning Techniques for Computer Vision

Gaussian Processes (10/16/13)

Bayesian nonparametric models for bipartite graphs

An Introduction to Gaussian Processes for Spatial Data (Predictions!)

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Machine Learning for Data Science (CS4786) Lecture 12

Factor Analysis (10/2/13)

Non-Parametric Bayes

Linear Dynamical Systems (Kalman filter)

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Lecture 5: GPs and Streaming regression

Unsupervised Learning

Probabilistic & Unsupervised Learning

Normalising constants and maximum likelihood inference

The Variational Gaussian Approximation Revisited

Bayesian Machine Learning

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

An Introduction to Statistical and Probabilistic Linear Models

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Introduction. Chapter 1

STA 414/2104: Lecture 8

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Maximum Likelihood Estimation. only training data is available to design a classifier

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities

CPSC 540: Machine Learning

Modeling human function learning with Gaussian processes

Variational Model Selection for Sparse Gaussian Process Regression

Bayesian Support Vector Machines for Feature Ranking and Selection

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Gaussian Process Regression Forecasting of Computer Network Conditions

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Learning Gaussian Process Models from Uncertain Data

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Latent Variable Models and EM Algorithm

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Gaussian processes for inference in stochastic differential equations

Latent Variable Models and EM algorithm

University of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

CS Lecture 19. Exponential Families & Expectation Propagation

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

STA 4273H: Statistical Machine Learning

Advanced Introduction to Machine Learning CMU-10715

Probability and Information Theory. Sargur N. Srihari

Transcription:

Model Based Clustering of Count Processes Data Tin Lok James Ng, Brendan Murphy Insight Centre for Data Analytics School of Mathematics and Statistics May 15, 2017 Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 1 / 25

Background: Model Based Clustering Sample observations arise from a distribution that is a mixture of two or more components. Each component is described by a probability density function and an associated probability in the mixture. Commonly used to model p-dimensional observations x 1,, x n, e.g. mixture of G components, each of which is multivariate normal with density f g (x µ g, Σ g ), g = 1,, G. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 2 / 25

Background: Model Based Clustering Majority of research in model based clustering focus on sample observations consist of multivariate data taking values in R p with p 1 (Fraley and Raftery, 2002; Raftery and Dean, 2006). Recently, clustering techniques for functional data have been proposed where the observation is a stochastic process taking values in an infinite dimensional space (Abraham et al., 2003; Giacofci et al., 2013). Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 3 / 25

Model Based Clustering for Count Processes In various applications, one typically observes a collection of count processes data, where each count process consists of event times observed over a fixed time interval. Some examples are occurrence of natural disasters in various locations, observed times of failure of multiple light bulbs, temporal sequence of action potentials generated by neurons. The area of model based clustering techniques for count processes data are under-developed. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 4 / 25

Model Based Clustering for Count Processes We assume a collection of N count processes data x = {x i } N i=1 observed in a fixed interval [0, T ) R. Each observation x i = {x i,1, x i,2,, x i,ni } consists of a set of event times with x i,1 x i,2 x i,ni for i = 1,, N. We assume a total of G mixture components and each observation arises from one of the G components. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 5 / 25

Model Specification: Poisson Process Non-homogeneous Poisson Process (NHPP) is a popular method in various applications where count process data are encountered. A temporal NHPP defined on [0, T ) is associated with a non-negative and locally integrable intensity function λ(x) : [0, T ) R +. For any bounded region B [0, T ), the number of events N(B) follows a Poisson distribution with rate Λ(B) = B λ(x)dx. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 6 / 25

Model Specification: Gaussian Process A random scalar function g(s) : S R is said to have a Gaussian process prior, if for any finite collection of points s 1,, s N S, the function values g(s 1 ),, g(s N ) follow a multivariate Gaussian distribution. The Gaussian process prior can be defined by a mean function m(.) : S R and a covariance function k(.,.) : S S R which further depend on hyper-paramters φ (Rasmussen and Williams, 2006). Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 7 / 25

Model Specification: Gaussian Cox Process The combination of a Poisson process and a Gaussian process prior is known as a Gaussian Cox process. It provides an attractive modeling framework to infer the underlying intensity function since one only needs to specify the form of the mean and covariance functions. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 8 / 25

Model Specification: Mixture of Poisson Processes We assume each count process observation x i is governed by one of the G underlying normalized intensity functions {λ g } G g=1, where λ g (x) : [0, T ) R +, T 0 λ g (x)dx = 1. We let τ g, g = 1,, G, be the probability that a count process observation is associated with intensity function λ g. We introduce a scaling factor α i for each count process observation so that x i NHPP(α i λ g ) with probability τ g. We employ the logistic density transform to the intensity functions {λ g } G g=1, λ g (x) = exp(f g (x)) T 0 exp(f g (s))ds (1) Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 9 / 25

Model Specification: Mixture of Poisson Processes A zero-mean Gaussian process prior GP(0, k(x, x )) is assigned to the random functions f = {f g } G g=1. The one dimensional squared-exponential covariance function is assumed which has the following form k(x, x ) = σ 2 exp( 1 2l 2 (x x ) 2 ) (2) with hyperparameters (σ, l) where l determines the smoothness of the covariance function. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 10 / 25

Model Specification: Mixture of Poisson Processes To make inference more tractable, we further discretise the interval [0, T ) into m equally spaced sub-intervals, ) and let {t k } m k=1 be the mid points of the sub-intervals, k = 1,, m. [ T (k 1) m, Tk m We assume that the random functions {f g } G g=1 and hence the intensity functions {λ g } G g=1 are piecewise constant on each interval. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 11 / 25

Inference Given observed event times {x i } of N count processes, our objective is to group them into G clusters while simulateneously estimate the intensity functions of the G Poisson processes. Let θ denote the set of parameters, the penalised likelihood function is given by L(θ; x) = N ( G n i τ g exp( α i ) i=1 g=1 j=1 )( G λ g (x i,j ) g=1 ) p(f g ) (3) where p(.) is the density associate with GP(0, k(x, x )) prior. In non-bayesian setting, the Gaussian process prior imposed on random functions f g can be understood as penalty terms. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 12 / 25

EM Algorithm We let z = {z i } n i=1 denote the latent assignment of intensity functions to count processes, where z i = (z i,1,, z i,g ) T and { 1 if count process i has intensity function α i λ g z i,g = 0 otherwise We can write the complete data likelihood function as L(θ; x, z) = N i=1 g=1 G ( n i ) zig ( G τ g exp( α i ) (α i λ g (x i,j )) j=1 g=1 ) p(f g ) (4) Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 13 / 25

EM Algorithm For the E-step, we need to compute the probabilities of cluster assignment for each count process, π g i Pr(Z i,g = 1 x i, θ (t) ) = τ g (t) G h=1 τ (t) h exp( α (t) ) n i for i = 1,, N and g = 1,, G. i exp( α (t) i ) n i j=1 (α(t) i λ g (t) (x i,j )) j=1 (α(t) i λ (t) h (x i,j)) Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 14 / 25

EM Algorithm For the M-step, we need to update the model parameters {τ g } G g=1, {α i } N i=1 and {f g } G g=1 by expected complete data likelihood function. The updates for {α i } N i=1 and {τ g } G g=1 are straightforward. ˆα i = n i τˆ g = N i=1 πg i N No closed form updates available for {f g } G g=1. We employ Newton s method to obtain estimates for {f g } G g=1. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 15 / 25

Standard Error Estimates for Intensity Functions It is often desirable to obtain standard errors and construct confidence intervals for the estimated intensity functions {ˆλ g } G g=1. For each observation j and each intensity function λ g, we apply the Jackknife method to obtain the estimates ˆλ (j) g with the j th count process observation removed from the data. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 16 / 25

Standard Error Estimates for Intensity Functions For g = 1,, G and for t [0, T ), we let λ g (t) = 1 N N j=1 ˆλ (j) g (t) The esimtated variance of intensity function g at time t is given by Var jack (ˆλ g (t)) = N 1 N N j=1 (ˆλ (j) g (t) λ g (t)) 2 The confidence interval for estimated intensity function ˆλ g is given by (ˆλg (t k ) c Var jack (ˆλ g (t)), ˆλ ) g (t) + c Var jack (ˆλ g (t)) for some c > 0. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 17 / 25

Model Selection: Cross Validated Likelihood We want to determine the number of mixture components G and the hyperparameters (σ, l). The cross validated likelihood approach (Smyth, 2000) is used to choose the number of components and hyperparameters in the mixture model. For a fixed model, the method works by repetitively partitioning the data set into two sets, one of which is to train the model and obtain parameter estimates by maximising the log-likelihood and the other is for testing the model by evaluating its log-likelihood. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 18 / 25

Application: Washington Bike Sharing Scheme Data Set Information including the start date and time of a trip, the end date and time of a trip, start station and end stations are publicly available. 1. For each station in the bike sharing scheme, we consider the end time of a trip as an event. We fit the model to the data over the period of 18 March 2016 through 24 March 2016 with 362 active stations and a total of 62234 events. 1 Historical data available at https://www.capitalbikeshare.com/ Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 19 / 25

Application: Washington Bike Sharing Scheme Data Set Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 20 / 25

Application: Washington Bike Sharing Scheme Data Set Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 21 / 25

Reference Abraham, C., Cornillon, P. A., Matzner-Lø ber, E., and Molinari, N. (2003), Unsupervised curve clustering using B-splines, Scand. J. Statist., 30, 581 595. Eagle, N. and Pentland, A. (2006), Reality mining: sensing complex social systems, Journal Personal and Ubiquitous Computing, 10, 255 268. Fraley, C. and Raftery, A. E. (2002), Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc., 97, 611 631. Giacofci, M., Lambert-Lacroix, S., Marot, G., and Picard, F. (2013), Wavelet-based clustering for mixed-effects functional models in high dimension, Biometrics, 69, 31 40. Raftery, A. E. and Dean, N. (2006), Variable selection for model-based clustering, J. Amer. Statist. Assoc., 101, 168 178. Rasmussen, C. E. and Williams, C. K. I. (2006), Gaussian processes for machine learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA. Smyth, P. (2000), Model selection for probabilistic clustering using cross-validated likelihood, Statistics and Computing, 10, 63 72. Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 22 / 25

Application: Reality Mining Data Set The Reality Mining Dataset was collected in 2004 (Eagle and Pentland, 2006). The goal of this experiment was to explore the capabilities of the smart phones that enabled social scientists to investigate human interactions. 75 students or faculty in MIT Media Laboratory and 25 students at the MIT Slogan business school were the subjects of the experiement where the time of phone calls and SMS messages were recorded. For each subject in the study, we treat the time when an outgoing call was made as an event time. We focus on the core of 86 people who have made outgoing calls during the period between 24 September 2004 and 8 January 2005 2. 2 Data obtained from http://sociograph.blogspot.ie/2011/04/communication-networks-part-2-mit.html Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 23 / 25

Application: Reality Mining Data Set Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 24 / 25

Application: Reality Mining Data Set Figure: Jackknife Estimates of Standard Errors for Reality Mining Data set Tin Lok James Ng, Brendan Murphy (Insight) CASI May 15, 2017 25 / 25