Bayesian non-parametric model to longitudinally predict churn

Size: px
Start display at page:

Download "Bayesian non-parametric model to longitudinally predict churn"

Transcription

1 Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics Rome, 25 November 2014

2 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

3 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

4 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

5 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

6 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

7 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

8 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

9 Churn analysis A typical problem for many companies with quite a large customer base is the evaluation of customer loyalty which customers are most likely to abandon the company? These customers are often described as being churners. This problem is prominent in sectors in which customers have ongoing relationships with companies, i.e. services companies: banks, insurance companies, telecommunications services, etc.. Good models are needed for predicting deactivation (churn) by their customers, to be able to carry out appropriate retention actions later on. A model is needed not only to fit the data and predict future data, but also, possibly, to indicate marketing actions, e.g., customer retention strategies

10 Churn analysis Goal: find for each customer a score of propensity to churn Understand which variables have effect on the customer decision to churn and measure this effect It is more important to understand effects than accuracy in prediction Typically a data mining model is fitted to a random sample of customer base data In this paper we consider the prediction of churn for the customer base of a telecommunication company

11 Churn analysis Goal: find for each customer a score of propensity to churn Understand which variables have effect on the customer decision to churn and measure this effect It is more important to understand effects than accuracy in prediction Typically a data mining model is fitted to a random sample of customer base data In this paper we consider the prediction of churn for the customer base of a telecommunication company

12 Churn analysis Goal: find for each customer a score of propensity to churn Understand which variables have effect on the customer decision to churn and measure this effect It is more important to understand effects than accuracy in prediction Typically a data mining model is fitted to a random sample of customer base data In this paper we consider the prediction of churn for the customer base of a telecommunication company

13 Churn analysis Goal: find for each customer a score of propensity to churn Understand which variables have effect on the customer decision to churn and measure this effect It is more important to understand effects than accuracy in prediction Typically a data mining model is fitted to a random sample of customer base data In this paper we consider the prediction of churn for the customer base of a telecommunication company

14 Churn analysis Goal: find for each customer a score of propensity to churn Understand which variables have effect on the customer decision to churn and measure this effect It is more important to understand effects than accuracy in prediction Typically a data mining model is fitted to a random sample of customer base data In this paper we consider the prediction of churn for the customer base of a telecommunication company

15 Sources socio demographic data subscription data usage & network data call center data (calls, complains, billing problems)

16 Longitudinal data In service companies data are often collected in different time intants. For example monthly telephone traffic is considered an important predictor for churn These are longitudinal data that are rarely considered (in this form) to predict churn (only sort of index number are typically used). A possibility to handle this type of data consists in considering traffic as a functional data We can use tools to analyse the relationship between a functional predictor and a binary response

17 Longitudinal data In service companies data are often collected in different time intants. For example monthly telephone traffic is considered an important predictor for churn These are longitudinal data that are rarely considered (in this form) to predict churn (only sort of index number are typically used). A possibility to handle this type of data consists in considering traffic as a functional data We can use tools to analyse the relationship between a functional predictor and a binary response

18 Longitudinal data In service companies data are often collected in different time intants. For example monthly telephone traffic is considered an important predictor for churn These are longitudinal data that are rarely considered (in this form) to predict churn (only sort of index number are typically used). A possibility to handle this type of data consists in considering traffic as a functional data We can use tools to analyse the relationship between a functional predictor and a binary response

19 Longitudinal data In service companies data are often collected in different time intants. For example monthly telephone traffic is considered an important predictor for churn These are longitudinal data that are rarely considered (in this form) to predict churn (only sort of index number are typically used). A possibility to handle this type of data consists in considering traffic as a functional data We can use tools to analyse the relationship between a functional predictor and a binary response

20 Goal of Analysis Determine whether patterns of phone traffic (number, duration, value of montly calls) are related to churn. Here, our outcome, churn, is univariate, and our predictor, phone traffic, is longitudinal. How to characterize pattern of traffic? number, duration and value measures of traffic are examples of functional predictors: a random curve that varies over time, space, or some other domain, with observations at every point in the domain that may only be measured at a finite set of points.

21 Goal of Analysis Determine whether patterns of phone traffic (number, duration, value of montly calls) are related to churn. Here, our outcome, churn, is univariate, and our predictor, phone traffic, is longitudinal. How to characterize pattern of traffic? number, duration and value measures of traffic are examples of functional predictors: a random curve that varies over time, space, or some other domain, with observations at every point in the domain that may only be measured at a finite set of points.

22 Goal of Analysis Determine whether patterns of phone traffic (number, duration, value of montly calls) are related to churn. Here, our outcome, churn, is univariate, and our predictor, phone traffic, is longitudinal. How to characterize pattern of traffic? number, duration and value measures of traffic are examples of functional predictors: a random curve that varies over time, space, or some other domain, with observations at every point in the domain that may only be measured at a finite set of points.

23 General problem and data structure Interest: relationship between functional predictor f i and the response z i (inference and prediction) Predictor f i takes value f i (t) at location t {1,..., T }. Data consist of {y i, z i } n i=1, with y i = (y i1,..., y it ) T. y ij = error-prone measure of f i (t ij ) (telephone traffic at month t ij ) t ij = location (or time) of observation j n i = number of observations on subject i z i = response variable (churn) in addition we may have x i = static predictors variable (age, sex,... )

24 General problem and data structure Interest: relationship between functional predictor f i and the response z i (inference and prediction) Predictor f i takes value f i (t) at location t {1,..., T }. Data consist of {y i, z i } n i=1, with y i = (y i1,..., y it ) T. y ij = error-prone measure of f i (t ij ) (telephone traffic at month t ij ) t ij = location (or time) of observation j n i = number of observations on subject i z i = response variable (churn) in addition we may have x i = static predictors variable (age, sex,... )

25 General problem and data structure Interest: relationship between functional predictor f i and the response z i (inference and prediction) Predictor f i takes value f i (t) at location t {1,..., T }. Data consist of {y i, z i } n i=1, with y i = (y i1,..., y it ) T. y ij = error-prone measure of f i (t ij ) (telephone traffic at month t ij ) t ij = location (or time) of observation j n i = number of observations on subject i z i = response variable (churn) in addition we may have x i = static predictors variable (age, sex,... )

26 General problem and data structure Interest: relationship between functional predictor f i and the response z i (inference and prediction) Predictor f i takes value f i (t) at location t {1,..., T }. Data consist of {y i, z i } n i=1, with y i = (y i1,..., y it ) T. y ij = error-prone measure of f i (t ij ) (telephone traffic at month t ij ) t ij = location (or time) of observation j n i = number of observations on subject i z i = response variable (churn) in addition we may have x i = static predictors variable (age, sex,... )

27 Latent class trajectory model Group-based trajectory models are used to identify clusters of subjects following similar trajectories over time. While we may not believe that each subject s phone traffic measures exactly follow one of K curves, this may be a very useful summary of the data.

28 Latent class trajectory model Group-based trajectory models are used to identify clusters of subjects following similar trajectories over time. While we may not believe that each subject s phone traffic measures exactly follow one of K curves, this may be a very useful summary of the data.

29 Issues Good choice of parametric form for latent trajectory curves often unclear (prefer a nonparametric form?) iid N(0, σ 2 ) residuals restrictive - may imply lots of latent classes Number of latent classes unknown - BIC criteria may be poor

30 Issues Good choice of parametric form for latent trajectory curves often unclear (prefer a nonparametric form?) iid N(0, σ 2 ) residuals restrictive - may imply lots of latent classes Number of latent classes unknown - BIC criteria may be poor

31 Issues Good choice of parametric form for latent trajectory curves often unclear (prefer a nonparametric form?) iid N(0, σ 2 ) residuals restrictive - may imply lots of latent classes Number of latent classes unknown - BIC criteria may be poor

32 Our interest Use semiparametric Bayes joint modelling framework For ease in interpretation, group individuals into functional predictor clusters (patterns of traffic), with number of clusters not specified in advance Allow response (churn) distribution to vary nonparametrically across clusters Conduct inferences on changes in churn

33 Our interest Use semiparametric Bayes joint modelling framework For ease in interpretation, group individuals into functional predictor clusters (patterns of traffic), with number of clusters not specified in advance Allow response (churn) distribution to vary nonparametrically across clusters Conduct inferences on changes in churn

34 Our interest Use semiparametric Bayes joint modelling framework For ease in interpretation, group individuals into functional predictor clusters (patterns of traffic), with number of clusters not specified in advance Allow response (churn) distribution to vary nonparametrically across clusters Conduct inferences on changes in churn

35 Our interest Use semiparametric Bayes joint modelling framework For ease in interpretation, group individuals into functional predictor clusters (patterns of traffic), with number of clusters not specified in advance Allow response (churn) distribution to vary nonparametrically across clusters Conduct inferences on changes in churn

36 The data 3000 post paid SIM cards number of outgoing calls for 9 consecutive months socio-demographical (sex, age,... ) and relate to the contract (services, payment method,... ) characteristics churn status (active/deactivated) after three months

37 From data to model Flexibility to get irregularities, if present non parametric modelling Estimate variability between functional curves and between output distributions

38 Bayesian non parametric approach We follow Bigelow and Dunson (2009) with some modification: we use Gaussian processes as baseline measures (B&D2009 uses Splines functions) in the estimate algorithm we use a nested Metropolis-Hastings algorithm by-product: functional clustering

39 Joint model Joint modelling of {y i, x i, z i } n i=1 1 specification of a model for each component y i and z i x i 2 specification of a joint prior for the parameters of the two models

40 Components of the model Model for the output (churn) - GLM z i Bin(1, π i ) π i = eξ i Model for the trajectory 1 + e ξ i ξ i = a i + x T i γ y i (t) = f i (t) + ε it ε it N (0, τ 1 ) f i G Joint model θ i = {f i, a i } P Dependence between functional predictor f i Ω & response z i R characterised through P P = random probability measure on (R T +1, B)

41 Components of the model Model for the output (churn) - GLM z i Bin(1, π i ) π i = eξ i Model for the trajectory 1 + e ξ i ξ i = a i + x T i γ y i (t) = f i (t) + ε it ε it N (0, τ 1 ) f i G Joint model θ i = {f i, a i } P Dependence between functional predictor f i Ω & response z i R characterised through P P = random probability measure on (R T +1, B)

42 Components of the model Model for the output (churn) - GLM z i Bin(1, π i ) π i = eξ i Model for the trajectory 1 + e ξ i ξ i = a i + x T i γ y i (t) = f i (t) + ε it ε it N (0, τ 1 ) f i G Joint model θ i = {f i, a i } P Dependence between functional predictor f i Ω & response z i R characterised through P P = random probability measure on (R T +1, B)

43 Components of the model Model for the output (churn) - GLM z i Bin(1, π i ) π i = eξ i Model for the trajectory 1 + e ξ i ξ i = a i + x T i γ y i (t) = f i (t) + ε it ε it N (0, τ 1 ) f i G Joint model θ i = {f i, a i } P Dependence between functional predictor f i Ω & response z i R characterised through P P = random probability measure on (R T +1, B)

44 Gaussian process To simplify modelling we express function f i as a Gaussian process f i (t) GP(µ, C) where µ is the mean function and C is the covariance function Considering the discrete sequences of times (9 observations in our data), the GP induces a multivariate normal distributions on the observed points of the process, f 1,..., f T.

45 Gaussian process To simplify modelling we express function f i as a Gaussian process f i (t) GP(µ, C) where µ is the mean function and C is the covariance function Considering the discrete sequences of times (9 observations in our data), the GP induces a multivariate normal distributions on the observed points of the process, f 1,..., f T.

46 Gaussian process Samples from a GP can take a very wide variety of shapes that have limited sensitivity to the mean function. we allow an unknown, fixed mean, to avoid sensitivity to the scale of the phone traffic (still allows a very wide variety of trajectory shapes) The covariance function C controls the types of shapes observed. We used the exponential covariance function, as it allows a wide variety of functional shapes (squared exponential may overly favour smooth functions). C(t, t ) = 1 ( exp t ) t κ 1 κ 2 where κ 1 and κ 2 are unknown parameters.

47 Gaussian process Samples from a GP can take a very wide variety of shapes that have limited sensitivity to the mean function. we allow an unknown, fixed mean, to avoid sensitivity to the scale of the phone traffic (still allows a very wide variety of trajectory shapes) The covariance function C controls the types of shapes observed. We used the exponential covariance function, as it allows a wide variety of functional shapes (squared exponential may overly favour smooth functions). C(t, t ) = 1 ( exp t ) t κ 1 κ 2 where κ 1 and κ 2 are unknown parameters.

48 Gaussian process Samples from a GP can take a very wide variety of shapes that have limited sensitivity to the mean function. we allow an unknown, fixed mean, to avoid sensitivity to the scale of the phone traffic (still allows a very wide variety of trajectory shapes) The covariance function C controls the types of shapes observed. We used the exponential covariance function, as it allows a wide variety of functional shapes (squared exponential may overly favour smooth functions). C(t, t ) = 1 ( exp t ) t κ 1 κ 2 where κ 1 and κ 2 are unknown parameters.

49 Gaussian process Samples from a GP can take a very wide variety of shapes that have limited sensitivity to the mean function. we allow an unknown, fixed mean, to avoid sensitivity to the scale of the phone traffic (still allows a very wide variety of trajectory shapes) The covariance function C controls the types of shapes observed. We used the exponential covariance function, as it allows a wide variety of functional shapes (squared exponential may overly favour smooth functions). C(t, t ) = 1 ( exp t ) t κ 1 κ 2 where κ 1 and κ 2 are unknown parameters.

50 Dirichlet process joint models θ i = {f i, a i } P A natural approach is to let P be unknown with P DP(αP 0 ) DP(αP 0 ) = denotes the Dirichlet process (Ferguson, 1973) with α: precision parameter P 0 : base probability measure

51 Dirichlet process joint models θ i = {f i, a i } P A natural approach is to let P be unknown with P DP(αP 0 ) DP(αP 0 ) = denotes the Dirichlet process (Ferguson, 1973) with α: precision parameter P 0 : base probability measure

52 Dirichlet process joint models θ i = {f i, a i } P A natural approach is to let P be unknown with P DP(αP 0 ) DP(αP 0 ) = denotes the Dirichlet process (Ferguson, 1973) with α: precision parameter P 0 : base probability measure

53 Dirichlet process joint models Stick-breaking representation (Sethuraman, 1994): P = π h δ θ h, θh P 0 h=1 δ θ : Dirac probability measure on the atom θ h 1 π h = V h (1 v l ) V h Beta(1, α) iid l=1

54 Dirichlet process joint models P 0 GP(µ, C) N (0, ν 1 ) P 0 N T +1 ([ µ 0 ] [ C 0, 0 T ν 1 ])

55 Dirichlet process joint models P 0 GP(µ, C) N (0, ν 1 ) P 0 N T +1 ([ µ 0 ] [ C 0, 0 T ν 1 ])

56 Prior distributions A Bayesian specification is completed with priors ν γ(a ν, b ν ) precision response component τ γ(a τ, b τ ) precision error for predictor component κ 1 γ(a κ1, b κ1 ) covariance function for GP κ 2 γ(a κ2, b κ2 ) covariance function for GP γ l N (γ 0, η 1 l ), static variable effects η l γ(a η, b η ), l = 1,..., p, variances for static variable effects

57 Dirichlet process joint models This DP prior induces the following Blackwell & MacQueen (1973) rule: ( ) α i 1 ( (θ i θ 1,..., θ i 1 ) P 0 + α + i 1 δ θ = measure concentrated at θ. Pólya Urn scheme j=1 1 α + i 1 ) δ θj

58 Comments on DP joint models Subjects automatically grouped into an unknown number of functional trajectory clusters. Cluster h has functional trajectory f (t) = GP(µ h, C) & response density Bin(1, ah + xγ h ) Marginal density of z is a mixture of Bernoulli Within predictor cluster, density of z is a single Bernoulli The DP assumes identical clusters in the predictor & response

59 Comments on DP joint models Subjects automatically grouped into an unknown number of functional trajectory clusters. Cluster h has functional trajectory f (t) = GP(µ h, C) & response density Bin(1, ah + xγ h ) Marginal density of z is a mixture of Bernoulli Within predictor cluster, density of z is a single Bernoulli The DP assumes identical clusters in the predictor & response

60 Comments on DP joint models Subjects automatically grouped into an unknown number of functional trajectory clusters. Cluster h has functional trajectory f (t) = GP(µ h, C) & response density Bin(1, ah + xγ h ) Marginal density of z is a mixture of Bernoulli Within predictor cluster, density of z is a single Bernoulli The DP assumes identical clusters in the predictor & response

61 Comments on DP joint models Subjects automatically grouped into an unknown number of functional trajectory clusters. Cluster h has functional trajectory f (t) = GP(µ h, C) & response density Bin(1, ah + xγ h ) Marginal density of z is a mixture of Bernoulli Within predictor cluster, density of z is a single Bernoulli The DP assumes identical clusters in the predictor & response

62 Comments on DP joint models Subjects automatically grouped into an unknown number of functional trajectory clusters. Cluster h has functional trajectory f (t) = GP(µ h, C) & response density Bin(1, ah + xγ h ) Marginal density of z is a mixture of Bernoulli Within predictor cluster, density of z is a single Bernoulli The DP assumes identical clusters in the predictor & response

63 Posterior distribution Gibbs sampling is straightforward to implement, involving simple steps for sampling from standard distributions Highly computationally intensive P is almost certainly discrete clustering of the sample units (customers) without first specifying the number of groups MCMC algorithm (Pólya Urn + Gibbs sampler + Metropolis-Hastings) At each iteration 1 Allocate the units in the groups 2 Update group parameters Nested Metropolis-Hastings 3 Update prior distributions iperparameters

64 Posterior distribution Gibbs sampling is straightforward to implement, involving simple steps for sampling from standard distributions Highly computationally intensive P is almost certainly discrete clustering of the sample units (customers) without first specifying the number of groups MCMC algorithm (Pólya Urn + Gibbs sampler + Metropolis-Hastings) At each iteration 1 Allocate the units in the groups 2 Update group parameters Nested Metropolis-Hastings 3 Update prior distributions iperparameters

65 Posterior distribution Gibbs sampling is straightforward to implement, involving simple steps for sampling from standard distributions Highly computationally intensive P is almost certainly discrete clustering of the sample units (customers) without first specifying the number of groups MCMC algorithm (Pólya Urn + Gibbs sampler + Metropolis-Hastings) At each iteration 1 Allocate the units in the groups 2 Update group parameters Nested Metropolis-Hastings 3 Update prior distributions iperparameters

66 Posterior distribution Gibbs sampling is straightforward to implement, involving simple steps for sampling from standard distributions Highly computationally intensive P is almost certainly discrete clustering of the sample units (customers) without first specifying the number of groups MCMC algorithm (Pólya Urn + Gibbs sampler + Metropolis-Hastings) At each iteration 1 Allocate the units in the groups 2 Update group parameters Nested Metropolis-Hastings 3 Update prior distributions iperparameters

67 Results interpretation Label switching a pain! number and composition of groups changes among algorithm iterations output not directly usable for clustering can be addressed by post-processing (Medvedovic and Sivaganesan, 2002), but adds considerably to computational burden 1 Obtain a distance matrix between sample units by using the posterior output 2 hierarchical clustering with complete link

68 Results interpretation Label switching a pain! number and composition of groups changes among algorithm iterations output not directly usable for clustering can be addressed by post-processing (Medvedovic and Sivaganesan, 2002), but adds considerably to computational burden 1 Obtain a distance matrix between sample units by using the posterior output 2 hierarchical clustering with complete link

69 Results interpretation Label switching a pain! number and composition of groups changes among algorithm iterations output not directly usable for clustering can be addressed by post-processing (Medvedovic and Sivaganesan, 2002), but adds considerably to computational burden 1 Obtain a distance matrix between sample units by using the posterior output 2 hierarchical clustering with complete link

70 Results interpretation Label switching a pain! number and composition of groups changes among algorithm iterations output not directly usable for clustering can be addressed by post-processing (Medvedovic and Sivaganesan, 2002), but adds considerably to computational burden 1 Obtain a distance matrix between sample units by using the posterior output 2 hierarchical clustering with complete link

71 Results interpretation Label switching a pain! number and composition of groups changes among algorithm iterations output not directly usable for clustering can be addressed by post-processing (Medvedovic and Sivaganesan, 2002), but adds considerably to computational burden 1 Obtain a distance matrix between sample units by using the posterior output 2 hierarchical clustering with complete link

72 Estimated trajectories and probabilities

73 some cluster

74 The static variables p1, p2, p3, p4, p5: Tariff plan m1, m2, m3: Payment method e1, e2, e3: Age

75 Lift improvement factor balanced logistic model not balanced logistic model balanced linear model not balanced linear model discriminant analysis classification tree MARS GAM SVM random forest bagging boosting fraction of predicted subjects

76

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Nonparametric Bayes Modeling

Nonparametric Bayes Modeling Nonparametric Bayes Modeling Lecture 6: Advanced Applications of DPMs David Dunson Department of Statistical Science, Duke University Tuesday February 2, 2010 Motivation Functional data analysis Variable

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Research Article Spiked Dirichlet Process Priors for Gaussian Process Models

Research Article Spiked Dirichlet Process Priors for Gaussian Process Models Hindawi Publishing Corporation Journal of Probability and Statistics Volume 200, Article ID 20489, 4 pages doi:0.55/200/20489 Research Article Spiked Dirichlet Process Priors for Gaussian Process Models

More information

Dirichlet Processes: Tutorial and Practical Course

Dirichlet Processes: Tutorial and Practical Course Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Nonparametric Bayes regression and classification through mixtures of product kernels

Nonparametric Bayes regression and classification through mixtures of product kernels Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Efficient Bayesian Multivariate Surface Regression

Efficient Bayesian Multivariate Surface Regression Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Normalized kernel-weighted random measures

Normalized kernel-weighted random measures Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,

More information

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

An Alternative Infinite Mixture Of Gaussian Process Experts

An Alternative Infinite Mixture Of Gaussian Process Experts An Alternative Infinite Mixture Of Gaussian Process Experts Edward Meeds and Simon Osindero Department of Computer Science University of Toronto Toronto, M5S 3G4 {ewm,osindero}@cs.toronto.edu Abstract

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Dynamic Generalized Linear Models

Dynamic Generalized Linear Models Dynamic Generalized Linear Models Jesse Windle Oct. 24, 2012 Contents 1 Introduction 1 2 Binary Data (Static Case) 2 3 Data Augmentation (de-marginalization) by 4 examples 3 3.1 Example 1: CDF method.............................

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris University of Texas, MD Anderson Cancer Center Joint wor with Marina Vannucci, Philip J. Brown,

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical

More information

A Nonparametric Bayesian Model for Multivariate Ordinal Data

A Nonparametric Bayesian Model for Multivariate Ordinal Data A Nonparametric Bayesian Model for Multivariate Ordinal Data Athanasios Kottas, University of California at Santa Cruz Peter Müller, The University of Texas M. D. Anderson Cancer Center Fernando A. Quintana,

More information

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models William Barcella 1, Maria De Iorio 1 and Gianluca Baio 1 1 Department of Statistical Science,

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Index Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables. Adaptive rejection metropolis sampling (ARMS), 98 Adaptive shrinkage, 132 Advanced Photo System (APS), 255 Aggregation

More information

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Athanasios Kottas Department of Applied Mathematics and Statistics,

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah David M. Blei Warren B. Powell Department of Computer Science, Princeton University Department of Operations Research and Financial Engineering, Princeton University Department of Operations

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Chart types and when to use them

Chart types and when to use them APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart

More information

Heriot-Watt University

Heriot-Watt University Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Chris Paciorek March 11, 2005 Department of Biostatistics Harvard School of

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Multivariate Normal & Wishart

Multivariate Normal & Wishart Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.

More information