Bayesian Nonparametric Regression through Mixture Models

Size: px
Start display at page:

Download "Bayesian Nonparametric Regression through Mixture Models"

Transcription

1 Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013

2 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

3 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

4 Introduction Aim: flexible regression, where both the mean and error distribution may evolve flexibly with x.

5 Introduction Aim: flexible regression, where both the mean and error distribution may evolve flexibly with x. Many approaches for flexible regression (ex. splines, wavelets, Gaussian processes), but typically assume iid Gaussian errors, Y i = m(x i ) + ɛ i, ɛ i σ 2 iid N(0, σ 2 ).

6 Introduction Aim: flexible regression, where both the mean and error distribution may evolve flexibly with x. Many approaches for flexible regression (ex. splines, wavelets, Gaussian processes), but typically assume iid Gaussian errors, Y i = m(x i ) + ɛ i, ɛ i σ 2 iid N(0, σ 2 ). For i.i.d. data, mixtures are a useful tool for flexible density estimation.

7 Introduction Aim: flexible regression, where both the mean and error distribution may evolve flexibly with x. Many approaches for flexible regression (ex. splines, wavelets, Gaussian processes), but typically assume iid Gaussian errors, Y i = m(x i ) + ɛ i, ɛ i σ 2 iid N(0, σ 2 ). For i.i.d. data, mixtures are a useful tool for flexible density estimation. Extend mixture models for conditional density estimation.

8 Mixture Models Given f, assume Y i are iid with density f p (y) = K(y θ)dp(θ), for some parametric density K(y θ).

9 Mixture Models Given f, assume Y i are iid with density f p (y) = K(y θ)dp(θ), for some parametric density K(y θ). Bayesian setting, define a prior on P, where typically P is discrete a.s., P = J J w j δ θj, f P (y) = w j K(y θ j ). j=1 j=1

10 Mixture Models Given f, assume Y i are iid with density f p (y) = K(y θ)dp(θ), for some parametric density K(y θ). Bayesian setting, define a prior on P, where typically P is discrete a.s., P = J J w j δ θj, f P (y) = w j K(y θ j ). j=1 j=1 How many components? Fixed J: overfitted mixtures (Rousseau and Mengersen (2011)) or post-processing techniques. Random J: reversible jump MCMC. J = : Dirichlet process (DP) mixtures and developments in BNP.

11 Extending Mixture Models Two main lines of research: Joint approach: Augment the observations to include X and assume that given f P, the (Y i, X i ) are iid with density f P (y, x) = K(y, x θ)dp(θ) = w j K(y, x θ j ). j=1 Conditional density estimates are obtained as a by-product through j=1 f P (y x) = w jk(y, x θ j ) j =1 w j K(x θ j ). Conditional approach: Allow the mixing measure to depend on x and assign a prior so that {P x } x X are dependent. Assume that given f Px and x, the Y i are independent with density f Px (y x) = K(y θ)dp x (θ) = w j (x)k(y θ j (x)). j=1

12 Mixture Models for Regression Many proposals in literature fall into these two general categories: 1. Joint Approach: Müller, Erkanli, and West (1996), Shahbaba and Neal (2009), Hannah, Blei, and Powell (2011), Park and Dunson (2010), Müller and Quintana (2010), Bhattacharya, Page, and Dunson (2012), Quintana (2011) Conditional Approach: a. Covariate-dependent atoms: MacEachern (1999; 2000; 2001), DeIorio et al. (2004), Gelfand, Kottas, and MacEachern (2005), Rodriguez and Horst (2008), De La Cruz, Quintana, and Müller (2007), De Iorio et al. (2009), Jara et al. (2010)... Example: f Px (y x) = j=1 w jn(y µ j (x), σj 2). b. Covariate-dependent weights: Griffin and Steele (2006; 2010), Dunson and Park (2008), Ren et al. (2011), Chung and Dunson (2009), Rodriguez and Dunson (2011)... (also models based on covariate-dependent urn schemes or partitions). Example: f Px (y x) = j=1 w j(x)n(y xβ j, σj 2).

13 Mixture Models for regression How to choose among the large number of proposals for the application at hand?

14 Mixture Models for regression How to choose among the large number of proposals for the application at hand? Computational reasons Frequentist properties: consistency (Hannah, Blei, and Powell (2011), Norets and Pelenis (2012), Pati, Dunson, and Tokdar (2012)), but may hide what happens in finite samples. Convergence rates?

15 Mixture Models for regression How to choose among the large number of proposals for the application at hand? Computational reasons Frequentist properties: consistency (Hannah, Blei, and Powell (2011), Norets and Pelenis (2012), Pati, Dunson, and Tokdar (2012)), but may hide what happens in finite samples. Convergence rates? Our aim is to answer this question from a Bayesian perspective. We do so through a detailed study of predictive performance based on finite samples, identifying potential sources of improvement, developing novel models to improve predictions.

16 Motivating Application Motivating application: study of Alzheimer s Disease based on neuroimaging data. Definite diagnosis isn t known until autopsy, and current diagnosis is based on clinical information. Aim: show that neuroimages can be used to improve diagnostic accuracy and may be better outcomes measures for clinical trials particularly in early stages of the disease. Prior knowledge, need for dimension reduction, non-homogenity BNP approach.

17 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

18 EDP (with S. Mongelluzzo and S. Petrone) In many applications, the DP is a prior for a multivariate random probability measure. DP has many desirable properties (easy elicitation of parameters: α, P 0, conjugacy, large support) However, when considering set of probability measures on R k, a DP prior can be quite restrictive in the sense that the variability is determined by a single parameter, α. Enriched conjugate priors (Consonni and Veronese (2001)) have been proposed to address an analogous lack of flexibility of standard conjugate priors in a parametric setting. Solution: Construct a enriched DP that is more flexible, yet still conjugate by extending the idea of enriched conjugate priors to a nonparametric setting.

19 EDP: Definition Sample space: Θ Ψ, where Θ, Ψ are complete separable metric spaces. Enriched Dirichlet Process: P 0 is a probability measure on Θ Ψ, α θ > 0, and α ψ (θ) > 0 θ Θ. Assume: 1. P θ DP(α θ P 0 θ ). 2. θ θ, P ψ θ ( θ) DP(α ψ (θ)p 0 ψ θ ( θ)). 3. P ψ θ ( θ), θ Θ are independent among themselves. 4. P θ is independent of { P ψ θ ( θ) }. θ Θ The law of the random joint is obtained from the joint law of the marginal and conditionals through the mapping (P θ, P ψ θ ) ( ) P ψ θ( θ)dp θ (θ). Many more precision parameters, yet conjugacy and many other desirable properties maintained.

20 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

21 Summary DP mixtures models are popular tools in BNP - known properties and analytically computable urn scheme. However, as p increases, x tends to dominate the posterior of the random partition, negatively affecting the prediction. Solution: replace the DP with the EDP, to allow local clustering, yet maintain an analytically computable urn scheme.

22 Joint DP mixture model Model (Müller et al. (1996)): f P (y, x) = N(y, x µ, Σ)dP(µ, Σ), P DP(αP 0 ), where P 0 is the conjugate normal-inverse Wishart. Problems: requires sampling the full p + 1xp + 1 covariance matrix and the inverse Wishart is poorly parametrized.

23 Joint DP mixture model Model (Müller et al. (1996)): f P (y, x) = N(y, x µ, Σ)dP(µ, Σ), P DP(αP 0 ), where P 0 is the conjugate normal-inverse Wishart. Problems: requires sampling the full p + 1xp + 1 covariance matrix and the inverse Wishart is poorly parametrized. Model (Shahbaba and Neal (2009)): p f P (y, x) = N(y xβ, σy x 2 ) N(x µ h, σx,h 2 )dp(θ, ψ), h=1 P DP(αP 0 θ P 0 ψ ), where θ = (β, σ 2 y x ), ψ = (µ, σ2 x), and P 0 θ, P 0 ψ are the conjugate normal-inverse gamma priors. Improvements: 1) eases computations, 2) more flexible prior for the variances, 3) flexible local correlations between Y and X, and 4) easy incorporation of other types of Y and X.

24 Examining the Partition P is discrete a.s. a random partition of the subjects. Notation: ρ n = (s 1,..., s n )-(partition), (θj, ψ j )-(unique parameters), xj, y j -(cluster-specific observations). Partition: p(ρ n x 1:n, y 1:n ) α k k k Γ(n j ) j=1 p j=1 h=1 g x,h (x j,h ) g y (y j x j ), where, for ex. g y (y j x j ) = i:s i =j K(y i x i, θ)dp 0 θ (θ).

25 Examining the Partition P is discrete a.s. a random partition of the subjects. Notation: ρ n = (s 1,..., s n )-(partition), (θj, ψ j )-(unique parameters), xj, y j -(cluster-specific observations). Partition: p(ρ n x 1:n, y 1:n ) α k k k Γ(n j ) j=1 p j=1 h=1 g x,h (x j,h ) g y (y j x j ), where, for ex. g y (y j x j ) = i:s i =j K(y i x i, θ)dp 0 θ (θ). As p increases, x dominates the partition structure. x exhibits increasing departures. large k and small n 1,..., n k with high posterior probability.

26 Examining the Prediction How does this affect prediction? E[Y n+1, y 1:n, x 1:n+1] = P n w k+1 (x n+1)x n+1 β 0 + w j (x n+1) = c 1 n j p h=1 w k+1 (x n+1) = c 1 αg x(x n+1). k w j (x n+1)x n+1 βj dp(ρ n, θ, φ y 1:n, x 1:n+1), j=1 N(x n+1,h µ j,h, σ 2 x,j,h) for j = 1,..., k, As p increases, given the partition, predictive estimates are an average over the large number of components. within component, predictive estimates are unreliable and variable due to small sample sizes. rigid measure of similarity in the covariate space.

27 An EDP mixture model for regression (with S. Petrone) Model: f P (y, x) = N(y xβ, σ 2 y x ) p h=1 N(x µ h, σx,h 2 )dp(θ, ψ), P EDP(α θ, α ψ ( ), P 0 θ P 0 ψ ),

28 An EDP mixture model for regression (with S. Petrone) Model: f P (y, x) = N(y xβ, σ 2 y x ) p h=1 N(x µ h, σx,h 2 )dp(θ, ψ), P EDP(α θ, α ψ ( ), P 0 θ P 0 ψ ), Defines a nested partition structure ρ n = (ρ n,y, (ρ nj+,x)), where k represents the number of y-clusters of sizes n j+, and k j represents the number of x-clusters within the j th y-cluster of sizes n j,l. Additional notation: (θj, ψ j,l )-(unique parameters), (yj, x j,l )-(cluster specific observations).

29 Examining the Partition Under the assumption that α ψ (θ) = α ψ for all θ Θ, k k j k p(ρ n x 1:n, y 1:n) αθ k α k j Γ(α ψ )Γ(n j+ ) k j p ψ Γ(n j,l ) g y (yj xj ) g x(xj,l,h). Γ(α ψ + n j+ ) j=1 l=1 j=1 l=1 h=1 likelihood of y determines if a coarser y-partition is present smaller k and larger n 1+,..., n k+ with high posterior probability.

30 Examining the Prediction Prediction: E[Y n+1, y 1:n, x 1:n+1] = P n w k+1 (x n+1)x n+1 β 0 + α x k w j (x n+1)x n+1 βj dp(ρ n, θ, φ y 1:n, x 1:n+1), j=1 w j (x n+1) = c 1 1 n j+ ( g x(x n+1) + α x + n j+ w k+1 (x n+1) = c 1 1 α y g x(x n+1). k j l=1 n j,l α x + n j+ p h=1 N(x n+1,h µ l j,h, σ 2 x,l j,h)), given the partition, predictive estimates are an average over a smaller number of components. within component, predictive estimates are based on larger sample sizes. flexible measure of similarity in the covariate space.

31 Examining the Prediction Prediction: E[Y n+1, y 1:n, x 1:n+1] = P n w k+1 (x n+1)x n+1 β 0 + α x k w j (x n+1)x n+1 βj dp(ρ n, θ, φ y 1:n, x 1:n+1), j=1 w j (x n+1) = c 1 1 n j+ ( g x(x n+1) + α x + n j+ w k+1 (x n+1) = c 1 1 α y g x(x n+1). k j l=1 n j,l α x + n j+ p h=1 N(x n+1,h µ l j,h, σ 2 x,l j,h)), given the partition, predictive estimates are an average over a smaller number of components. within component, predictive estimates are based on larger sample sizes. flexible measure of similarity in the covariate space. Summary: More reliable and less variable estimates, yet computations remain relatively simple due to analytically computable urn scheme.

32 Application to AD data Aim: Diagnosis of AD. Data: Y-binary indicator of CN (1) or AD (0); X-summaries of structural Magnetic Resonance Images (smri) (p = 15). Model: ind Y i x i, β i Bern(Φ((x i β i )/σ y,i )), p X i µ i, σx,i 2 ind N(µ i,h, σx,i,h 2 ), h=1 β i, µ i, σ 2 x,i P i.i.d. P, P Q. Q= DP or EDP with the same base measure and gamma hyperpriors on the precision parameters.

33 Results for n = 185 Posterior mode of k: 16 (DP) vs. 3 (EDP) ICV BV LHV RHV (a) DP ICV BV LHV RHV (b) EDP

34 Results for n = 185 Posterior mode of k: 16 (DP) vs. 3 (EDP) ICV BV LHV RHV (c) DP ICV BV LHV RHV (d) EDP Prediction for m = 192 subjects: Number of subjects correctly classified: 159 (DP) vs. 168 (EDP). Tighter credible intervals.

35 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

36 Introduction Aim: Analyse the role of the partition in prediction for BNP mixtures and highlight an understated problem: huge dimension of the partition space.

37 Introduction Aim: Analyse the role of the partition in prediction for BNP mixtures and highlight an understated problem: huge dimension of the partition space. Assume both Y and X are univariate and continuous.

38 Introduction Aim: Analyse the role of the partition in prediction for BNP mixtures and highlight an understated problem: huge dimension of the partition space. Assume both Y and X are univariate and continuous. Number of ways to partition the n subjects: S n,k = 1 k! B n = n S n,k, k=1 k ( ( 1) j k j j=0 ) (k j) n. Many of these partitions similar large number of partitions will fit the data.

39 Introducing Covariate Information Covariates typically provide information on the partition structure. Incorporating this covariate information (as for models with w j (x)) will improve posterior inference on partition.

40 Introducing Covariate Information Covariates typically provide information on the partition structure. Incorporating this covariate information (as for models with w j (x)) will improve posterior inference on partition. But posterior is still diluted this information needs to be strictly enforced.

41 Introducing Covariate Information Covariates typically provide information on the partition structure. Incorporating this covariate information (as for models with w j (x)) will improve posterior inference on partition. But posterior is still diluted this information needs to be strictly enforced. If aim is estimation of regression function, desirable partitions satisfy an ordering constraint of (x i ) Reduces the total number of partitions to 2 n 1. Note: if n=10, B 10 = 115, 975 and = 512, 0.44% of the total partitions, and if n = 100, /B 100 <

42 Restricted DPM (with S.G. Walker and S. Petrone) Let x (1),..., x (n) denote the ordered values of x, and s (1),..., s (n) denote the corresponding values of s 1,..., s n. Prior for the random partition: p(ρ n x) = Γ(α)Γ(n + 1) α k Γ(α + n) k! k j=1 1 n j I s(1)... s (n). Introduces covariate dependency, greatly reduces the total number of partitions, improves prediction, and speeds up computations.

43 Simulated Example Data are simulated from: { xi /8 + 5 if x y i x i = i 6 2x i 12 if x i > 6, x i = (0, 0.25, 0.5,..., 8.75, 9).

44 Simulated Ex.: 3 partitions with highest posterior mass p( rho y,x)= p( rho y,x)= p( rho y,x)= (e) DPM p( rho y,x)= p( rho y,x)= p( rho y,x)= (f) jdpm p( rho y,x)= p( rho y,x)= p( rho y,x)= (g) rdpm No. of partitions with positive post. prob.: 1,263, 918, and 43, respectively.

45 Simulated Example: Prediction (h) DPM (i) joint DPM (j) restricted DPM Figure: Prediction for x = (3.3, 5.9, 6.2, 6.3, 7.9, 8.1, 10). The empirical L 2 prediction error, m (1/m (ŷ m,j,est ŷ m,j,true ) 2 ) 1/2, j=1 for the three models, respectively, is 2.19, 1.66, and 1.09.

46 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

47 Introduction BNP mixture models with covariate dependent weights: f Px (y x) = K(y x, θ)dp x (θ), P x = w j (x)δ θj. Construction of covariate dependent weights (in literature): w j (x) = v j (x) v j (x)), j <j(1 j=1 where v j ( ) are independent stochastic processes.

48 Introduction BNP mixture models with covariate dependent weights: f Px (y x) = K(y x, θ)dp x (θ), P x = w j (x)δ θj. Construction of covariate dependent weights (in literature): w j (x) = v j (x) v j (x)), j <j(1 j=1 where v j ( ) are independent stochastic processes. Drawbacks: non-interpretable weight function, typically exhibiting peculiarities. choice for functional shapes and hyper-parameters. extensions for continuous and discrete covariates are not straightforward.

49 Introduction BNP mixture models with covariate dependent weights: f Px (y x) = K(y x, θ)dp x (θ), P x = w j (x)δ θj. Construction of covariate dependent weights (in literature): w j (x) = v j (x) v j (x)), j <j(1 j=1 where v j ( ) are independent stochastic processes. Drawbacks: non-interpretable weight function, typically exhibiting peculiarities. choice for functional shapes and hyper-parameters. extensions for continuous and discrete covariates are not straightforward. Aim: construct interpretable weights that can handle continuous and discrete covariates, with feasible posterior inference.

50 Normalized weights (with S.G. Walker and I. Antoniano) Solution: Since w j (x) represents the weight assigned to regression model j at covariate x, an interpretable construction is obtained through Bayes Theorem by combining P(A j) = K(x ψ j )dx, the probability that an observation from regression model j has a covariate value in the region A X. P(j) = w j, the prior probability that an observation comes from model j. w w j (x) = j K(x ψ j ) j =1 w j K(x ψ ). j Inclusion of continuous and discrete covariates is straightforward via appropriate definition of K(x ψ).

51 Normalized weights: Computations Likelihood contains an intractable normalizing constant, where w j (x) = f P (y x) = w j (x)k(y x, θ j ), j=1 w j K(x ψ j ) j =1 w j K(x ψ j ).

52 Normalized weights: Computations Likelihood contains an intractable normalizing constant, where w j (x) = f P (y x) = w j (x)k(y x, θ j ), j=1 w j K(x ψ j ) j =1 w j K(x ψ j ). However, using results for the geometric series, for r < 1, k=0 the likelihood can be written as j=1 r k = 1 1 r, f P (y x) = w j K(x ψ j )K(y x, θ j ) (1 w j K(x ψ j )) k. k=0 j =1

53 Normalized weights: Computations The likelihood can be re-written as f P (y x) = w j K(x ψ j )K(y x, θ j ) ( w j (1 K(x ψ j ))) k. j=1 k=0 j =1 k = w j K(x ψ j )K(y x, θ j ) ( w jl (1 K(x ψ jl ))). j=1 k=0 l=1 j l =1 Introducing latent variables k, d, D, latent model is k f P (y, k, d, D x) = w d K(x ψ d )K(y x, θ d ) w Dl (1 K(x ψ Dl )). l=1

54 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression 4 Restricted DP Mixtures for Regression 5 Normalized Covariate Dependent Weights 6 Discussion

55 Discussion BNP mixture models for regression are highly flexible and numerous. This thesis aimed to shed light on advantages and drawbacks of the various models.

56 Discussion BNP mixture models for regression are highly flexible and numerous. This thesis aimed to shed light on advantages and drawbacks of the various models. 1. Joint models: + easy computations, + flexible and easy to extend to other types of Y and X, posterior inference is based on the joint, problematic for large p. Can overcome with EDP prior (maintains + +).

57 Discussion 2. Conditional models with covariate dependent atoms: + moderate computations (increases with flexibility in θ j (x)), + posterior inference is based on the conditional, appropriate definition of θ j (x) for mixed x can be challenging, choice of θ j (x) has strong implications, inflexible: extremely poor prediction, over flexible: prediction may also suffer.

58 Discussion 2. Conditional models with covariate dependent atoms: + moderate computations (increases with flexibility in θ j (x)), + posterior inference is based on the conditional, appropriate definition of θ j (x) for mixed x can be challenging, choice of θ j (x) has strong implications, inflexible: extremely poor prediction, over flexible: prediction may also suffer. 3. Conditional models with covariate dependent weights: + posterior inference is based on the conditional, + flexible and incorporate covariate proximity in partition, difficult computations, lack of interpretable weights that can accommodate mixed x. Can overcome with normalized weights.

59 Discussion 2. Conditional models with covariate dependent atoms: + moderate computations (increases with flexibility in θ j (x)), + posterior inference is based on the conditional, appropriate definition of θ j (x) for mixed x can be challenging, choice of θ j (x) has strong implications, inflexible: extremely poor prediction, over flexible: prediction may also suffer. 3. Conditional models with covariate dependent weights: + posterior inference is based on the conditional, + flexible and incorporate covariate proximity in partition, difficult computations, lack of interpretable weights that can accommodate mixed x. Can overcome with normalized weights. - Huge dimension of partition space: posterior of ρ n is very spread out. Can overcome by enforcing notion of covariate proximity.

60 Future Work Examining models for large p problems. Formalizing model comparison of various models, Consistency, Convergence rates, Finite sample bounds. AD study: Diagnosis based on smri and other imaging techniques. Investigate dynamics of AD biomarkers (jointly + incorporating longitudinal data).

61 References 1. Consonni, G. and Veronese, P. (2001). Conditionally reducible natural exponential families and enriched conjugate priors. Scand. J. Statist., 28, Ramamoorthi, R.V. and Sangalli, L. (2005). On a Characterization of Dirichlet Distribution. Proceedings of the International Conference on Bayesian Statistics and its Applications, Varanasi, India. 3. Hannah, L. and Blei, D. and Powell W. (2011). Dirichlet Process mixtures of Generalized Linear Models. Journal of Machine Learning Research,12, Müller, P. Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika, 83, Shahbaba, B. and Neal, R.M. (2009). Nonlinear models using Dirichlet Process Mixtures. Journal of Machine Learning Research, 10,

62 References 6. MacEachern, S.N. (2000). Dependent Nonparametric Processes. Technical Report, Department of Statistics, Ohio State University. 7. Griffin, J.E. and Steele, M. (2006). Order-Based Dependent Dirichlet Processes. JASA, 10, Dunson, D.B. and Park, J.H. (2008). Kernel Stick-Breaking Process. Biometrika,95, Rodriguez, A. and Dunson, D.B. (2011). Nonparametric Bayesian Models through Probit Stick-Breaking Processes. Bayesian Analysis, Fuentes-Garcia, R., Mena, R.H. and Walker, S.G. (2010). A probability for classification based on the mixture of Dirichlet process model. Journal of Classification, 27,

Improving prediction from Dirichlet process mixtures via enrichment

Improving prediction from Dirichlet process mixtures via enrichment Improving prediction from Dirichlet process mixtures via enrichment Sara K. Wade, David B. Dunson, Sonia Petrone, Lorenzo Trippa, For the Alzheimer s Disease Neuroimaging Initiative. November 15, 2013

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah David M. Blei Warren B. Powell Department of Computer Science, Princeton University Department of Operations Research and Financial Engineering, Princeton University Department of Operations

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Nonparametric Bayes regression and classification through mixtures of product kernels

Nonparametric Bayes regression and classification through mixtures of product kernels Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models William Barcella 1, Maria De Iorio 1 and Gianluca Baio 1 1 Department of Statistical Science,

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Maria De Iorio Yale-NUS College Dept of Statistical Science, University College London Collaborators: Lifeng Ye (UCL, London,

More information

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J. E. Griffin and M. F. J. Steel University of Warwick Bayesian Nonparametric Modelling with the Dirichlet Process Regression

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David M. Blei Department

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We

More information

Normalized kernel-weighted random measures

Normalized kernel-weighted random measures Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren A. Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David M. Blei Department

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Bayesian Nonparametric Modeling for Multivariate Ordinal Regression

Bayesian Nonparametric Modeling for Multivariate Ordinal Regression Bayesian Nonparametric Modeling for Multivariate Ordinal Regression arxiv:1408.1027v3 [stat.me] 20 Sep 2016 Maria DeYoreo Department of Statistical Science, Duke University and Athanasios Kottas Department

More information

Dirichlet Process Mixtures of Generalized Linear Models

Dirichlet Process Mixtures of Generalized Linear Models Dirichlet Process Mixtures of Generalized Linear Models Lauren Hannah Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA David Blei Department of

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

A Nonparametric Bayesian Model for Multivariate Ordinal Data

A Nonparametric Bayesian Model for Multivariate Ordinal Data A Nonparametric Bayesian Model for Multivariate Ordinal Data Athanasios Kottas, University of California at Santa Cruz Peter Müller, The University of Texas M. D. Anderson Cancer Center Fernando A. Quintana,

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Modeling and Predicting Healthcare Claims

Modeling and Predicting Healthcare Claims Bayesian Nonparametric Regression Models for Modeling and Predicting Healthcare Claims Robert Richardson Department of Statistics, Brigham Young University Brian Hartman Department of Statistics, Brigham

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions

Bayesian Nonparametric Inference Methods for Mean Residual Life Functions Bayesian Nonparametric Inference Methods for Mean Residual Life Functions Valerie Poynor Department of Applied Mathematics and Statistics, University of California, Santa Cruz April 28, 212 1/3 Outline

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis

POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced

More information

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data

Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Efficient Bayesian Multivariate Surface Regression

Efficient Bayesian Multivariate Surface Regression Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

On the posterior structure of NRMI

On the posterior structure of NRMI On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

A Bayesian Nonparametric Approach to Inference for Quantile Regression

A Bayesian Nonparametric Approach to Inference for Quantile Regression A Bayesian Nonparametric Approach to Inference for Quantile Regression Matthew TADDY The University of Chicago Booth School of Business, Chicago, IL 60637 (matt.taddy@chicagogsb.edu) Athanasios KOTTAS

More information

A Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation

A Bayesian Nonparametric Hierarchical Framework for Uncertainty Quantification in Simulation Submitted to Operations Research manuscript Please, provide the manuscript number! Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Particle Learning for General Mixtures

Particle Learning for General Mixtures Particle Learning for General Mixtures Hedibert Freitas Lopes 1 Booth School of Business University of Chicago Dipartimento di Scienze delle Decisioni Università Bocconi, Milano 1 Joint work with Nicholas

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Dirichlet Processes: Tutorial and Practical Course

Dirichlet Processes: Tutorial and Practical Course Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

A Nonparametric Model-based Approach to Inference for Quantile Regression

A Nonparametric Model-based Approach to Inference for Quantile Regression A Nonparametric Model-based Approach to Inference for Quantile Regression Matthew Taddy and Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA

More information

Methods for the Comparability of Scores

Methods for the Comparability of Scores Assessment and Development of new Statistical Methods for the Comparability of Scores 1 Pontificia Universidad Católica de Chile 2 Laboratorio Interdisciplinario de Estadística Social Advisor: Jorge González

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Compound Random Measures

Compound Random Measures Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies 3 CALGB8881 3 CALGB916 2 2 β 1 1 β 1 1 1 5 5 β 1 5 5 β Infinite mixture models

More information

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J.E. Griffin and M. F. J. Steel University of Kent and University of Warwick Abstract In this paper we discuss implementing

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i, Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information