Multivariate kernel partition processes

Size: px
Start display at page:

Download "Multivariate kernel partition processes"

Transcription

1 Multivariate kernel partition processes David B. Dunson Department of Statistical Science Box 95, 8 Old Chemistry Building Duke University Durham, NC dunson@stat.duke.edu SUMMARY This article considers the problem of accounting for unknown multivariate mixture distributions within Bayesian hierarchical models motivated by functional data analysis. Most nonparametric Bayes methods rely on global partitioning, with subjects assigned to a single cluster index for all their random effects. We propose a multivariate kernel partition process (KPP) that instead allows the cluster indices to vary. The KPP is shown to be the driving measure for a multivariate ordered Chinese restaurant process, which allows a highly-flexible dependence structure in local clustering. This structure allows the relative locations of the random effects to inform the clustering process, with spatially-proximal random effects likely to be assigned the same cluster index. An efficient exact block Gibbs sampler is developed for posterior computation, avoiding the need to truncate the infinite measure. The methods are applied to functional data analysis, illustrating improvements in model fit, parsimony and predictive performance relative to current methods. Key words: Basis functions; Chinese Restaurant Process; Dirichlet process; Local clustering; Longitudinal data; Nonparametric Bayes; Random effects; Sparsity.

2 . Introduction Functional data analysis considers a random function as the basic unit of analysis for a subject, with the data consisting of error-prone measurements at a finite number of locations. Some examples of functional data include longitudinal trajectories in a biomarker and brain images. Our particular interest is in sparse functional data measured at unequally-spaced and varying locations for the different subjects. In such settings, a variety of modeling strategies have been proposed, including hierarchical Gaussian processes (Behseta, Kass and Wallstrom, 5; Kaufman and Sain, 7), random effects models relying on basis function representations (Thompson and Rosen, 8; Bigelow and Dunson, 7), and wavelet-based functional mixed models (Morris and Carroll, 6). For Gaussian process (GP) models, the choice of mean and covariance function plays a critical role, while for random effects models the choice of basis and parametric form for the random effects distribution is important. Let f i : X R denote the function for subject i, for i =,..., n. As a more flexible alternative to the GP, one can use a functional Dirichlet process (FDP) in which f i P and P DP(αP ), with α a scalar precision parameter and P a base probability measure obeying a GP law. Under this approach, subjects are allocated to functional clusters, with f i = Θ h for all subjects in cluster h, where Θ h is a realization from a GP. By clustering subjects, one obtains replicates so that through posterior updating the function estimates become less sensitive to the choice of GP covariance function. However, as noted by Petrone, Guindani and Gelfand (8), the FDP has the disadvantage of inducing global functional clusters, which does not allow local clustering of functions that differ only in local regions. They proposed a hybrid DP (hdp), which instead allows individual functions to be formed as a patchwork of segments locally selected from a collection of global GP realizations. Nguyen and Gelfand (8) consider properties of the hdp and propose approximations.

3 An alternative to relying on generalizations of GP priors is to use a basis representation, p f i (x) = θ ij b j (x), θ i = (θ i,..., θ ip ) P, () j= where b = {b j } p j= is a collection of basis functions, such as splines or kernels, θ i is a random effects vector, and P is the distribution of the random effects. The two primary questions that arise in using () are how to choose b and P. Addressing these questions is challenging, since the appropriate basis to use is typically not known in advance and one may need to allow varying basis functions for different subjects for sufficient flexibility. In addition, p is often large, so that P has many parameters, with simple parametric choices (e.g., Gaussian with diagonal covariance) providing a poor representation of the data. A common strategy for accommodating uncertainty in basis function selection is to pre-specify a large number of potential basis functions, and then choose a shrinkage prior distribution for the basis coefficients. For example, the relevance vector machine (Tipping, ) specifies a t-prior with zero degrees of freedom, and then obtains maximum a posteriori (MAP) estimates. Unfortunately, this choice of prior does not result in a proper posterior. An alternative is to choose a Cauchy prior, which following West (987) can be expressed as θ ij N(, κ ij ), with κ ij gamma(/, /). This prior is concentrated at zero with very heavy tails, so that coefficients for unnecessary basis functions are shrunk to zero, while coefficients for important basis functions fall in the tails. To borrow information across coefficients about the degree of shrinkage, let κ ij = κ. This approach can be generalized to the functional data analysis setting, while allowing P to be unknown in (), by letting P DP(αP ), p P = P, P Cauchy(κ), () j= where p j= P j denotes the product measure. A related specification to () - (), but without the Cauchy shrinkage, was proposed by Ray and Mallick (6) for wavelet-based functional clustering. De la Cruz-Mesia, Quintana and Müller (7) developed a discriminant analysis

4 approach for classification using a dependent Dirichlet process (MacEachern, 999; De Iorio et al., ) for the distribution of random effects underlying longitudinal markers in different groups. Specification () - () leads to global functional clustering, with f i (x) = b(x) Θ h for subjects in cluster h, where one allows for differential basis function selection automatically through letting the elements of the basis coefficient vector Θ h = (Θ h,..., Θ hp ) that are set close to zero vary across the clusters. However, the performance of the approach in accurately and sparsely characterizing complex functional data is critically dependent on global clustering. Dunson, Xue and Carin (8) and Dunson (8) proposed local generalizations of the Dirichlet process, which allow for more flexible borrowing of information through dependent local clustering. In particular, one allows θ ij = θ i j without restricting θ ij = θ i j for j j. Both of these approaches assume an exchangeable dependence structure in local clustering, so that the probability of clustering subjects i and i for the jth random effect increases by a constant amount given information that these subjects are clustered for the j th random effect. When random effects are indexed by time, space or predictors, this exchangeability assumption is overly-simplistic; instead, one should allow the relative locations of the random effects to inform the local clustering process. The richer dependence structure is needed for accurate interpolations and predictions of f i (x) across regions of X for which data are not directly available for subject i. Motivated by this problem, this article proposes a class of kernel partition process (KPP) priors for unknown mixture distributions in Bayesian hierarchical models. There is a rich literature on nonparametric Bayes modeling of random effects distributions using the Dirichlet process (Bush and MacEachern, 996; Müller and Rosner, 997; Kleinman and Ibrahim, 998) and more flexible predictor-dependent global partition processes, such as the kernel stick-breaking process (Dunson and Park, 8). The implicit assumption of global clustering underlying almost all of the literature on mixture modeling leads to major limitations

5 in terms of sparsely characterizing moderate- to high-dimensional distributions. In particular, as the dimension increases, it is increasingly unlikely that two subjects are similar with respect to all the elements of their random effects vectors. Hence, global partition models will either allocate subjects to many clusters, leading to an inefficient characterization of the data, or will inappropriately cluster subjects that are similar at most locations, obscuring local differences. In practice, both of these problems occur and one can obtain a simultaneous improvement in goodness of fit and reduction in model complexity using a carefully-specified local partition model. Section proposes the KPP formulation and discusses properties. Section develops an MCMC algorithm for posterior computation. Section considers a simulation example, Section 5 applies the methods to hormone curve data, and Section 6 discusses the results.. Kernel Partition Processes. Preliminaries Let θ i P, with P a probability measure over {Ω, B(Ω)}, where Ω is a measurable Polish space, and B(Ω) is the Borel σ-algebra of subsets of Ω. Our goal is to choose a prior P P, where P is a probability measure on the space of probability measures over {Ω, B(Ω)}. If we choose P DP(αP ) as in (), then it follows from Sethuraman (99) that P = π γ δ Θγ, Θ γ P, () γ= where γ {,..., } is a global cluster index, Pr(γ = h) = π h = πh l<h( πl ), πh beta(, α), and the unique coefficient vectors, Θ h = (Θ h,..., Θ hp ) for h =,...,, are generated independently according to P. As a local generalization of (), let P = where γ = (γ,..., γ p ) π γ γ p δ Θ γ, Θ γ = (Θ γ,..., Θ γpp), () γ = γ p= {,..., } p is a p-dimensional vector of local cluster indices, 5

6 Pr(γ = h,..., γ p = h p ) = π h h p for h j =,...,, j =,..., p, and Θ h P independently for h =,...,. The class of priors in () is similar to () in incorporating independent unique coefficient vectors sampled according to P. However, in drawing θ P, the DP prior in () allocates all the elements of θ to the same unique coefficient vector, while the local generalization in () allows the allocation to vary across the elements of θ. The allowance for local allocation should lead to a greater degree of borrowing of information across subjects and a sparser formulation of high-dimensional random effects distributions. To clarify, suppose that subjects i and i have similar random effects in elements,..., 5 but then deviate substantially in elements 6,...,. The DP prior in () should allocate these subjects to different unique coefficient vectors, so that borrowing of information in estimation of their random effects vectors relies entirely on the base parametric model P. In contrast, the prior in () allows the subjects to be allocated to the same component for elements,..., 5 but different components for elements 6,...,. The clustering in elements,..., 5 should substantially improve efficiency in estimating these random effects. In addition, by not forcing allocation to new unique coefficients for each new subject that deviates from previous subjects in any of their random effects, () favors allocation of subjects to fewer unique coefficient vectors. If the n subjects in a data set occupy k << n components, then we obtain a sparse representation of the data, with the occupied unique coefficient vectors providing a summary of important characteristics of the data. To complete a specification of (), it remains to choose a prior for π = {π h h p, h j =,...,, j =,..., p}. Petrone et al. (8) proposed using a Dirichlet prior in the finite case in which h j =,..., k, and then provided theory on the limiting case as k. In applications, they focused on the finite case and proposed a Gaussian process copula model, which results in substantial computational challenges. We propose a fundamentally different specification, which avoids truncation of the infinite-dimensional prior in () and relies on an innovative kernel specification to sparsely and flexibly induce a prior for π. 6

7 . Chinese Restaurants The proposed kernel partition process (KPP) prior is motivated by a generalization of the Chinese Restaurant Process (CRP) often discussed in considering clustering properties of the Dirichlet process (Aldous, 985; Ishwaran and James, ). The typical CRP is obtained in marginalizing out the infinitely-many atoms and stick-breaking weights in () to obtain the following predictive probability function (PPF), (θ n+ γ (n), θ (n) ) = ( ) α k (n) ( ) P + ñ(n) h δ, n =,,...,, (5) α + n h= α + n θ h where γ i = h denotes that customer i is seated at table h, γ (n) = ( γ,..., γ n ), tables are indexed in the order they are occupied as subjects i =,..., n are seated, all customers at table h are served dish θ h P, ñ (n) h = n i= ( γ i = h) is the number of individuals at table h, and k (n) is the number of occupied tables after n customers are seated. In order for the CRP to be more closely linked to the stick-breaking process in (), it is convenient to define a modification, which we refer to as the ordered CRP (ocrp). The distinguishing characteristic of the ocrp is that the tables are indexed not by the order in which they are occupied but by the order of the components in the stick-breaking representation. Because the stick-breaking weights are stochastically-decreasing, one can modify the metaphor so that the restaurant has infinitely-many tables ordered in terms of desirability, with table h = the most desirable. Lemma provides the predictive probability of allocation to table m for individual i = n + under the ocrp. Lemma. Assuming that θ i P for i =,,..., n +, with P DP(αP ) following (), γ i = h if θ i = Θ h, and γ (n) = (γ,..., γ n ), we have ( Pr(γ n+ = m γ (n) ) = + n (n) m α + + k (n) l=m n (n) l ) m h= where n (n) h = n i= (γ i = h) and k (n) = max{γ (n) }. ( α + k (n) l=h+ n (n) l α + + k (n) l=h n (n) l ), m =,...,, Teh et al. (6) proposed a hierarchical DP (HDP) as a prior for a collection of random probability measures, {P l } r l=. The HDP leads to a generalization of the CRP having multiple 7

8 restaurants with a shared collection of dishes. As a simpler hierarchical generalization of the DP that also includes shared atoms (dishes) within each P l, let P l = h= π lh δ Θh, with Θ h P, π lh = πlh m<h( πlm), and πlh beta(, α) independently for l =,..., r. In this specification, dish Θ h is served at the hth most popular table at each of the restaurants, with the seating process at each restaurant proceeding independently via Lemma. Dependence arises across the restaurants because of the incorporation of a common set of dishes with the same ordering in terms of popularity. However, neither this hierarchical ocrp or the Teh et al. (6) HDP are designed for local partitioning.. Multivariate Ordered Chinese Restaurant Process To accommodate model (), we propose a modification of the hierarchical ocrp in which each household in a city with r restaurants contains p types of individuals. Each restaurant serves dishes Θ h P = p j=p j to their hth most popular table, with individuals of type j at table h dining on Θ hj. Individuals in a household select restaurants separately, with an individual of type j selecting restaurant l with probability proportional to the popularity score of the restaurant, λ l gamma(β/r, ), multiplied by the closeness of the restaurant to the individual s preference, quantified as K ψ (z j, z l ). Here, K ψ : Z Z [, ] is a kernel, z j Z are features of type j individuals, and z l are features of restaurant l. Individuals in a household that select the same restaurant will be seated together when they arrive, with the seating following an ocrp. Theorem shows the resulting multivariate PPF. Theorem. For individual i in household j, let S ij = l denote choice of restaurant l and let γ ij = h denote seating at table h. Letting S i = (S i,..., S ip ), S (n) = (S,..., S n), γ i = (γ i,..., γ ip ), γ (n) = (γ,..., γ n ), and θ (n) = (θ,..., θ n ), (θ n+ S (n), γ (n), θ (n) ) r r { p } r ω jlj l = l p= j= l= m= π (n) lm p j= {(w (n) mj = )P j (θ n+,j ) + (w (n) mj > )δ Θmj (θ n+,j )}, 8

9 where w (n) mj = n i= (γ ij = m) is the number of individuals of type j at the mth table, ω jl = λ l K ψ (z j, z l ) rs= λ s K ψ (z j, z s), l =,..., r, j =,..., p π (n) lm = ( + n (n) lm α + + k(n) l t=m n (n) lt ) m h= ( k α + (n) l t=h+ n(n) lt α + + k(n) l t=h n(n) lt ), m =,...,, with n (n) lm = n i= (m {γ ij : S ij = l}) the number of households seated at table m in restaurant l. In Theorem, l,..., l p indexes the restaurant allocation for individuals of types,..., p, respectively, ω jl is the probability an individual of type j chooses restaurant l, and m indexes the table assignment of household n+ within restaurant l. If an individual of type j is seated at a table number that has not had an individual of that type previously, they will be given a new dish sampled from P j, with this dish served to all future individuals of type j seated at the same numbered table. It is clear from Theorem that most of the probability will be allocated to the first few tables in each restaurant when α is small, leading to a sparse formulation in which all individuals occupy a few clusters. In addition, when β is small, the popularity of the restaurants will vary greatly, leading to a small number of dominant restaurants that attract most of the customers unless the kernel is very tight. Proposition. Under Theorem, where ω j = (ω j,..., ω jr ). (i) Pr(γ ij = γ ij ) = ω jω j + ω jω j + α = ρ jj, for j j, (ii) Pr(γ ij = γ i j ) = ω jω j + α + ω jω j + α = κ jj, for any j, j, Proposition (i) shows the probability of allocating two individuals in a household of type j and j to the same numbered table and hence the same cluster index, while proposition (ii) shows the probability of assigning individuals from two different households to the same cluster index. For concreteness, consider a longitudinal data application in which 9

10 f i (x) = b(x) θ i, with b j (x) = exp{ ψ b (x z j ) } a kernel basis function located at time z j for j =,..., p = r. The within-subject pairwise dependence structure in the vector of cluster indices γ i is characterized by proposition (i), while the cross-subject dependence is characterized by proposition (ii). Both within- and across-subjects, it is more likely for random effects that are located close together to be allocated to the same cluster index. The ω jω j term is a cross-product of weighted and normalized kernel functions, so induces a highly flexible dependence structure, as is partially illustrated in Figure. The hyperparameter α provides a global control on clustering, with small values implying a high probability of being allocated to the same cluster index even for random effects located far apart. Indeed, in the limiting case lim α ρ jj = κ jj =, and θ i δ Θ, so that all subjects are allocated to a single global cluster. A small but non-zero α instead implies a sparse formulation with a combination of short and long range dependence in clustering. Figure provides some representative realizations from the prior, illustrating the impact of local clustering on the functions.. Driving Measure The definition of the multivariate ocrp bypasses the explicit specification of a prior for the random measure P. However, it can be shown that the multivariate ocrp in Theorem has a driving measure P KPP(α, β, ψ, P ), with the kernel partition process (KPP) prior specified as in () with π h h p = T T l:t l Pr(φ l = {h j, j T l } ν l ) j T l ω jl, (6) where T = (T,..., T p ) is a partition of the index set {,..., p} into r locations, with T l {,..., p} the indices at location l for l =,..., r, T is the set of all p! such partitions, {h j, j T l } are the cluster indices for random effects allocated to location l, φ l is a random cluster index specific to location l, Pr(φ l = h) = (#h = )ν lh, #h is the cardinality of the set h, ν lh = νlh m<h( νlm), and νlh beta(, α) for h =,...,.

11 Letting θ i P, with P KPP(α, β, ψ, P ), we obtain the hierarchical model θ i = Θγ i, Θγ i = (Θ γi,..., Θ γip p), Θ h P, γ ij r ω jl δ φil, l= φ il ν lh δ h, (7) h= with γ i = (γ i,..., γ ip ) {,..., } p a multivariate cluster index for subject i, and φ i = (φ i,..., φ ir ) a vector of latent cluster indices assigned to locations,..., r. In the restaurant terminology, φ il denotes the latent table assignment of any individuals from household i choosing restaurant l, while γ ij denotes the actual table assignment for the jth individual from household i. Formulation (7) will form the basis for the computational algorithm proposed in Section..5 Some Special Cases It is interesting to consider some special cases of the KPP. First, suppose K ψ (z j, z l ) = (j = l), with z j = z j for j =,..., p = r. In this case, there is a distinct restaurant catering to each type of individual, so that all individuals of type j go to restaurant j for j =,..., p. If P = p j= P j, then it is straightforward to show that we obtain θ ij P j with P j DP(αP j ) independently for j =,..., p. Hence, for a particular choice of kernel, the KPP reduces to independent Dirichlet process priors on the distribution for each random effect. These independent DP priors will induce independent clustering of subjects with respect to each element of the random effects vector. By modifying the stick-breaking prior on the distribution of the φ il s in (7), one can induce independent species sampling priors of other types for each P j. As another extreme case, suppose that r = so that there is a single restaurant visited by all individuals. In this case, it is straightforward to show that θ i P, with P DP(αP ). This same global DP prior can also be induced by letting K ψ (z j, z l ) = for all j, l and β. To illustrate this, note that when the kernel is a constant equal to one, ω j = ω = (ω,..., ω r ), with ω l = λ l / r m= λ m. Since λ l gamma(β/r, ), we obtain

12 ω Dirichlet(β/r,..., β/r). As β decreases, ω converges in distribution to δ vξ, with v ξ an r vector of zeros with a single one in position ξ r l= (/r)δ l. Hence, from (7) it is clear we obtain γ ij = γ i = φ iξ for all i, j, leading to P DP(αP ). By replacing the stick-breaking priors on the φ il s in (7), we can induce an arbitrary species sampling prior on P. In addition to independent and global DP priors, we can induce an exchangeable withinsubject dependence structure by letting K ψ (z j, z l ) = with β >. In this case, it follows from proposition that Pr(γ ij = γ ij ) = ( ) α λ + α + λ + α (λ, for all i, j, r ) with r an r vector of ones. As β decreases, the λ λ/(λ r ) term will converge in distribution to δ, so that Pr(γ ij = γ ij ) converges to one. As a sparse version of the KPP, which has computational advantages in avoiding the introduction of a separate stick-breaking process for each random effect, we propose to let ν lh = ν h = ν h m<h( ν m) with ν h beta(, α). Because the seating processes for the different restaurants are then coupled, Theorem can be modified by simply replacing n (n) lm in the expression for π (n) lm with n(n) m = r l= n (n) lm, so that π(n) lm = π(n) m for all restaurants. In addition, proposition (i)-(ii) is modified by replacing the /( + α) multipliers with /( + α), so that Pr(γ ij = γ i j ) = /( + α). Proposition. Let A : S ij = S ij denote allocation of subjects j and j from household i to the same restaurant, B : S i j = S i j denotes allocation of subjects j and j from household i to the same restaurant, and A and B denote the complements of A and B, respectively. Then, under the sparse KPP, Pr(θ ij = θ i j θ ij = θ i j ) = { + + α + α + (6 + α) }, ( + α)( + α) where = (,, ) is a probability vector, with = Pr(A B) = (ω jω j ), = Pr{(A B) (A B)} = ω jω j ω j( ω j ) and = Pr(A B) = {ω j( ω j )}.

13 As Pr(θ ij = θ i j) = /( + α), it is clear from proposition that the probability of clustering subjects i and i for their jth random effect increases given information that these subjects are clustered for their j th random effect. This is an appealing property in allowing borrowing of information in local clustering. The magnitude of the multiplicative increase in { } tends to decrease as the distance between the jth and j th random effect increases, though the function can take an extremely wide variety of shapes and is not restricted to be monotone due to the incorporation of the adaptive weights λ.. Posterior Computation For posterior computation, we propose a Markov chain Monte Carlo (MCMC) algorithm, which is a hybrid of data augmentation, the exact block Gibbs sampler of Papaspiliopoulos (8), and Metropolis sampling. Papaspiliopoulos (8) proposed the exact block Gibbs sampler as an efficient approach to posterior computation in Dirichlet process mixture models, modifying the block Gibbs sampler of Ishwaran and James () to avoid truncation approximations. The exact block Gibbs sampler combines characteristics of the retrospective sampler (Papaspilioupoulos and Roberts, 8) and the slice sampler (Walker, 7). For concreteness, we focus on a functional data analysis model with y it t υ (f i (x it ), τ), where t υ (µ, τ) denotes the t-density centered on µ, with υ degrees of freedom and scale parameter τ. In addition, f i (x) follows () with θ i P and P KPP(α, β, ψ, P ), focusing on the sparse formulation and letting P = p j=p where P denotes a Cauchy prior centered on zero. The Cauchy prior is an appealing choice for robust shrinkage of the basis coefficients, as small coefficients will tend to be shrunk to very close to zero, while larger coefficients fall in the tails (Bhuiyan et al., 7). To complete a Bayes specification, let α gamma(a α, b α ), β gamma(a β, b β ), ψ gamma(a ψ, b ψ ), υ gamma(a υ, b υ ), and τ gamma(a τ, b τ ). From West (987), the t-density can be expressed as a scale mixture of Gaussians, which results in y it N(b itθ i, τ ϕ it ), b it = b(x it ), ϕ it gamma(υ/, υ/), P = N(, κ ) and

14 κ gamma(/, /). Letting S ij {,..., r} denote the location γ ij is allocated to, the algorithm proceeds as follows:. Update S ij for all i, j from the multinomial conditional posterior, with Pr(S ij = h ) = ω ni jh t= N(y it ; b itθγ i (S ij =h), τ ϕ it ) rl= ω ni jl t= N(y it ; b itθγ i (S ij =l), τ ϕ, h =,..., r, (8) it ) where Θγ i (S ij =h) equals (Θ γi,..., Θ γip p) but with the current value of γ ij set to φ ih.. To update λ h, for h =,..., r, use a data augmentation approach related to Holmes and Held (6) and Dunson, Pillai and Park (7). Letting K jh = K ψ (z j, z h) and Kjh = K jh /( l h λ l K jl ), the conditional likelihood for λ h is p ( λh K ) jh i L(λ h ) = (S ij=h) ( ) i (S ij h), + λ h Kjh + λ h Kjh j= which can be obtained via (S ij = h) = (Zijh > ), with Zijh Poisson(λ h ξ ijh Kjh) and ξ ijh exp(). Update {Zijh, ξ ijh } and {λ h } in Gibbs steps: (a) Let Z ijh = if S ij h and otherwise Z ijh Poisson(λ h ξ ijh K jh)(z ijh > ). (b) ξ ijh gamma( + Z ijh, + λ h K jh). (c) λ h gamma(β/r + i,j Z ijh, + i,j ξ ijh K jh).. Update ψ, β, υ using Metropolis steps.. Update ϕ it, i =,..., n, t =,..., n i, and τ by sampling from gamma full conditionals ( υ + (ϕ it ) gamma, υ + τ (y it b itθ i ) ), ( (τ ) gamma a τ + n n i, b τ + n n i ϕ it (y it b i= itθ i ) ). i= t= 5. Implement exact block Gibbs sampler steps:

15 (a) Sample u ih uniform(, ν φih ), for i =,..., n, with ν l = ν l (b) Sample the stick-breaking random variables, m<l( ν m). ν l ( r n r n ) beta + (φ ih = l), α + (φ ih > l), h= i= h= i= for l =,..., φ, with φ the minimum value satisfying ν ν φ > min{u ih }. (c) Update Θ l, for l =,..., φ, from N p ( Θ l, Σ Θl ) with Θ l = Σ Θl n n i i= t= τϕ it Γ il b it (y it b itγ i( l) θ i ), Σ Θl = ( κi p + n n i τϕ it Γ il b it b itγ il ), i= t= where I p is the p p identity matrix, Γ il = diag((γ i = l),..., (γ ip = l)) and Γ i( l) = diag((γ i l),..., (γ ip l)). (d) Update φ ih, i =,..., n, h =,..., r from the multinomial conditional with Pr(φ ih = l ) (u ih < ν l ) where y (h) it n i t= ( N y (h) p it ; j= = y it p j= (S ij h)b itj θ ij. 6. Update α by sampling from the conditional posterior ( (α ) gamma a α + φ, b α + 7. Update κ by sampling from the conditional posterior ) (S ij = h)b itj Θ lj, τ ϕ it, l =,..., φ, φ l= ( (κ ) gamma + pφ, + ) log( νl ). φ p Θ lj l= j= This algorithm is straightforward to program and has exhibited good rates of convergence and mixing in simulated and real data applications we have considered. We also considered using the slice sampler of Walker (7) in place of the exact block Gibbs sampling steps, but this resulted in considerably slower rates of convergence and mixing. Adaptations for 5 ).

16 more complex random effects models, which include fixed and random effects and other complications, are trivial by imbedding steps similar to those outlined above in an MCMC algorithm that includes additional steps to update fixed effects and any additional unknowns involved in the more complex model.. Simulation Example To assess the performance of the approach, we start by considering a simulation example. We assumed n = and simulated data under the functional data analysis model described in Section, with n =, υ =, τ =, X = (, ), b (x) =, b j+ (x) = exp{ 5(x z j ) } and z j = (j )/9, for j =,..., 9. In addition, we let n i = plus a discrete uniform random variable on [,..., ], for i =,..., n, and simulated t ij Uniform(, C i ), with C i = for the first 5 subjects and C i = / for the second 5 subjects. We let Θ h = (Θ h,..., Θ l ), for h =,,, with the elements Θ hl.8δ +.N(, ) independently. Then, letting θ ij = Θ γij j, for j =,..., p, We simulated the elements of γ i = (γ i,..., γ ip ) from a Markov chain, with γ i set to a random element of {,, }, γ ij = γ i,j with probability.9 and γ ij otherwise set to a random element of {,, }. To implement our Bayesian analysis, the approach of Section was applied after choosing priors for each of the parameters by letting a α = b α =, a β = b β =, a ψ = 5, b ψ =, a υ =, b υ =, and a τ = b τ =.. These were considered reasonable default values when X = (, ) and the overall variance of y is within a factor of of. In the absence of prior knowledge of the scale of y, one can standardize the data. In addition, we let K ψ (z, z ) = exp{ ψ(z z ) } and z j = zj, for j =,..., 9. Note that these values of z j, zj, and p provide a reasonable default for smooth curves, as there are sufficient numbers of equally-spaced kernels to capture a very wide variety of smooth curve shapes. The results are robust to increases in the number of kernels due to the adaptive shrinkage resulting from allowing the data to inform about κ and β. The MCMC algorithm was run for,5 6

17 iterations including a 75 iteration burn-in, with the chain thinned by collecting every 5 iterations due to storage constraints. Based on examination of trace plots of the parameters and function estimates at a variety of points for different subjects, we observed rapid rates of apparent convergence and mixing. Note that it is important to avoid diagnosing convergence based on examination of the unique coefficient vectors, Θ h, due to the well-known label switching issue, which is omnipresent in mixture models (Jasra, Holmes and Stephens, 5). This label switching does not create problems as long as the focus of inference is not on mixture component-specific quantities and one obtains good rates of convergence and mixing in quantities of interest, such as individual-specific function estimates. The focus of this article is on using local partitioning as a highly-flexible approach for sparse modeling of unknown random effects distributions underlying functional data and not on clustering as a focus of the analysis. Figure shows the data, estimated posterior mean curves (solid lines), 95% pointwise credible intervals (dashed lines) and true curves (dotted lines) for subjects,..., 8 (top 8 panels) and subjects 5,..., 58 (bottom 8 panels). Recall that subjects,..., 5 have observations throughout the [, ] interval, while subjects 5,..., have no observations in the [/, ] interval. For subjects,..., 5 the estimates are very close to the true curves, while there is some deviation after the / point for some of the 5,..., subjects. However, in general, predictions across this interval are quite good, with the true curves enclosed in the credible bounds. The average mean square error across a dense grid of times between and is.56 and the mean width of 95% credible intervals is.96. Repeating the analysis for the LPP prior of Dunson (8), the results are very good at locations close to data points. However, interpolations and predictions are not as good as for the KPP. For example, for the subject in the (,) panel there is a substantial gap in the observations. Across this gap, the 95% credible intervals are much wider in the LPP analysis. In addition, across the [/,] region for many of the subjects 5,...,, the LPP estimates are substantially farther from 7

18 the truth than the KPP-based estimates. The mean square error for the LPP is Application to Hormone Curve Data We consider progesterone curve data in early pregnancy previously analyzed by Dunson (8) using a functional data analysis model with kernel basis functions and t-distributed measurement errors. That article demonstrated improved performance for a local partition process (LPP) prior on the random effects distribution relative to a DP. Data consisted of daily urinary measurements of PdG, a metabolite of progesterone, starting on the identified day of ovulation and continuing for up to additional days. There were 65 women, with an average of measurements per women (range = [,]). To analyze these data using the KPP prior for the random effects distribution, we implemented the approach used in Section, with the same prior specification. As in the simulation examples, rates of apparent convergence and mixing were good. In examining plots of the individual curve estimates for each of the 65 women, it is apparent that the estimates fit the data very well. Figure shows the estimated curves and 95% pointwise credible intervals for 6 randomly selected women. There is a small but notable improvement in fit in using the KPP prior for the random effects distribution instead of the LPP prior. The improvement in fit is not attributable to over-fitting, as it is clear that a very sparse representation of the data is obtained in examining the estimated parameters in Table. In particular, the small α value suggests that a small number of unique coefficient vectors are sufficient to characterize the data. Indeed, the estimate of φ was.7, implying that the basis coefficient vectors for all 65 women are constituted of elements selected from 5 unique coefficient vectors. In addition, the small value of β suggests the occurrence of a few dominate locations with much higher λ h values, again leading to a sparse characterization. A primary motivation for the KPP over the LPP is that the incorporation of information 8

19 on the relative locations of the basis functions should allow substantial improvements in prediction. Although the LPP is flexible enough to provide a good fit to complex functional data, one may expect diminished predictive performance when there is no observed data available for a subject at times close to the time of interest. To assess predictive performance, we repeated the analysis holding out the last 5 observations for 5 women randomly selected from among the women having at least observations. Figure shows the true values of y it for these held-out observations versus the predicted values, with 95% predictive intervals shown with light dotted lines. The correlation between the true and predicted values was.866 and the mean square predictive error was.5. Given that hormone trajectories are difficult to predict more than a few days out, this is very good performance. For the first held out observation the correlation was very high at.99, while for the second to fifth observations the correlations were.898,.765,.7 and.685, respectively. For comparison, we repeated the analysis using the LPP prior with the same held-out observations. Figure 5 shows the true values of y it versus the predictive values. The results are clearly not as good as those shown in Figure for the KPP, with a substantial subset of the points moving much further away from the line. The correlation between y it and ŷ it diminished to.795, the mean square predictive error was. and the correlations for observations -5 were.99,.85,.58,.5, and.89, respectively. As expected, the LPP has good predictive performance when the subject has data available close to the time of interest, but the performance decays rapidly with an increasing time gap. Repeating the analysis also for a DP prior on the random effects, the mean square predictive error was.9 and the correlation between y it and ŷ it was.68, suggesting substantially worse predictive performance than for either the KPP or LPP. 6. Discussion This article has proposed a new nonparametric Bayes prior for unknown random effects 9

20 distributions, allowing for flexible local borrowing of information across subjects through dependent local partitioning. The KPP has clear advantages over previous nonparametric Bayes methods based on global partitioning, such as the Dirichlet process. In particular, the KPP favors a sparser representation of the data in allowing subjects to have identical basis coefficients without forcing their functions to be identical. Although functional clustering is a useful exploratory tool, it seems unlikely that functions for two different subjects are exactly identical, though the functions may be locally similar within particular regions. Our goal is not to obtain functional clusters but to use local partitioning as a tool for sparsely characterizing complex functional data, while facilitating borrowing of information in a flexible manner. In addition to sparse characterization of unknown random effects distributions, there are clear applications of the KPP to multiple changepoint detection and image segmentation. In such settings, it is useful to consider a minor modification of the formulation to replace the vector Θ h = (Θ h,..., Θ hp ) with a scalar Θ h, letting θ ij = Θ γij, for j =,..., p. Then, using piecewise constant or linear basis functions, so that z = (z,..., z p ) is a vector of potential knot locations, a changepoint in f i occurs at z j if γ ij γ ij. The KPP will automatically borrow information on changepoint locations within- and across-subjects using a more flexible dependence structure than commonly-used Markov models. In addition, computation is straightforward using the proposed MCMC algorithm. Appendix Proof of Theorem Theorem follows in a straightforward manner from Lemma, so it suffices to prove Lemma. Letting k (n) = max{γ i } n i= and n h = n i= (γ i = h), we have Pr(γ,..., γ n ) = E [ k (n) { π h h= l<h } ] nh ( πl )

21 = = k (n) h= k (n) h= E { (πh) n h ( πh) n h l= n } l Be ( n h +, k (n) l=h+ n l + α ) Be(, α), where Be( ) is the beta function. Letting k (n+) = max{k (n), m}, a h = n h +, and b h = k (n) l=h+ n l + α, we have Pr(γ,..., γ n, γ n+ = m) = k (n+) h= Be ( a h + (m = h), b h + (m > h) ) Be(, α) Noting that Be(a+, b)/be(a, b) = a/(a+b) and Be(a, b+)/be(a, b) = b/(a+b) from properties of the beta function, Lemma is obtained as the ratio of the previous two expressions, Pr(γ n+ = m γ (n) ) = { m h= ( bh a h + b h )} am a m + b m, m =,...,.. Proof of Proposition Lemma can be calculated as the probability that individuals i, j and i, j choose the same restaurant, which is trivially calculated as ω jω j, plus the probability that these individuals choose different restaurants ( ω jω j ) multiplied by the probability of being assigned to the same numbered table at different restaurants, m= {( ) ( )} α = + α t<m + α = ( ) {( α ) } m + α m= + α + α, where the last identity is a consequence of the infinite geometric series identity. In addition, Pr(γ ij = γ i j) equals the probability that the jth individual in households i and i choose the same table in the same restaurant [ω jω j /( + α)] plus the probability they choose the same table in different restaurants [( ω jω j )/( + α)]. References

22 Aldous, D.J. (985). Exchangeability and related topics. In École d Été de Probabilitiés de Saint-Flour XII (Edited by P. L. Hennequin). Springer Lecture Notes in Mathematics, Vol. 7. Behseta, S., Kass, R.E. and Wallstrom, G.L. (5). Hierarchical models for assessing variability among functions. Biometrika, 9, 9-. Bigelow, J.L. and Dunson, D.B. (7). Bayesian adaptive regression splines for hierarchical data. Biometrics, 6, 7-7. Bhuiyan, M.I.H., Ahmad, M.O. and Swamy, M.N.S. (7). Spatially adaptive waveletbased method using the Cauchy prior for denoising SAR images. IEEE Transactions on Circuits and Systems for Video Technology, 7, Bush, C.A. and MacEachern, S.N. (996). A semiparametric Bayesian model for randomised block designs. Biometrika, 8, De Iorio, M, Müller, P., Rosner, G.L. and MacEachern, S.N. (). An ANOVA model for dependent random measures. Journal of the American Statistical Association, 99, 5-5. De la Cruz-Mesia, R., Quintana, F.A. and Müller, P. (7). Semiparametric Bayesian classification with longitudinal markers. Applied Statistics, 56, 9-7. Dunson, D.B. (8). Nonparametric Bayes local partition models for random effects. Biometrika, to appear. Dunson, D.B. and Park, J.-H. (8). Kernel stick-breaking processes. Biometrika, 95, 7-. Dunson, D.B., Pillai, N. and Park, J.-H. (7). Bayesian density regression. Journal of the Royal Statistical Society B, 69, 6-8.

23 Dunson, D.B., Xue, Y. and Carin, L. (8). The matrix stick-breaking process: Flexible Bayes meta analysis. Journal of the American Statistical Association,, 7-7. Holmes, C.C. and Held, L. (6). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis,, Ishwaran, H. and James, L.F. (). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96, 6-7. Ishwaran, H. and James, L.F. (). Generalized weighted Chinese restaurant processes for species sampling mixture models. Statistica Sinica,, -5. Jasra, A., Holmes, C.C. and Stephens, D.A. (5). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science,, Kaufman, C. and Sain, S. (7). Bayesian functional ANOVA modeling using Gaussian process prior distributions. Journal of Computational and Graphical Statistics, submitted. Kleinman, K.P. and Ibrahim, J.G. (998). A semiparametric Bayesian approach to the random effects model. Biometrics, 5, MacEachern, S.N. (999). Dependent nonparametric processes. In ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA: American Statistical Association. Morris, J.S. and Carroll, R.J. (6). Wavelet-based functional mixed models. Journal of the Royal Statistical Society B, 68, Müller, P. and Rosner, G.L. (997). A Bayesian population model with hierarchical mixture

24 priors applied to blood count data. Journal of the American Statistical Association, 9, Nguyen, X. and Gelfand, A.E. (8). The Dirichlet labeling process for functional data analysis. Technical Report, 8-7, Department of Statistical Science, Duke University. Papaspiliopoulos, O. (8). A note on posterior sampling from Dirichlet process mixture models. Working Paper, 8-, Centre for Research in Statistical Methodology, University Warwick, Coventry, UK. Papaspiliopoulos, O. and Roberts, G. (8). Retrospective MCMC for Dirichlet process hierarchical models. Biometrika 95, Petrone, S., Guindani, M. and Gelfand, A.E. (8). Hybrid Dirichlet processes for functional data. Journal of the Royal Statistical Society B, revision invited. Ray, S. and Mallick, B. (6). Functional clustering by Bayesian wavelet methods. Journal of the Royal Statistical Society B, 68, 5-. Sethuraman, J. (99). A constructive definition of Dirichlet priors. Statistica Sinica,, Tipping, M.E. (). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, -. Thompson, W. and Rosen, O. (8). A Bayesian model for sparse functional data. Biometrics, 6, 5-6. Walker, S.G. (7). Sampling the Dirichlet mixture model with slices. Communications in Statistics: Simulation and Computation, 6, 5-5. West, M. (987). On scale mixtures of normal distributions. Biometrika, 7,

25 Table. Posterior summaries of parameters in KPP analysis of hormone curve data. Posterior Summary Parameter Mean Median SD 95% CI α..8.6 [.8,.] β...7 [.8,.5] τ [.7,.7] ψ...9 [.8, 6.56] υ.6.6. [.,.] κ [.6, 78.6] φ [., 9.5] 5

26 α=, β=, ψ=5 α=, β=., ψ= ρ.5 ρ time time α=, β=., ψ= α=, β=., ψ= ρ.5 ρ time time Fig.. Samples of Pr(γ ij = γ ij ) under the kernel partition process with various choices of hyperparameters. We let z j = z j, for j =,..., 5 correspond to an equally-spaced grid of times between and, with j = 5 and K(x, z j ) = exp{ ψ(x z j ) }. 6

27 α=., β=, ψ=5 8 α=., β=., ψ= f i (t) f i (t) time (t) time (t) α=, β=., ψ= α=, β=., ψ= f i (t) f i (t) time (t) time (t) Fig.. Samples from the prior for f i under the functional data model () with a KPP prior on the distribution of the basis coefficients across subjects for different choices of hyperparameter values. 7

28 Fig.. Results for simulation example. The upper 8 plots are for subjects having observations distributed randomly in [,], while the lower 8 plots are for subjects having observations in [/,]. The solid lines are posterior mean curves, dashed lines are 95% pointwise credible intervals, and dotted lines are true curves. 8

29 Fig.. Log(PdG) data and KPP-based function estimates for 6 randomly selected women. The data points are marked with, the posterior means are solid lines, and 95% credible intervals are dashed lines. 9

30 Out of Sample Predictive Performance Predicted value True value of y it Fig. 5. Out-of-sample predictive performance for the KPP. The last 5 log(pdg) observations for 5 women randomly selected from those with or more observations were held out.

31 Out of Sample Predictive Performance Predicted value True value of y it Fig. 6. Out-of-sample predictive performance for the LPP (Dunson, 8). The last 5 log(pdg) observations for 5 women randomly selected from those with or more observations were held out.

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Nonparametric Bayes Modeling

Nonparametric Bayes Modeling Nonparametric Bayes Modeling Lecture 6: Advanced Applications of DPMs David Dunson Department of Statistical Science, Duke University Tuesday February 2, 2010 Motivation Functional data analysis Variable

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Nonparametric Bayes regression and classification through mixtures of product kernels

Nonparametric Bayes regression and classification through mixtures of product kernels Nonparametric Bayes regression and classification through mixtures of product kernels David B. Dunson & Abhishek Bhattacharya Department of Statistical Science Box 90251, Duke University Durham, NC 27708-0251,

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J. E. Griffin and M. F. J. Steel University of Warwick Bayesian Nonparametric Modelling with the Dirichlet Process Regression

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

Bayesian isotonic density regression

Bayesian isotonic density regression Bayesian isotonic density regression Lianming Wang and David B. Dunson Biostatistics Branch, MD A3-3 National Institute of Environmental Health Sciences U.S. National Institutes of Health P.O. Box 33,

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process. Yee Whye Teh, University College London Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Distance dependent Chinese restaurant processes

Distance dependent Chinese restaurant processes David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232

More information

Bayesian Nonparametric Regression through Mixture Models

Bayesian Nonparametric Regression through Mixture Models Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

Hierarchical Dirichlet Processes

Hierarchical Dirichlet Processes Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Stochastic Processes, Kernel Regression, Infinite Mixture Models Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Journal of Data Science 8(2010), 43-59 A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models Jing Wang Louisiana State University Abstract: In this paper, we

More information

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Ian Porteous, Alex Ihler, Padhraic Smyth, Max Welling Department of Computer Science UC Irvine, Irvine CA 92697-3425

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

A nonparametric Bayesian approach to inference for non-homogeneous. Poisson processes. Athanasios Kottas 1. (REVISED VERSION August 23, 2006)

A nonparametric Bayesian approach to inference for non-homogeneous. Poisson processes. Athanasios Kottas 1. (REVISED VERSION August 23, 2006) A nonparametric Bayesian approach to inference for non-homogeneous Poisson processes Athanasios Kottas 1 Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California,

More information

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Athanasios Kottas 1 and Gilbert W. Fellingham 2 1 Department of Applied Mathematics and Statistics, University of

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Modeling and Predicting Healthcare Claims

Modeling and Predicting Healthcare Claims Bayesian Nonparametric Regression Models for Modeling and Predicting Healthcare Claims Robert Richardson Department of Statistics, Brigham Young University Brian Hartman Department of Statistics, Brigham

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

arxiv: v1 [stat.me] 3 Mar 2013

arxiv: v1 [stat.me] 3 Mar 2013 Anjishnu Banerjee Jared Murray David B. Dunson Statistical Science, Duke University arxiv:133.449v1 [stat.me] 3 Mar 213 Abstract There is increasing interest in broad application areas in defining flexible

More information

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical

More information

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Nonparametric Mixed Membership Models

Nonparametric Mixed Membership Models 5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................

More information

A Nonparametric Bayesian Model for Multivariate Ordinal Data

A Nonparametric Bayesian Model for Multivariate Ordinal Data A Nonparametric Bayesian Model for Multivariate Ordinal Data Athanasios Kottas, University of California at Santa Cruz Peter Müller, The University of Texas M. D. Anderson Cancer Center Fernando A. Quintana,

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics

More information

Normalized kernel-weighted random measures

Normalized kernel-weighted random measures Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,

More information

Efficient adaptive covariate modelling for extremes

Efficient adaptive covariate modelling for extremes Efficient adaptive covariate modelling for extremes Slides at www.lancs.ac.uk/ jonathan Matthew Jones, David Randell, Emma Ross, Elena Zanini, Philip Jonathan Copyright of Shell December 218 1 / 23 Structural

More information

Dependent hierarchical processes for multi armed bandits

Dependent hierarchical processes for multi armed bandits Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Infinite Latent Feature Models and the Indian Buffet Process

Infinite Latent Feature Models and the Indian Buffet Process Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Hierarchical Dirichlet Processes with Random Effects

Hierarchical Dirichlet Processes with Random Effects Hierarchical Dirichlet Processes with Random Effects Seyoung Kim Department of Computer Science University of California, Irvine Irvine, CA 92697-34 sykim@ics.uci.edu Padhraic Smyth Department of Computer

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER Statistica Sinica 20 (2010), 1507-1527 BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER J. E. Griffin and M. F. J. Steel University of Kent and University of Warwick Abstract:

More information

Multivariate Bayes Wavelet Shrinkage and Applications

Multivariate Bayes Wavelet Shrinkage and Applications Journal of Applied Statistics Vol. 32, No. 5, 529 542, July 2005 Multivariate Bayes Wavelet Shrinkage and Applications GABRIEL HUERTA Department of Mathematics and Statistics, University of New Mexico

More information