Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Size: px
Start display at page:

Download "Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother"

Transcription

1 Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J.E. Griffin and M. F. J. Steel University of Kent and University of Warwick Abstract In this paper we discuss implementing Bayesian fully nonparametric regression by defining a process prior on distributions which depend on covariates. We consider the problem of centring our process over a class of regression models and propose fully nonparametric regression models with flexible location structures. We also introduce a non-trivial extension of a dependent finite mixture model proposed by Chung and Dunson (7) to a dependent infinite mixture model and propose a specific prior, the Dirichlet Process Regression Smoother, which allows us to control the smoothness of the process. Computational methods are developed for the models described. Results are presented for simulated and actual data examples. Keywords: Nonlinear regression; Nonparametric regression; Model centring; Stick-breaking prior Introduction Nonparametric regression offers the ability to accurately model relationships between covariates and responses that are not well represented by simple parametric models. For example, we may have non-linearity in the mean, heteroscedasticity or changing numbers of modes. Here, we take a Bayesian approach. There is a substantial literature on estimation of a nonlinear mean function (see, e.g., Denison et al, for a review) with parametric errors. There has been some work on non-linear heteroscedasticity, most notably in Kohn et al. (), by modelling the standard deviation as a function of covariates with parametric errors, which was Jim Griffin acknowledges research support from The Nuffield Foundation grant NUF-NAL/78.

2 extended to errors drawn from a single unknown distribution in Leslie et al (7). In this paper, we will develop fully nonparametric models that allow the whole distribution of the response to change with covariates. The challenge is to define models that make a minimal amount of assumptions but allow effective inference. Assume that a response, y, observed at a covariate value x can be expressed as y = m(x) + σ(x)ɛ, where ɛ has expectation and variance, and its distribution may depend on x. The parameters m(x) and σ(x) are the mean and the variance of the response with covariate x. In Bayesian inference there has been substantial work on fitting this model when ɛ follows a parametric distribution with constant variance σ (x) = σ. Many priors for flexible estimation of the mean function have been proposed. Examples include Gaussian processes (Neal 998), or basis function regression, such as a spline basis or a wavelet basis. Gaussian process priors have a long history in Bayesian nonparametric analysis dating back to at least O Hagan (978) and have recently seen renewed interest in the machine learning community (e.g. Neal 998). The basis function approach is described in Denison et al. (). Wahba (978) gives a Bayesian interpretation of spline smoothing, which is further explored in Speckman and Sun (3). Models that relax the assumption of constant σ (x) are developed in Kohn et al. (). Leslie et al. (7) extend these models to nonparametric inference about the distribution of ɛ (which is assumed independent of x). An alternative approach investigated in the literature is so-called density regression or density smoothing, which defines a nonparametric prior for an unknown distribution (random probability measure) that changes with the covariate value. Much recent work on Bayesian density estimation from a sample y, y,..., y n has been based around extending nonparametric mixture models which are defined by y i k(ψ i ), ψ i F, F Π, () where k( ) is a probability density function (pdf) parameterised by ψ and the prior Π places support across a wide range of distributions. Many choices for Π are possible including Normalized Random Measures (James et al. 5) and the Dirichlet Process (Ferguson 973). Some other methods are reviewed by Müller and Quintana (4). In this paper we will concentrate on the stick-breaking class of discrete distributions for which F d = p i δ θi, () i= where δ θ denotes a Dirac measure at θ and p i = V i j<i ( V j). The sequence V, V, V 3,... is independent and V i Be(a i, b i ) while θ, θ, θ 3,... are i.i.d. from some distribution H. The

3 Dirichlet process occurs as a special case if a i = and b i = M for all i. If we observe pairs (x, y ), (x, y ),..., (x n, y n ) and we wish to estimate the conditional distribution of y given x, then a natural extension of () and () above defines d y i k(ψ i ), ψ i F x, F x = p j (x)δ θj (x). (3) Many previously defined processes fall within this class: if p i (x) = p i then we have the single p DDP model (MacEachern, ) which has been applied to spatial problems (Gelfand et al., 5) and quantile regression (Kottas and Krnjajic, 5). Often it is assumed that θ i (x) follows a Gaussian process. Such models are flexible but make it hard to control specific features of the modelling. Alternatively, we could assume that θ i (x) is a given function described by θ i (x) = K(x; φ i ) parameterised by φ i, and k( ) is the pdf of a Normal(, σ ) distribution; an alternative nonparametric representation is then j= y i N(K(x i ; φ i ), σ ), φ i i.i.d. F, F Π. An example in regression analysis is the ANOVA-DDP model of De Iorio et al. (4) who model replicated data from a series of experiments using combinations of treatment and block levels. A standard parametric approach would be ANOVA. The function K(x; φ i ) mimics the form of the group means in an ANOVA analysis by taking a linear combination of treatment effects, block effects and interactions. Other authors have considered a regression model for p i (x). Dunson et al. (7) define dependent measures by defining each measure to be an average of several unknown, latent distributions. Allowing the weights to change with covariates will allow the distributions to change. Griffin and Steel (6) (hereafter denoted by GS), Dunson and Park (8) and Reich and Fuentes (7) exploit the stick-breaking construction of random measures. The construction of GS transforms an infinite sequence of independent beta random variables V, V, V 3,... and an infinite sequence of i.i.d. random variables θ, θ, θ 3,... into a distribution F using (). GS then define dependent Dirichlet processes by permuting the sequence V, V, V 3,... at different covariate values of x. Dunson and Park (8) and Reich and Fuentes (7) use the stick-breaking construction on a finite sequence of random variables on [,]. Dependence can be simply introduced by allowing each element of this sequence to depend on x. For example, V i (x) = K(x, φ i )W i where < K(x, φ i ) < and W i is beta distributed. This induces dependence of p i (x). However, it is usually hard to link the correlation of V i (x) to the correlation of p i (x). This makes these models difficult to implement in general. Chung and Dunson (7) define a specific example of this type of process, the Local Dirichlet process, which defines the K(x, φ i ) to be a ball of radius φ i around x. In this case, it is simple to relate the correlation in p i (x) to the properties of the kernel and the choice of the prior for φ i. One purpose of this paper is to extend this class of models to the nonparametric case where we have an infinite number of atoms. 3

4 The methods developed in GS lead to fairly tractable prior distributions which are centred over a single distribution. A second aim of the present paper is to develop hierarchical extensions to exploit the natural idea of centring over a non-trivial model (in other words, allowing the centring distribution to depend on the covariates). Thus, we allow for two sources of dependence on covariates: dependence of the random probability measure on the covariates and dependence of the centring distribution on the same (or other) covariates. Besides extra flexibility, this also provides a framework for assessing the adequacy of commonly used parametric models, by centring over these models. We will also propose a new density regression prior which allows us to control its smoothness, along with a simple Markov chain Monte Carlo (MCMC) algorithm for inference. The paper is organised in the following way: Section introduces the idea of centring a dependent nonparametric prior over a parametric model and introduces centred models using infinite mixtures of normal and of uniform distributions. The models indicate the suitability of the parametric model for the observed data. Section 3 introduces a new method for constructing Bayesian nonparametric regression estimators, discussing a particular class in detail: the Dirichlet Process Regression Smoother (DPRS). Section 4 briefly discusses computational methods for these hierarchical models defined using the DPRS with more details of the implementation in Appendix B. Section 5 illustrates the use of these models in inference problems, and a final section concludes. Without specific mention in the text, proofs will be grouped in Appendix A. Centring nonparametric models over parametric models Nonparametric priors for an unknown distribution are usually centred over a parametric distribution. This means that the prior expectation of the mass that the unknown distribution places on any set is given by a parametric distribution. It will be useful to extend this idea to dependent nonparametric priors and think explicitly about how we can implement centring over parametric regression models. The nonparametric prior will model aspects of the conditional distribution that are not well captured by our parametric centring model and, by centring, we can use prior information elicited for the parametric model directly. If the parameters controlling the departures of the fitted distribution from the centring model are given priors then we can assess how well our parametric model describes the data. The covariates used in the parametric and nonparametric parts of the model are not required to be the same, but x will generally denote the union of all covariates. Definition A nonparametric model will be centred over a parametric model, with parameters θ, if the prior predictive distribution of the nonparametric model conditional on θ coin- 4

5 cides with the parametric model for all covariate values. In this paper we will concentrate on centring over the standard regression model with normally distributed errors y i = g(x i ) + ɛ i, ɛ i N(, σ ), where g(x) is a parametric regression function. Centring then refers to defining a nonparametric prior for the distribution of ɛ i whose predictive distribution is a zero mean, normal distribution. An infinite mixture model can be naturally extended to depend on covariates through ɛ i k(ɛ i ψ i ), ψ i F xi, F x Π(H, φ) (4) where Π(H, φ) is the class of dependent discrete random probability measures for which d F x = p i (x)δ θi i= where p i (x) are weights such that i= p i(x) =, θ, θ, θ 3, i.i.d. H and φ are parameters governing the behaviour of p i (x). This class is relatively tractable and it has formed the basis of several density regression priors. We restrict attention to Dirichlet process-based models since these methods dominate in the literature and our approach follows these ideas naturally. We have assumed that the position θ i does not depend on x. A similar idea was implemented by GS using the πddp prior. We consider two possible classes: one of which defines k to be normal and a second that takes k to be a uniform distribution. A natural centred nonparametric model assumes that the regression errors ɛ i, can be modelled by ɛ i N(µ i, aσ ), µ i F xi, F x Π(H, φ), H N(, ( a)σ ) where < a <. This model will be denoted here as Model (a) and is centred over y i N(g(x i ), σ ) with the same prior on σ. Note that dependence on the covariates x enters in two different ways: it is used to define ɛ i as deviations of y i from g(x i ) and the nonparametric distribution of the means µ i depends on x i through the depedent discrete random probability measure defined in (4). For all values of x the model reduces to the standard Dirichlet process mixture of normals model. The parameterisation of the model is discussed by Griffin (6), who pays particular attention to prior choice. Many distributional features, such as multimodality, are more easily controlled by a rather than M. Small values of a suggest that the nonparametric modelling is crucial. A uniform prior distribution on a supports a wide range of departures from a normal distribution. 5

6 This type of model can capture non-linear relationships between errors and regressors through the changing weights p i (x). However, the k-th moment of the distribution F x, µ (k) x = i= p i(x)θi k, has autocorrelation given by Corr(µ(k) x, µ (k) y ) = (M + ) i= E[p i(x)p i (y)] which does not depend on k and so the autocorrelation of all moments of the distributions at two covariate levels are the same. This seems unsatisfactory for many applications where we would imagine that the mean is much less correlated than the variance (an assumption that underlies much regression modelling where the error variance is considered constant). This relatively crude correlation structure can lead to undersmoothing of the posterior estimates of the distribution. In particular, the posterior mean E[y x] will often have the step form typical of piecewise constant models. These problems can be addressed by defining the mean of ɛ i using flexible regression ideas such as Gaussian process priors (O Hagan 978, Rasmussen and Williams 6) or basis regression models (Denison et al. ). If the mean of the errors has a highly non-linear relationship to the regressor then this relationship could be more accurately and parsimoniously modelled using these ideas. This is denoted by Model (b) and written as ɛ i m(x i ) N(µ i, aσ ), µ i F xi, F x Π(H, φ), H N(, ( a)σ ), where σ = σ + σ and m(x) follows a Gaussian process prior where m(x ),..., m(x n ) are jointly normally distributed with constant mean and the covariance of m(x i ) and m(x j ) is σ ρ(x i, x j ) with ρ(x i, x j ) a proper correlation function. A popular choice of correlation function is the flexible Matèrn class (see e.g. Stein, 999) for which ρ(x i, x j ) = ν Γ(ν) (ζ x i x j ) ν K ν (ζ x i x j ), where K ν is the modified Bessel function of order ν. The process is q times mean squared differentiable if and only if q < ν and ζ acts as a range parameter. We define b = σ /σ, which can be interpreted as the amount of residual variability explained by the nonparametric Gaussian process estimate of m(x). If we consider the prior predictive with respect to F x we obtain the centring model y i N(g(x i ) + m(x i ), σ ), whereas if we integrate out both F x and m(x) with their priors we obtain y i N(g(x i ), σ ). The model allows the first moment of ɛ i to have a different correlation structure to all higher moments. The approach is not restricted to the Gaussian process and a wavelet or spline basis could be introduced instead. Models (a) and (b) illustrate an important advantage of centring over a model: it provides a natural way to distinguish between the parametric dependence on covariates, captured by g(x), and the nonparametric dependence, modelled through F x and m(x). Thus, by choosing g(x) appropriately, we may find that the nonparametric modelling is less critical. This will be detected by a large value of a and a small value of b, and will allow us to use the model to evaluate interesting parametric specifications. It is important to note that if we take a linear parametric regression function, say g(x) = x γ, then the interpretation of γ is non-standard in 6

7 this model since E[y i F xi, γ, x i ] is merely distributed around g(x i ) and P (E[y i F xi, γ, x i ] = g(x i )) = if y i is a continuous random variable and H is absolutely continuous, which occur in a large proportion of potential applications. Thus, γ will not have the standard interpretation as the effects of the covariate on the mean of the response. The predictive mean, i.e. E[Y i γ, x i ] still equals g(x i ), however. The prior uncertainty about this predictive mean will increase as our confidence in the centring model (usually represented by one of the parameters in φ) decreases. One solution to this identifiability problem is to follow Kottas and Gelfand () who fix the median of ɛ i, which is often a natural measure of centrality in nonparametric applications, to be. If we assume that the error distribution is symmetric and unimodal, then median and mean regression will coincide (if the mean exists). An alternative, wider, class of error distributions, introduced by Kottas and Gelfand () to regression problems, is the class of unimodal densities with median zero (extensions to quantile regression are discussed in Kottas and Krnjajic 5). Incorporating our dependent discrete random probability measure structure into this model defines Model : ɛ i m(x i ) U ( σ ui, σ ui ), u i F xi, F x Π(H, φ), which leads to symmetric error distributions with U(c, d) denoting a Uniform distribution on (c, d). For H we choose a Gamma distribution with shape parameter 3/ and scale /, corresponding to a normal centring distribution. This model is centred exactly as Model (b). 3 A Bayesian Density Smoother This section develops a measure-valued stochastic process that can be used as a prior for Bayesian nonparametric inference when we want to infer distributions, {F x } x X, where X is the space of covariates. It will be stationary in the sense that all marginal distributions F x follow a stick-breaking process with the same parameters at any x X. The model is developed in the spirit of GS and shares similarities with their order-based dependent Dirichlet process (πddp) with an arrivals construction. The latter construction defines a stochastic process that is not reversible and we will define a new process that is reversible in one dimension and easily extends to an isotropic process in higher dimensions, which makes its suitable for our regression purpose. The process is also a non-trivial generalisation of the independently proposed Local Dirichlet Process (Chung and Dunson, 7) from finite mixture models to infinite mixture models. The πddp of GS defines F x by making the V i s enter () in a different order at each covariate value so that F x = i p i (x)δ θi 7

8 where p i (x) = V i {j π j (x)<π i (x)} ( V j) where π (x), π (x), π 3 (x),... is a random permutation of a subset of,, 3,.... Similar random permutations and choices of subsets will induce similar distributions. By allowing a permutation of a subset rather than a permutation of all natural number, we can allow atoms to appear in the distribution at only some points x X. We can define orderings by allocating a latent parameter τ i to the i-th atom (V i, θ i ) and order at x according to some distance between x and τ i. The arrivals construction of GS takes time as a covariate and defines π i (x) = #{j τ i τ j < x} i.e. the ordering is backwards in τ from the largest τ j smaller than x. If we denote the region in which points are relevant for the ordering at x by U(x), then the arrivals process assumes that U(x) U(y) if y > x and that the new atoms are placed at the start of the process. Clearly, the process is not reversible in time. Intuitively, if the process allows atoms to appear in the ordering then, to define a reversible process, we need atoms to also disappear. A simple method that achieves this effect is to restrict U(x) to be an interval. Thus, as a special case the arrivals ordering arises when the left-hand side of the interval is minus infinity for all points. If the interval is symmetric around a centre then the process will be reversible by construction. In what follows, we will focus on processes with Dirichlet marginals. Definition Let {S j } j= be an infinite collection of random subsets of X with associated marks (V j, θ j ) which are i.i.d. realisations of Be(, M) and H respectively. We define F x = δ θi V i ( V j ), {i x S i } {j<i x S j } and then {F x x X } follows a Subset-based Dependent Dirichlet Process which is represented as S-DDP(M, H, Π), where Π are parameters that determine how to generate the infinite collection of random subsets. This defines the idea for general spaces. Any atom θ j only appears in the stick-breaking representation of F x at points x which belong to a subset of X, and this allows atoms to appear and disappear. In particular, if X R p then we can define S j to be a closed, convex set of appropriate dimension. However, it can be naturally applied to other spaces using a suitable metric. The dependence between distributions at different locations s and v can be easily measured using the correlation of F s (B) and F v (B) for any measurable set B, which can be represented by Theorem [ Corr(F s, F v ) = M + E ( M B i M + i= ) i j= A ( ) i ] j M + j= B j M + (5) 8

9 where A i = I(s S i or v S i ) and B i = I(s S i and v S i ). If s, v S i for all i then the correlation will be. If s and v do not both fall in any S i then the correlation will be. Consider a situation where the probability of observing a shared element in each subsequence is constant given two covariate values s and v and equals, say, p s,v. Then Theorem The correlation between F s and F v can be expressed as Corr(F s, F v ) = M+ M+ p s,v + M M+ p s,v where, for any k, p s,v = P (s, v S k s S k or v S k ). = (M + )p s,v + M( + p s,v ) This correlation is increasing both in p s,v and M, the mass parameter of the Dirichlet process at each covariate value. As p s,v tends to the limits of zero and one, the correlation does the same, irrespective of M. As M tends to zero, the Sethuraman representation in () will be totally dominated by the first element, and thus the correlation tends to p s,v. Finally, as M (the Dirichlet process tends to the centring distribution) the correlation will tend to p s,v /(+p s,v ), as other elements further down the ordering can also contribute to the correlation. Thus, the correlation is always larger than p s,v if the latter is smaller than one. Note that the correlation between distributions at different values of x will not tend to unity as M tends to infinity, in contrast to the πddp constructions proposed in GS. This is a consequence of the construction: some points will not be shared by the ordering at s and v no matter how large M. The correlation between drawings from F s and F v, given by Corr(F s, F v )/(M + ) (see GS) will, however, tend to zero as M. To make the result more applicable in regression, we now define a specific, simple method for choosing the subset in p-dimensional Euclidean space using a ball of radius r. Definition 3 Let (C, r, t) be a Poisson process with intensity f(r) defined on R p R + with associated marks (V j, θ j ) which are i.i.d. realisations of Be(, M) and H respectively. We define F x = {i x B ri (C i )} δ θi V i {j x B rj (C j ), t j <t i } ( V j ) where B r (C) is a ball of radius r around C. Then {F x x X } follows a Ball-based Dependent Dirichlet Process which is represented as B-DDP(M, H, f), where f is a non-negative function on R + (the positive half-line). The intensity f(r) can be any positive function. However, we will usually take f(r) to be a probability density function. The following argument shows that defining f(r) more generally does not add to the flexibility of the model. Suppose that (C, r, t) follows the Poisson process above then writing C i = C i, r i = r i and t i = t i λ for λ > defines a coupled Poisson 9

10 process (C, r, t ) with intensity λf(r). The ordering of t and t is the same for the two coupled processes and the B-DDP only depends on t through its ordering. It follows that distributions {F x } x X defined using (C, r, t) and (C, r, t ) will be the same. The definition could be easily extended to ellipsoids around a central point that would allow the process to exhibit anisotropy. It is necessary to add the latent variable t for F x to be a nonparametric prior. The set T (x) = {i x C i < r i } indicates the atoms that appear in the stick-breaking representation of F x. If we would instead define a Poisson process on (C, r) on R p R + with intensity f(r) then T (x) would be Poisson distributed with mean r f(r) dr. This integral can be infinite but this would have strong implications for the correlation structure. By including t we make T (x) infinite for all choices of f and therefore define a nonparametric process. To calculate the correlation function, and relate its properties to the parameters of the distribution of r, it is helpful to consider p s,v. This probability only depends on those centres from the set {C k s S k or v S k } = {C k C k B rk (s) B rk (v)}. Theorem 3 If {F x } x X follows a B-DDP then ν (Br (s) B r (v)) f(r) dr p s,v = ν (Br (s) B r (v)) f(r) dr, where ν( ) denotes the Lebesgue measure in the covariate space X. Sofar, our results are valid for a covariate space of any dimension. However, in the sequel, we will focus particularly on implementations with a covariate that takes values in the real line. In this case, Theorem 3 leads to a simple expression. Corollary If {F x } x X follows a B-DDP on R then p s,s+u = µ ui 4µ µ + ui where µ = E[r], I = u/ f(r) dr and µ = u/ rf(r) dr, provided µ exists. Throughout, we will assume the existence of a nonzero mean for r and define different correlation structures through the choice of the distribution of f(r). We will focus on two properties of the autocorrelation. The first property is the range, say x, which we define as the distance at which the autocorrelation function takes the value ε which implies that p s,s+x = ε(m + ) M + + M( ε). The second property is the mean square differentiability which is related to the smoothness of the process. In particular, a weakly stationary process on the real line is mean square differentiable of order q if and only if the q th derivate of the autocovariance function evaluated at zero exists and is finite (see for example Stein, 999). In the case of a Gamma distributed radius, we can derive the following result.

11 Theorem 4 If {F x } x X follows a B-DDP with f(r) = βα Γ(α) rα exp{ βr} then F x is mean square differentiable of order q =,,... if and only if α q. If each radius follows a Gamma distribution then we can choose the shape parameter, α, to control smoothness and the scale parameter, β, to define the range, x. A closed form inverse relationship will not be available analytically in general. However, if we choose α =, which gives an exponential distribution, then β = x log ( + M + ε ε(m + ) ). (6) M =. M = M = Figure : The autocorrelation function for a Gamma distance distribution with range 5 with shape: α =. (dashed line), α = (solid line) and α = (dotted line) Figure shows the form of the autocorrelation for various smoothness parameters and a range fixed to 5 (with ε =.5). Clearly, the mass parameter M which is critical for the variability of the process, does not have much impact on the shape of the autocorrelation function, once the range is fixed. We will concentrate on the Gamma implementation and work with the following class Definition 4 Let {C, r, t} be a Poisson process with intensity βα Γ(α) rα i exp{ βr i } defined on R p R + with associated marks (V i, θ i ) which are i.i.d. realisations of Be(, M) and H. We define F x = {i x B ri (C i )} V i {j x B rj (C j ), t j <t i } ( V j )δ θi, and then {F x x X } follows a Dirichlet Process Regression Smoother (DPRS) which is represented as DPRS(M, H, α, β). Typically, we fix α and x and put appropriate priors on M and any other parameters in the model. We use a prior distribution for M which can be elicited by choosing a typical value for M to be n and a variance parameter η. This prior has the density function and is discussed in more detail in GS. p(m) = nη Γ(η) Γ(η) M η (M + n ) η,

12 4 Computational method This section discusses how the nonparametric hierarchical models discussed in Section with a DPRS prior can be fitted to data using a retrospective sampler. Retrospective Sampling for Dirichlet process-based hierarchical models was introduced by Papaspiliopoulos and Roberts (8). Previous attempts to use the stick-breaking representation of the Dirichlet process in MCMC samplers have used truncation (e.g. Muliere and Tardella 998, Ishwaran and James for the Dirichlet process and GS for order-based dependent Dirichlet processes). The Retrospective Sampler avoids the need to truncate. The method produces a sample from the posterior distribution of all parameters except the unknown distribution. Inference about the unknown distribution must use some truncation method. This makes the methods comparable to Polya urn based methods, which are reviewed by Neal (). Retrospective methods have been used for density regression models by Dunson and Park (8). We assume that data (x, y ),..., (x n, y n ) have been observed which are hierarchically modelled by p(y i ψ i ), ψ i x i F xi, F x DPRS(M, H, α, β). The DPRS assumes that F x = j= p j(x)δ θj where θ, θ, i.i.d. H and p is constructed according to the definition of the DPRS. Additional parameters can be added to the sampling kernel or the centring distribution, and these are updated in the standard way for Dirichlet process mixture models. MCMC is more easily implemented by introducing latent variables s, s,..., s n and re-expressing the model as p(y i θ si ), P (s i = j) = p j (x i ), θ, θ, i.i.d. H The latent variables s = (s,..., s n ) are often called allocations since s i assigns the i-th observation to the distinct elements of F xi (i.e. ψ i = θ si ). Defining y = (y,..., y n ), θ = (θ, θ,... ) and V = (V, V,... ), the posterior distribution of the parameters is p(θ, s, t, C, r, V y) p(y θ, s) n p(s i x i, C, t, r, V )p(v )p(θ)p(c, r, t) i= The probability p(s i x i, C, t, r, V ) is only non-zero if x B ri (C i ). Let (C A, r A, t A ) be the Poisson process (C, r, t) restricted to the set A and let (θi A, V i A ) be the mark associated with (Ci A, ra i, ta i ). These marks are collected in the sets θa and V A respectively. Defining the set R = {(C, r, t) x B r (C)} with its complement R C, we obtain p(θ, s, t, C, r, V y) p ( y θ R, s ) n p ( s i x i, C R, t R, r R, V R) p ( V R) p ( θ R) i= ( ) ( ) p C R, r R, t R )p(v RC p θ RC p (C ) RC, r RC, t RC

13 which follows from the independence of Poisson processes on disjoint sets. Therefore we can draw inference using the following restricted posterior distribution p ( θ R, s, t R, C R, r R, V R y ) p ( y θ R, s ) n p ( s i x i, C R, t R, r R, V R) i= p ( V R) p ( θ R) p ( C R, r R, t R) We define a retrospective sampler for this restricted posterior distribution. A method of simulating (C R, r R, t R ) that will be useful for our retrospective sampler is: ) initialize t Ex ( R f(r) dc dr) and ) t i = t i + x i where x i Ex ( R f(r) dc dr). Then (C R, rr ), (CR, rr ),... are independent of tr, tr,... and we take i.i.d. draws from the distribution p ( ( Ck R ) n ) ( n ) rr k = U B r R(x i ), p ( r R ) ν i= B r R(x i) f(r R k k = k k ) ν ( n i= B u(x i )) f(u) du, i= where U(A) represents the pdf of the uniform distribution on the set A and ν( ) represents the appropriate Lesbesgue measure. It is often hard to simulate from this conditional distribution of Ck R and to calculate the normalising constant of the distribution of rr k. It will usually be simpler to use a rejection sampler from the joint distribution of (C, R) conditioned to fall in a simpler set containing R. For example, in one dimension we define d (r k ) = (x min r k, x max + r k ) where x min and x max are the maximum and minimum values of x,..., x n and the rejection envelope is g(c R k rr k ) = U ( d (r R k )), g(r R k ) = (x max x min + r k ) f(r R k ) (xmax x min + u) f(u) du. Any values (C R k, rr k ) are rejected if they do not fall in R. If we use a DPRS where r k follows a Gamma(α,β) distribution then we sample rk R distribution g(rk R ) = wf Ga (α, β) + ( w)f Ga (α +, β) from the rejection envelope using a mixture where f Ga (α, β) is the pdf of a Ga(α, β) distribution and w = (x max x min )/(x max x min + α β ). This construction generates the Poisson process underlying the B-DDP ordered in t and we will use it to retrospectively sample the Poison process in t (we can think of the DPRS as defined by a marked Poisson process where t R follows a Poisson process and (C R, r R ) are the marks). In fact, the posterior distribution only depends on t R, tr, tr 3,... through their ordering and we can simply extend the ideas of Papaspiliopoulos and Roberts (8) to update the allocations s. The MCMC sampler defined on the posterior distribution parameterised by r R can have problems mixing. The sampler can have much better mixing properties by using the reparameterisation from r R to r R where we let d il 3 = x i C l and

14 define ri R = ri R max{d ij s i = j}. Conditioning on r R = (r R, r R,... ) rather than r R = (r R, rr,... ) allows each observation to be allocated to a distinct element. Initially, we condition on s i = (s,..., s i, s i+,..., s n ) and remove s i from the allocation. Let K i = max{s i } and let r () k = rj R distribution is r () k = rj R + max{d jk s j = k, j =,..., i, i +,..., n} and + max({d jk s j = k, j =,..., i, i +,..., n} { x i C j }). The proposal p(y i θ k )V k ( V k ) A ( ) k l<k ( V l) f r () k ( ) k K q(s i = k) = c f r () i k max m K i {p(y i θ m )}V k l<k ( V l) k > K i { } where A k = # m r () k < d msm < r () k, s m > k and ( K i f c = p(y i θ l )V l ( V l ) A l ( V h ) ( f l= h<l r () l r () l ) ) + max l K i {p(y i θ l )} h K i ( V h ). If k > K i we need to generate (θ K i +, V K i +, C K i +, d K i +),..., (θ k, V k, C k, d k ) independently from their prior distribution. A value is generated from this discrete distribution using the standard inversion method (i.e. simulate a uniform random variate U and the proposed value k is such that k l= q(s i = l) < U k l= q(s i = l)). Papaspiliopoulos and Roberts (8) show that the acceptance probability of the proposed value is α = { } if k K i. min, if k > K i p(y i θ k ) max l K i p(y i θ l ) The other full conditional distributions of the Gibbs sampler are given in Appendix B. 5 Examples This section applies the models developed in Section in combination with the DPRS of Section 3 on simulated data and two real datasets: the prestige data (Fox and Suschnigg, 989) and the electricity data of Yatchew (3). As a basic model, we take Model (a) with a regression function f(x) =. This model tries to capture the dependence on x exclusively through the Dirichlet process smoother. Model (b) is a more sophisticated version of Model, where m(x) is modelled through a Gaussian process, as explained in Section. Finally, we will also use Model, which will always have a Gaussian process specification for m(x). The prior on M is as explained in Section 3 with n = 3 and η = for all examples. On the parameter a in Model we adopt a Uniform prior over (,). The range x of the DPRS is 4

15 such that the correlation is.4 at the median distance between the covariate values. Priors on σ and on the parameters of the Gaussian process σ and ζ are as in the benchmark prior of Palacios and Steel (6). Finally, we fix the smoothness parameter ν of the Gaussian process at. 5. Example : Sine wave We generated observations from the following model y i = sin(πx i ) + ɛ i where x i are uniformly distributed on (, ) and the errors ɛ i are independent and chosen to be heteroscedastic and non-normally distributed. We consider two possible formulations: Error assumes that ɛ i follows a t-distribution with zero mean,.5 degrees of freedom and a conditional variance of the form σ (x) = (x ) which equals at x = and increases away from. Error assumes that the error distribution is a mixture of normals p(ɛ i x i ) =.3N(.3,.) +.7N(.3 +.6x i,.). This error distribution is bimodal at x i = and unimodal (and normal) at x i =. The first error distribution can be represented using both a mixture of normals and a scale mixture of uniforms whereas the second error distribution can not be fitted using a mixture of uniforms. Initially, we assume Error. The results for Figure : Example with Error : predictive conditional mean of y given x for Model (a): α =. (dashed line), α = (solid line), α = (dotted line). Data points are indicated by dots. The right panel presents a magnified section of the left panel Model (a) are illustrated in Figure for three values of α. Smaller values of α lead to rougher processes and the effect of its choice on inference is clearly illustrated. In the sequel, we will only present results where α =. Under Model (a), we infer a rough version of the underlying true distribution function as illustrated by the predictive density in Figure 3. The small values of a in Table indicates a lack of normality. The results are similar to those of GS who find that the estimated conditional mean is often blocky which reflects the underlying piecewise constant approximation to the changing distributional form. In the more complicated models the conditional location is modelled through a nonparametric regression function (in this case a Gaussian process prior). Both Model (b) and Model 5

16 Model (a) Model (b) Model σ.7 (.48,.3).64 (.46,.96).68 (.49,.8) a.9 (.,.4).5 (.,.33) b.75 (.54,.88).76 (.53,.9) ρ.53 (.3,.96).6 (.3,.9) M.38 (.4,.95).84 (.6, 5.7).57 (.46, 3.64) Table : Example with Error : posterior median and 95% credible interval (in brackets) for selected parameters Model (a) Model (b) Model p(y x) Conditional variance Figure 3: Example with Error : heatmap of the posterior predictive density p(y x) ite dashed line) and plot of the posterior conditional predictive variance σ (x) (solid line) and the true value (dashed line) assume a constant prior mean for m(x). Introducing the Gaussian process into Model leads to smaller values of σ since some variability can now be explained by the Gaussian process prior. However, the posterior for a still favours fairly small values, reminding us that even if the conditional mean is better modelled with the Gaussian process, the tails are still highly non-normal (see Figure 4). The estimated posterior predictive distributions (as depicted in Figure 4: Example with Error : The posterior predictive distribution of y for x =.4 using Model (a) (solid line) and Model (b) (dashed line) Figure 3) are now much smoother. Both Model (b) and Model lead to large values for b and 6

17 thus σ (which better fits the true variability of the mean). This leads to better estimates of the conditional predictive variance, as illustrated in Figure 3. Clearly a model of this type would struggle with estimation at the extreme values of x but the main part of the functional form is well-recovered. The parameter ρ = ν/ζ used in Table is an alternative range parameter which is favoured by Stein (999, p.5), and indicates that the Gaussian process dependence of m(x) is similar for Model (a) and Model. The posterior median values of ρ lead to a range of the Gaussian process equal to.89 and.6 for Models (b) and, respectively. Results for data generated with the second error structure are shown in Figure 5 and Table (for selected parameters). Model (b) is able to infer the bimodal distribution for small values of x and the single mode for large x as well as the changing variance. Model is not able to capture the bimodality by construction and only captures the changing variability. In both cases the mean is well estimated. Small values of a illustrate the difficulty in capturing the error structure. The large values of b indicate that the centring model (a constant model with mean zero) does a poor job in capturing the mean. Predictive density p(y x) Posterior of m(x) Predictive error distribution Model (b) Model Figure 5: Example with Error : heatmap of posterior predictive density p(y x), plot of the posterior of m(x) indicating median (solid line), 95% credible interval (dashed lines) and data (dots), and the posterior predictive error distribution indicating the.5 th, 5 th, 75 th and 97.5 th percentiles Model (a) Model (b) Model σ.7 (.4,.66).47 (.34,.7).54 (.38,.98) a. (.,.3).3 (.,.38) b.84 (.66,.9).84 (.65,.94) Table : Example with Error : posterior median and 95% credible interval (in brackets) for selected parameters 7

18 5. Prestige data Fox and Suschnigg (989) and Fox (997) consider measuring the relationship between income and prestige of occupations using the 97 Canadian census. The prestige of the jobs was measured through a social survey. We treat income as the response and the prestige measure as a covariate. The data is available to download in the R package car. Figure 6 shows the fitted Model (a) Model (b) Model Figure 6: Prestige data: posterior distribution of the conditional mean indicating median (solid line), 95% credible interval (dashed lines) and data (dots) conditional mean. In all cases the relationship between income and prestige show an increasing trend for smaller income before prestige flattens out for larger incomes. The result are very similar to those described in Fox (997). The inference for selected individual parameters is presented in Table 3. As in the previous example, the Gaussian process structure on m(x) Model (a) Model (b) Model σ. (4.8, 43.7). (4.8, 3.). (6., 36.4) a. (.3,.3).8 (.8,.69) b.66 (.38,.85).69 (.4,.88) Table 3: Prestige data: posterior median and 95% credible interval (in brackets) for selected parameters accounts for quite a bit of variability, rendering the error distribution not too far from normal in Model (b), as indicated by the fairly large values of a. 5.3 Scale economies in electricity distribution Yatchew (3) considers fitting a cost function for the distribution of electricity. A Cobb- Douglas model is fitted, which assumes that tc = f(cust) + β wage + β pcap + β 3 PUC + β 4 kwh + β 5 life + β 6 lf + β 7 kmwire + ɛ, where tc is the log of total cost per customer, cust is the log of the number of customers, wage is the log wage rate, pcap is the log price of capital, PUC is a dummy variable for public 8

19 utility commissions, life is the log of the remaining life of distribution assets, lf is the log of the load factor, and kmwire is the log of kilometres of distribution wire per customer. The data consist of 8 municipal distributors in Ontario, Canada during 993. We will fit the DPRS model with cust as the covariate to ɛ and we will centre the model over two parametric regression models by choosing f(cust) as follows: Parametric, γ + γ cust, and Parametric, γ + γ cust + γ 3 cust. Parametric Model (a) Model (b) Model γ.4 (-4.4, 5.) -.7 (-4.88, 3.) -.9 (-4.98, 3.9) -.67 (-4.79, 4.3) γ -.7 (-.3, -.) -.7 (-.4, -.) -. (-.,.) -.9 (-.,.) β.48 (-.5,.6).67 (.5,.).7 (.7,.3).7 (.,.53) β 4. (-.6,.3).7 (-.,.5).4 (-.4,.).6 (-.4,.3) β 6.97 (.3,.9). (.9,.).4 (.4,.).9 (.4,.4) σ.7 (.5,.). (.4,.36).3 (.7,.39).7 (.9,.48) a.9 (.5,.45).75 (.5,.99) b.4 (.,.77).55 (,,.84) Table 4: Electricity data: posterior median and 95% credible interval (in brackets) for selected parameters of Parametric model (linear) and the nonparametric models centred over Parametric model The results of Yatchew (3) suggest that a linear f(cust) is not sufficient to explain the effect of number of customers. The results for selected parameters are shown in Tables 4 and 5 when centring over Parametric and over Parametric, respectively. When fitting both parametric models we see differences in the estimates of the effects of some other covariates. The parameters β and β 6 have larger posterior medians under Parametric while β 4 has a smaller estimate. If we centre our nonparametric models over the linear parametric model then we see the same changes for β and β 6 and a smaller change for β 4. Posterior inference on regression coefficients is much more similar for all models in Table 5. In particular, the parametric effect of customers is very similar for Parametric and for all the nonparametric models centred over it. The estimated correction to the parametric fit for the effect of customers is shown in Figure 7. For models centred over the linear model, it shows a difference which could be well captured by a quadratic effect, especially for Model (b) and Model. Under both centring models, the importance of the nonparametric fitting of the error sharply decreases as a Gaussian process formulation for the regression function is used, as evidenced by the increase in a. Changing to a quadratic centring distribution leads to decreased estimates of b indicating a more appropriate fit of the parametric part. This is corroborated by the nonparametric correction to this fit as displayed in Figure 7. 9

20 Parametric Model (a) Model (b) Model γ.77 (-.53, 6.96).78 (-.83, 6.88).5 (-.44, 7.56).77 (-4., 7.79) γ -.83 (-.9, -.48) -.9 (-.4, -.4) -.9 (-.69, -.3) -.96 (-.57, -.3) γ 3.4 (.,.6).5 (.,.7).4 (.,.9).5 (.,.8) β.83 (.,.48).79 (.6,.38).8 (.4,.43).78(-.3,.4) β 4 -. (-.,.5) -. (-.,.7). (-.8,.8). (-.8,.9) β 6.5 (.38,.9).3 (.5,.8).3 (.47,.5).3 (.48,.59) σ.6 (.3,.9).7 (.4,.3). (.6,.34). (.6,.38) a.3 (.,.4).77 (.4,.99) b.3 (.8,.75).37 (.5,.77) Table 5: Electricity data: posterior median and 95% credible interval (in brackets) for selected parameters of Parametric model (quadratic) and the nonparametric models centred over Parametric model Model (a) Model (b) Model Linear Quadratic Figure 7: Electricity data: posterior mean of the nonparametric component(s) of the model 6 Discussion This paper shows how ideas from Bayesian nonparametric density estimation and nonparametric estimation of the mean in regression models can be combined to define a range of useful models. We introduce novel approaches to nonparametric modelling by centring over appropriately chosen parametric models. This allows for a more structured approach to Bayesian nonparametrics and can greatly assist in identifying the specific inadequacies of commonly used parametric models. An important aspect of the methodology is separate modelling of various components, such as important quantiles, like the median, or moments, like the mean, which allows the nonparametric smoothing model to do less work. These ideas can be used in combination with any nonparametric prior that allows distributions to change with covari-

21 ates. In this paper we have concentrated on one example which is the Dirichlet Process Regression Smoother (DPRS) prior, introduced here. The latter is related to the πddp methods of GS but allows simpler computation (and without truncation) through retrospective methods. The parameters of the DPRS can be chosen to control the smoothness and the scale of the process. Appendix A: Proofs Proof of Theorem It is easy to show (see GS) that for any measurable set B In this case, if x / S i or y / S i then Corr(F x (B), F y (B)) = (M + ) E[p i (x)p i (y)] = E[p i (x)p i (y)] Otherwise, let R () i = {j < i x S i and y S i }, R () i = {j < i x S i and y / S i } and R (3) i = {j < i x / S i and y S i } E[p i (x)p i (y)] = E V ( V i ) ( V i ) ( V i ) i j R () i j R () i ( = (M + )(M + ) E M M + ) #R () i i= j R (3) i ( M M + ) #R () i ( ) (3) M #R i M + and so [ Corr(F x (B), F y (B)) = M + E ( M B i M + Proof of Theorem i= [ ( ) i M E B i M + i= ( = E M B i M + {i y S i or x S i } ) i j= A j ( M + M + ) i j= A ( ) i ] j M + j= B j M + ) i ] j= B j j= A ( ) i j M + j= B j M +

22 The set {i y S i or x S i } must have infinite size since it contains the {i x S i } which have infinite size. Let B i = then [ Corr(F s, F v ) = M + E = M + = M + = M + = M + i= = M M+ p s,v + M M+ p s,v Proof of Theorem 3 B i ( M M + ) i E [ B i ) i ( ) i ] M + j= B j M + ] i [ (M ) ] + B j E M + j= [( M + ( M M + i= ( ) M i p s,v M + M + i= ( ) M ( ) M i p s,v M + M + i= ( ) M p s,v [( ) M + M M+ p s,v + ) p s,v + ( p s,v ) ] i [( ) ] M + i p s,v + ( p s,v ) M + ] M M+ ( p s,v) Since (C, r, t) follows a Poisson process on R p R + with intensity f(r), p(c k s S k or v S k, r k ) is uniformly distributed on B rk (s) B rk (v) and p(r k s I k or v I k ) = ν(b r k (s) B rk (v))f(r k ) ν(brk (s) B rk (v))f(r k ) dr k where ν( ) is Lebesgue measure. Then p s,v = P (s, v S k s S k or v S k ) = p (C k, r k s I k or v I k ) dc k dr k = B rk (s) B rk (v) ν (Brk (s) B rk (v)) f(r k ) dr k ν (Brk (s) B rk (v)) f(r k ) dr k. Proof of Theorem 4 The autocorrelation function can be expressed as f(p s,s+u ) where f(x) = ( M+ M+ )/( + M M+ x). Then by Faá di Bruno s formula d n du n f(p s,s+u) = n! d m + +m n f m!m!m 3!... dp m + +m n s,s+u {j m j } ( d j p s,s+u du j j! ) mj, where m + m + 3m nm n = n with m j, j =,..., n, and so d n lim u du n f(p s,s+u) = n! m!m!m 3!... lim d m + +m n f ( d j ) mj p s,s+u u dp m + +m n s,s+u du j. j! {j m j }

23 d Since lim m + +mn f u dp m + +m n = lim ps,s+u dm + +mn f s,s+u dp m + +m n is finite and non-zero for all values s,s+u of n, the degree of differentiability of the autocorrelation function is equal to the degree of ( ) differentiability of p s,s+u. We can write p s,s+u = 4µ a with a = µ ui. Now d k p s,s+u a k = (k )!(4µ a) k d and lim k p s,s+u u da k non-zero. By application of Faá di Bruno s formula d n du n p s,s+u = n! d m + +m n p s,s+u m!m!m 3!... da m + +m n = (k )!(µ) k which is finite and {j m j } ( d j ) mj a du j j! and the degree of differentiability is determined by the degree of differentiability of a. If p(r) Ga(α, β) then dµ du = ( u ) α exp{ u/} and di du = ( u ) α exp{ u/} and it is easy to show that dn a du = C n n u α n+ exp{ u/} + ζ where ζ contains terms with power of x greater than α n +. If lim u u α exp{ u/} is finite then so is lim u u α+k exp{u/} for k > and so the limit will be finite iff α n +, i.e. α n. Appendix B: Computational details As we conduct inference on the basis of the Poisson process restricted to the set R, all quantities (C, r, t, V, θ) should have a superscript R. To keep notation manageable, these superscripts are not explicitly used in this Appendix. Updating the centres We update each centre C,..., C K from its full conditional distribution Metropolis-Hastings random walk step. A new value C i for the i-th centre is proposed from N(C i, σ C ) where σ C is chosen so that the acceptance rate is approximately.5. If there is no x i such that x i (C i r i, C i + r i) or if there is one value of j such that s j = i for which x i / (C i r i, C i + r i) then α(c i, C i ) =. Otherwise, the acceptance probability has the form n α(c i, C i) j= h<s j and C h = r h<x j <C h +r ( V h) h n j= h<s j and C h r h <x j <C h +r h ( V h ). Updating the distances The distances can be updated using a Gibbs step since the full conditional distribution of r k has a simple piecewise form. Recall that d ik = x i C k and let S k = {j s j k}. We define S ord k to be a version of S k where the element have been ordered to be increasing in d ik, i.e. if i > j and i, j S ord k then d ik > d jk. Finally we define d k = max[{x min C k, C k x max } 3

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother

Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J. E. Griffin and M. F. J. Steel University of Warwick Bayesian Nonparametric Modelling with the Dirichlet Process Regression

More information

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER

BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER Statistica Sinica 20 (2010), 1507-1527 BAYESIAN NONPARAMETRIC MODELLING WITH THE DIRICHLET PROCESS REGRESSION SMOOTHER J. E. Griffin and M. F. J. Steel University of Kent and University of Warwick Abstract:

More information

Normalized kernel-weighted random measures

Normalized kernel-weighted random measures Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham MCMC 2: Lecture 3 SIR models - more topics Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham Contents 1. What can be estimated? 2. Reparameterisation 3. Marginalisation

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Default priors for density estimation with mixture models

Default priors for density estimation with mixture models Bayesian Analysis ) 5, Number, pp. 45 64 Default priors for density estimation with mixture models J.E. Griffin Abstract. The infinite mixture of normals model has become a popular method for density estimation

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Compound Random Measures

Compound Random Measures Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies 3 CALGB8881 3 CALGB916 2 2 β 1 1 β 1 1 1 5 5 β 1 5 5 β Infinite mixture models

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Bayesian Nonparametric Regression through Mixture Models

Bayesian Nonparametric Regression through Mixture Models Bayesian Nonparametric Regression through Mixture Models Sara Wade Bocconi University Advisor: Sonia Petrone October 7, 2013 Outline 1 Introduction 2 Enriched Dirichlet Process 3 EDP Mixtures for Regression

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions

On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Bayesian Analysis (2012) 7, Number 2, pp. 277 310 On the Support of MacEachern s Dependent Dirichlet Processes and Extensions Andrés F. Barrientos, Alejandro Jara and Fernando A. Quintana Abstract. We

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Hyperparameter estimation in Dirichlet process mixture models

Hyperparameter estimation in Dirichlet process mixture models Hyperparameter estimation in Dirichlet process mixture models By MIKE WEST Institute of Statistics and Decision Sciences Duke University, Durham NC 27706, USA. SUMMARY In Bayesian density estimation and

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Bayesian isotonic density regression

Bayesian isotonic density regression Bayesian isotonic density regression Lianming Wang and David B. Dunson Biostatistics Branch, MD A3-3 National Institute of Environmental Health Sciences U.S. National Institutes of Health P.O. Box 33,

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process

A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We give a simple

More information

Lecture 4: Dynamic models

Lecture 4: Dynamic models linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Modeling and Predicting Healthcare Claims

Modeling and Predicting Healthcare Claims Bayesian Nonparametric Regression Models for Modeling and Predicting Healthcare Claims Robert Richardson Department of Statistics, Brigham Young University Brian Hartman Department of Statistics, Brigham

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

On the Fisher Bingham Distribution

On the Fisher Bingham Distribution On the Fisher Bingham Distribution BY A. Kume and S.G Walker Institute of Mathematics, Statistics and Actuarial Science, University of Kent Canterbury, CT2 7NF,UK A.Kume@kent.ac.uk and S.G.Walker@kent.ac.uk

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models

A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models A comparative review of variable selection techniques for covariate dependent Dirichlet process mixture models William Barcella 1, Maria De Iorio 1 and Gianluca Baio 1 1 Department of Statistical Science,

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation

Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Bayesian Nonparametric Autoregressive Models via Latent Variable Representation Maria De Iorio Yale-NUS College Dept of Statistical Science, University College London Collaborators: Lifeng Ye (UCL, London,

More information

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Athanasios Kottas 1 and Gilbert W. Fellingham 2 1 Department of Applied Mathematics and Statistics, University of

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

On the posterior structure of NRMI

On the posterior structure of NRMI On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline

More information

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Ian Porteous, Alex Ihler, Padhraic Smyth, Max Welling Department of Computer Science UC Irvine, Irvine CA 92697-3425

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

processes Sai Xiao, Athanasios Kottas and Bruno Sansó Abstract

processes Sai Xiao, Athanasios Kottas and Bruno Sansó Abstract Nonparametric Bayesian modeling and inference for renewal processes Sai Xiao, Athanasios Kottas and Bruno Sansó Abstract We propose a flexible approach to modeling and inference for renewal processes.

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process. Yee Whye Teh, University College London Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

More information

A Nonparametric Model-based Approach to Inference for Quantile Regression

A Nonparametric Model-based Approach to Inference for Quantile Regression A Nonparametric Model-based Approach to Inference for Quantile Regression Matthew Taddy and Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information