Approximate Marginal Posterior for Log Gaussian Cox Processes

Size: px
Start display at page:

Download "Approximate Marginal Posterior for Log Gaussian Cox Processes"

Transcription

1 Approximate Marginal Posterior for Log Gaussian Cox Processes Shinichiro Shirota and Alan. E. Gelfand arxiv: v1 [stat.co] 26 Jun 2016 June 12, 2018 Abstract The log Gaussian Cox process is a flexible class of Cox processes, whose intensity surface is stochastic, for incorporating complex spatial and time structure of point patterns. The straightforward inference based on Markov chain Monte Carlo is computationally heavy because the computational cost of inverse or Cholesky decomposition of high dimensional covariance matrices of Gaussian latent variables is cubic order of their dimension. Furthermore, since hyperparameters for Gaussian latent variables have high correlations with sampled Gaussian latent processes themselves, standard Markov chain Monte Carlo strategies are inefficient. In this paper, we propose an efficient and scalable computational strategy for spatial log Gaussian Cox processes. The proposed algorithm is based on pseudo-marginal Markov chain Monte Carlo approach. Based on this approach, we propose estimation of approximate marginal posterior for parameters and comprehensive model validation strategies. We provide details for all of the above along with some simulation investigation for univariate and multivariate settings and analysis of a point pattern of tree data exhibiting positive and negative interaction between different species. Keywords: pseudo-marginal Markov chain Monte Carlo approach; kernel mixture marginalization; multivariate Poisson log normal; log Gaussian Cox processes; importance sampling; Laplace approximation 1 Introduction There is increasing interest in analyzing spatial point process data. In the literature, the most widely adopted class of models are nonhomogeneous Poisson processes (NHPP) or, more generally log Gausian Cox processes (LGCP) (see Møller Department of Statistical Science, Duke University., US. ss571@stat.duke.edu. Department of Statistical Science, Duke University., US. alan@stat.duke.edu. 1

2 and Waagepetersen (2004) and references therein). The intensity surface of Cox processes is stochastic, so, given the intensity surface, Cox processes are Poisson processes. Among Cox processes, the LGCP is a flexible class of point processes for incorporating complex structure of spatial or time point patterns. The process was originally proposed by Møller et al. (1998) and extended to space-time case by Brix and Diggle (2001). As the name suggested, the intensity function of this process is driven by the exponential of Gaussian processes (GP). Basically, since LGCP is one type of latent Gaussian models, the sampling of GP is required. However, the likelihood of LGCP includes infinite dimensional stochastic integral over the study region, which analytically intractable. Hence, some approximation methods are required. One simple strategy is to take grid over the study region, and approximate this intractable integral with Riemann sum and plug this estimator into the likelihood (Møller et al. (1998) and Møller and Waagepetersen (2004)). Then, standard Markov chain Monte Carlo (MCMC) scheme is available even though the conditional sampling of high dimensional Gaussian latent variables is required. The convergence of posterior samples based on this approximated likelihood to exact posterior distribution is guaranteed by Waagepetersen (2004). Then, sampling of high dimensional GPs become a main computational task. Alternative approach is sigmoidal Gaussian Cox process (SGCP, Adams et al. (2009)). This approach utilize the thinning property for Cox processes, they avoid the grid approximation and obtain the exact inference by introducing latent thinned points in addition to observed points. Although their approach is potentially attractive in the meaning of exact inference, the approach require the larger dimensional GP outputs. Hence, it is computationally infeasible for large datasets as authors suggested (see Adams et al. (2009)). Hence, more or less, the sampling of high dimensional GPs is a fundamental problem as often observed in Gaussian latent process models. Calculating high dimensional inverse or Cholesky decomposition of n-dimensional covariance matrix require O(n 3 ) computational time and O(n 2 ) memory for restore. As for the literature of MCMC based Bayesian inference, Møller et al. (1998) implement Metropolis adjusted Langevin algorithm (MALA, Roberts and Rosenthal (1998)). This algorithm achieve higher asymptotic acceptance rate than random walk Metropolis-Hastings algorithm (RWMH, e.g., Robert and Casella (2004)) by introducing the transition density induced by the Langevin diffusion to the target distribution. Chakraborty et al. (2011) discuss and introduce various ad hoc approaches used by ecologists in the context of presence-only datasets. They implemented LGCP with Gaussian predictive processes approximation (GPP, Banerjee et al. (2008)). Diggle et al. (2013) survey models of space-time and multivariate LGCP and implement manifold MALA (MMALA, Girolami and Calderhead (2013)), which is manifold extension of MALA to incorporate geometrical structure of the target information 2

3 within Langevin dynamics. Although these algorithms are potentially efficient, these algorithms require the careful tuning of some parameters in the transition density. Incorporation of the density information also requires further computational time. More recently, Leininger and Gelfand (2016) implemented elliptical slice sampling proposed by Murray et al. (2010) for sampling high dimensional Gaussian latent variables of spatial LGCP. Although this algorithm might be relatively slow, the algorithm does not require the fine tuning and further computation of the target density information. As an alternative Bayesian inference stream, Integrated nested Laplace approximation (INLA, Rue et al. (2009)) was proposed, which is an highly efficient approximate Bayesian inference scheme for structured latent Gaussian process models. This approach take the different path from MCMC, i.e., calculate the marginal posterior distribution of parameters with implementing Laplace approximation (Tierney and Kadane (1986)). By approximating Gaussian random field (GRF) by the Gaussian Markov random field (GMRF, Rue and Held (2005)) structure for the precision matrix of Gaussian prior, they accomplish computationally efficient implementation of sampling of posterior marginal distribution of parameters and latent Gaussian variables. Although their approach is based on GMRF approximation, Lindgren et al. (2011) shows the connection of GMRF to GRF through the stochastic partial differential equation (SPDE). Hence, they can estimate the concerning parameters of GRF through the connection while utilizing the computational efficient structure of GMRF. Illian et al. (2012) implemented INLA for the inference of LGCP and demonstrates its applicability. More recently, Simpson et al. (2016) implement INLA with Lindgren et al. (2011) for the inference of LGCP and discuss its convergence properties. Although the INLA approach is successfully implemented for the latent GP models, the fundamental assumption is that the dimension of parameter is low (basically 2 to 5, but not exceeding 20, see Rue et al. (2016)). However, in practice, we face some situation where the estimation of larger dimensional parameters, e.g., relatively large dimensional covariate information is available, is our main concerns (e.g., Waagepetersen et al. (2016)). Recently, Bayesian inference of marginal posterior from the MCMC perspective was proposed, called pseudo-marginal MCMC approach, by Andrieu and Roberts (2009). The algorithm is just to input the unbiased estimator of marginal likelihood integrated over latent variables into acceptance ratio instead of likelihoods themselves. The surprising property of the method is that the convergence to exact marginal posterior distribution is guaranteed when we use the unbiased estimator of marginal likelihood integrated over latent variables. The efficiency of this algorithm is dependent on the variance of the estimator. Hence, the primal task for this algorithm is to construct the unbiased estimator while keeping its variance as small as possible. The straightforward construction of this estimator is the importance sampling. How- 3

4 ever, the direct implementation of PM-MCMC for LGCP has a big computational problem, i.e., implement high dimensional importance sampling to construct unbiased estimate. Although the accuracy of the inference is dependent on the resolution degree of approximation, the large grid approximation increase the variance of the unbiased estimator. In this paper, we propose comprehensive Bayesian inference scheme for LGCP with estimation of approximate marginal posterior by PM-MCMC approach. This computational scheme is composed of two steps. At the first step, we estimate the approximate marginal posterior of parameters by PM-MCMC. Since the direct implementation is difficult due to the high dimensionality of Gaussian latent variables, we take the grid approximation over the study region and convert the likelihood into the multivariate Poisson log normal (mpln, Aitchison and Ho (1989)). At the next step, we calculate the marginal posterior of Gaussian latent variable. Hence, our computational scheme is similar to the spirit of INLA. Based on these approximate posterior samples, we suggest two types of model validation strategies for LGCP. We also investigate our methodology through the simulation studies and real data applications. 2 Log Gaussian Cox Processes and Their Bayesian Inference Let S = {s 1,..., s n } be the observed point pattern and D be the study region, the likelihood of LGCP is defined as ( L(S θ) = exp D D ) λ(u θ, z(u))du λ(s θ, z(s)) (1) log λ(s θ, z(s)) = X(s)β + z(s), z GP(0, C ζ ( )) (2) where X(s) is a covariate vector, θ = (β, ζ) is a parameter vector where β is a coefficient vector and ζ is a hyperparameter vector for Gaussian covariance function, z(s) is a Gaussian process at location s. The likelihood has infinite dimensional stochastic integral inside exponential, which is analytically intractable. Hence, in practice, some approximation of this integral is required. The straightforward approach is to take finite grid approximation, i.e., K k=1 λ(u k θ, z(u k )) k (Møller et al. (1998) and Illian et al. (2012)). Since the convergence of posterior samples based on the approximated likelihood to exact posterior distribution of concerning parameters are guaranteed when K go to infinity (Waagepetersen (2004)), we can implement MCMC given the approximated likelihood. For this approximation, n + K GP outputs have to be sampled, so we need calculate inverse of n + K GP covariance within each MCMC 4 s S

5 iteration. Due to this computational cost, some variations of Gaussian predictive processes (GPP) approach proposed recently in geostatistics contexts, e.g., nearestneighbor GP (NNGP, Datta et al. (2015)) and multi-resolution GP (MRGP, Katzfuss (2016)), are recommended as MCMC based inference of LGCP. An alternative approach is sigmoidal Gaussian Cox processes (SGCP) by Adams et al. (2009). This approach specifies the intensity as λ(s) = λ /(1 + exp( z(s))), where λ is the maximum of the intensity surface over the study region. By introducing and sampling thinned events {s thin } as latent variables, this approach avoid the numerical integration inside the exponential. Although this approach might be applicable for some case, e.g., when the number of points is small or the smooth intensity surface is expected, we still need to sample high dimensional latent variables, i.e., GPs on observed and thinned locations. Hence, in below discussion, we assume MCMC based on the grid approximated likelihood strategy as the default Bayesian inference for LGCP. These MCMC based inference for LGCP requires the calculation of inverse or Cholesky decomposition of covariance matrices of high dimensional GPs within each MCMC iteration. Its computational cost is O(n 3 ) computational time and O(n 2 ) memory for store. In addition to the computational cost for inverse calculation of LGCP, the sampling inefficiency between Gaussian latent variables and parameters is observed (Filippone and Girolami (2014)). The hyperparameters of covariance function depend on the sampled Gaussian latent variables. When the sampling of GPs is inefficient, the sampling of hyperparameters is also inefficient. Furthermore, the estimation of the coefficients of spatial covariates has also high correlation with Gaussian latent variables. To facilitate the sampling of Gaussian latent variables, the sophisticated sampling schemes have been developed, e.g., Hamiltonian Monte Carlo (Neal (2010)), Metropolis adjusted Langevin algorithm (Roberts and Rosenthal (1998), Møller et al. (1998)) and their manifold extension (Girolami and Calderhead (2013)). However, the performance of these algorithms is dependent on the careful tuning of some parameters, or requires the calculation of computationally heavy pre-conditioning matrix (Girolami and Calderhead (2013)). In order to avoid detailed tuning of algorithms, Leininger and Gelfand (2016) implemented elliptical slice sampling (Murray and Adams (2010)) for the inference of LGCP. Hence, the main computational costs for the Bayesian inference of LGCP is (1) evaluation of inverse or Cholesky decomposition of covariance matrices for sampling high dimensional Gaussian latent variables within MCMC iteration and (2) inefficiency caused by the correlation between Gaussian latent processes and related parameters. In below subsections, we describe some computational schemes for latent Gaussian process models especially in the LGCP perspective and their pros and cons. 5

6 2.1 Metropolis Adjusted Langevin Algorithm Metropolis adjusted Langevin algorithm (MALA) is originally proposed by Roberts and Rosenthal (1998). This approach is a MH algorithm with transition density driven by Langevin diffusion (see, Roberts and Tweedie (1996)), i.e., z = z (i 1) + σ n v (i 1) + σ2 n 2 log{π n(z (i 1) )} (3) where the random variables v (i 1) are distributed as independent standard normal and σ 2 n is the step variance. In contrast to traditional random walk MH algorithms, Langevin algorithm utilizes the local information of the target density. Actually, since the Langevin algorithm utilize the structure of the target density, the higher acceptance rate is accomplished than random walk MH (Roberts and Rosenthal (1998)). Girolami and Calderhead (2013) proposed manifold MALA (MMALA), which incorporates the information matrix of the target density as a pre-conditioning matrix M, i.e., z = z (i 1) + σ n M 1/2 v (i 1) + σ2 n 2 M 1 log{π n (z (i 1) )} (4) where M is the expected information matrix. They also implemented the algorithm for various applications, logistic regression, stochastic volatility models, log Gaussian Cox processes and dynamic systems drive by non-linear differential equations. Although their approach is efficient, calculating of information matrices are computationally costly. Although MALA and its extension are efficient algorithms, they require careful tuning of parameters. 2.2 Integrated Nested Laplace Approximation Integrated nested Laplace approximation (INLA) proposed by Rue et al. (2009) is a highly efficient Bayesian approximate inference for structured latent Gaussian models. More recent review is Rue et al. (2016). Let y = {y 1,..., y n } be observed dataset, the goal of this algorithm is to estimate approximate marginal posterior of Gaussian latent variables, i.e., π(z y). In their approach, they implement Laplace approximation (Tierney and Kadane (1986) and Barndorff-Nielsen and Cox (1989)) for calculating both marginal posterior for θ and for z. The marginal posterior of z is evaluated as π(z y) = j π(z θ j, y) π(θ j y) θj (5) 6

7 The sum is over gridded values of θ j with area j. Calculation of π(θ y) is as follows π(θ y) π(z, y, θ) π(z θ, y) z=z (θ) where π(z θ, y) is the Gaussian approximation to the full conditional of z, and z (θ) is the mode of the full conditional for z given θ (Rue et al. (2009) and Martins et al. (2013)). When π(z θ, y) = π(z θ, y), π(θ y) = π(θ y) for arbitrary z. For calculation of π(z θ, y), they implemented Laplace approximation for Gaussian Markov random field (GMRF, Rue and Held (2005)) approximation for precision matrices of Gaussian processes to obtain fast computation. In the Gaussian case, zeros for pairs of conditionally independent values in the precision matrix means the conditional independence. Due to this property, the computational cost for spatial GMRF case is O(n 3/2 ) computational time with O(n log(n)) memory for store (see Rue and Held (2005)). Illian et al. (2012) and Simpson et al. (2016) implement INLA approach for LGCP and investigate its convergence properties by utilizing the theoretical results for the connection of GMRF and GRF by Lindgren et al. (2011). Although INLA based inference for LGCP is highly efficient, there are some unsolved problems. Firstly, INLA approach have to evaluate π(θ j y) for each gridded θ j and then integrate over π(θ y) to calculate π(z y). For example, if we take 3 integration points in each dimension, the cost would be 3 p to cover all combinations in p dimensions, which is 729 for p = 6 and for p = 10. Hence, INLA based inference is available only for low dimensional θ with coarse grids (see, Rue et al. (2009), Illian et al. (2012), and Simpson et al. (2016)). In practice, since the number of hyperparameters for covariance function of spatial Gaussian processes and coefficients of concerning covariates are not so large, e.g., less than 10, this approach still work. However, large number of point patterns potentially have more complex covariance specification, e.g., nonstationary, nonseparable space-time, multivariate spatial, and more spatial covariates researchers have concerns. Hence, the low dimensional assumption might be a fatal bottleneck for the INLA approach. Furthermore, although some sophisticated variations of Gaussian approximation have been investigated (Martins et al. (2013)), Gaussian approximation for π(z θ, y) might not be accurate enough. For LGCP, the likelihood is point process, approximated by Poisson process on each grids. When the number of counts on each grid is very small, Gaussian approximation might not be accurate enough. (6) 7

8 3 Approximate Marginal Posterior for Log Gaussian Cox Processes Both INLA and MALA approaches have some limitations especially for high dimensional Gaussian latent variables or large number of parameters, e.g., over 10 dimension. In this paper, we propose the different computational scheme based on pseudomarginal MCMC proposed by Andrieu and Roberts (2009). Since this scheme is specific for LGCP, we assume y = S, i.e., observed dataset is a point pattern, in below discussion. Our basic strategy is similar to the INLA approach, i.e., consider an efficient and accurate Bayesian inference for approximate marginal posterior distribution of θ, i.e., π(θ S). This approximate marginal posterior distribution is constructed by the different way from INLA and estimated in MCMC framework. After obtaining this approximate marginal posterior, we can calculate the marginal posterior of Gaussian latent processes as π(z S) = I+i 0 i=i 0 +1 π(z θ (i), S) π(θ (i) S) (7) where I is the number of approximate marginal posterior samples and i 0 is the end point of burn-in period. Given the preserved {θ (i) } I+i 0 i=i 0 +1, we could estimate π(z θ (i), S) at each θ (i). Since we can sample π(z θ (i), S) at fixed θ (i), calculation of inverse or Cholesky decomposition of covariance matrices can be parallelized with respect to different θ (i). Hence, the computationally heavy iteration of these calculation through MCMC is not required. Furthermore, since θ (i) is fixed, z converge fast to π(z θ (i), S). The posterior samples of z are obtained by elliptical slice sampling for LGCP (Leininger and Gelfand (2016)) or MALA (Møller et al. (1998) and Diggle et al. (2013)). This step can be implemented by parallel computation schemes, since we only need to sample z S, θ (i) at fixed θ (i). Hence, the question move to How can we accurately estimate π(θ S)?. We consider implementation of pseudo-marginal approach proposed by Andrieu and Roberts (2009) for LGCP. 3.1 Pseudo-Marginal for Exact MCMC Andrieu and Roberts (2009) propose the pseudo-marginal approach. This approach enable us to sample posterior marginal of parameters efficiently when latent variables exit. The key point is to construct an unbiased estimate of marginal likelihood ˆπ(S θ) by integrating out latent variables and put this likelihood into the acceptance ratio, 8

9 i.e., α = { 1, π(θ )ˆπ(S θ )q(θ θ ) π(θ)ˆπ(s θ)q(θ θ) } (8) If u < α, preserve θ and ˆπ(S θ ). Surprisingly, the convergence to π(θ S) is guaranteed as long as ˆπ(S θ) is an unbiased estimate of π(s θ). Andrieu and Roberts (2009) verify the uniform ergodicity of the algorithm. The efficiency is dependent on the variance of ˆπ(S θ). When ˆπ(S θ) is a noisy estimate, the samples with above acceptance ratio would be highly correlated. So, the primal task for pseudo-marginal approach is to construct an unbiased estimate ˆπ(S θ) with small variance. The straightforward approach is the importance sampling. Due to the theory of importance sampling (e.g., Robert and Casella (2004)), we can construct the smaller variance unbiased estimate than the direct Monte Carlo estimate. Filippone and Girolami (2014) implemented the pseudo-marginal approach for estimating hyperparameters of GPs. They utilize Laplace approximation (Tierney and Kadane (1986) and Barndorff-Nielsen and Cox (1989)) and expectation propagation (Minka (2001)) as the importance density to construct the unbiased estimate. For the LGCP, we need to take grid approximation over the study region. For the accurate implementation, the sufficiently fine grids are required for approximating infinite dimensional stochastic integral. However, the importance sampling for high dimensional case is not promising. Let B = (B 1,..., B M ) be M disjoint subregions over D and T (S) = (T 1 (S),..., T M (S)) and δ = (δ 1,..., δ M ) be counts and intensity on each subregion B m for m = 1,..., M. The basic strategy is (1) divide the study region into subregions B, (2) taking counts on each subregions as count summary statistics T and (3) construct likelihood for the vector of count summary statistics given θ. For the third step, we can utilize the multivariate Poisson log normal (mpln) kernel function (Aitchison and Ho (1989)). So, approximate for marginal posterior means Grid approximation of the study region. Then the meaning π(θ S) is π(θ S) = π(θ T (S)) (9) The direct implementation of pseudo-marginal approach require the high dimensional grid approximation over the study region D, i.e., integrating out large M dimensional Gaussian latent variables. The estimator is given by ˆπ(T (S) θ) = 1 N imp N imp j=1 π(t (S) δ j )π(δ j θ), δ j g(δ j T (S), θ) (10) g(δ j T (S), θ) where N imp is the number of samples from importance density. When M is large, 9

10 obtaining the low variance estimate is computationally tough because very large N imp is needed. The straightforward implementation requires large M because this implementation assumes homogeneous Poisson in each grid as in standard MCMC and in INLA approaches. Then, the first and second order moments are evaluated at and between representative points of grids. On the other hand, in our settings, we can avoid high dimensional integration by utilizing the first and second moment equations induced by general moment formula for Cox processes. The key point is that even if we keep M is relatively low dimension, we can calculate the exact first and second order moments from the general moments formula for Cox processes. These moments formula is based on the integration of intensity and its product with pair correlation function over the subregions. So, although we take grid approximation like straightforward implementation, the first and second order moments of T is induced by the exact moments of LGCP. These exact moments eliminate biases cased by the grid approximation (homogeneous Poisson on each grid) in INLA and MCMC based algorithms. 3.2 Kernel Mixture Marginalization We consider the kernel mixture marginalization for the density of summary statistics vector, i.e., π(t (S) θ) = M m=1 P(T m (S) δ m )π(δ θ)dδ (11) where π(δ θ) is the prior distribution of the intensity vector. This mixture representation is to incorporate correlation structure of counts into the intensity distribution. Given the intensity, the counts between different grids are independent. From the moment formula for general Cox processes, we can obtain the first and second moments of marginal counts summary vector given θ, i.e., α θ,m = E[T m (S) θ] and β θ,mn = Cov[T m (S), T n (S) θ] for m, n = 1,..., M, α θ,m = λ θ (u)du (12) B m β θ,mn = λ θ (u)du + λ θ (u)λ θ (v){g θ (u, v) 1}dudv, (13) B m B n B n B m λ θ (s) = E z [λ(s θ, z(s))] (14) where g θ (u, v) is pair correlation function of latent processes, which can be anisotropic and nonstationary expression. Since B m and B n are disjoint, B m B n λ θ (u)du = B m λ θ (u)du for m = n and 0 otherwise. λ θ (s) = E z [λ(s θ, z(s))] is the expected intensity with respect to z. 10

11 Although the marginal likelihood π(t θ) is not analytically available, the first and second order moments can be calculated from above formula for any latent processes assumption. In practice, these moments can be accurately calculated by grid approximation, i.e., ˆα θ,m = ˆβ θ,mn = N Bm b=1 N Bm Bn b=1 λ θ (u b ) Bm (15) λ θ (u b ) Bm B n + N Bm b=1 N Bn b =1 λ θ (u b )λ θ (v b ){g θ (u b, v b ) 1} Bm Bn where N Bm and Bm is the number of grids and area of an unit grid within B m, and both of them can be different among B m s. For simplification, we assume N Bm = N B and Bm = B for all m. Importantly, B is a M-dimensional disjoint grid vector over the study region to construct T, the above moment calculation is implemented by taking further grids within each B m. (16) That is, we take M-dimensional count summary statistics vector over the M-dimensional disjoint subregions at first stage, then we can calculate the first and second order moments of T θ on B through the above moments formula directly. Although the analytical expression of π(t (S) θ) is not available, the first and second moments of T (S) θ is available. We try to utilize these moments information for construct the unbiased estimator ˆπ(T (S) θ). In below discussion, we introduce mpln kernel, which is an appropriate kernel function for LGCP to construct the unbiased estimator. multivariate Poisson log normal kernel By using first and second moments of count summary statistics, we could introduce mpln kernel function defined as π(t (S) θ) = M m=1 P(T m (S) δ m )LN (δ µ θ, Σ θ )dδ (17) where µ θ and Σ θ are the mean vector and covariance matrix of log(δ). In this approach, the latent intensity parameter δ is introduced and the marginal correlation structure is incorporated into the log normal distribution of δ. The marginal mean and variance of T (S) θ (see, Aitchison and Ho (1989)) is α θ,m = exp ( µ θ,m + σ θ,mm 2 ), m = 1,..., M (18) β θ,mm = α θ,m + α 2 θ,m{exp(σ θ,mm ) 1}, m = 1,..., M (19) β θ,mn = α θ,m α θ,n {exp(σ θ,mn ) 1}, m n = 1,..., M (20) 11

12 where σ θ,mn is the (m, n) element of Σ θ. Since, through the moments formula, we can calculate the marginal mean and covariance, E[T m ( ) θ] and Cov[T m ( ), T n ( ) θ] for m, n = 1,..., M given θ, we also calculate µ θ,m and σ θ,mn for m, n = 1,..., M. Hence, µ θ and Σ θ are induced from α θ and β θ as µ m = log(α θ,m ) σ mm ( 2 σ mm = log 1 + β θ,mm 1 σ mn = log ( α 2 θ,m 1 + β θ,mn α θ,m α θ,n α θ,m ) ) (21) (22) (23) Since β θ,mn for n m can be positive and negative, we could incorporate both positive and negative correlation among counts. However, this specification expresses only overdispersion because marginal variance β mm have to be larger than α m to satisfy σ mm > 0. The total number of parameters in µ θ and Σ θ is 1 M(M + 3), 2 which is the same number as that in α θ and β θ. Hence, the matching between (α θ, β θ ) and (µ θ, Σ θ ) is one to one. We also note that λ θ introduced above is not the intensity including Gaussian latent variables z because the Gaussian latent variables are integrated over. This is analytically available as E[exp(z(s))] = exp(σ 2 /2) (Møller et al. (1998)). The pair correlation function for LGCP is g θ (u, v) = exp(c ζ (u, v)), where C ζ (u, v) is covariance function for GPs at between u and v, which can be anistropic and nonstationary. Since the expressions for α θ and β θ are exact, the induced moments µ θ and Σ θ are also exact mean and covariance for log δ. The main purpose above is to induce µ θ and Σ θ from α θ and β θ. At first, we need to calculate α θ and β θ which are dependent on θ, then transform these values into µ θ and Σ θ. 3.3 Algorithm The algorithm is composed of two main steps. We call the below algorithm approximate marginal posterior (AMP) approach. If marginal posterior distribution for z is not your interests, you can skip the second step. Estimate π(θ S) 1. let i = 1, set initial value θ (0), 2. Generate θ q(θ θ (i 1) ), calculate moments E[T (S) θ ] and Cov[T (S) θ ] and convert these moment vectors into µ θ and Σ θ. 3. Calculate Laplace approximation g(δ T (S), θ ) for π(t (S) δ)π(δ θ ). 12

13 4. Estimate ˆπ(T (S) θ ) as ˆπ(T (S) θ ) = 1 N imp π(t (S) δ j )π(δ j θ ) N imp j=1 g(δ j T (S), θ ) (24) where g(δ j T (S), θ ) is the Laplace approximation of π(t (S) δ)π(δ θ ) evaluated at δ j and δ j g(δ T (S), θ ) for j = 1,..., N imp. 5. Evaluate acceptance ratio α = min { 1, π(θ )ˆπ(T (S) θ )q(θ (i 1) θ } ) π(θ (i 1) )ˆπ(T (S) θ (i 1) )q(θ θ (i 1) ) (25) and preserve θ (i) = θ and ˆπ(T (S) θ (i) ) = ˆπ(T (S) θ ) if u < α where u U(0, 1), otherwise θ (i) = θ (i 1) and ˆπ(T (S) θ (i) ) = ˆπ(T (S) θ (i 1) ). Back to step 2 and i i + 1 Importantly, although our algorithm approximately estimate π(θ S) in the meaning of π(θ S) = π(θ T (S)), pseudo marginal MCMC enable us to estimate π(θ T (S)). Although the posterior variance is dependent on the dimension of T (S), the algorithm estimate the marginal posterior mode even with moderate dimensional M. This point is a difference between AMP approach and grid approximation based approach, e.g., INLA and MALA. These algorithms assume the homogeneous Poisson on each grid, so require the high dimensional (sufficiently fine) grids for accurate inference. On the other hand, our approach is calculate the exact moments accurately enough for each grid (α θ and β θ ), the marginal posterior modes of parameters would be included within approximate marginal posterior credible intervals. When the estimator is heavy-tailed, it is difficult to accept a change from a large ˆπ(T (S) θ), and the Markov chain would stop moving for a long time. The efficiency of algorithm is dependent on the variance of ˆπ(T (S) θ). We discuss in more details about the construction of ˆπ(T (S) θ) later. Estimate π(z θ, S) and π(z S) (optional step) 1. Given θ (i) for i = i 0 +1,..., I+i 0, where i 0 is the end point of burn-in period and I is the number of preserved approximate marginal posterior samples, estimate π(z θ (i), S). 2. Calculate π(z S) as π(z S) = I+i 0 i=i 0 +1 π(z θ (i), S) π(θ (i) S) (26) 13

14 There are three important points here. First, since we preserved {θ (i) } I+i 0 i=i 0 +1 through the first step, we can calculate π(z θ (i), S) separately for each θ (i). The computational bottleneck for the sampling of LGCP is iteration of Cholesky decomposition or inverse calculation of alarge dimensional covariance matrix, which depends on θ, within MCMC iteration. However, since we preserve approximate marginal posterior samples, {θ (i) } I+i 0 i=i 0 +1 through the first step, we just need to take only one time Cholesky decomposition of inverse calculation of huge covariance matrix for ech θ (i). These calculation can be parallelized without passing information. Second, π(z θ (i), S) does not necessarily need approximation for sampling GPs, e.g., nearest-neighbor GPs (Datta et al. (2015)) and multiresolution GPs (Katzfuss (2016)). Since the iteration of calculation of inverse or Cholesky decomposition is not required, we can implement them for relatively large covariance matrices without approximation. Given fixed θ (i), convergence of π(z θ (i), S) is dramatically fast, e.g., elliptical slice sampling (see, Murray et al. (2010) and Leininger and Gelfand (2016)), even without fine tunings for Hamiltonian Monte Carlo or MALA. Finally, AMP approach does not require the grid approximation over θ, this is a fatal bottleneck of extension of INLA approach to relatively larger dimensional θ, e.g., over 10 dimension. Although π(θ S) is still approximation of π(θ S) in the meaning of π(θ S) = π(θ T (S)), the marginal posterior mode is well estimated as shown in simulation studies. AMP approach is potentially advantageous for larger dimensional θ than the INLA approach. 3.4 Construction of Unbiased Estimator The efficiency and accuracy of our algorithm is dependent on the construction of an unbiased estimator. When the variance of the estimator is small, the PM-MCMC converge efficiently. Through the theory of importance sampling (Robert and Casella (2004)), the importance sampling is an unbiased estimator with smaller variance than Monte Carlo estimator. Hence, we assume the importance sampling as a default choice for constructing an unbiased estimator. One question is how large we should take the number of importance particle, N imp. Doucet et al. (2015) suggest that N imp should be taken so that the variance of estimator of log likelihood is 1. Alternative approach is expectation propagation (EP, Minka (2001)). Filippone and Girolami (2014) implemented EP in addition to Laplace approximation, they suggested that EP might be more robust to the choice of N imp. However, in this paper, we assume Laplace approximation as an importance density because the related discussion to INLA is available, but EP is also directly applicable. Although INLA approach requires the fine grids over the study regions, for some grids, there are small number of points. Since Laplace approximation for the small 14

15 counts deviate from Gaussian distribution, skewness corrected method is considered in INLA approach (Martins et al. (2013)). On the other hand, AMP approach can potentially contain relatively larger number of counts in each grid because M would be lower dimension than INLA approach. Posterior distribution of log δ given T (S) is closer to the Gaussian distribution, so Laplace approximation of the posterior distribution of log δ is a promising option especially when large number of points are observed. 3.5 Computational Costs and Tuning Parameters There are three main computational costs for implementing AMP algorithm. Firstly, first and second order moments of T (S) θ, i.e., α θ and β θ, have to be calculated for each proposed θ. Especially, calculation of β θ would be time consuming because the number of components is M(M + 1)/2. So, the straightforward computational cost for these moment calculation is O(N B M 2 ). On the other hand, these moments are deterministic funtions of parameters. For calculating these moments, modern distributed computational tools, e.g., graphical processing units (GPU), are available. Given θ, each first and second order moments are obtained independently without passing information. Hence, this computation would be reduced to O(N B ), and not to be a bottleneck. Secondly, we need to generate N imp samples from importance density, so decide N imp. Doucet et al. (2015) suggest N imp should be decided so that the variance of loglikelihood is 1 under the assumption that the distribution of additive noise for loglikelihood estimator is Gaussian with variance inversely proportional to the number of samples and independent of the parameter value at which it is evaluated. Since AMP algorithm does not require large M, N imp is not so large. When the M is relatively small and large number of points are observed, the Laplace approximation for the posterior distribution of log δ would be close to the Gaussian distribution. Then, the importance density is close to the posterior density of log δ, the N imp is not required to be large. Finally, inverse and Cholesky decomposition of M M matrix is necessary for generating samples from and evaluate importance density. Although our approach can keep M relatively low dimension, larger M provide more information about π(θ S). Hence, the main computational cost for AMP algorithm is O(M 3 ). In practice, we suggest that M should be kept from and N B is adjusted so that N B M is close to the necessary number of grids in the INLA approach or standard MCMC approach, e.g., MALA, elliptical slice sampling, and Hamiltonian Monte Carlo. 15

16 3.6 Some Extensions Nonstationary There are varieties of approachs have been developed for nonstationary spatial processes. One parametric family of nonstationary covariance function was proposed by Paciorek and Schervish (2006). Alternative direction is the deformation approach proposed by Sampson and Guttorp (1992) which transform the stationary random field into nonstationary by deforming the space. Some Bayesian extension have been also proposed by (Damian et al. (2001), Schmidt and O Hagan (2003), Schmidt et al. (2011) and Bornn et al. (2012)). Other approach is kernel convolution (Higdon (1998)). AMP approach is easily extended to nonstationary covariance case as long as the analytical expression of covariance function is available, then the pair correlation functin is also analytically available, which is exponential of covariance function. Moment formula itself is generally applicable even for anisotropic and nonstationary cases. Hence, as long as pair correlation function is analytically available, AMP approach is extended to the nonstationary case. When the analytical expression is not available, we approximate pair correlation by further grid approximation. For example, the expression of covariance function is convolution of kernel function in the kernel convolution approach. Then, by taking the grid approximation of the convolution, we can still implement AMP approach. Fortunately, the calculation of moments and pair correlation function can be parallelized without passing information. AMP approach is flexibly available for the nonstationary case. Multivariate Møller et al. (1998) suggests the multivariate extension of LGCP. Brix and Møller (2001) consider the common latent process and independent process specification for bivariate LGCP. Waagepetersen et al. (2016) propose factor like specification for common latent process which is a promising direction for higher dimensional LGCP. The estimation strategies are based on minimum contrast estimator with respect to pair correlation function for multivariate LGCP. Since the Bayesian computational cost for these processes is huge, we rarely find the preceding literature of Bayesian inference of multivariate LGCP except for the bivariate case. For the multivariate extension, specification of cross covariance function (see, Genton and Kleiber (2015)) for LGCP is required. Let L is the dimension of a point pattern, the simplest form of cross covariance function is a separable form, C ll (s 1, s 2 ) = ρ(s 1, s 2 )a ll, s 1, s 2 R 2, (27) for all l, l = 1,..., L, where ρ(s 1, s 2 ) is a valid, non-stationary or stationary correlation function and a ll = cov(z l, Z l ) is the nonspatial covariance between variables 16

17 l and l. An alternative choice is the linear model of coregionalization (LMC, see, Schmidt and Gelfand (2003) and Banerjee et al. (2014)). It consists of representing a multivariate random field as a linear combination of H < L independent univariate random field. The resulting cross-covariate functions takes the form H C ll (u, v) = ρ h (u, v)a lh a l h (28) h=1 where A is the L H matrix, whose (i, j) component is a ij. Then, we can define the l-th Gaussian components inside the intensity function at location s as z l (s) = Hh=1 a lh v h (s) for l = 1,..., L where v 1,..., v H is mean 0 and variance 1 Gaussian processes with spatial correlation ρ h ( ). The cross pair correlation function ) between l-th and l -th components is gθ ll (u, v) = exp ( H h=1 ρ h (u, v)a lh a l h. Furthermore, β ll θ,mn = Cov[T l m, T l n θ], i.e., covariance of counts of l-th and l -th components on B m and B n is βθ,mn ll = 1(l = l ) λ l θ(u)du + λ l θ(u)λ l θ (v){gθ ll (u, v) 1}dudv, (29) B m B n B m B n where λ l θ( ) and λ l θ ( ) are intensity of l-th and l -th components. Waagepetersen et al. (2016) consider the factor like specification which is a special case of LMC as a cross covariance specification for multivariate LGCP. Importantly, the number of parameters would be larger than 20 in their multivariate LGCP. For these approaches, the INLA approach is difficult to be implemented. The computational cost required for multivariate extension is O((M L) 3 ). Although our algorithm doesn t require the large M, it computational cost would be huge for when L is large, e.g., over 30. On the other hand, if we assume independent LGCP for each component, the computational costs still remain O(M 3 ). As we discussed, there are two main computational tasks (1) moment calculation and (2) Laplace approximation. When we consider the convolution based cross covariance (Ver Hoef and Barry (1998), Ver Hoef et al. (2004) and Majumdar and Gelfand (2007)), the pair correlation function is not analytically available. Then, the additional calculation of the grid approximation of convolutional expression of pair correlation function is required. However, this steps is mainly related to moment calculation part, this part can be parallelized without passing information. Hence, our approach is potentially available for these convolution based cross covariance case, too. Space and time Space and time LGCP processes, proposed by Brix and Diggle (2001), might have been attracting more interests than multivariate LGCP. Shirota and Gelfand (2015) 17

18 applied separable and nonseparable space and circular time LGCP for crime event datasets. Let M s and M t be the number of grids for space and time, the computational cost for nonseparable space-time covariance function is O((M s M t ) 3 ). Hence, we need to keep both M s and M t small for fast computation. More practical strategy is to consider the sparse covariance specification with respect to time. It would reduce the computational time. 4 Model Validation for Log Gaussian Cox Processes Our specification enable us to estimate π(θ T (S)). Given approximate marginal posterior samples θ T (S), we could implement model comparison. Since sampling posterior marginal z is an optional step in our algorithm, we propose two model comparison strategies. p-thinning cross validation The first one is to implement p-thinning cross validation approach proposed by Leininger and Gelfand (2016). Due to the conditional independence property of Cox processes, we can obtain the posterior predictive intensity surface for the test dataset from the training dataset. For this approach, we sample z from marginal posterior in the second step of algorithm. Let p denote the retention probability, i.e., we delete s i S with probability 1 p. This produces a training point pattern S train and test point pattern S test, which are independent, conditional on λ(s). In particular, S train has intensity λ train (s) = pλ(s). We set p = 0.5 and estimate λ train (s) for s S train. Then, we convert the posterior draws of λ test (s) using λ test (s) = 1 p p λtrain (s). Let {Q k } be a collection of subsets of D. After fitting the model to obtain λ test, the posterior predictive distribution of N(Q k ) is available. For the choice of {Q k }, Leininger and Gelfand (2016) suggest to draw random subsets of the same size uniformly over D, i.e., if the area of each Q k is q D, then q is the size of each Q k. Next, calculating the predictive residuals in each subset; they argue that making the subsets disjoint is time consuming and unnecessary. Based on the p-thinning cross validation, we consider two model performance criteria: (1) predictive interval coverage (PIC) and (2) ranked probability score (RPS, Gneiting and Raftery (2007)). assessment of model adequacy, RPS enables model comparison. PIC offers After the model is fitted to S train, the posterior predictive intensity function can supply posterior predictive point patterns and therefore samples from the posterior predictive distribution of N(Q k ). For the i-th posterior sample, i = 1 + i 0,..., I + i 0, 18

19 the associated predictive residual is defined as R pred l (Q k ) = N test (Q k ) N (i) (Q k ) where N test (Q k ) is the number of points of the test data in Q k. If the model is adequate, the empirical predictive interval coverage rate, i.e., the proportion of intervals which contain 0, is expected to be roughly the nominal level of coverage; below, we choose 90% nominal coverage. Empirical coverage much less than the nominal suggests model inadequacy; predictive intervals are too optimistic. Empirical coverage much above, for example 100%, is also undesirable. It suggests that the model is overfitting, introducing more uncertainty than needed. Gneiting and Raftery (2007) propose the rank probability score (RPS). This score is derived as a proper scoring rule and enables a criterion for assessing the precision of a predictive distribution. That is, we seek to compare a predictive distribution to an observed count. Intuitively, a good model will provide a predictive distribution that is very concentrated around the observed count. While the RPS has a challenging formal computational form, it is directly amenable to Monte Carlo integration. In particular, for a given B k, we calculate the RPS as I+i 0 RPS(F, N test (Q k )) = 1 N (i) (Q k ) N test (Q k ) I i=1+i 0 1 2I 2 I+i 0 i=1+i 0 I+i 0 N (i) (Q k ) N (i ) (Q k ) i =1+i 0 Summing over the collection of Q k gives a model comparison criterion. Smaller values of the sum are preferred. Posterior functional summary statistics The alternative approach is to compare the posterior functional summary statistics. This approach does not require the sampling of z S, θ, but require the simulation of point pattern S (i) for each θ (i). Although the direct comparison of counts on subregions of observed point pattern S obs and simulated point patterns S (i) is straightforward approach, this is not available because point patterns from LGCP is heavily dependent on the realization of Gaussian processes. Then, comparing functional summary statistics would be a more promising way. Leininger (2014) discuss about Bayesian alternative of functional summary statistics. 19

20 5 Simulation Studies In this section, we investigate two simulation examples (1) univariate LGCP and (2) three dimensional LGCP. The study region is defined as D = [0, 1] [0, 1]. 5.1 Example 1: univariate LGCP In this first example, we consider the univariate LGCP and investigate the influence of the choice of some tuning parameters. The model we assume is λ(s) = λ 0 exp(β 1 s x β 2 s y z(s)), z GP(0, C φ ) (30) where s = (s x, s y ), C ζ (s 1, s 2 ) = σ 2 exp( φ s 1 s 2 ) and ζ = (σ 2, φ). The true parameter values are (λ 0, β 1, β 2, σ 2 ) = (400, 3, 3, 1). We set the decay parameter at three different smoothness levels: (1) φ = 1 (smooth), (2) φ = 3 (moderate) and (3) φ = 5 (rough). We discard first i 0 = 1, 000 samples as the burn-in period and preserve subsequent I = 5, 000 samples as posterior samples Figure 1 is the plot of univariate LGCP for three different smoothness. The number of simulated points for each case is (1) n = 3474, (2) n = 3744 and (3) n = We set N imp = As one benchmark case, we implement elliptical slice sampling algorithm with grid approximated likelihood, i.e., ( ) K K L(S θ) = exp λ(u k θ, z(u k )) λ(u k θ, z(u k )) n k (31) k=1 k=1 where n k is the number of counts in k-th grid, K k=1 n k = n. As our prior choice, we assume λ 0 G(2, 0.01), σ 2 G(2, 1), β 1, β 2 N (0, 100) and flat prior with sufficiently large support for φ. We discard first i 0 = 20, 000 samples as the burnin period and preserve subsequent I = 10, 000 samples as posterior samples. the sampling of parameters, we implement adaptive MCMC for both cases (see e.g., Andrieu and Thoms (2008)). Table 1 3 demonstrates the estimation results. We consider different settings for (M, N B ): (a) (100, 36), (b) (100, 1) (c) (25, 144) and (d) (25, 4). For (a) and (c), (M, N B ) are tuned so that M N B = 60. For (b) and (d), (M, N B ) are tuned so that M N B = 10. The case (a) represents the case where M is relatively large and N B is also taken large enough for accurate moments evaluation. The case (b) corresponds to the case where M is relatively large and N B is taken small not enough for accurate moments evaluation. Likewise, (c) corresponds to relatively small grids and fine moments evaluation. For The case (d) represents the case where both approximation are coarse. The results suggests that true values are recovered even when M is low dimensional as long as N B is sufficiently large. 20

21 Table 1: Estimation results for φ = 1 True Mean Stdev 95% Int Inef Elliptical K = 2500 λ [91.46, 296.6] 555 β [3.113, 4.376] 598 β [2.483, 3.678] 598 σ [0.454, 1.947] 562 φ [0.486, 3.024] 653 σ 2 φ [0.891, 1.574] 402 AMP M = 100, N B = 36 λ [57.67, 559.8] 18.0 β [1.407, 4.459] 14.1 β [0.769, 3.878] 23.6 σ [0.300, 4.148] 20.9 φ [0.179, 3.089] 15.6 σ 2 φ [0.515, 1.441] 15.8 AMP M = 100, N B = 1 λ [62.32, 528.8] 11.7 β [1.627, 4.256] 11.6 β [0.712, 3.551] 13.8 σ [0.296, 4.752] 18.9 φ [0.098, 1.944] 9.8 σ 2 φ [0.348, 0.801] 12.1 AMP M = 25, N B = 144 λ [50.45, 554.5] 22.3 β [1.248, 5.015] 26.2 β [0.620, 4.344] 14.6 σ [0.330, 4.090] 14.0 φ [0.228, 4.532] 26.4 σ 2 φ [0.538, 2.879] 31.5 AMP M = 25, N B = 4 λ [52.01, 575.2] 19.8 β [1.200, 4.922] 20.8 β [0.560, 4.092] 17.6 σ [0.329, 4.628] 45.0 φ [0.160, 3.598] 38.8 σ 2 φ [0.439, 2.034]

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Spatial point processes in the modern world an

Spatial point processes in the modern world an Spatial point processes in the modern world an interdisciplinary dialogue Janine Illian University of St Andrews, UK and NTNU Trondheim, Norway Bristol, October 2015 context statistical software past to

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016

List of projects. FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 2016 List of projects FMS020F NAMS002 Statistical inference for partially observed stochastic processes, 206 Work in groups of two (if this is absolutely not possible for some reason, please let the lecturers

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 1/27 Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information Shengde Liang, Bradley

More information

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Ryan Prescott Adams rpa23@cam.ac.uk Cavendish Laboratory, University of Cambridge, Cambridge CB3 HE, UK

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

Manifold Monte Carlo Methods

Manifold Monte Carlo Methods Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October

More information

Spatial smoothing using Gaussian processes

Spatial smoothing using Gaussian processes Spatial smoothing using Gaussian processes Chris Paciorek paciorek@hsph.harvard.edu August 5, 2004 1 OUTLINE Spatial smoothing and Gaussian processes Covariance modelling Nonstationary covariance modelling

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

FastGP: an R package for Gaussian processes

FastGP: an R package for Gaussian processes FastGP: an R package for Gaussian processes Giri Gopalan Harvard University Luke Bornn Harvard University Many methodologies involving a Gaussian process rely heavily on computationally expensive functions

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models arxiv:1811.03735v1 [math.st] 9 Nov 2018 Lu Zhang UCLA Department of Biostatistics Lu.Zhang@ucla.edu Sudipto Banerjee UCLA

More information

arxiv: v4 [stat.me] 14 Sep 2015

arxiv: v4 [stat.me] 14 Sep 2015 Does non-stationary spatial data always require non-stationary random fields? Geir-Arne Fuglstad 1, Daniel Simpson 1, Finn Lindgren 2, and Håvard Rue 1 1 Department of Mathematical Sciences, NTNU, Norway

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data An Additive Gaussian Process Approximation for Large Spatio-Temporal Data arxiv:1801.00319v2 [stat.me] 31 Oct 2018 Pulong Ma Statistical and Applied Mathematical Sciences Institute and Duke University

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Inexact approximations for doubly and triply intractable problems

Inexact approximations for doubly and triply intractable problems Inexact approximations for doubly and triply intractable problems March 27th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers of) interacting

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

Multimodal Nested Sampling

Multimodal Nested Sampling Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Sub-kilometer-scale space-time stochastic rainfall simulation

Sub-kilometer-scale space-time stochastic rainfall simulation Picture: Huw Alexander Ogilvie Sub-kilometer-scale space-time stochastic rainfall simulation Lionel Benoit (University of Lausanne) Gregoire Mariethoz (University of Lausanne) Denis Allard (INRA Avignon)

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Investigating posterior contour probabilities using INLA: A case study on recurrence of bladder tumours by Rupali Akerkar PREPRINT STATISTICS NO. 4/2012 NORWEGIAN

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Multivariate Gaussian Random Fields with SPDEs

Multivariate Gaussian Random Fields with SPDEs Multivariate Gaussian Random Fields with SPDEs Xiangping Hu Daniel Simpson, Finn Lindgren and Håvard Rue Department of Mathematics, University of Oslo PASI, 214 Outline The Matérn covariance function and

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing

More information