A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

Size: px
Start display at page:

Download "A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging"

Transcription

1 A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging Cheng Li DSAP, National University of Singapore Joint work with Rajarshi Guhaniyogi (UC Santa Cruz), Terrance D. Savitsky (US Bureau of Labor Statistics), and Sanvesh Srivastava (U Iowa)

2 Outline Background of GP D&C Bayes Method Theory Experiments Future Work

3 Background: Gaussian Process Gaussian Process is a commonly tool to model functions (surfaces). Let s,..., s n be n arbitrary different locations in a spatial domain D. f ( ) is the mean function (which we want to model). f i = f (s i ) (i =,..., n) are the functional values at the location s i. A Gaussian process assumes that f = (f,..., f n) jointly follows N (f, C). (C) ij = C(s i, s j ), i, j n. C(, ) is the covariance kernel function. Examples of covariance kernel function: C(s, s ) = τ e φ s s, C(s, s ) = τ e φ s s, C(s, s ) = τ ν Γ(ν) (φ s s ) ν K ν (φ s s ).

4 Fitting Gaussian Process Regression Consider fitting a function f with noise: y(s) = f (s) + ɛ(s). Observed locations S = (s,..., s n); y = (y(s ),..., y(s n)); f = (f (s ),..., f (s n)). We impose a Gaussian process (GP) prior on f : f N (, C(S, S)). (C(S, S)) ij = C(s i, s j ), i, j n. ɛ(s) is a white noise process independent of f ; Var(ɛ(s)) = σ. Let f = f (s ) be the functional value at a testing location s. Then the joint distribution of (y, f ) is [ ] ( [ ]) y C(S, S) + σ I n C(S, s ) f N, C(s, S) C(s, s ) The GP posterior of f given the observed data y is Gaussian, with [ ] E[f y] = C(s, S) C(S, S) + σ I n y, [ ] Var[f y] = C(s, s ) C(s, S) C(S, S) + σ I n C(S, s ).

5 Long-standing Challenge in Fitting GP Inverting the matrix C(S, S) + σ I n requires O(n 3 ) computational cost and O(n ) storage cost. For Bayesian methods, learning the parameters in the model (such as the parameters in the kernel C(, ) and σ ) often requires inverting the matrix C(S, S) + σ I n repeatedly. For commonly used kernels, such as squared exponential kernel and Matérn kernel, the matrix C(S, S) tends to have singularity problem when n becomes large, say n 4. This is one of the main motivations that drive the research on GP.

6 Existing Methods to Speed up GP The general idea is to simplify the structure of C(, ). Low-rank methods: Fit the spatial surface f using r chosen basis functions or knots. Fixed-rank kriging (Cressie & Johannesson 8 JRSSB), Predictive process (Banerjee et al. 8 JRSSB, Finley et al. 9 CSDA). Compacted supported covariance kernel: Covariance tapering (Kaufman, Schervish & Nychka 8 JASA). Sparsity on inverse covariance matrix: Composite likelihood (Stein, Chi & Welty 4 JRSSB), Nearest Neighbor GP (Datta et al. 6 JASA, Datta et al. 6 AOAS) Block structured covariance: Fitting GP on sub-regions (Nychka et al. 5 JCGS, Katzfuss 7 JASA, Guhaniyogi and Banerjee 8 JCGS). Related methods in machine learning: Subset of regressors (Quinonero-Candela & Rasmussen 5 JMLR); Inducing point methods (Wilson & Nickisch 5 ICML), etc.

7 Problems with Existing Methods Low-rank methods are popular for large spatial datasets. In this work, we compare mainly with the modified predictive process (MPP) (Finley et al. 9 CSDA). MPP reduces the computation complexity of GP from O(n 3 ) to O(nr ), where r is the number of knots (chosen basis functions). However, a very large n also requires a much larger r ( n), in order to achieve the same level of accuracy. For example, if n is about million, then the number of knots needs to be a few thousand. Nearest Neighbor GP (NNGP, Datta et al. 6 JASA, Datta et al. 6 AOAS) works well for rough surfaces but not as well for smooth surfaces.

8 Goals We need fast and scalable computational methods to fit GP to spatial datasets with million observations in reasonable amount of time (e.g. in about day). We develop under the Bayesian framework with posterior samplers that can be practically easy to implement. We need theoretical guarantees for the validity of uncertainty quantification.

9 Outline Background of GP D&C Bayes Method Theory Experiments Future Work

10 Divide-and-Conquer Approach for Big Data General steps: Divide the massive datasets of size n into k non-overlapping subsets; Perform statistical inference (estimation, posterior sampling, etc.) on each of the k subsets; Combine the k subset results into a global result (averaged estimator, posterior distribution, etc.) Divide-and-conquer simple and effective: Allows parallel computing; Significantly reduces computational costs locally. Each machine only handles data of size m = n/k.

11 Divide and Conquer Bayes Consensue Monte Carlo (CMC, Scott et al. 6 ) Semiparametric density product (SDP, Neiswanger et al. 4 ) Wasserstein posterior (WASP, Srivastava et al. 5 ), PIE (Li et al. 7 ) Double-parallel MCMC (DP-MCMC, Xue & Liang 7 ) Basic idea: When the data are all independent Posterior Likelihood Prior [ k ] jth subset likelihood prior j= k j= [ jth subset likelihood prior /k] (CMC, SDP) [ k /k (jth subset likelihood) k prior] (WASP, PIE, DP-MCMC) j=

12 Divide and Conquer Bayes Idea: Run MCMC in parallel on disjoint subset data, and combine the subset posterior distributions into a global posterior.

13 Divide and Conquer Bayes PIE (Li, Srivastava, and Dunson 7 ): Average the subset posterior empirical quantiles. Leading to the Wasserstein- barycenter, as an approximation to the full data posterior.

14 For a -dimensional parameter ξ R, PIE Algorithm in 3 Steps Step : Run MCMC on k subsets on k machines in parallel. Draw posterior samples from each subset posterior. Get the posterior samples for ξ. Step : Find the qth empirical quantiles of ξ, for each of the k subset posterior distributions. q (, ) can come from a fine grid on (, ). Step 3: Average the k subset quantiles. Output: Averaged quantiles for ξ = A combined distribution for ξ, as an approximation to the true posterior of ξ based on the full data. Posterior credible intervals. For independent data, this method (as well as many other similar ones) is asymptotically valid for approximating the true posterior of ξ given the full data.

15 Wasserstein Barycenter Wasserstein- (W ) distance between two measures µ and ν: { } W (µ, ν) = inf E(X Y ), law(x ) = µ, law(y ) = ν For univariate distribution functions F and F, their W distance has an equivalent form: [ W (F, F ) = F (q) F (q) ] dq where F, F are quantile functions of F, F. The Wasserstein- barycenter of k distributions µ,..., µ k is µ = arg min µ P k k W (µ, µ j ), where P is the space of all distributions with finite nd moment. j=

16 Outline Background of GP D&C Bayes Method Theory Experiments Future Work

17 A GP-based Spatial Model In this work, we consider a generalized GP regression model with spatial predictors: y(s) = x(s) β + w(s) + ɛ(s). Observed locations S = {s,..., s n}; y = (y(s ),..., y(s n)) ; w = (w(s ),..., w(s n)). x(s) is a p-dimensional predictor vector at location s. X = (x(s ),..., x(s n)). We impose a Gaussian process (GP) prior on w: w N (, C α(s, S)). (C α(s, S)) ij = C α(s i, s j ), i, j n. ɛ(s) is a white noise process independent of w and x; Var(ɛ(s)) = σ. Let s be a testing location and s / S. Let w = w(s ), y = y(s ), x = x(s ). The prior for β is N(µ β, Σ β ). A prior is also imposed on α, σ.

18 Iterate the following steps: Posterior Sampling Algorithm Sample β given y, X, α, σ from N(m β, V β ), where { } V β = X V(α, σ ) X + Σ β, { } m β = V β X V(α, σ ) y + Σ β µ β, V(α, σ ) = C α(s, S) + σ I n. Sample (α, σ ) given y, X, β using the random walk Metropolis-Hastings. Then w, y can be drawn as w y, X, β, α, σ N(m, V ), where m = C α(s, S)V(α, σ ) (y Xβ), V = C α(s, s ) C α(s, S)V(α, σ ) C α(s, s ). y y, X, x, w, β, α, σ N(x β + w, σ ).

19 Distributed Kriging (DISK) What can we do for big n like n = 6? We propose a D&C Bayesian method called DISK. For short, DISK = (Existing D&C Bayes) + (Spatially Dependent Data) We fit the GP-based spatial model on k different subsets of data, and then combine the results. Partition scheme: Uniform sampling - such that every subset is representative of the full data. For example, if PIE (posterior interval estimation, Li et al 7 ) is used, then for each parameter in (β, α, σ, w, y ), the marginal posterior is obtained by the following steps:

20 DISK with PIE Partition the locations into k subsets S = k j=s j, S j S j =. Let n = km, where n is the total sample size and m is the subset size. Fit the GP regression model for the jth subset (j =,..., k) y(s ji ) = x(s ji ) β + w(s ji ) + ɛ(s ji ), i =,..., m, by using the modified posterior { k π m(β, α, σ y j, X j ) l j (β, α, σ )} π(β, α, σ ), ( ) l j (β, α, σ ) = N X j β, V j (α, σ ), V j (α, σ ) = C α(s j, S j ) + σ I m, where (y j, X j ) is the jth subset data at locations S j. Take the qth empirical quantiles of the marginals of π m(β, α, σ y j, X j ) for j =,..., k, and then average them. Perform the same combination for the predictive posteriors of w and y.

21 Outline Background of GP D&C Bayes Method Theory Experiments Future Work

22 Theory Our theoretical results mainly focus on the posterior of w, the values function w at a testing location s. We present upper bounds for the Bayes L risk, and a Bayesian version of bias-variance tradeoff. We first show the results for the simplified model y(s) = w(s) + ɛ(s) without the x(s) β part, and then generalize to the full model y(s) = x(s) β + w(s) + ɛ(s). To ease the technical difficulty, we assume that the finite-dimensional parameters (α, σ ) are fixed (because their estimation is notoriously difficult...) Furthermore, we assume that the testing location s is selected independently from the same distribution that generates the training locations S = {s,..., s n}.

23 Bayes L Risk The Bayes L risk of the DISK posterior is defined as Bayes L risk = E SE S E s [w(s ) w (s )], where E s is the expectation w.r.t. the sampled testing location, E S is the DISK posterior expectation given the training locations S, and E S is the expectation w.r.t. the training location. This Bayes L risk can be decomposed into 3 parts: Bayes L risk = bias + var mean + var DISK They are the squared bias, the variance of subset posterior means (var mean), and the mean of subset variances (var DISK ). The reason for variance terms is the law of total variance var(y ) = var[e(y X )] + E[var(Y X )].

24 Some Notation Let P s be the sampling distribution of s,..., s n, s. Let {φ i } i= be an orthonormal basis of L (P s) w.r.t. P s. The kernel has the series expansion C α(s, s ) = i= µ iφ i (s)φ i (s ) w.r.t. P s for any s, s D, where µ µ... are the eigenvalues of C α. The trace of the kernel C α is defined as tr(c α) = i= µ i. For any f L (P s), f (s) = i= θ iφ i (s), where θ i = f, φ i L (P s). The reproducing kernel Hilbert space (RKHS) H attached to C α is the space of all functions f L (P s) such that the H-norm f H = i= θ i /µ i <.

25 Bayes Bias-Variance Tradeoff More technical assumptions: Assumption : The true surface w lies in the RKHS (reproducing kernel Hilbert space) of C α. (This can be relaxed by using sieve approximation.) Assumption : The covariance kernel C α satisfies tr(c α) <. Assumption 3: There are positive constants ρ and r, such that E Ps [φ r i (s)] ρ r for every i =,,...,, and var [ɛ(s)] σ < for all s D. We derive upper bounds for the three terms in the Bayes L risk, and show that (roughly speaking) for a given total sample size n, as k, bias, var mean, var DISK bias + var mean, where k is the number of subsets.

26 Bayes Bias-Variance Tradeoff bias 8σ n w H + w H inf d N var mean n + 4 w H k var DISK 3 σ n γ inf d N 3n σ σ k n w H + σ n γ ( ) σ + inf n d N 8n ρ 4 tr(c α)tr(cα) d Ab(m, d, r)ρ r γ( σ + µ n ) σ m ρ 4 tr(c α)tr(cα) d Ab(m, d, r)ρ r γ( σ + n ) + m ( ) σ, n 5n σ tr(c α)tr(cα) d Ab(m, d, r)ρ γ( σ + tr(c α) m where N is the set of all positive integers, A is a positive constant, ( ) max(r, max(r, log d) b(m, d, r) = max log d),, γ(a) = i= m / /r µ i µ i + a for any a >, tr(c d α) = µ i. i=d+ n ) r,

27 Examples of Covariance Kernels Implications from the previous results for some kernel classes (with r > 4): Finite rank kernels: if µ d + = µ d =... = for some integer d, and k cn r 4 r /(log n) r r for some constant c >, then the Bayes L risk is O ( n ). Exponentially decaying kernels: if µ i c µ exp ( c µi κ ) for some constants c µ >, c µ >, κ > and all i N, and k cn r 4 r /(log n) (rκ+κ ) κ(r ) ( ) for some constant c >,, then the Bayes L risk is O (log n) /κ /n. Polynomially decaying kernels: if µ i c µi ν for some constants (r 4)ν (r ) and all i N, and k cn (r )ν /(log n) r r c µ >, ν > r r 4 constant c >, then the Bayes L risk is O (n ν ν ). for some

28 Examples of Covariance Kernels Some comments: The rates for finite rank kernels and exponentially decaying kernels are near minimax optimal up to logarithm factors. The rate for polynomially decaying kernels is suboptimal, but ) the difference is small. The minimax optimal rate is O (n ν+ ν. The gap between the two rates is n ν(ν+). The reason for sub-optimality is due to our fixed stochastic approximation (likelihood raised to the fixed power of k). One can possibly make it optimal by using an adaptive power in the subset likelihood.

29 Examples of Covariance Kernels (Convenient) generalization to the full model y(s) = x(s) β + w(s) + ɛ(s): Extra Assumption: x( ) is a deterministic function. The prior of β is normal N(µ β, Σ β ), and is independent of the prior on w( ), which is GP(, C α(, )). Based on this assumption, we can do a change of kernel: Č α(s, s ) = Cov { x(s ) β + w(s ), x(s ) β + w(s ) } = x(s ) Σ β x(s ) + C α(s, s ). Our theory results hold for the mean function x(s) β + w(s) in the full model, as long as all the previous conditions on C α is satisfied for the new kernel Č α. Since we only focus on the predictive loss on the mean function x(s) β + w(s), we do not need any identification condition for β and w( ).

30 Outline Background of GP D&C Bayes Method Theory Experiments Future Work

31 Simulation Model : Smooth Surface D = [, ] [, ]. f (s) = e (s ) + e.8(s+).5 sin{8(s +.)}, for s [, ]. The true surface is w (s) = f (s )f (s ), s = (s, s ) D. The response is y(s) = β + w (s) + ɛ, ɛ N(, σ ). β =, σ =.. We use the Matérn kernel with ν = /: C(s, s ) = τ exp ( φ s s ) for s, s D. σ InvGamma(,.), τ InvGamma(, ), φ Uniform(., ). In simulation scenarios, we use n = 4 and n = 6, respectively. The training locations are sampled uniformly over D. We use l = 5 testing locations, also sampled uniformly over D. The Bayes L risks (and other criteria) are estimated by averaging over the l testing locations. All methods are replicated for Monte Carlo simulations.

32 Methods Compared LatticeKrig (Nychka et al. 5 ) using the LatticeKrig package in R, with 3 number of resolutions. Locally approximated Gaussian process (lagp, Gramacy and Apley 5 ) using the lagp package in R with method set to nn. Full-rank Gaussian process (GP) using the spbayes package in R with the full data. Modified predictive process (MPP, Finley et al. 5 ) using the spbayes package in R with the full data. Nearest neighbor Gaussian process (NNGP, Datta et al. 5 ) using the spnngp package in R with the number of nearest neighbors (NN) as 5, 5, and 5. DISK + CMC (Scott et al. 6 ) + (Full GP or MPP). DISK + PIE (Li et al. 7 ) + (Full GP or MPP). The first two methods are frequentist; the others are all Bayesian.

33 Simulation Results When n = 4 Prediction MSE averaged over l = 5 locations lagp LatticeKrig Full GP MPP MPP (r = ) (r = 4) NNGP NN= 5 NN= 5 NN= DISK-CMC(MPP) k = k = k = 3 k = 4 k = 5 r = DISK-CMC(MPP) k = k = k = 3 k = 4 k = 5 r = DISK-PIE(GP) k = k = k = 3 k = 4 k = DISK-PIE(MPP) k = k = k = 3 k = 4 k = 5 r = DISK-PIE(MPP) k = k = k = 3 k = 4 k = 5 r =

34 Background of GP D&C Bayes Method True w* Theory DISK (GP, k = ): 5% Quantile Experiments DISK (GP, k = 3): 5% Quantile Future Work DISK (GP, k = 4): 5% Quantile..5. DISK (GP, k = ):.5% Quantile GP: 5% Quantile DISK (GP, k = ): 5% Quantile.5 DISK (GP, k = ): 97.5% Quantile..5. MPP (r = ): 5% Quantile DISK (MPP, r =, k = ):.5% Quantile DISK (MPP, r =, k = ): 5% Quantile.5 DISK (MPP, r =, k = ): 97.5% Quantile..5. MPP (r = 4): 5% Quantile DISK (MPP, r = 4, k = ):.5% Quantile DISK (MPP, r = 4, k = ): 5% Quantile.5 DISK (MPP, r = 4, k = ): 97.5% Quantile..5..5

35 Bias-Variance Tradeoff Bias Variance DISK (GP) DISK (MPP, r = ) DISK (MPP, r = 4) GP lagp l Risk k k k When n = 6, full data GP, LatticeKrig, MPP, NNGP are not applicable. Only lagp is left to compare with DISK. Prediction MSE averaged over l = 5 locations lagp DISK(MPP) DISK(MPP) (r = 4, k = 5) (r = 4, k = 5)..6.5

36 Simulation Results When n = 4 Parametric Inference and Predictive Interval Coverage β τ σ φ 95% PI Coverage Truth lagp LatticeKrig GP MPP (r = ) MPP (r = ) NNGP NN= NN= NN= DISK+PIE (GP) k = DISK+PIE (MPP) r =, k = DISK+PIE (MPP) r = 4, k =

37 Simulation Results When n = 6 Parametric Inference and Predictive Interval Coverage β τ σ φ Truth lagp DISK+PIE (MPP) r = 4, k = r = 6, k = MSPE 95% PI Coverage lagp..945 DISK+PIE (MPP) r = 4, k = r = 6, k = r is the number of knots. k is the number of subsets. Full GP, LatticeKrig, MPP, and NNGP fail to run for n = 6.

38 Simulation Model : Rough Surface D = [, ] [, ]. The true surface w ( ) is a rough sample path from the Matérn kernel with ν = /: C(s, s ) = τ exp ( φ s s ) for s, s D. We fit the model using the same kernel. The response is y(s) = β + w (s) + ɛ, ɛ N(, σ ). β =, σ =., τ =, φ = 9. σ InvGamma(,.), τ InvGamma(, ), φ Uniform(., 3). Training sample size n = 4 respectively. The training locations are sampled uniformly over D. Testing sample size l = 5, also sampled uniformly over D. The Bayes L risks (and other criteria) are estimated by averaging over the l testing locations. Only compare DISK(MPP) k = and lagp.

39 Results for Simulation Parametric Inference and Predictive Interval Coverage (k = for DISK) β σ τ φ Truth. 9 lagp DISK(MPP) r = (.96,.3) (.,.7) (.9,.3) (8.94, 9.68) r = (.96,.3) (.,.7) (.9,.3) (8.96, 9.73) MSPE 95% PI Coverage 95% PI Length lagp.5 (.). (.5).38 (.3) DISK(MPP) r =.9 (.).96 (.7) 3.97 (.773) r = 4.8 (.94).96 (.35) 3.86 (.77)

40 Sea Surface Temperature Data SST in the west coast of mainland U.S.A., Canada, and Alaska between 4 65 north latitudes and 8 west longitudes. The dataset is obtained from NODC World Ocean Database. wod.html We choose,, 8 spatial observations in the selected domain. 6 obs are used as training data, and 8 obs are used as testing data. Model: y(s) = β + β latitude + w(s) + ɛ(s), ɛ(s) N(, σ ). For DISK+CMC(MPP) and DISK+PIE(MPP), we show the results with r = 4 and k = 3.

41 Background of GP D&C Bayes Method Theory Experiments Future Work SST Data: Predicted Surfaces Truth 65 lagp 65 CMC:.5% Quantile 65 DISK:.5% Quantile CMC: 5% Quantile DISK: 5% Quantile CMC: 97.5% Quantile DISK: 97.5% Quantile I The PMSEs of CMC(MPP) and DISK(MPP) are almost the same, and both are comparable to lagp (a freq GP method). DISK(MPP) has wider predictive intervals. I Other Bayesian GP methods do not work, including MPP and NNGP.

42 SST Data: Numerical Results Parametric inference and prediction in SST data. DISK-CMC nd DISK-PIE use MPP-based modeling with 4 knots on k = 3 subsets. β β σ τ lagp DISK-CMC (3.9, 3.37) (-.36, -.34) (.78,.69) (.8,.) DISK-PIE (3.74, 3.95) (-.33, -.3) (.3,.45) (.8,.85) MSPE 95% PI Coverage 95% PI Length lagp.5 (.).95 (.).35 (.) DISK-CMC.4 (.).3 (.).4 (.) DISK-PIE.4 (.).95 (.).67 (.)

43 Outline Background of GP D&C Bayes Method Theory Experiments Future Work

44 Future Work Theory for the finite-dimensional parameters (α, σ ), and the joint distribution of both parametric and nonparametric parts. Theory for the frequentist coverage of DISK posterior credible intervals. The choice of partition distribution; The influence from the number of knots in MPP. Extension to spatial-temporal processes. Manuscript at arxiv:

45 Future Work Theory for the finite-dimensional parameters (α, σ ), and the joint distribution of both parametric and nonparametric parts. Theory for the frequentist coverage of DISK posterior credible intervals. The choice of partition distribution; The influence from the number of knots in MPP. Extension to spatial-temporal processes. Manuscript at arxiv: Thank you! & Questions?

arxiv: v1 [stat.me] 28 Dec 2017

arxiv: v1 [stat.me] 28 Dec 2017 A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging Raarshi Guhaniyogi, Cheng Li, Terrance D. Savitsky 3, and Sanvesh Srivastava 4 arxiv:7.9767v [stat.me] 8 Dec 7 Department of Applied Mathematics

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Sudipto Banerjee University of California, Los Angeles, USA Based upon projects involving: Abhirup Datta (Johns Hopkins University)

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models arxiv:1811.03735v1 [math.st] 9 Nov 2018 Lu Zhang UCLA Department of Biostatistics Lu.Zhang@ucla.edu Sudipto Banerjee UCLA

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing

More information

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data An Additive Gaussian Process Approximation for Large Spatio-Temporal Data arxiv:1801.00319v2 [stat.me] 31 Oct 2018 Pulong Ma Statistical and Applied Mathematical Sciences Institute and Duke University

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Geostatistical Modeling for Large Data Sets: Low-rank methods

Geostatistical Modeling for Large Data Sets: Low-rank methods Geostatistical Modeling for Large Data Sets: Low-rank methods Whitney Huang, Kelly-Ann Dixon Hamil, and Zizhuang Wu Department of Statistics Purdue University February 22, 2016 Outline Motivation Low-rank

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Spatial smoothing using Gaussian processes

Spatial smoothing using Gaussian processes Spatial smoothing using Gaussian processes Chris Paciorek paciorek@hsph.harvard.edu August 5, 2004 1 OUTLINE Spatial smoothing and Gaussian processes Covariance modelling Nonstationary covariance modelling

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O. Finley 1, Sudipto Banerjee 2, and Bradley P. Carlin 2 1 Michigan State University, Departments

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016 Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK Practical Bayesian Quantile Regression Keming Yu University of Plymouth, UK (kyu@plymouth.ac.uk) A brief summary of some recent work of us (Keming Yu, Rana Moyeed and Julian Stander). Summary We develops

More information

Student-t Process as Alternative to Gaussian Processes Discussion

Student-t Process as Alternative to Gaussian Processes Discussion Student-t Process as Alternative to Gaussian Processes Discussion A. Shah, A. G. Wilson and Z. Gharamani Discussion by: R. Henao Duke University June 20, 2014 Contributions The paper is concerned about

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets Thomas Romary, Nicolas Desassis & Francky Fouedjio Mines ParisTech Centre de Géosciences, Equipe Géostatistique

More information

Gaussian predictive process models for large spatial data sets.

Gaussian predictive process models for large spatial data sets. Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Spatio-temporal prediction of site index based on forest inventories and climate change scenarios

Spatio-temporal prediction of site index based on forest inventories and climate change scenarios Forest Research Institute Spatio-temporal prediction of site index based on forest inventories and climate change scenarios Arne Nothdurft 1, Thilo Wolf 1, Andre Ringeler 2, Jürgen Böhner 2, Joachim Saborowski

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Gaussian Multiscale Spatio-temporal Models for Areal Data

Gaussian Multiscale Spatio-temporal Models for Areal Data Gaussian Multiscale Spatio-temporal Models for Areal Data (University of Missouri) Scott H. Holan (University of Missouri) Adelmo I. Bertolde (UFES) Outline Motivation Multiscale factorization The multiscale

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Afternoon Meeting on Bayesian Computation 2018 University of Reading Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Multi-resolution models for large data sets

Multi-resolution models for large data sets Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation NORDSTAT, Umeå, June, 2012 Credits Steve Sain, NCAR Tia LeRud, UC Davis

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Bayesian Deep Learning

Bayesian Deep Learning Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference

More information

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation A Framework for Daily Spatio-Temporal Stochastic Weather Simulation, Rick Katz, Balaji Rajagopalan Geophysical Statistics Project Institute for Mathematics Applied to Geosciences National Center for Atmospheric

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Bayesian Modeling of Conditional Distributions

Bayesian Modeling of Conditional Distributions Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings

More information

Spatial Dynamic Factor Analysis

Spatial Dynamic Factor Analysis Spatial Dynamic Factor Analysis Esther Salazar Federal University of Rio de Janeiro Department of Statistical Methods Sixth Workshop on BAYESIAN INFERENCE IN STOCHASTIC PROCESSES Bressanone/Brixen, Italy

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information