Multivariate Output Analysis for Markov Chain Monte Carlo

Size: px
Start display at page:

Download "Multivariate Output Analysis for Markov Chain Monte Carlo"

Transcription

1 Multivariate Output Analysis for Marov Chain Monte Carlo Dootia Vats School of Statistics University of Minnesota James M. Flegal Department of Statistics University of California, Riverside Galin L. Jones School of Statistics University of Minnesota December 23, 2015 Abstract Marov chain Monte Carlo (MCMC) produces a correlated sample in order to estimate expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The ey to answering this question lies in assessing the Monte Carlo error through a multivariate Marov chain central limit theorem. However, the multivariate nature of this Monte Carlo error has largely been ignored in the MCMC literature. We provide conditions for consistently estimating the asymptotic covariance matrix via the multivariate batch means estimator. Based on this result, we present a relative standard deviation fixed-volume sequential stopping rule for terminating simulation. We show that this stopping rule is asymptotically equivalent to terminating when the effective sample size for the MCMC sample is above a desired threshold, giving an intuitive and theoretically justified approach to im- Research supported by the National Science Foundation. Research supported by the National Institutes of Health and the National Science Foundation. 1

2 plementing the proposed method. The finite sample properties of the proposed method are demonstrated in examples. 1 Introduction Marov chain Monte Carlo (MCMC) algorithms are used to estimate expectations with respect to a probability distribution when independent sampling is difficult. Typically, interest is in estimating a vector of quantities such as means, moments, and quantiles associated with a given probability distribution. However, analysis of the MCMC output routinely focuses on a univariate Marov chain central limit theorem (CLT). Thus standard convergence diagnostics, sequential stopping rules for termination, effective sample size definitions, and confidence intervals are univariate in nature. We present a methodological framewor for multivariate analysis of MCMC output. Let F be a probability distribution with support X and g : X R p be an F -integrable function such that θ := E F g is of interest. If {X t } is an F -invariant Harris recurrent Marov chain, set {Y t } = {g(x t )} and estimate θ with θ n = n 1 n t=1 Y t since θ n θ with probability 1 as n. Finite sampling leads to an unnown Monte Carlo error, θ n θ, and estimating this error is essential to assessing the quality of estimation. If, for δ > 0, g has 2 + δ moments under F and {X t } is geometrically ergodic, an approximate sampling distribution for the Monte Carlo error is available via a Marov chain CLT. That is, there exists a p p positive definite symmetric matrix, Σ, such that, as n, n(θn θ) d N p (0, Σ). (1) Assessment of the Monte Carlo error rests on estimating Σ. This, however, can be challenging since Σ = Var F (Y 1 ) + 2 Cov F (Y 1, Y 1+ ). =1 We consider the multivariate batch means (mbm) estimator of Σ and establish con- 2

3 ditions for its strong consistency. Using the mbm estimator we develop multivariate confidence regions, multivariate sequential termination rules, and multivariate effective sample size. Thus we provide a methodological framewor for multivariate MCMC output analysis. Univariate output analysis methods are motivated by the univariate CLT n(θn,i θ i ) d N(0, σ 2 i ), (2) as n, where θ n,i and θ i are the ith components of θ n and θ, respectively and σi 2 is the ith diagonal element of Σ. Flegal and Jones (2010), Hobert et al. (2002) and Jones et al. (2006) propose strongly consistent estimators of σi 2 using spectral variance methods, regenerative simulation, and batch means methods, respectively. Consistent estimators of the asymptotic variance in independent sampling are used to sequentially terminate simulation when the half-width of the confidence interval for θ i is below a tolerance level set a priori. In MCMC, this idea was formalized by Jones et al. (2006) via the fixed-width sequential stopping rule. For a desired tolerance of ɛ, simulation is terminated the first time, for all components σ n,i t + ɛi(n < n ) + n 1 ɛ, (3) n where σn,i 2 is a strongly consistent estimator of σ2 i, t is an appropriate t-distribution quantile, and n is a minimum simulation effort specified by the user. The fixed-width sequential stopping rule laid the foundation for termination based on quality of estimation rather than convergence of the Marov chain (for a review of convergence diagnostics see Cowles and Carlin, 1996). Fixed-width procedures are aimed at terminating simulation when the estimation is reliable in the sense that if the procedure is repeated again, the estimates will not be vastly different (Flegal et al., 2008). Implementing the fixed-width sequential stopping rule at (3) requires careful analysis 3

4 for choosing ɛ for each θ n,i. Since this can be challenging for large p, Flegal and Gong (2015) proposed relative fixed-width sequential stopping rules that terminate simulation relative to attributes of the estimation process. Specifically, they proposed a relative standard deviation fixed-width sequential stopping rule, where the simulation is terminated relative to the uncertainty in the target distribution. That is, if λ 2 n,i is the sample variance for the ith component of {Y t }, simulation is terminated the first time for all components σ n,i t + ɛλ n,i I(n < n ) + n 1 ɛλ n,i. (4) n Jones et al. (2006) and Flegal and Gong (2015) showed that the confidence intervals created at termination using fixed-width and relative standard deviation fixed-width sequential stopping rules are asymptotically valid, in that the confidence intervals created at termination have the right coverage probability as ɛ 0 when σ 2 n,i σ2 i with probability 1 as n. Thus, strong consistency of the estimators of σ 2 i is vital for valid inference using sequential termination rules. However, these termination rules are univariate in that, simulation is terminated when all components in θ n individually satisfy the termination rule. Clearly, termination is dictated by the component that mixes slowest, and cross-correlations between components are ignored. In addition, corrections for multiple testing result in conservative confidence intervals leading to delayed termination. We propose a multivariate approach to termination called the relative standard deviation fixed-volume sequential stopping rule. If Λ n is the sample covariance matrix of {Y t } and denotes determinant, then simulation is terminated the first time Volume of Confidence Region 1/p + ɛ Λ n 1/2p I(n < n ) + n 1 < ɛ Λ n 1/2p. Since the volume of the confidence region is proportional to the estimate of Σ an interpretation is that the simulation is terminated when the generalized variance of the Monte Carlo error is small relative to the estimated generalized variance with respect 4

5 to F ; that is, the estimate of Σ is small compared to the estimate of Cov F (Y 1 ). This is a multivariate generalization of the relative standard deviation fixed-width sequential stopping rule of Flegal and Gong (2015). We show that the relative standard deviation fixed-volume sequential stopping rule leads to asymptotically valid confidence regions. The bottlenec in demonstrating asymptotic validity is proposing estimators of Σ that are strongly consistent. To this end, we investigate the mbm estimator and provide conditions for strong consistency. Another common way to terminate simulation is to continue until the effective sample size (ESS) is greater than some W, set a priori. (see Drummond et al. (2006), Atinson et al. (2008), and Giordano et al. (2015) for a few examples). ESS is the equivalent number of independent and identically distributed samples as in the correlated sample. Terminating using ESS is intuitive since it is comparable to termination for independent sampling. Gong and Flegal (2015) showed that this method of termination is equivalent to terminating according to the relative standard deviation fixed-width stopping rule. However, we are unaware of any formal definition of multivariate ESS (mess). Since the samples {Y t } are vector-valued, it is common to calculate ESS for all components simultaneously. In Section 3.1, we introduce a definition of mess and present a strongly consistent estimator, mess. We show that terminating according to the relative standard deviation fixed-volume sequential stopping rule is asymptotically equivalent to terminating when mess W p,α,ɛ, (5) where 1 α represents the confidence level and ɛ is the precision specified in the relative standard deviation fixed-volume sequential stopping rule. Notice that W p,α,ɛ is only a function of the dimension of the estimation problem and the specified precision. Thus the user can specify a precision, calculate W p,α,ɛ, and simulate until (5) is satisfied. This sequential process is the same as terminating when the relative standard deviation fixed-volume sequential stopping rule is satisfied. 5

6 Generally, our proposed mess and relative standard deviation fixed-volume sequential stopping rule methods terminate earlier than simultaneous univariate methods since termination is dictated by the overall Marov chain and not by the component that mixes the slowest. In addition, since multiple testing is no longer an issue, the volume of the confidence region is considerably smaller even in moderate p problems. Using the inherent multivariate nature of the problem and acnowledging the existence of crosscorrelation in estimating effective sample size leads to a more realistic understanding of the estimation process. 1.1 An Illustrative Example For i = 1,..., K, let Y i be a binary response variable and X i = (x i1, x i2,..., x i5 ) be the observed predictors for the ith observation. Assume τ 2 is nown, ( ) Y i X i, β ind 1 Bernoulli 1 + e X, and β N iβ 5 (0, τ 2 I 5 ). (6) This simple hierarchical model results in an intractable posterior, F, on R 5. The dataset used is the logit dataset in the mcmc R pacage. The goal is to estimate the posterior mean of β, E F β. Thus g here is the identity function mapping to R 5. We implement a random wal Metropolis-Hastings algorithm with a multivariate normal proposal distribution N 5 (, I 5 ) where I 5 is the 5 5 identity matrix and the 0.35 scaling ensures an optimal acceptance probability as suggested by Roberts et al. (1997). We obtain an MCMC sample of size 10 5 and calculate the Monte Carlo estimate for E F β. The starting value for β is a random draw from the prior distribution. The covariance matrix Σ is estimated using the mbm estimator described in Section 2. We also implement the univariate batch means (ubm) methods described in Jones et al. (2006) to estimate σi 2, which captures the autocorrelation in each component while ignoring the cross-correlation. This cross-correlation is often significant as exhibited in Figure 1, and can only be captured by multivariate methods lie mbm. In Figure 2 we 6

7 Autocorrelation for β 1 Cross correlation for β 1 and β 3 ACF ACF Lag Lag Trace plot for β 1 Trace plot for β 3 β β e+00 4e+04 8e+04 0e+00 4e+04 8e+04 Iteration Iteration Figure 1: Autocorrelation plot for β 1, cross-correlation plot between β 1 and β 3 and trace plots for β 1 and β 3. The cross-correlation is ignored by univariate methods. Monte Carlo sample size = present 90% confidence regions created using mbm and ubm for β 1 and β 3 (for the purposes of this figure, we treat this as a problem in 2 dimensions). The boxes were created using ubm both corrected and uncorrected with Bonferroni and the ellipse was constructed using the mbm estimator. To assess the confidence regions, we verify their coverage probabilities over 1000 independent replications with Monte Carlo sample sizes in {10 4, 10 5, 10 6 }. The true posterior mean, (0.5706, , , , ), is obtained by taing the mean over 10 9 iterations. For each of the 1000 replications, it was noted whether the true posterior mean was present in the confidence region. In addition, the volume of the confidence region to the pth root was also observed. Table 1 summarizes the results. Note that though the uncorrected univariate methods provide the smallest confidence regions, their coverage probabilities are far from desirable. For a large enough Monte Carlo sample size, mbm produces 90% coverage probabilities with a systematically lower volume than ubm corrected with Bonferroni (ubm-bonferroni). This conclusion agrees with Figure 2, where due to the elliptical structure, mbm ignores a large chun of area 7

8 90% Confidence Region β β 1 Figure 2: Joint 90% confidence region for β 1 and β 3. The ellipse is generated using the mbm estimator, the dotted line using the uncorrected univariate BM estimator and dashed line the univariate BM estimator corrected by Bonferroni. Monte Carlo sample size is contained in the ubm-bonferroni confidence region. This example shows that even simple MCMC problems produce complex dependence structures within and across components of the samples. Ignoring this dependence structure leads to an incomplete understanding of the estimation process. Not only do we gain more information about the Monte Carlo error using multivariate methods, but avoid using conservative Bonferroni methods. It is evident that termination rules based on volume of confidence regions would generally terminate earlier for multivariate methods while extracting more information from the samples. The rest of the paper is organized as follows. In Section 2 we define the mbm estimator and provide conditions for strong consistency. In Section 3 we formally introduce a general class of relative fixed-volume sequential termination rules. In addition, we introduce a multivariate definition of ESS and relate it to sequential termination rules. In Section 4 we continue our implementation of the Bayesian logistic regression model and consider additional examples. We choose a vector autoregressive process of order 1, where convergence of the process can be manipulated. Specifically, we construct the process in such a way that one component mixes slowly, while the others are fairly well behaved. Such behavior is nown to occur often in hierarchical models with priors 8

9 Volume to the pth root n mbm ubm-bonferroni ubm 1e (7.94e-05) (9.23e-05) (6.48e-05) 1e (1.20e-05) (1.42e-05) (1.00e-05) 1e (1.70e-06) (2.30e-06) (1.60e-06) Coverage Probabilities 1e (0.0104) (0.0099) (0.0155) 1e (0.0103) (0.0090) (0.0156) 1e (0.0097) (0.0094) (0.0153) Table 1: Volume to the pth (p = 5) root and coverage probabilities for 90% confidence regions constructed using mbm, ubm uncorrected and ubm corrected for Bonferroni. Replications = 1000 and standard errors are indicated in parenthesis. on the variance components. The next example is that of a Bayesian Lasso where the posterior is in 51 dimensions. We also implement our output analysis methods for a fairly complicated Bayesian dynamic spatial temporal model. We conclude with a discussion in Section 5. 2 Multivariate Batch Means Estimator Let n = a n, where a n is the number of batches and is the batch size. For = 0,...,, define Ȳ := b 1 bn n t=1 Y bn+t. Then Ȳ is the mean vector for batch and the mbm estimator of Σ is given by Σ n = a n 1 =0 (Ȳ θ n ) (Ȳ θ n ) T. (7) When Y t is univariate, the batch means estimator has been well studied for MCMC problems (Flegal and Jones, 2010; Jones et al., 2006) and for steady state simulations (Damerdji, 1991; Glynn and Iglehart, 1990; Glynn and Whitt, 1991). Glynn and Whitt (1991) showed that the batch means estimator cannot be consistent for fixed batch size,. Damerdji (1991, 1995), Jones et al. (2006) and Flegal and Jones (2010) established its asymptotic properties including strong consistency and mean square consistency 9

10 when both the batch size and number of batches increases with n. The multivariate extension as in (7) was first introduced by Chen and Seila (1987). For steady-state simulation output Charnes (1995) and Muñoz and Glynn (2001) studied confidence regions for θ based on the mbm, however, the theoretical properties of the mbm remain unexplored. In Theorem 1, we present conditions for strong consistency of Σ n in estimating Σ for MCMC, but our results hold for more general processes. Our main assumption on the process is that of a strong invariance principle. Condition 1. Let denote the Euclidean norm and {B(t), t 0} be a p-dimensional multivariate Brownian motion. There exists a nonnegative increasing function γ on the positive integers, a sufficiently rich probability space Ω, an n 0 N, a p p lower triangular matrix L and a finite random variable D such that for almost all ω Ω and for all n > n 0, n(θ n θ) LB(n) < Dγ(n) w.p. 1 as n. (8) A strong law, CLT, and functional central limit theorem (FCLT) are consequences of Condition 1 which holds for independent processes (Beres and Philipp, 1979; Einmahl, 1989; Zaitsev, 1998), Martingale sequences (Eberlein, 1986), renewal processes (Horvath, 1984) and for φ-mixing and strongly mixing sequences (Dehling and Philipp, 1982; Kuelbs and Philipp, 1980). Using Kuelbs and Philipp (1980) results, Vats et al. (2015) established Condition 1 with γ(n) = n 1/2 λ, λ > 0 for polynomially ergodic Marov chains. As in the univariate case, the batch means estimator can be consistent only if the batch size increases with n. Condition 2. The batch size is an integer sequence such that and n/ as n where, and n/ are monotonically increasing. Condition 3. There exists a constant c 1 such that ( n bn n 1) c <. In Theorem 1, we establish strong consistency of Σ n. The proof is given in Appendix A.1. 10

11 Theorem 1. Let X be a geometrically ergodic Marov chain with invariant distribution F and let g be an F -measurable real-valued function such that E F g 2+δ <. Then (8) holds with γ(n) = n 1/2 λ for some λ > 0. Let Conditions 2 and 3 hold. If b 1/2 n (log n) 1/2 n 1/2 λ 0 as n, then Σ n Σ with probability 1 as n. Remar 1. The theorem holds more generally outside of the context of Marov chains for processes that satisfy Condition 1. The general statement of the theorem is provided in Appendix A.1. Remar 2. The value of λ depends on the mixing time of the Marov chain. For slowly mixing Marov chains λ is closer to 0 and for fast mixing chains λ is closer to 1/2. Remar 3. It is natural to consider = n ν for 0 < ν < 1. Then ν > 1 2λ is required to satisfy b 1/2 n (log n) 1/2 n 1/2 λ 0 as n. Hence, for fast mixing process (λ close to 1/2), smaller batch sizes suffice and similarly slow mixing processes require larger batch sizes. This reinforces our intuition that higher correlation calls for larger batch sizes. Remar 4. Since Σ is non singular Σ n should be non-singular, which requires a n > p. Thus larger Monte Carlo sample sizes will be required for higher dimensional estimation problems. An immediate consequence of the strong consistency of the mbm estimator is the convergence of its eigenvalues. The proof follows from Theorem 2 in Vats et al. (2015). Corollary 1. Let λ 1 > λ 2 > > λ p > 0 be the eigenvalues of Σ. Let ˆλ 1,..., ˆλ p be the p eigenvalues of Σ n such that ˆλ 1 > ˆλ 2 > > ˆλ p, then under conditions of Theorem 1, ˆλ λ w.p. 1 as n for all 1 p. 3 Termination Rules We consider multivariate sequential termination rules that lead to asymptotically valid confidence regions. Let F 1 α,p,an p denote the 1 α quantile of a central F distribution with p numerator degrees of freedom and a n p denominator degrees of freedom. A 11

12 100(1 α)% confidence region for θ is the set C α (n) = { θ R p : n(θ n θ) T Σ 1 n (θ n θ) < p(a } n 1) (a n p) F 1 α,p,a n p. Then C α (n) forms an ellipsoid in p dimensions oriented along the directions of the eigenvectors of Σ n. Letting denote determinant, the volume of C α (n) is Vol(C α (n)) = ( ) 2πp/2 p(an 1) p/2 pγ(p/2) n(a n p) F 1 α,p,a n p Σ n 1/2. (9) Since p is fixed and as n, Σ n Σ with probability 1, Vol(C α (n)) 0 with probability 1 as n. If ɛ > 0 and s(n) is a positive real valued function defined on the positive integers, then a fixed-volume sequential stopping rule terminates the simulation at the random time T (ɛ) = inf { } n 0 : Vol(C α (n)) 1/p + s(n) ɛ. (10) Glynn and Whitt (1992) showed that terminating at T (ɛ) yields confidence regions that are asymptotically valid in that, as ɛ 0, Pr [θ C α (T (ɛ))] 1 α if (a) a FCLT holds, (b) s(n) = o(n 1/2 ) as n, and (c) Σ n Σ with probability 1 as n. A FCLT holds under Condition 1 and our Theorem 1 establishes consistency of mbm. Next (b) will be satisfied if we set s(n) = ɛi(n < n ) + n 1, which ensures simulation does not terminate before n iterations. Also, n should satisfy a n > p so that Σ n is positive definite; recall Remar 4. The sequential stopping rule (10) can be difficult to implement in practice since the choice of ɛ depends on the units of θ, and has to be carefully chosen for every application. We consider a generalization of (10) which can be used more naturally and which we will show connects nicely to the idea of a multivariate effective sample size. Let K(Y, p) > 0 be an attribute of the estimation process and suppose K n (Y, p) > 0 is an estimator of K(Y, p); for example tae θ = K(Y, p) and θ n = K n (Y, p). Set 12

13 s(n) = ɛk n (Y, p)i(n < n ) + n 1 and define T (ɛ) = inf { } n 0 : Vol(C α (n)) 1/p + s(n) ɛk n (Y, p). (11) The following result establishes asymptotic validity of this termination rule. The proof is provided in Appendix A.2. Theorem 2. Suppose a FCLT holds. If K n (Y, p) K(Y, p) w.p. 1 and Σ n Σ w.p. 1 as n, then, as ɛ 0, T (ɛ) and Pr [θ C α (T (ɛ))] 1 α. Suppose K(Y, p) = Λ 1/2p := Cov F Y 1 1/2p and if Λ n is the usual sample covariance matrix for {Y t }, set K n (Y, p) = Λ n. Then K n (Y, p) K(Y, p) with probability 1 as n and T (ɛ) is the first time the uncertainty in estimation (measured via the volume of the confidence region) is an ɛth fraction of the uncertainty in the target distribution. The relative standard deviation fixed-volume sequential stopping rule is formalized as terminating at T SD (ɛ) = inf { n 0 : Vol(C α (n)) 1/p + ɛ Λ n 1/2p I(n < n ) + n 1 ɛ Λ n 1/2p}. (12) Note that Λ n is positive definite as long as n is larger than p, so Λ n 1/2p > Relation to Effective Sample Size Kass et al. (1998), Liu (2008), and Robert and Casella (2013) define ESS for the ith component of the process as ESS i = n 1 + (i) =1 ρ(y 1, Y (i) 1+ ), where ρ(y (i) 1, Y (i) 1+ ) = ρ i() is the lag correlation for the ith component of Y. It is challenging to estimate ρ i () consistently, especially for larger. Alternatively, Gong 13

14 and Flegal (2015) rewrite the above definition as, ESS i = n λ2 i σi 2, where σ 2 i be the ith diagonal element of Σ and λ 2 i be the ith diagonal element of Λ. Using this formulation, a consistent estimate of ESS i is obtained by using consistent estimators of λ 2 i and σ2 i via the sample variance and univariate batch means estimators, respectively. The R pacage coda estimates ESS i by using the sample variance to estimate λ 2 i but estimates σ 2 i by determining the spectral density at frequency zero of an approximating autoregressive process. Thus, it does not estimate σ 2 i directly. The R pacage mcmcse uses the univariate batch means method to estimate σ 2 i. The estimate of ESS i is then, ÊSS i = n λ2 n,i σn,i 2, where ˆλ 2 i is the sample variance and σn,i 2 is the batch means estimator. Then ÊSS i is a consistent estimate of univariate ESS i. However in both of these pacages, ESS i is estimated for each component separately, and a conservative estimate of the overall ESS is taen to be the minimum of all ESS i. This again leads to the situation where the estimate of ESS is dictated by the component that mixes the slowest, while ignoring the behavior of the other components. A natural definition of multivariate ESS is mess = n ( ) Λ 1/p. (13) Σ When p = 1, mess reduces to the form of ESS presented above. In (13), we replace the concept of variance with generalized variance. For a random vector, the determinant of its covariance matrix is called its generalized variance. This was first introduced by Wils (1932) as a univariate measure of spread for a multivariate distribution. Wils 14

15 (1932) recommended the use of the pth root of the generalized variance. This was formalized by SenGupta (1987) as the standardized generalized variance in order to compare variability over different dimensions. Due to the strong consistency of mbm established in Theorem 1, a strongly consistent estimator for ESS is, mess = n ( ) Λn 1/p. (14) Σ n Rearranging the defining inequality in (12) yields that when n n ( mess 2π p/2 pγ(p/2) ) 1/p (p(an ) 1) 1/2 a n p F 1 α,p,a n p + 2 2/p ( ) π p(an 1) 1 (pγ(p/2)) 2/p a n p F 1 α,p,a n p ɛ 2. 1 n 1/2 Σ n 1/2p 2 1 ɛ 2 Thus, the relative standard deviation fixed-volume sequential stopping rule is equivalent to terminating the first time mess is larger than a lower bound. This lower bound is a function of n and thus is difficult to determine before starting the simulation. However, as n, the scaled F distribution converges to a χ 2, leading to the following approximation mess 2 2/p π (pγ(p/2)) 2/p χ 2 1 α,p ɛ 2. Thus in order to determine termination, a user can a priori decide to simulate until W effective samples are obtained. This choice of W will correspond to some α and ɛ and thus be an asymptotically valid termination rule. The choice of W should be made eeping in mind that the samples obtained are only approximately from the invariant distribution F, and this does not change in finite time. Thus the choice of W should be conservative to begin with and be bigger for larger p. Example 1. Suppose p = 5 (as in the logistic regression setting of Section 1.1) and that we want a precision of ɛ =.05 (so the desired computational uncertainty is a.05th fraction of the uncertainty in the target distribution) for a 95% confidence region. This 15

16 requires mess On the other hand, if we simulate until mess = 10000, we obtain a precision of ɛ = Univariate Termination Rules In this section we formally present the univariate termination rules we will implement in Section 4 as a comparison to the multivariate rules. Recall that for the ith component of θ, θ n,i is the Monte Carlo estimate for θ i, and σ 2 i is the asymptotic variance in the univariate CLT. Let σ 2 n,i be the univariate batch means estimator of σ2 i and let λ 2 n,i be the sample variance for the ith component. Common practice is to terminate simulation when all components satisfy a termination rule. We focus on the relative standard deviation fixed-width sequential stopping rules. Due to multiple testing, a Bonferroni correction is often used. We will refer to the uncorrected method as the ubm method and the corrected method as the ubm-bonferroni. To create 100(1 α)% univariate uncorrected confidence intervals, the relative standard deviation fixed-width rule terminates the first time, { ( 1 max,...,p λ n,i σ n,i 2t 1 α/2,an 1 + ɛλ n,i I(n < n ) + 1 )} ɛ. n n To create 100(1 α)% univariate corrected confidence intervals, the relative standard deviation fixed-width rule terminates the first time, { ( 1 max,...,p λ n,i σ n,i 2t 1 α/2p,an 1 + ɛλ n,i I(n < n ) + 1 )} ɛ. n n Remar 5. If it is unclear which features of F are of interest a priori, i.e., g is not nown before simulation, one can implement Scheffe s simultaneous confidence intervals. This method can also be used to create univariate confidence intervals for arbitrary contrasts a posteriori. Scheffe s simultaneous intervals are boxes on the coordinate axes that 16

17 bound the confidence ellipsoids. For all a T θ, where a R p, a T θ n ± a T Σ n a p() (a n p) F 1 α,p,a n p will have simultaneous coverage probability 1 α. Since we generally are not interested in all possible linear combinations, Scheffe s intervals are often too conservative. In fact, if the number of confidence intervals is less than p, Scheffe s intervals are more conservative than Bonferroni corrected intervals. 4 Examples In each of the following examples we present a target distribution F, a Marov chain with F as its invariant distribution, we specify g, and are interested in estimating E F g. We consider the finite sample performance (based on 1000 independent replications) of the relative standard deviation fixed-volume sequential stopping rules. In each case we use 90% confidence regions and various choices of ɛ and specify our choice of n and. 4.1 Bayesian Logistic Regression We continue the Bayesian logistic regression example introduced in Section 1.1. Recall that a random wal Metropolis-Hastings algorithm was implemented to sample from the intractable posterior. We prove the chain is geometrically ergodic in Appendix A.3. Theorem 3. The random wal based Metropolis-Hastings algorithm with invariant distribution given by the posterior from (6), is geometrically ergodic. As a consequence of Theorem 3 and that F has a moment generating function, a FCLT and thus a CLT holds. This ensures the validity of Theorems 1 and 2. Motivated by the ACF plot in Figure 1, was set to n 1/2 and n = For calculating coverage probabilities, we declare the truth as the posterior mean from an independent simulation of length The results are presented in Table 2. As before, 17

18 mbm ubm-bonferroni ubm Termination Iteration ɛ = (196) (391) (213) ɛ = (1158) (1880) (1036) ɛ = (1837) (7626) (3150) Effective Sample Size ɛ = (9) 9270 (13) 4643 (7) ɛ = (51) (65) (36) ɛ = (110) (271) (116) Coverage Probabilities ɛ = (0.0099) (0.0091) (0.0157) ɛ = (0.0097) (0.0090) (0.0155) ɛ = (0.0098) (0.0097) (0.0155) Table 2: Bayesian Logistic: Over 1000 replications, we present termination iterations, effective sample size at termination and coverage probabilities at termination for each corresponding method. Standard errors are in parenthesis. the univariate uncorrected method has less than desired coverage probabilities. For ɛ = 0.02 and 0.01, the coverage probabilities for both the mbm and ubm-bonferroni regions are at 90%. However, termination for mbm is significantly earlier. This is a consequence of two things. First, accounting for the multivariate nature of the problem allows us to capture more information from a smaller sample; this is reflected in the estimates of ESS. Second, using a Bonferroni correction gives larger confidence regions, leading to delayed termination. Recall that due to Theorem 2, as ɛ decreases to zero, the coverage probability of confidence regions created at termination using the relative standard deviation fixed-volume sequential stopping rule converges to the nominal level. This is demonstrated in Figure 3 where we present the coverage probability over 1000 replications as ɛ increases (or ɛ decreases). Notice that the increase in coverage probabilities need not be monotonic due to the randomness of the underlying process. To illustrate the convergence of the eigenvalues of the mbm estimator, we plot the condition number of the estimate under the spectral norm. That is, we report ˆλ max /ˆλ min 18

19 Coverage Probability as ε Decreases Coverage Probability ε Figure 3: Bayesian Logistic: Plot of coverage probability with confidence bands as ɛ decreases at 90% nominal rate. Replications = Condition Number by Iteration Condition Number log 10 (Iteration) Figure 4: Bayesian Logistic: Plot of condition number under spectral norm with confidence bands versus Monte Carlo sample size. Replications = over varying Monte Carlo sample sizes, and present the plot in Figure 4. In Figure 4, the condition number for the mbm estimator decreases with an increase in sample size indicating better conditioning at large simulation sizes. 4.2 Vector Autoregressive Process Consider the vector autoregressive process of order 1 (VAR(1)). For t = 1, 2,..., Y t = ΦY t 1 + ɛ t, 19

20 where Y t R p, Φ is a p p matrix, ɛ t iid N p (0, Ω), where Ω is a p p positive definite symmetric matrix. The matrix Φ determines the nature of the autocorrelation. This Marov chain has invariant distribution F = N p (0, V ) where V is such that vec(v ) = (I p 2 Φ Φ) 1 vec(ω), denotes the Kronecer product and is geometrically ergodic when the spectral radius of Φ is less than 1 (Tjøstheim, 1990). Consider the goal of estimating the mean of F, E F Y = 0 with Ȳn. Then Σ = (I Φ) 1 V + V (I Φ) 1 V. Let p = 5, Φ = diag(.9,.5,.1,.1,.1), and Ω be the AR(1) covariance matrix with autocorrelation 0.9. Since the first eigenvalue of Φ is large, the first component mixes slowest. This component will dictate the primary orientation of the confidence ellipsoids. We sample the process for 10 5 iterations and in Figure 5 present the autocorrelation (ACF) plot for Y (1) and Y (3) and the cross-correlation (CCF) plot between Y (1) and Y (3) in addition to the trace plot for Y (1). Notice that Y (1) has larger significant lags than Y (3) and there is significant cross-correlation between Y (1) and Y (3). Figure 6 displays joint confidence regions for Y (1) and Y (3). Recall that the true mean is (0, 0), and is present in all three regions, but the ellipse produced by mbm has significantly smaller volume than the ubm boxes. The orientation of the ellipse is determined by the cross-correlations witnessed in Figure 5. We set n = 1000, = n 1/3, ɛ in {0.05, 0.02, 0.01} and at termination of each method, calculate the coverage probabilities and effective sample size. Results are presented in Table 3. Note that as ɛ decreases, termination time increases and coverage probabilities tend to 90% nominal for each method (due to asymptotic validity). Also note that the uncorrected Bonferroni methods produce confidence regions with undesirable coverage probabilities and thus, are not of interest. Consider ɛ =.02 in Table 3. Termination for mbm is at 8.8e4 iterations compared to 9.6e5 for ubm-bonferroni. However, the estimates for multivariate ESS at 8.8e4 iterations is 4.7e4 samples compared to univari- 20

21 Autocorrelation for Y (1) Autocorrelation for Y (3) ACF ACF Lag Lag Cross correlation for Y (1) and Y (3) Trace plot for Y (1) ACF Y (1) Lag 0e+00 4e+04 8e+04 Time Figure 5: VAR: ACF plot for Y (1) and Y (3) and the CCF plot between Y (1) and Y (3) in addition to the trace plot for Y (1). The observed cross-correlation is completely ignored by univariate methods. Monte Carlo sample size is ate ESS of 5.6e4 samples for 9.6e5 iterations. This is because the leading component Y (1) mixes much slower than the other components, and defines the behavior of the univariate ESS. A small study presented in Table 4 elaborates on this behavior. Over 100 replications of Monte Carlo sample sizes 10 5 and 10 6, we present the mean estimate of ESS using multivariate and univariate methods. The estimate of ESS for the first component is significantly smaller than all other components leading to a conservative univariate estimate of ESS. 4.3 Bayesian Lasso Let y be an K 1 response vector and X be an K q matrix of predictors. We consider the following Bayesian Lasso formulation of Par and Casella (2008). y β, σ 2, τ 2 N K (Xβ, σ 2 I n ) β σ 2, τ 2 N q (0, σ 2 D τ ) where D τ = diag(τ 2 1, τ 2 2,..., τ 2 q ) 21

22 90% Confidence Region Y Y 1 Figure 6: VAR: Joint 90% confidence region for first two components of Y. Solid ellipse is generated using mbm estimator, dotted the ubm estimator uncorrected and dashed line the ubm estimator corrected by Bonferroni. Monte Carlo sample size is σ 2 Inverse-Gamma(α, ξ) ( ) τj 2 iid λ 2 Exponential 2 for j = 1,..., q, where λ, α, and ξ are fixed and the Inverse-Gamma(a, b) distribution is x a 1 e b/x. We use a deterministic scan Gibbs sampler to draw approximate samples from the posterior; see Khare and Hobert (2013) for a full description of the algorithm. Khare and Hobert (2013) showed that for K 3, this Gibbs sampler is geometrically ergodic for arbitrary q, X and λ. We fit this model to the cooie dough dataset of Osborne et al. (1984). The data was collected to test the feasibility of near infra-red (NIR) spectroscopy for measuring the composition of biscuit dough pieces. There are 72 observations and the response variable is taen as the amount of dry flour content measured and the predictor variables are 25 measurements of spectral data spaced equally between 1100 to 2498 nanometers. Since we are interested in estimating the posterior mean for (β, τ 2, σ 2 ), p = 51. The data is available in the R pacage ppls, and the Gibbs sampler is implemented in function blasso in R pacage monomvn. The truth was declared by averaging posterior means from 1000 parallel chains of length 1e6. We set n = 2e4 and = n 1/3. 22

23 mbm ubm-bonferroni ubm Termination Iteration ɛ = (10) (93) (52) ɛ = (23) (286) (617) ɛ = (454) (0) (902) Effective Sample Size ɛ = (6) 9093 (6) 4542 (3) ɛ = (21) (25) (29) ɛ = (217) (56) (51) Coverage Probabilities ɛ = (0.0101) (0.0072) (0.0136) ɛ = (0.0102) (0.0074) (0.0134) ɛ = (0.0095) (0.0075) (0.0131) Table 3: VAR: Over 1000 replications, we present termination iterations, effective sample size at termination and coverage probabilities at termination for each corresponding method. Standard errors are in parentheses. n mess ESS 1 ESS 2 ESS 3 ESS 4 ESS 5 1e (71) 5447 (41) (284) (664) (639) (647) 1e (332) (256) (1665) (3801) (3639) (3845) Table 4: VAR: Effective sample sample (ESS) estimated using proposed multivariate method and the univariate method of Gong and Flegal (2015) for Monte Carlo sample sizes of n = 1e5, 1e6 and 100 replications. Standard errors are in parentheses. In Table 5 we present termination results from 1000 replications. With p = 51, the uncorrected univariate regions produce confidence regions with very low coverage probabilities. The ubm-bonferroni and mbm provide competitive coverage probabilities at termination. However, termination for mbm is significantly earlier than univariate methods over all values of ɛ. For ɛ =.05 and.02 we observe zero standard error for termination using mbm since termination is achieved at the same 10% increment over all 1000 replications. Thus the variability in those estimates is less than 10% of the size of the estimate. Aside from the conservative multiple testing correction, delayed termination for the univariate methods is also due to conservative univariate estimates of ESS. Figure 7 23

24 mbm ubm-bonferroni ubm Termination Iteration ɛ = (0) (76) (7) ɛ = (0) (664) (103) ɛ = (393) (431) (332) Effective Sample Size ɛ = (4) (15) 4778 (6) ɛ = (8) (122) (24) ɛ = (283) (163) (74) Coverage Probabilities ɛ = (0.0096) (0.0097) (0.0031) ɛ = (0.0098) (0.0093) (0.0030) ɛ = (0.0096) (0.0081) (0.0030) Table 5: Bayesian Lasso: Over 1000 replications, we present termination iterations, effective sample size at termination and coverage probabilities at termination for each corresponding method. Standard errors are in parentheses. shows the ESS using multivariate and univariate methods at termination over 1000 replications for ɛ = The multivariate ESS at termination is more conservative than the univariate ESS at ubm-bonferroni termination, but the iterations to termination is significantly lower for mbm. 4.4 Bayesian Dynamic Spatial-Temporal Model Gelfand et al. (2005) propose a Bayesian hierarchical model for modeling univariate and multivariate dynamic spatial data viewing time as discrete and space as continuous. The methods in their paper have been implemented in the R pacage spbayes. We present a simpler version of the dynamic model as described by Finley et al. (2015). Let s = 1, 2,..., N s be location sites and t = 1, 2,..., N t be time-points. Let the observed measurement at location s and time t be denoted by y t (s). In addition, let x t (s) be the q 1 vector of predictors, observed at location s and time t, and β t be the 24

25 Sample Size in Thousands Termination Iteration ESS at Termination mbm ubm Bonferroni ubm Figure 7: Bayesian Lasso: Plot of termination iteration and ESS for mbm, ubm-bonferroni and ubm for ɛ =.02. Replications = q 1 vector of coefficients. For t = 1, 2,..., N t, y t (s) = x t (s) T β t + u t (s) + ɛ t (s), ɛ t (s) ind N(0, τ 2 t ); (15) β t = β t 1 + η t, η t iid N(0, Σ η ); u t (s) = u t 1 (s) + w t (s), w t (s) ind GP (0, σ 2 t ρ( ; φ t )), (16) where GP (0, σt 2 ρ( ; φ t )) denotes a spatial Gaussian process with covariance function σt 2 ρ( ; φ t ). Here, σt 2 denotes the spatial variance component and ρ(, φ t ) is the correlation function with exponential decay. Equation (15) is referred to as the measurement equation and ɛ t (s) denotes the measurement error, assumed to be independent of location and time. Equation (16) contains the transition equations which emulate the Marovian nature of dependence in time. To complete the Bayesian hierarchy, the following priors are assumed β 0 N(m 0, C 0 ) and u 0 (s) 0; τ 2 t IG(a τ, b τ ) and σ 2 t IG(a s, b s ); 25

26 Σ η IW(a η, B η ) and φ t Unif(a φ, b φ ), where IW is the inverse-wishart distribution Σ η aη+q+1 2 e 1 2 tr(bησ 1 η ) and IG(a, b) is the inverse-gamma distribution x a 1 e b/x. We fit the model to the NETemp dataset in the spbayes pacage. This dataset contains monthly temperature measurements from 356 weather stations on the east coast of USA collected from January 2000 to December The elevation of the weather stations is also available as a covariate. We choose a subset of the data with 10 weather stations for the year 2000, and fit the model with intercept. The resulting posterior has p = 185 components. A conditional Metropolis-Hastings sampler is described in Gelfand et al. (2005) and implemented in the spdynlm function. Default hyper parameter settings were used. The rate of convergence for this sampler has not been studied; thus we do not now if the sampler is geometrically ergodic. Our goal is to estimate the posterior expectation of θ = (β t, u t (s), σ 2 t, Σ η, τ 2 t, φ t ). For the calculation of coverage probabilities, 1000 parallel runs of a 2e6 MCMC sample were averaged and declared as the truth. We set = n 1/2 and n = 5e4 so that a n > p to ensure positive definitiveness of Σ n. Due to the Marovian transition equations in (16), the β t and u t exhibit a significant covariance structure in the posterior distribution. This is evidenced in Figure 8 where for Monte Carlo sample size n = 10 5, we present confidence regions for β (0) 1 and β (0) 2 - the intercept coefficient for the first and second months, and for u 1 (1) and u 2 (1) - the additive spatial coefficient for the first two weather stations. The thin ellipses indicate that the principle direction of variation is due to the correlation between the components. This significant reduction in volume, along with the conservative Bonferroni correction (p = 185) results in increased delay in termination. For smaller values of ɛ it was not possible to store the MCMC output in memory on a 8 gigabyte machine using ubm-bonferroni methods. As a result in Table 6, the univariate methods could not be implemented for smaller ɛ values. For ɛ =.10, termination for mbm was at n = 5e4 for every replication. At 26

27 90% Confidence Region 90% Confidence Region β 2 (0) u 2(1) β 1 (0) u 1(1) Figure 8: Bayesian Spatial: 90% confidence regions for β (0) 1 and β (0) 2 and u 1 (1) and u 2 (1). Monte Carlo sample size = Coverage Probability as ε Decreases Coverage Probability ε Figure 9: Bayesian Spatial: Plot of ɛ versus observed coverage probability for mbm estimator over 1000 replications with = n 1/2. these minimum iterations, the coverage probability for mbm is at 88%, whereas both the univariate methods have far lower coverage probabilities at 0.62 for ubm-bonferroni and for ubm. The coverage probabilities for the uncorrected methods are quite small since we are maing 185 confidence regions simultaneously. In Figure 9, we illustrate asymptotic validity of the confidence regions constructed using the relative standard deviation fixed-volume sequential stopping rule. We present the observed coverage probabilities over 1000 replications for several values of ɛ. 27

28 mbm ubm-bonferroni ubm Termination Iteration ɛ = (0) (28315) (491) ɛ = (12) (2178) ɛ = (174) - - Effective Sample Size ɛ = (20) 3184 (75) 1130 (1) ɛ = (20) (4) ɛ = (97) - - Coverage Probabilities ɛ = (0.0102) (0.0153) (0.0026) ɛ = (0.0102) (0.0040) ɛ = (0.0108) - - Table 6: Bayesian Spatial: Over 1000 replications, we present termination iteration, effective sample size at termination and coverage probabilities at termination for each corresponding method at 90% nominal levels. Standard errors are in parentheses. 5 Discussion Multivariate analysis of MCMC output data has received little attention thus far. Seila (1982) and Chen and Seila (1987) built a framewor for multivariate analysis for a Marov chain using regenerative simulation. Since establishing regenerative properties for a Marov chain requires a separate analysis for every problem and will not wor well in component-wise Metropolis-Hastings samplers, the application of their wor is limited. More recently, Vats et al. (2015) showed strong consistency of the multivariate spectral variance estimators of Σ, (msve) which could potentially substitute for the mbm estimator in applications to termination rules. However, outside of toy problems where p is small, the msve estimator is computationally demanding compared to the mbm estimator. To compare these two estimators, we extend the VAR(1) example discussed in Section 4.2, to p = 50. Since in this case we now the true Σ, we assess the relative error in estimation ˆΣ Σ F / Σ F where ˆΣ is either the mbm or msve estimator and F is the Frobenius norm. Results are in Table 7. The msve estimator overall has smaller relative error, but as Monte Carlo sample size increases, computation 28

29 Method n Relative Error Computation Time mbm 1e ( ) (1.5e-05) msve 1e ( ) (2.1e-05) mbm 1e ( ) (1.7e-05) msve 1e ( ) (1.8e-04) mbm 1e ( ) (8.6e-05) msve 1e ( ) ( ) Table 7: Relative error and computation time (in seconds) comparison between msve and mbm for a VAR(1) model with p = 50 over varying Monte Carlo samples sizes time is significantly higher than the mbm estimator. Also note that the relative error for both estimators decreases with an increase in Monte Carlo sample size. The msve, mbm estimators, and the multivariate ESS have been implemented in the mcmcse R pacage (Flegal et al., 2015). Through our examples in Section 4, we present a variety of settings where multivariate estimation leads to improved analysis when compared to univariate methods. The univariate methods provide a fundamental framewor to build on, but are not equipped for analyzing multivariate output, even when the dimension of the problem is as small as 5. This is because there are two layers of conservative methods required to implement the relative standard deviation fixed-width sequential stopping rule; (i) the need for the termination rule to be satisfied for each component and (ii) a correction to adjust for multiple testing. As demonstrated in our examples, uncorrected methods have less than desired coverage probabilities even for problems with small p. For large p uncorrected methods provide confidence regions with coverage probabilities very close to zero. Thus, a correction is essential when using univariate methods. References Atinson, Q. D., Gray, R. D., and Drummond, A. J. (2008). mtdna variation predicts population size in humans and reveals a major southern asian chapter in human prehistory. Molecular Biology and Evolution, 25(2):

30 Beres, I. and Philipp, W. (1979). Approximation theorems for independent and wealy dependent random vectors. The Annals of Probability, 7: Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New Yor. Charnes, J. M. (1995). Analyzing multivariate output. In Proceedings of the 27th conference on Winter simulation, pages IEEE Computer Society. Chen, D.-F. R. and Seila, A. F. (1987). Multivariate inference in stationary simulation using batch means. In Proceedings of the 19th conference on Winter simulation, pages ACM. Cowles, M. K. and Carlin, B. P. (1996). Marov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91: Csörgő, M. and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic Press. Damerdji, H. (1991). Strong consistency and other properties of the spectral variance estimator. Management Science, 37: Damerdji, H. (1994). Strong consistency of the variance estimator in steady-state simulation output analysis. Mathematics of Operations Research, 19: Damerdji, H. (1995). Mean-square consistency of the variance estimator in steady-state simulation output analysis. Operations Research, 43(2): Dehling, H. and Philipp, W. (1982). Almost sure invariance principles for wealy dependent vector-valued random variables. The Annals of Probability, 10: Drummond, A. J., Ho, S. Y., Phillips, M. J., Rambaut, A., et al. (2006). Relaxed phylogenetics and dating with confidence. PLoS biology, 4(5):699. Eberlein, E. (1986). On strong invariance principles under dependence assumptions. The Annals of Probability, 14:

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

Package mcmcse. July 4, 2017

Package mcmcse. July 4, 2017 Version 1.3-2 Date 2017-07-03 Title Monte Carlo Standard Errors for MCMC Package mcmcse July 4, 2017 Author James M. Flegal , John Hughes , Dootika Vats ,

More information

arxiv: v1 [stat.co] 21 Mar 2014

arxiv: v1 [stat.co] 21 Mar 2014 A practical sequential stopping rule for high-dimensional MCMC and its application to spatial-temporal Bayesian models arxiv:1403.5536v1 [stat.co] 21 Mar 2014 Lei Gong Department of Statistics University

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Markov Chain Monte Carlo: Can We Trust the Third Significant Figure?

Markov Chain Monte Carlo: Can We Trust the Third Significant Figure? Markov Chain Monte Carlo: Can We Trust the Third Significant Figure? James M. Flegal School of Statistics University of Minnesota jflegal@stat.umn.edu Murali Haran Department of Statistics The Pennsylvania

More information

A GENERAL FRAMEWORK FOR THE ASYMPTOTIC VALIDITY OF TWO-STAGE PROCEDURES FOR SELECTION AND MULTIPLE COMPARISONS WITH CONSISTENT VARIANCE ESTIMATORS

A GENERAL FRAMEWORK FOR THE ASYMPTOTIC VALIDITY OF TWO-STAGE PROCEDURES FOR SELECTION AND MULTIPLE COMPARISONS WITH CONSISTENT VARIANCE ESTIMATORS Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. A GENERAL FRAMEWORK FOR THE ASYMPTOTIC VALIDITY OF TWO-STAGE PROCEDURES

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O. Finley 1, Sudipto Banerjee 2, and Bradley P. Carlin 2 1 Michigan State University, Departments

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in

More information

Econ 424 Time Series Concepts

Econ 424 Time Series Concepts Econ 424 Time Series Concepts Eric Zivot January 20 2015 Time Series Processes Stochastic (Random) Process { 1 2 +1 } = { } = sequence of random variables indexed by time Observed time series of length

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Spatio-temporal precipitation modeling based on time-varying regressions

Spatio-temporal precipitation modeling based on time-varying regressions Spatio-temporal precipitation modeling based on time-varying regressions Oleg Makhnin Department of Mathematics New Mexico Tech Socorro, NM 87801 January 19, 2007 1 Abstract: A time-varying regression

More information

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Output analysis for Markov chain Monte Carlo simulations

Output analysis for Markov chain Monte Carlo simulations Chapter 1 Output analysis for Markov chain Monte Carlo simulations James M. Flegal and Galin L. Jones (October 12, 2009) 1.1 Introduction In obtaining simulation-based results, it is desirable to use estimation

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration

Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Bayesian spatial hierarchical modeling for temperature extremes

Bayesian spatial hierarchical modeling for temperature extremes Bayesian spatial hierarchical modeling for temperature extremes Indriati Bisono Dr. Andrew Robinson Dr. Aloke Phatak Mathematics and Statistics Department The University of Melbourne Maths, Informatics

More information

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Stochastic process for macro

Stochastic process for macro Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Practical unbiased Monte Carlo for Uncertainty Quantification

Practical unbiased Monte Carlo for Uncertainty Quantification Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

1 Random walks and data

1 Random walks and data Inference, Models and Simulation for Complex Systems CSCI 7-1 Lecture 7 15 September 11 Prof. Aaron Clauset 1 Random walks and data Supposeyou have some time-series data x 1,x,x 3,...,x T and you want

More information

Online Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve

Online Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve Online Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve Joshua CC Chan Australian National University Gary Koop University of Strathclyde Simon M Potter

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Recall that the AR(p) model is defined by the equation

Recall that the AR(p) model is defined by the equation Estimation of AR models Recall that the AR(p) model is defined by the equation X t = p φ j X t j + ɛ t j=1 where ɛ t are assumed independent and following a N(0, σ 2 ) distribution. Assume p is known and

More information

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Computation in Color-Magnitude Diagrams

Bayesian Computation in Color-Magnitude Diagrams Bayesian Computation in Color-Magnitude Diagrams SA, AA, PT, EE, MCMC and ASIS in CMDs Paul Baines Department of Statistics Harvard University October 19, 2009 Overview Motivation and Introduction Modelling

More information

Model Selection for Geostatistical Models

Model Selection for Geostatistical Models Model Selection for Geostatistical Models Richard A. Davis Colorado State University http://www.stat.colostate.edu/~rdavis/lectures Joint work with: Jennifer A. Hoeting, Colorado State University Andrew

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Bayesian Dynamic Modeling for Space-time Data in R

Bayesian Dynamic Modeling for Space-time Data in R Bayesian Dynamic Modeling for Space-time Data in R Andrew O. Finley and Sudipto Banerjee September 5, 2014 We make use of several libraries in the following example session, including: ˆ library(fields)

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

Chapter 3 - Temporal processes

Chapter 3 - Temporal processes STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information