Multivariate Output Analysis for Markov Chain Monte Carlo
|
|
- Gwendolyn Cameron
- 5 years ago
- Views:
Transcription
1 Multivariate Output Analysis for Marov Chain Monte Carlo Dootia Vats School of Statistics University of Minnesota James M. Flegal Department of Statistics University of California, Riverside Galin L. Jones School of Statistics University of Minnesota December 23, 2015 Abstract Marov chain Monte Carlo (MCMC) produces a correlated sample in order to estimate expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The ey to answering this question lies in assessing the Monte Carlo error through a multivariate Marov chain central limit theorem. However, the multivariate nature of this Monte Carlo error has largely been ignored in the MCMC literature. We provide conditions for consistently estimating the asymptotic covariance matrix via the multivariate batch means estimator. Based on this result, we present a relative standard deviation fixed-volume sequential stopping rule for terminating simulation. We show that this stopping rule is asymptotically equivalent to terminating when the effective sample size for the MCMC sample is above a desired threshold, giving an intuitive and theoretically justified approach to im- Research supported by the National Science Foundation. Research supported by the National Institutes of Health and the National Science Foundation. 1
2 plementing the proposed method. The finite sample properties of the proposed method are demonstrated in examples. 1 Introduction Marov chain Monte Carlo (MCMC) algorithms are used to estimate expectations with respect to a probability distribution when independent sampling is difficult. Typically, interest is in estimating a vector of quantities such as means, moments, and quantiles associated with a given probability distribution. However, analysis of the MCMC output routinely focuses on a univariate Marov chain central limit theorem (CLT). Thus standard convergence diagnostics, sequential stopping rules for termination, effective sample size definitions, and confidence intervals are univariate in nature. We present a methodological framewor for multivariate analysis of MCMC output. Let F be a probability distribution with support X and g : X R p be an F -integrable function such that θ := E F g is of interest. If {X t } is an F -invariant Harris recurrent Marov chain, set {Y t } = {g(x t )} and estimate θ with θ n = n 1 n t=1 Y t since θ n θ with probability 1 as n. Finite sampling leads to an unnown Monte Carlo error, θ n θ, and estimating this error is essential to assessing the quality of estimation. If, for δ > 0, g has 2 + δ moments under F and {X t } is geometrically ergodic, an approximate sampling distribution for the Monte Carlo error is available via a Marov chain CLT. That is, there exists a p p positive definite symmetric matrix, Σ, such that, as n, n(θn θ) d N p (0, Σ). (1) Assessment of the Monte Carlo error rests on estimating Σ. This, however, can be challenging since Σ = Var F (Y 1 ) + 2 Cov F (Y 1, Y 1+ ). =1 We consider the multivariate batch means (mbm) estimator of Σ and establish con- 2
3 ditions for its strong consistency. Using the mbm estimator we develop multivariate confidence regions, multivariate sequential termination rules, and multivariate effective sample size. Thus we provide a methodological framewor for multivariate MCMC output analysis. Univariate output analysis methods are motivated by the univariate CLT n(θn,i θ i ) d N(0, σ 2 i ), (2) as n, where θ n,i and θ i are the ith components of θ n and θ, respectively and σi 2 is the ith diagonal element of Σ. Flegal and Jones (2010), Hobert et al. (2002) and Jones et al. (2006) propose strongly consistent estimators of σi 2 using spectral variance methods, regenerative simulation, and batch means methods, respectively. Consistent estimators of the asymptotic variance in independent sampling are used to sequentially terminate simulation when the half-width of the confidence interval for θ i is below a tolerance level set a priori. In MCMC, this idea was formalized by Jones et al. (2006) via the fixed-width sequential stopping rule. For a desired tolerance of ɛ, simulation is terminated the first time, for all components σ n,i t + ɛi(n < n ) + n 1 ɛ, (3) n where σn,i 2 is a strongly consistent estimator of σ2 i, t is an appropriate t-distribution quantile, and n is a minimum simulation effort specified by the user. The fixed-width sequential stopping rule laid the foundation for termination based on quality of estimation rather than convergence of the Marov chain (for a review of convergence diagnostics see Cowles and Carlin, 1996). Fixed-width procedures are aimed at terminating simulation when the estimation is reliable in the sense that if the procedure is repeated again, the estimates will not be vastly different (Flegal et al., 2008). Implementing the fixed-width sequential stopping rule at (3) requires careful analysis 3
4 for choosing ɛ for each θ n,i. Since this can be challenging for large p, Flegal and Gong (2015) proposed relative fixed-width sequential stopping rules that terminate simulation relative to attributes of the estimation process. Specifically, they proposed a relative standard deviation fixed-width sequential stopping rule, where the simulation is terminated relative to the uncertainty in the target distribution. That is, if λ 2 n,i is the sample variance for the ith component of {Y t }, simulation is terminated the first time for all components σ n,i t + ɛλ n,i I(n < n ) + n 1 ɛλ n,i. (4) n Jones et al. (2006) and Flegal and Gong (2015) showed that the confidence intervals created at termination using fixed-width and relative standard deviation fixed-width sequential stopping rules are asymptotically valid, in that the confidence intervals created at termination have the right coverage probability as ɛ 0 when σ 2 n,i σ2 i with probability 1 as n. Thus, strong consistency of the estimators of σ 2 i is vital for valid inference using sequential termination rules. However, these termination rules are univariate in that, simulation is terminated when all components in θ n individually satisfy the termination rule. Clearly, termination is dictated by the component that mixes slowest, and cross-correlations between components are ignored. In addition, corrections for multiple testing result in conservative confidence intervals leading to delayed termination. We propose a multivariate approach to termination called the relative standard deviation fixed-volume sequential stopping rule. If Λ n is the sample covariance matrix of {Y t } and denotes determinant, then simulation is terminated the first time Volume of Confidence Region 1/p + ɛ Λ n 1/2p I(n < n ) + n 1 < ɛ Λ n 1/2p. Since the volume of the confidence region is proportional to the estimate of Σ an interpretation is that the simulation is terminated when the generalized variance of the Monte Carlo error is small relative to the estimated generalized variance with respect 4
5 to F ; that is, the estimate of Σ is small compared to the estimate of Cov F (Y 1 ). This is a multivariate generalization of the relative standard deviation fixed-width sequential stopping rule of Flegal and Gong (2015). We show that the relative standard deviation fixed-volume sequential stopping rule leads to asymptotically valid confidence regions. The bottlenec in demonstrating asymptotic validity is proposing estimators of Σ that are strongly consistent. To this end, we investigate the mbm estimator and provide conditions for strong consistency. Another common way to terminate simulation is to continue until the effective sample size (ESS) is greater than some W, set a priori. (see Drummond et al. (2006), Atinson et al. (2008), and Giordano et al. (2015) for a few examples). ESS is the equivalent number of independent and identically distributed samples as in the correlated sample. Terminating using ESS is intuitive since it is comparable to termination for independent sampling. Gong and Flegal (2015) showed that this method of termination is equivalent to terminating according to the relative standard deviation fixed-width stopping rule. However, we are unaware of any formal definition of multivariate ESS (mess). Since the samples {Y t } are vector-valued, it is common to calculate ESS for all components simultaneously. In Section 3.1, we introduce a definition of mess and present a strongly consistent estimator, mess. We show that terminating according to the relative standard deviation fixed-volume sequential stopping rule is asymptotically equivalent to terminating when mess W p,α,ɛ, (5) where 1 α represents the confidence level and ɛ is the precision specified in the relative standard deviation fixed-volume sequential stopping rule. Notice that W p,α,ɛ is only a function of the dimension of the estimation problem and the specified precision. Thus the user can specify a precision, calculate W p,α,ɛ, and simulate until (5) is satisfied. This sequential process is the same as terminating when the relative standard deviation fixed-volume sequential stopping rule is satisfied. 5
6 Generally, our proposed mess and relative standard deviation fixed-volume sequential stopping rule methods terminate earlier than simultaneous univariate methods since termination is dictated by the overall Marov chain and not by the component that mixes the slowest. In addition, since multiple testing is no longer an issue, the volume of the confidence region is considerably smaller even in moderate p problems. Using the inherent multivariate nature of the problem and acnowledging the existence of crosscorrelation in estimating effective sample size leads to a more realistic understanding of the estimation process. 1.1 An Illustrative Example For i = 1,..., K, let Y i be a binary response variable and X i = (x i1, x i2,..., x i5 ) be the observed predictors for the ith observation. Assume τ 2 is nown, ( ) Y i X i, β ind 1 Bernoulli 1 + e X, and β N iβ 5 (0, τ 2 I 5 ). (6) This simple hierarchical model results in an intractable posterior, F, on R 5. The dataset used is the logit dataset in the mcmc R pacage. The goal is to estimate the posterior mean of β, E F β. Thus g here is the identity function mapping to R 5. We implement a random wal Metropolis-Hastings algorithm with a multivariate normal proposal distribution N 5 (, I 5 ) where I 5 is the 5 5 identity matrix and the 0.35 scaling ensures an optimal acceptance probability as suggested by Roberts et al. (1997). We obtain an MCMC sample of size 10 5 and calculate the Monte Carlo estimate for E F β. The starting value for β is a random draw from the prior distribution. The covariance matrix Σ is estimated using the mbm estimator described in Section 2. We also implement the univariate batch means (ubm) methods described in Jones et al. (2006) to estimate σi 2, which captures the autocorrelation in each component while ignoring the cross-correlation. This cross-correlation is often significant as exhibited in Figure 1, and can only be captured by multivariate methods lie mbm. In Figure 2 we 6
7 Autocorrelation for β 1 Cross correlation for β 1 and β 3 ACF ACF Lag Lag Trace plot for β 1 Trace plot for β 3 β β e+00 4e+04 8e+04 0e+00 4e+04 8e+04 Iteration Iteration Figure 1: Autocorrelation plot for β 1, cross-correlation plot between β 1 and β 3 and trace plots for β 1 and β 3. The cross-correlation is ignored by univariate methods. Monte Carlo sample size = present 90% confidence regions created using mbm and ubm for β 1 and β 3 (for the purposes of this figure, we treat this as a problem in 2 dimensions). The boxes were created using ubm both corrected and uncorrected with Bonferroni and the ellipse was constructed using the mbm estimator. To assess the confidence regions, we verify their coverage probabilities over 1000 independent replications with Monte Carlo sample sizes in {10 4, 10 5, 10 6 }. The true posterior mean, (0.5706, , , , ), is obtained by taing the mean over 10 9 iterations. For each of the 1000 replications, it was noted whether the true posterior mean was present in the confidence region. In addition, the volume of the confidence region to the pth root was also observed. Table 1 summarizes the results. Note that though the uncorrected univariate methods provide the smallest confidence regions, their coverage probabilities are far from desirable. For a large enough Monte Carlo sample size, mbm produces 90% coverage probabilities with a systematically lower volume than ubm corrected with Bonferroni (ubm-bonferroni). This conclusion agrees with Figure 2, where due to the elliptical structure, mbm ignores a large chun of area 7
8 90% Confidence Region β β 1 Figure 2: Joint 90% confidence region for β 1 and β 3. The ellipse is generated using the mbm estimator, the dotted line using the uncorrected univariate BM estimator and dashed line the univariate BM estimator corrected by Bonferroni. Monte Carlo sample size is contained in the ubm-bonferroni confidence region. This example shows that even simple MCMC problems produce complex dependence structures within and across components of the samples. Ignoring this dependence structure leads to an incomplete understanding of the estimation process. Not only do we gain more information about the Monte Carlo error using multivariate methods, but avoid using conservative Bonferroni methods. It is evident that termination rules based on volume of confidence regions would generally terminate earlier for multivariate methods while extracting more information from the samples. The rest of the paper is organized as follows. In Section 2 we define the mbm estimator and provide conditions for strong consistency. In Section 3 we formally introduce a general class of relative fixed-volume sequential termination rules. In addition, we introduce a multivariate definition of ESS and relate it to sequential termination rules. In Section 4 we continue our implementation of the Bayesian logistic regression model and consider additional examples. We choose a vector autoregressive process of order 1, where convergence of the process can be manipulated. Specifically, we construct the process in such a way that one component mixes slowly, while the others are fairly well behaved. Such behavior is nown to occur often in hierarchical models with priors 8
9 Volume to the pth root n mbm ubm-bonferroni ubm 1e (7.94e-05) (9.23e-05) (6.48e-05) 1e (1.20e-05) (1.42e-05) (1.00e-05) 1e (1.70e-06) (2.30e-06) (1.60e-06) Coverage Probabilities 1e (0.0104) (0.0099) (0.0155) 1e (0.0103) (0.0090) (0.0156) 1e (0.0097) (0.0094) (0.0153) Table 1: Volume to the pth (p = 5) root and coverage probabilities for 90% confidence regions constructed using mbm, ubm uncorrected and ubm corrected for Bonferroni. Replications = 1000 and standard errors are indicated in parenthesis. on the variance components. The next example is that of a Bayesian Lasso where the posterior is in 51 dimensions. We also implement our output analysis methods for a fairly complicated Bayesian dynamic spatial temporal model. We conclude with a discussion in Section 5. 2 Multivariate Batch Means Estimator Let n = a n, where a n is the number of batches and is the batch size. For = 0,...,, define Ȳ := b 1 bn n t=1 Y bn+t. Then Ȳ is the mean vector for batch and the mbm estimator of Σ is given by Σ n = a n 1 =0 (Ȳ θ n ) (Ȳ θ n ) T. (7) When Y t is univariate, the batch means estimator has been well studied for MCMC problems (Flegal and Jones, 2010; Jones et al., 2006) and for steady state simulations (Damerdji, 1991; Glynn and Iglehart, 1990; Glynn and Whitt, 1991). Glynn and Whitt (1991) showed that the batch means estimator cannot be consistent for fixed batch size,. Damerdji (1991, 1995), Jones et al. (2006) and Flegal and Jones (2010) established its asymptotic properties including strong consistency and mean square consistency 9
10 when both the batch size and number of batches increases with n. The multivariate extension as in (7) was first introduced by Chen and Seila (1987). For steady-state simulation output Charnes (1995) and Muñoz and Glynn (2001) studied confidence regions for θ based on the mbm, however, the theoretical properties of the mbm remain unexplored. In Theorem 1, we present conditions for strong consistency of Σ n in estimating Σ for MCMC, but our results hold for more general processes. Our main assumption on the process is that of a strong invariance principle. Condition 1. Let denote the Euclidean norm and {B(t), t 0} be a p-dimensional multivariate Brownian motion. There exists a nonnegative increasing function γ on the positive integers, a sufficiently rich probability space Ω, an n 0 N, a p p lower triangular matrix L and a finite random variable D such that for almost all ω Ω and for all n > n 0, n(θ n θ) LB(n) < Dγ(n) w.p. 1 as n. (8) A strong law, CLT, and functional central limit theorem (FCLT) are consequences of Condition 1 which holds for independent processes (Beres and Philipp, 1979; Einmahl, 1989; Zaitsev, 1998), Martingale sequences (Eberlein, 1986), renewal processes (Horvath, 1984) and for φ-mixing and strongly mixing sequences (Dehling and Philipp, 1982; Kuelbs and Philipp, 1980). Using Kuelbs and Philipp (1980) results, Vats et al. (2015) established Condition 1 with γ(n) = n 1/2 λ, λ > 0 for polynomially ergodic Marov chains. As in the univariate case, the batch means estimator can be consistent only if the batch size increases with n. Condition 2. The batch size is an integer sequence such that and n/ as n where, and n/ are monotonically increasing. Condition 3. There exists a constant c 1 such that ( n bn n 1) c <. In Theorem 1, we establish strong consistency of Σ n. The proof is given in Appendix A.1. 10
11 Theorem 1. Let X be a geometrically ergodic Marov chain with invariant distribution F and let g be an F -measurable real-valued function such that E F g 2+δ <. Then (8) holds with γ(n) = n 1/2 λ for some λ > 0. Let Conditions 2 and 3 hold. If b 1/2 n (log n) 1/2 n 1/2 λ 0 as n, then Σ n Σ with probability 1 as n. Remar 1. The theorem holds more generally outside of the context of Marov chains for processes that satisfy Condition 1. The general statement of the theorem is provided in Appendix A.1. Remar 2. The value of λ depends on the mixing time of the Marov chain. For slowly mixing Marov chains λ is closer to 0 and for fast mixing chains λ is closer to 1/2. Remar 3. It is natural to consider = n ν for 0 < ν < 1. Then ν > 1 2λ is required to satisfy b 1/2 n (log n) 1/2 n 1/2 λ 0 as n. Hence, for fast mixing process (λ close to 1/2), smaller batch sizes suffice and similarly slow mixing processes require larger batch sizes. This reinforces our intuition that higher correlation calls for larger batch sizes. Remar 4. Since Σ is non singular Σ n should be non-singular, which requires a n > p. Thus larger Monte Carlo sample sizes will be required for higher dimensional estimation problems. An immediate consequence of the strong consistency of the mbm estimator is the convergence of its eigenvalues. The proof follows from Theorem 2 in Vats et al. (2015). Corollary 1. Let λ 1 > λ 2 > > λ p > 0 be the eigenvalues of Σ. Let ˆλ 1,..., ˆλ p be the p eigenvalues of Σ n such that ˆλ 1 > ˆλ 2 > > ˆλ p, then under conditions of Theorem 1, ˆλ λ w.p. 1 as n for all 1 p. 3 Termination Rules We consider multivariate sequential termination rules that lead to asymptotically valid confidence regions. Let F 1 α,p,an p denote the 1 α quantile of a central F distribution with p numerator degrees of freedom and a n p denominator degrees of freedom. A 11
12 100(1 α)% confidence region for θ is the set C α (n) = { θ R p : n(θ n θ) T Σ 1 n (θ n θ) < p(a } n 1) (a n p) F 1 α,p,a n p. Then C α (n) forms an ellipsoid in p dimensions oriented along the directions of the eigenvectors of Σ n. Letting denote determinant, the volume of C α (n) is Vol(C α (n)) = ( ) 2πp/2 p(an 1) p/2 pγ(p/2) n(a n p) F 1 α,p,a n p Σ n 1/2. (9) Since p is fixed and as n, Σ n Σ with probability 1, Vol(C α (n)) 0 with probability 1 as n. If ɛ > 0 and s(n) is a positive real valued function defined on the positive integers, then a fixed-volume sequential stopping rule terminates the simulation at the random time T (ɛ) = inf { } n 0 : Vol(C α (n)) 1/p + s(n) ɛ. (10) Glynn and Whitt (1992) showed that terminating at T (ɛ) yields confidence regions that are asymptotically valid in that, as ɛ 0, Pr [θ C α (T (ɛ))] 1 α if (a) a FCLT holds, (b) s(n) = o(n 1/2 ) as n, and (c) Σ n Σ with probability 1 as n. A FCLT holds under Condition 1 and our Theorem 1 establishes consistency of mbm. Next (b) will be satisfied if we set s(n) = ɛi(n < n ) + n 1, which ensures simulation does not terminate before n iterations. Also, n should satisfy a n > p so that Σ n is positive definite; recall Remar 4. The sequential stopping rule (10) can be difficult to implement in practice since the choice of ɛ depends on the units of θ, and has to be carefully chosen for every application. We consider a generalization of (10) which can be used more naturally and which we will show connects nicely to the idea of a multivariate effective sample size. Let K(Y, p) > 0 be an attribute of the estimation process and suppose K n (Y, p) > 0 is an estimator of K(Y, p); for example tae θ = K(Y, p) and θ n = K n (Y, p). Set 12
13 s(n) = ɛk n (Y, p)i(n < n ) + n 1 and define T (ɛ) = inf { } n 0 : Vol(C α (n)) 1/p + s(n) ɛk n (Y, p). (11) The following result establishes asymptotic validity of this termination rule. The proof is provided in Appendix A.2. Theorem 2. Suppose a FCLT holds. If K n (Y, p) K(Y, p) w.p. 1 and Σ n Σ w.p. 1 as n, then, as ɛ 0, T (ɛ) and Pr [θ C α (T (ɛ))] 1 α. Suppose K(Y, p) = Λ 1/2p := Cov F Y 1 1/2p and if Λ n is the usual sample covariance matrix for {Y t }, set K n (Y, p) = Λ n. Then K n (Y, p) K(Y, p) with probability 1 as n and T (ɛ) is the first time the uncertainty in estimation (measured via the volume of the confidence region) is an ɛth fraction of the uncertainty in the target distribution. The relative standard deviation fixed-volume sequential stopping rule is formalized as terminating at T SD (ɛ) = inf { n 0 : Vol(C α (n)) 1/p + ɛ Λ n 1/2p I(n < n ) + n 1 ɛ Λ n 1/2p}. (12) Note that Λ n is positive definite as long as n is larger than p, so Λ n 1/2p > Relation to Effective Sample Size Kass et al. (1998), Liu (2008), and Robert and Casella (2013) define ESS for the ith component of the process as ESS i = n 1 + (i) =1 ρ(y 1, Y (i) 1+ ), where ρ(y (i) 1, Y (i) 1+ ) = ρ i() is the lag correlation for the ith component of Y. It is challenging to estimate ρ i () consistently, especially for larger. Alternatively, Gong 13
14 and Flegal (2015) rewrite the above definition as, ESS i = n λ2 i σi 2, where σ 2 i be the ith diagonal element of Σ and λ 2 i be the ith diagonal element of Λ. Using this formulation, a consistent estimate of ESS i is obtained by using consistent estimators of λ 2 i and σ2 i via the sample variance and univariate batch means estimators, respectively. The R pacage coda estimates ESS i by using the sample variance to estimate λ 2 i but estimates σ 2 i by determining the spectral density at frequency zero of an approximating autoregressive process. Thus, it does not estimate σ 2 i directly. The R pacage mcmcse uses the univariate batch means method to estimate σ 2 i. The estimate of ESS i is then, ÊSS i = n λ2 n,i σn,i 2, where ˆλ 2 i is the sample variance and σn,i 2 is the batch means estimator. Then ÊSS i is a consistent estimate of univariate ESS i. However in both of these pacages, ESS i is estimated for each component separately, and a conservative estimate of the overall ESS is taen to be the minimum of all ESS i. This again leads to the situation where the estimate of ESS is dictated by the component that mixes the slowest, while ignoring the behavior of the other components. A natural definition of multivariate ESS is mess = n ( ) Λ 1/p. (13) Σ When p = 1, mess reduces to the form of ESS presented above. In (13), we replace the concept of variance with generalized variance. For a random vector, the determinant of its covariance matrix is called its generalized variance. This was first introduced by Wils (1932) as a univariate measure of spread for a multivariate distribution. Wils 14
15 (1932) recommended the use of the pth root of the generalized variance. This was formalized by SenGupta (1987) as the standardized generalized variance in order to compare variability over different dimensions. Due to the strong consistency of mbm established in Theorem 1, a strongly consistent estimator for ESS is, mess = n ( ) Λn 1/p. (14) Σ n Rearranging the defining inequality in (12) yields that when n n ( mess 2π p/2 pγ(p/2) ) 1/p (p(an ) 1) 1/2 a n p F 1 α,p,a n p + 2 2/p ( ) π p(an 1) 1 (pγ(p/2)) 2/p a n p F 1 α,p,a n p ɛ 2. 1 n 1/2 Σ n 1/2p 2 1 ɛ 2 Thus, the relative standard deviation fixed-volume sequential stopping rule is equivalent to terminating the first time mess is larger than a lower bound. This lower bound is a function of n and thus is difficult to determine before starting the simulation. However, as n, the scaled F distribution converges to a χ 2, leading to the following approximation mess 2 2/p π (pγ(p/2)) 2/p χ 2 1 α,p ɛ 2. Thus in order to determine termination, a user can a priori decide to simulate until W effective samples are obtained. This choice of W will correspond to some α and ɛ and thus be an asymptotically valid termination rule. The choice of W should be made eeping in mind that the samples obtained are only approximately from the invariant distribution F, and this does not change in finite time. Thus the choice of W should be conservative to begin with and be bigger for larger p. Example 1. Suppose p = 5 (as in the logistic regression setting of Section 1.1) and that we want a precision of ɛ =.05 (so the desired computational uncertainty is a.05th fraction of the uncertainty in the target distribution) for a 95% confidence region. This 15
16 requires mess On the other hand, if we simulate until mess = 10000, we obtain a precision of ɛ = Univariate Termination Rules In this section we formally present the univariate termination rules we will implement in Section 4 as a comparison to the multivariate rules. Recall that for the ith component of θ, θ n,i is the Monte Carlo estimate for θ i, and σ 2 i is the asymptotic variance in the univariate CLT. Let σ 2 n,i be the univariate batch means estimator of σ2 i and let λ 2 n,i be the sample variance for the ith component. Common practice is to terminate simulation when all components satisfy a termination rule. We focus on the relative standard deviation fixed-width sequential stopping rules. Due to multiple testing, a Bonferroni correction is often used. We will refer to the uncorrected method as the ubm method and the corrected method as the ubm-bonferroni. To create 100(1 α)% univariate uncorrected confidence intervals, the relative standard deviation fixed-width rule terminates the first time, { ( 1 max,...,p λ n,i σ n,i 2t 1 α/2,an 1 + ɛλ n,i I(n < n ) + 1 )} ɛ. n n To create 100(1 α)% univariate corrected confidence intervals, the relative standard deviation fixed-width rule terminates the first time, { ( 1 max,...,p λ n,i σ n,i 2t 1 α/2p,an 1 + ɛλ n,i I(n < n ) + 1 )} ɛ. n n Remar 5. If it is unclear which features of F are of interest a priori, i.e., g is not nown before simulation, one can implement Scheffe s simultaneous confidence intervals. This method can also be used to create univariate confidence intervals for arbitrary contrasts a posteriori. Scheffe s simultaneous intervals are boxes on the coordinate axes that 16
17 bound the confidence ellipsoids. For all a T θ, where a R p, a T θ n ± a T Σ n a p() (a n p) F 1 α,p,a n p will have simultaneous coverage probability 1 α. Since we generally are not interested in all possible linear combinations, Scheffe s intervals are often too conservative. In fact, if the number of confidence intervals is less than p, Scheffe s intervals are more conservative than Bonferroni corrected intervals. 4 Examples In each of the following examples we present a target distribution F, a Marov chain with F as its invariant distribution, we specify g, and are interested in estimating E F g. We consider the finite sample performance (based on 1000 independent replications) of the relative standard deviation fixed-volume sequential stopping rules. In each case we use 90% confidence regions and various choices of ɛ and specify our choice of n and. 4.1 Bayesian Logistic Regression We continue the Bayesian logistic regression example introduced in Section 1.1. Recall that a random wal Metropolis-Hastings algorithm was implemented to sample from the intractable posterior. We prove the chain is geometrically ergodic in Appendix A.3. Theorem 3. The random wal based Metropolis-Hastings algorithm with invariant distribution given by the posterior from (6), is geometrically ergodic. As a consequence of Theorem 3 and that F has a moment generating function, a FCLT and thus a CLT holds. This ensures the validity of Theorems 1 and 2. Motivated by the ACF plot in Figure 1, was set to n 1/2 and n = For calculating coverage probabilities, we declare the truth as the posterior mean from an independent simulation of length The results are presented in Table 2. As before, 17
18 mbm ubm-bonferroni ubm Termination Iteration ɛ = (196) (391) (213) ɛ = (1158) (1880) (1036) ɛ = (1837) (7626) (3150) Effective Sample Size ɛ = (9) 9270 (13) 4643 (7) ɛ = (51) (65) (36) ɛ = (110) (271) (116) Coverage Probabilities ɛ = (0.0099) (0.0091) (0.0157) ɛ = (0.0097) (0.0090) (0.0155) ɛ = (0.0098) (0.0097) (0.0155) Table 2: Bayesian Logistic: Over 1000 replications, we present termination iterations, effective sample size at termination and coverage probabilities at termination for each corresponding method. Standard errors are in parenthesis. the univariate uncorrected method has less than desired coverage probabilities. For ɛ = 0.02 and 0.01, the coverage probabilities for both the mbm and ubm-bonferroni regions are at 90%. However, termination for mbm is significantly earlier. This is a consequence of two things. First, accounting for the multivariate nature of the problem allows us to capture more information from a smaller sample; this is reflected in the estimates of ESS. Second, using a Bonferroni correction gives larger confidence regions, leading to delayed termination. Recall that due to Theorem 2, as ɛ decreases to zero, the coverage probability of confidence regions created at termination using the relative standard deviation fixed-volume sequential stopping rule converges to the nominal level. This is demonstrated in Figure 3 where we present the coverage probability over 1000 replications as ɛ increases (or ɛ decreases). Notice that the increase in coverage probabilities need not be monotonic due to the randomness of the underlying process. To illustrate the convergence of the eigenvalues of the mbm estimator, we plot the condition number of the estimate under the spectral norm. That is, we report ˆλ max /ˆλ min 18
19 Coverage Probability as ε Decreases Coverage Probability ε Figure 3: Bayesian Logistic: Plot of coverage probability with confidence bands as ɛ decreases at 90% nominal rate. Replications = Condition Number by Iteration Condition Number log 10 (Iteration) Figure 4: Bayesian Logistic: Plot of condition number under spectral norm with confidence bands versus Monte Carlo sample size. Replications = over varying Monte Carlo sample sizes, and present the plot in Figure 4. In Figure 4, the condition number for the mbm estimator decreases with an increase in sample size indicating better conditioning at large simulation sizes. 4.2 Vector Autoregressive Process Consider the vector autoregressive process of order 1 (VAR(1)). For t = 1, 2,..., Y t = ΦY t 1 + ɛ t, 19
20 where Y t R p, Φ is a p p matrix, ɛ t iid N p (0, Ω), where Ω is a p p positive definite symmetric matrix. The matrix Φ determines the nature of the autocorrelation. This Marov chain has invariant distribution F = N p (0, V ) where V is such that vec(v ) = (I p 2 Φ Φ) 1 vec(ω), denotes the Kronecer product and is geometrically ergodic when the spectral radius of Φ is less than 1 (Tjøstheim, 1990). Consider the goal of estimating the mean of F, E F Y = 0 with Ȳn. Then Σ = (I Φ) 1 V + V (I Φ) 1 V. Let p = 5, Φ = diag(.9,.5,.1,.1,.1), and Ω be the AR(1) covariance matrix with autocorrelation 0.9. Since the first eigenvalue of Φ is large, the first component mixes slowest. This component will dictate the primary orientation of the confidence ellipsoids. We sample the process for 10 5 iterations and in Figure 5 present the autocorrelation (ACF) plot for Y (1) and Y (3) and the cross-correlation (CCF) plot between Y (1) and Y (3) in addition to the trace plot for Y (1). Notice that Y (1) has larger significant lags than Y (3) and there is significant cross-correlation between Y (1) and Y (3). Figure 6 displays joint confidence regions for Y (1) and Y (3). Recall that the true mean is (0, 0), and is present in all three regions, but the ellipse produced by mbm has significantly smaller volume than the ubm boxes. The orientation of the ellipse is determined by the cross-correlations witnessed in Figure 5. We set n = 1000, = n 1/3, ɛ in {0.05, 0.02, 0.01} and at termination of each method, calculate the coverage probabilities and effective sample size. Results are presented in Table 3. Note that as ɛ decreases, termination time increases and coverage probabilities tend to 90% nominal for each method (due to asymptotic validity). Also note that the uncorrected Bonferroni methods produce confidence regions with undesirable coverage probabilities and thus, are not of interest. Consider ɛ =.02 in Table 3. Termination for mbm is at 8.8e4 iterations compared to 9.6e5 for ubm-bonferroni. However, the estimates for multivariate ESS at 8.8e4 iterations is 4.7e4 samples compared to univari- 20
21 Autocorrelation for Y (1) Autocorrelation for Y (3) ACF ACF Lag Lag Cross correlation for Y (1) and Y (3) Trace plot for Y (1) ACF Y (1) Lag 0e+00 4e+04 8e+04 Time Figure 5: VAR: ACF plot for Y (1) and Y (3) and the CCF plot between Y (1) and Y (3) in addition to the trace plot for Y (1). The observed cross-correlation is completely ignored by univariate methods. Monte Carlo sample size is ate ESS of 5.6e4 samples for 9.6e5 iterations. This is because the leading component Y (1) mixes much slower than the other components, and defines the behavior of the univariate ESS. A small study presented in Table 4 elaborates on this behavior. Over 100 replications of Monte Carlo sample sizes 10 5 and 10 6, we present the mean estimate of ESS using multivariate and univariate methods. The estimate of ESS for the first component is significantly smaller than all other components leading to a conservative univariate estimate of ESS. 4.3 Bayesian Lasso Let y be an K 1 response vector and X be an K q matrix of predictors. We consider the following Bayesian Lasso formulation of Par and Casella (2008). y β, σ 2, τ 2 N K (Xβ, σ 2 I n ) β σ 2, τ 2 N q (0, σ 2 D τ ) where D τ = diag(τ 2 1, τ 2 2,..., τ 2 q ) 21
22 90% Confidence Region Y Y 1 Figure 6: VAR: Joint 90% confidence region for first two components of Y. Solid ellipse is generated using mbm estimator, dotted the ubm estimator uncorrected and dashed line the ubm estimator corrected by Bonferroni. Monte Carlo sample size is σ 2 Inverse-Gamma(α, ξ) ( ) τj 2 iid λ 2 Exponential 2 for j = 1,..., q, where λ, α, and ξ are fixed and the Inverse-Gamma(a, b) distribution is x a 1 e b/x. We use a deterministic scan Gibbs sampler to draw approximate samples from the posterior; see Khare and Hobert (2013) for a full description of the algorithm. Khare and Hobert (2013) showed that for K 3, this Gibbs sampler is geometrically ergodic for arbitrary q, X and λ. We fit this model to the cooie dough dataset of Osborne et al. (1984). The data was collected to test the feasibility of near infra-red (NIR) spectroscopy for measuring the composition of biscuit dough pieces. There are 72 observations and the response variable is taen as the amount of dry flour content measured and the predictor variables are 25 measurements of spectral data spaced equally between 1100 to 2498 nanometers. Since we are interested in estimating the posterior mean for (β, τ 2, σ 2 ), p = 51. The data is available in the R pacage ppls, and the Gibbs sampler is implemented in function blasso in R pacage monomvn. The truth was declared by averaging posterior means from 1000 parallel chains of length 1e6. We set n = 2e4 and = n 1/3. 22
23 mbm ubm-bonferroni ubm Termination Iteration ɛ = (10) (93) (52) ɛ = (23) (286) (617) ɛ = (454) (0) (902) Effective Sample Size ɛ = (6) 9093 (6) 4542 (3) ɛ = (21) (25) (29) ɛ = (217) (56) (51) Coverage Probabilities ɛ = (0.0101) (0.0072) (0.0136) ɛ = (0.0102) (0.0074) (0.0134) ɛ = (0.0095) (0.0075) (0.0131) Table 3: VAR: Over 1000 replications, we present termination iterations, effective sample size at termination and coverage probabilities at termination for each corresponding method. Standard errors are in parentheses. n mess ESS 1 ESS 2 ESS 3 ESS 4 ESS 5 1e (71) 5447 (41) (284) (664) (639) (647) 1e (332) (256) (1665) (3801) (3639) (3845) Table 4: VAR: Effective sample sample (ESS) estimated using proposed multivariate method and the univariate method of Gong and Flegal (2015) for Monte Carlo sample sizes of n = 1e5, 1e6 and 100 replications. Standard errors are in parentheses. In Table 5 we present termination results from 1000 replications. With p = 51, the uncorrected univariate regions produce confidence regions with very low coverage probabilities. The ubm-bonferroni and mbm provide competitive coverage probabilities at termination. However, termination for mbm is significantly earlier than univariate methods over all values of ɛ. For ɛ =.05 and.02 we observe zero standard error for termination using mbm since termination is achieved at the same 10% increment over all 1000 replications. Thus the variability in those estimates is less than 10% of the size of the estimate. Aside from the conservative multiple testing correction, delayed termination for the univariate methods is also due to conservative univariate estimates of ESS. Figure 7 23
24 mbm ubm-bonferroni ubm Termination Iteration ɛ = (0) (76) (7) ɛ = (0) (664) (103) ɛ = (393) (431) (332) Effective Sample Size ɛ = (4) (15) 4778 (6) ɛ = (8) (122) (24) ɛ = (283) (163) (74) Coverage Probabilities ɛ = (0.0096) (0.0097) (0.0031) ɛ = (0.0098) (0.0093) (0.0030) ɛ = (0.0096) (0.0081) (0.0030) Table 5: Bayesian Lasso: Over 1000 replications, we present termination iterations, effective sample size at termination and coverage probabilities at termination for each corresponding method. Standard errors are in parentheses. shows the ESS using multivariate and univariate methods at termination over 1000 replications for ɛ = The multivariate ESS at termination is more conservative than the univariate ESS at ubm-bonferroni termination, but the iterations to termination is significantly lower for mbm. 4.4 Bayesian Dynamic Spatial-Temporal Model Gelfand et al. (2005) propose a Bayesian hierarchical model for modeling univariate and multivariate dynamic spatial data viewing time as discrete and space as continuous. The methods in their paper have been implemented in the R pacage spbayes. We present a simpler version of the dynamic model as described by Finley et al. (2015). Let s = 1, 2,..., N s be location sites and t = 1, 2,..., N t be time-points. Let the observed measurement at location s and time t be denoted by y t (s). In addition, let x t (s) be the q 1 vector of predictors, observed at location s and time t, and β t be the 24
25 Sample Size in Thousands Termination Iteration ESS at Termination mbm ubm Bonferroni ubm Figure 7: Bayesian Lasso: Plot of termination iteration and ESS for mbm, ubm-bonferroni and ubm for ɛ =.02. Replications = q 1 vector of coefficients. For t = 1, 2,..., N t, y t (s) = x t (s) T β t + u t (s) + ɛ t (s), ɛ t (s) ind N(0, τ 2 t ); (15) β t = β t 1 + η t, η t iid N(0, Σ η ); u t (s) = u t 1 (s) + w t (s), w t (s) ind GP (0, σ 2 t ρ( ; φ t )), (16) where GP (0, σt 2 ρ( ; φ t )) denotes a spatial Gaussian process with covariance function σt 2 ρ( ; φ t ). Here, σt 2 denotes the spatial variance component and ρ(, φ t ) is the correlation function with exponential decay. Equation (15) is referred to as the measurement equation and ɛ t (s) denotes the measurement error, assumed to be independent of location and time. Equation (16) contains the transition equations which emulate the Marovian nature of dependence in time. To complete the Bayesian hierarchy, the following priors are assumed β 0 N(m 0, C 0 ) and u 0 (s) 0; τ 2 t IG(a τ, b τ ) and σ 2 t IG(a s, b s ); 25
26 Σ η IW(a η, B η ) and φ t Unif(a φ, b φ ), where IW is the inverse-wishart distribution Σ η aη+q+1 2 e 1 2 tr(bησ 1 η ) and IG(a, b) is the inverse-gamma distribution x a 1 e b/x. We fit the model to the NETemp dataset in the spbayes pacage. This dataset contains monthly temperature measurements from 356 weather stations on the east coast of USA collected from January 2000 to December The elevation of the weather stations is also available as a covariate. We choose a subset of the data with 10 weather stations for the year 2000, and fit the model with intercept. The resulting posterior has p = 185 components. A conditional Metropolis-Hastings sampler is described in Gelfand et al. (2005) and implemented in the spdynlm function. Default hyper parameter settings were used. The rate of convergence for this sampler has not been studied; thus we do not now if the sampler is geometrically ergodic. Our goal is to estimate the posterior expectation of θ = (β t, u t (s), σ 2 t, Σ η, τ 2 t, φ t ). For the calculation of coverage probabilities, 1000 parallel runs of a 2e6 MCMC sample were averaged and declared as the truth. We set = n 1/2 and n = 5e4 so that a n > p to ensure positive definitiveness of Σ n. Due to the Marovian transition equations in (16), the β t and u t exhibit a significant covariance structure in the posterior distribution. This is evidenced in Figure 8 where for Monte Carlo sample size n = 10 5, we present confidence regions for β (0) 1 and β (0) 2 - the intercept coefficient for the first and second months, and for u 1 (1) and u 2 (1) - the additive spatial coefficient for the first two weather stations. The thin ellipses indicate that the principle direction of variation is due to the correlation between the components. This significant reduction in volume, along with the conservative Bonferroni correction (p = 185) results in increased delay in termination. For smaller values of ɛ it was not possible to store the MCMC output in memory on a 8 gigabyte machine using ubm-bonferroni methods. As a result in Table 6, the univariate methods could not be implemented for smaller ɛ values. For ɛ =.10, termination for mbm was at n = 5e4 for every replication. At 26
27 90% Confidence Region 90% Confidence Region β 2 (0) u 2(1) β 1 (0) u 1(1) Figure 8: Bayesian Spatial: 90% confidence regions for β (0) 1 and β (0) 2 and u 1 (1) and u 2 (1). Monte Carlo sample size = Coverage Probability as ε Decreases Coverage Probability ε Figure 9: Bayesian Spatial: Plot of ɛ versus observed coverage probability for mbm estimator over 1000 replications with = n 1/2. these minimum iterations, the coverage probability for mbm is at 88%, whereas both the univariate methods have far lower coverage probabilities at 0.62 for ubm-bonferroni and for ubm. The coverage probabilities for the uncorrected methods are quite small since we are maing 185 confidence regions simultaneously. In Figure 9, we illustrate asymptotic validity of the confidence regions constructed using the relative standard deviation fixed-volume sequential stopping rule. We present the observed coverage probabilities over 1000 replications for several values of ɛ. 27
28 mbm ubm-bonferroni ubm Termination Iteration ɛ = (0) (28315) (491) ɛ = (12) (2178) ɛ = (174) - - Effective Sample Size ɛ = (20) 3184 (75) 1130 (1) ɛ = (20) (4) ɛ = (97) - - Coverage Probabilities ɛ = (0.0102) (0.0153) (0.0026) ɛ = (0.0102) (0.0040) ɛ = (0.0108) - - Table 6: Bayesian Spatial: Over 1000 replications, we present termination iteration, effective sample size at termination and coverage probabilities at termination for each corresponding method at 90% nominal levels. Standard errors are in parentheses. 5 Discussion Multivariate analysis of MCMC output data has received little attention thus far. Seila (1982) and Chen and Seila (1987) built a framewor for multivariate analysis for a Marov chain using regenerative simulation. Since establishing regenerative properties for a Marov chain requires a separate analysis for every problem and will not wor well in component-wise Metropolis-Hastings samplers, the application of their wor is limited. More recently, Vats et al. (2015) showed strong consistency of the multivariate spectral variance estimators of Σ, (msve) which could potentially substitute for the mbm estimator in applications to termination rules. However, outside of toy problems where p is small, the msve estimator is computationally demanding compared to the mbm estimator. To compare these two estimators, we extend the VAR(1) example discussed in Section 4.2, to p = 50. Since in this case we now the true Σ, we assess the relative error in estimation ˆΣ Σ F / Σ F where ˆΣ is either the mbm or msve estimator and F is the Frobenius norm. Results are in Table 7. The msve estimator overall has smaller relative error, but as Monte Carlo sample size increases, computation 28
29 Method n Relative Error Computation Time mbm 1e ( ) (1.5e-05) msve 1e ( ) (2.1e-05) mbm 1e ( ) (1.7e-05) msve 1e ( ) (1.8e-04) mbm 1e ( ) (8.6e-05) msve 1e ( ) ( ) Table 7: Relative error and computation time (in seconds) comparison between msve and mbm for a VAR(1) model with p = 50 over varying Monte Carlo samples sizes time is significantly higher than the mbm estimator. Also note that the relative error for both estimators decreases with an increase in Monte Carlo sample size. The msve, mbm estimators, and the multivariate ESS have been implemented in the mcmcse R pacage (Flegal et al., 2015). Through our examples in Section 4, we present a variety of settings where multivariate estimation leads to improved analysis when compared to univariate methods. The univariate methods provide a fundamental framewor to build on, but are not equipped for analyzing multivariate output, even when the dimension of the problem is as small as 5. This is because there are two layers of conservative methods required to implement the relative standard deviation fixed-width sequential stopping rule; (i) the need for the termination rule to be satisfied for each component and (ii) a correction to adjust for multiple testing. As demonstrated in our examples, uncorrected methods have less than desired coverage probabilities even for problems with small p. For large p uncorrected methods provide confidence regions with coverage probabilities very close to zero. Thus, a correction is essential when using univariate methods. References Atinson, Q. D., Gray, R. D., and Drummond, A. J. (2008). mtdna variation predicts population size in humans and reveals a major southern asian chapter in human prehistory. Molecular Biology and Evolution, 25(2):
30 Beres, I. and Philipp, W. (1979). Approximation theorems for independent and wealy dependent random vectors. The Annals of Probability, 7: Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New Yor. Charnes, J. M. (1995). Analyzing multivariate output. In Proceedings of the 27th conference on Winter simulation, pages IEEE Computer Society. Chen, D.-F. R. and Seila, A. F. (1987). Multivariate inference in stationary simulation using batch means. In Proceedings of the 19th conference on Winter simulation, pages ACM. Cowles, M. K. and Carlin, B. P. (1996). Marov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91: Csörgő, M. and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic Press. Damerdji, H. (1991). Strong consistency and other properties of the spectral variance estimator. Management Science, 37: Damerdji, H. (1994). Strong consistency of the variance estimator in steady-state simulation output analysis. Mathematics of Operations Research, 19: Damerdji, H. (1995). Mean-square consistency of the variance estimator in steady-state simulation output analysis. Operations Research, 43(2): Dehling, H. and Philipp, W. (1982). Almost sure invariance principles for wealy dependent vector-valued random variables. The Annals of Probability, 10: Drummond, A. J., Ho, S. Y., Phillips, M. J., Rambaut, A., et al. (2006). Relaxed phylogenetics and dating with confidence. PLoS biology, 4(5):699. Eberlein, E. (1986). On strong invariance principles under dependence assumptions. The Annals of Probability, 14:
Applicability of subsampling bootstrap methods in Markov chain Monte Carlo
Applicability of subsampling bootstrap methods in Markov chain Monte Carlo James M. Flegal Abstract Markov chain Monte Carlo (MCMC) methods allow exploration of intractable probability distributions by
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationPackage mcmcse. July 4, 2017
Version 1.3-2 Date 2017-07-03 Title Monte Carlo Standard Errors for MCMC Package mcmcse July 4, 2017 Author James M. Flegal , John Hughes , Dootika Vats ,
More informationarxiv: v1 [stat.co] 21 Mar 2014
A practical sequential stopping rule for high-dimensional MCMC and its application to spatial-temporal Bayesian models arxiv:1403.5536v1 [stat.co] 21 Mar 2014 Lei Gong Department of Statistics University
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationMarkov Chain Monte Carlo: Can We Trust the Third Significant Figure?
Markov Chain Monte Carlo: Can We Trust the Third Significant Figure? James M. Flegal School of Statistics University of Minnesota jflegal@stat.umn.edu Murali Haran Department of Statistics The Pennsylvania
More informationA GENERAL FRAMEWORK FOR THE ASYMPTOTIC VALIDITY OF TWO-STAGE PROCEDURES FOR SELECTION AND MULTIPLE COMPARISONS WITH CONSISTENT VARIANCE ESTIMATORS
Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. A GENERAL FRAMEWORK FOR THE ASYMPTOTIC VALIDITY OF TWO-STAGE PROCEDURES
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationLabor-Supply Shifts and Economic Fluctuations. Technical Appendix
Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January
More informationKazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract
Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationspbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models
spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models Andrew O. Finley 1, Sudipto Banerjee 2, and Bradley P. Carlin 2 1 Michigan State University, Departments
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,
More informationThe Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS
The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in
More informationEcon 424 Time Series Concepts
Econ 424 Time Series Concepts Eric Zivot January 20 2015 Time Series Processes Stochastic (Random) Process { 1 2 +1 } = { } = sequence of random variables indexed by time Observed time series of length
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationSpatio-temporal precipitation modeling based on time-varying regressions
Spatio-temporal precipitation modeling based on time-varying regressions Oleg Makhnin Department of Mathematics New Mexico Tech Socorro, NM 87801 January 19, 2007 1 Abstract: A time-varying regression
More informationTAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω
ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.
More informationHierarchical Modelling for Univariate Spatial Data
Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationOutput analysis for Markov chain Monte Carlo simulations
Chapter 1 Output analysis for Markov chain Monte Carlo simulations James M. Flegal and Galin L. Jones (October 12, 2009) 1.1 Introduction In obtaining simulation-based results, it is desirable to use estimation
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing
More informationComputer intensive statistical methods
Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university The two stage Gibbs sampler If the conditional distributions are easy to sample
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationWrapped Gaussian processes: a short review and some new results
Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBlock Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration
Block Gibbs sampling for Bayesian random effects models with improper priors: Convergence and regeneration Aixin Tan and James P. Hobert Department of Statistics, University of Florida May 4, 009 Abstract
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationBayesian Dynamic Linear Modelling for. Complex Computer Models
Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationComparing Non-informative Priors for Estimation and Prediction in Spatial Models
Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial
More informationBayesian spatial hierarchical modeling for temperature extremes
Bayesian spatial hierarchical modeling for temperature extremes Indriati Bisono Dr. Andrew Robinson Dr. Aloke Phatak Mathematics and Statistics Department The University of Melbourne Maths, Informatics
More informationMALA versus Random Walk Metropolis Dootika Vats June 4, 2017
MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationStochastic process for macro
Stochastic process for macro Tianxiao Zheng SAIF 1. Stochastic process The state of a system {X t } evolves probabilistically in time. The joint probability distribution is given by Pr(X t1, t 1 ; X t2,
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationA Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data
A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine
More informationPractical unbiased Monte Carlo for Uncertainty Quantification
Practical unbiased Monte Carlo for Uncertainty Quantification Sergios Agapiou Department of Statistics, University of Warwick MiR@W day: Uncertainty in Complex Computer Models, 2nd February 2015, University
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More information1 Random walks and data
Inference, Models and Simulation for Complex Systems CSCI 7-1 Lecture 7 15 September 11 Prof. Aaron Clauset 1 Random walks and data Supposeyou have some time-series data x 1,x,x 3,...,x T and you want
More informationOnline Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve
Online Appendix for: A Bounded Model of Time Variation in Trend Inflation, NAIRU and the Phillips Curve Joshua CC Chan Australian National University Gary Koop University of Strathclyde Simon M Potter
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationRecall that the AR(p) model is defined by the equation
Estimation of AR models Recall that the AR(p) model is defined by the equation X t = p φ j X t j + ɛ t j=1 where ɛ t are assumed independent and following a N(0, σ 2 ) distribution. Assume p is known and
More informationComparing Non-informative Priors for Estimation and. Prediction in Spatial Models
Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationBayesian Computation in Color-Magnitude Diagrams
Bayesian Computation in Color-Magnitude Diagrams SA, AA, PT, EE, MCMC and ASIS in CMDs Paul Baines Department of Statistics Harvard University October 19, 2009 Overview Motivation and Introduction Modelling
More informationModel Selection for Geostatistical Models
Model Selection for Geostatistical Models Richard A. Davis Colorado State University http://www.stat.colostate.edu/~rdavis/lectures Joint work with: Jennifer A. Hoeting, Colorado State University Andrew
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationBayesian Dynamic Modeling for Space-time Data in R
Bayesian Dynamic Modeling for Space-time Data in R Andrew O. Finley and Sudipto Banerjee September 5, 2014 We make use of several libraries in the following example session, including: ˆ library(fields)
More informationarxiv: v1 [stat.co] 18 Feb 2012
A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania
More informationChapter 3 - Temporal processes
STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS
ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More information