Robust Deviance Information Criterion for Latent Variable Models

Size: px
Start display at page:

Download "Robust Deviance Information Criterion for Latent Variable Models"

Transcription

1 Robust Deviance Information Criterion for Latent Variable Models Yong Li Renmin University of China Tao Zeng Singapore Management University Jun Yu Singapore Management University February 15, 014 Abstract It is shown in this paper that the data augmentation technique undermines the theoretical underpinnings of the deviance information criterion DIC, a widely used information criterion for Bayesian model comparison, although it facilitates parameter estimation for latent variable models via Markov chain Monte Carlo MCMC simulation. Data augmentation makes the likelihood function non-regular and hence invalidates the standard asymptotic arguments. A robust form of DIC, denoted as RDIC, is advocated for Bayesian comparison of latent variable models. RDIC is shown to be a good approximation to DIC without data augmentation. While the later quantity is difficult to compute, the expectation maximization EM algorithm facilitates the computation of RDIC when the MCMC output is available. Moreover, RDIC is robust to nonlinear transformations of latent variables and distributional representations of model specification. The proposed approach is applied to several popular models in economics and finance. While DIC is very sensitive to the nonlinear transformations of latent variables in these models, RDIC is robust to these transformations. As a result, substantial discrepancy has been found between DIC and RDIC. JEL classification: C11, C1, G1 Keywords: AIC; DIC; EM Algorithm; Latent variable models; Markov Chain Monte Carlo. 1 Introduction One of the most important developments in the Bayesian literature in recent years is arguably the deviance information criterion DIC of Spiegelhalter et al. 00. DIC is a Bayesian We wish to thank Peter Phillips and David Spiegelhalter for their helpful comments. Yong Li, Hanqing Advanced Institute of Economics and Finance, Renmin University of China, Beijing, , P.R. China. Tao Zeng, School of Economics and Sim Kee Boon Institute for Financial Economics, Singapore Management University, 90 Stamford Road, Singapore Jun Yu,School of Economics and Lee Kong Chian School of Business. for Jun Yu: yujun@smu.edu.sg. URL: Yu thanks the Singapore Ministry of Education for Academic Research Fund under grant number MOE011-T

2 version of the well known Akaike Information Criterion AIC Akaike Like AIC, it trades off a measure of model adequacy against a measure of complexity and is concerned with how replicate data predict the observed data. DIC is constructed based on the posterior distribution of the log-likelihood or the deviance, and has several desirable features. Firstly, DIC is simple to calculate when the likelihood function is available in closed-form and the posterior distributions of the models are obtained by Markov chain Monte Carlo MCMC simulation. Secondly, it is applicable to a wide range of statistical models. Third, unlike Bayes factors BFs, it is not subject to Jeffery-Lindley s paradox. An important class of models in economics and finance involves latent variables. Latent variables have figured prominently in consumption decision, investment decision, labor force participation, conduct of monetary policy, indices of economic activity, inflation dynamics and other economic, business and financial activities and decisions. For example, one important class of latent variable models is the state space models, in which the state variable is latent. It provides a unified methodology for treating a wide range of problems in time series analysis. Another example can be found in the values of stocks, bonds, options, futures, and derivatives, which are often determined by a small number of factors. Often these factors, such as the level, the slope and the curvature in the term structure of interest rates, are not observed. In macroeconomics, a well-known recent example of latent variable models is the dynamic stochastic general equilibrium DSGE model. On the basis of macroeconomic theory, the DSGE model attempts to explain aggregate economic phenomena by taking into account the fact that the economy is affected by some structural innovations. The DSGE model can be solved as a rational expectation system in the percentage deviation of variables from their steady-states which are latent An and Schorfheide 007 and Dejong and Dave 007. In microeconometrics, many discrete choice models and panel data models involve unobserved variables in order to capture observed heterogeneity across economic entities Stern For latent variable models, Bayesian methods via MCMC simulation have proven to be a powerful alternative to frequentist methods for estimating model parameters. In particular, the data augmentation strategy proposed by Tanner and Wong 1987, that expands the parameter space by treating the latent variables as additional model parameters, has been found very useful for simplifying the MCMC computation of posterior distributions. This simplification is achieved because data augmentation leads to a closed-form expression for the likelihood function. Comparing alternative latent variable models in the Bayesian paradigm is a daunting and yet important task. The gold standard to carry out Bayesian model comparison is to compute BFs, which basically compare marginal likelihood of alternative models Kass and Raftery Several interesting developments have been made in recent years for computing marginal likelihood from the MCMC output; see for example, Chib 1995, Chib and Jeliazkov 001. While these methods are very general and widely applicable, for latent variable models,

3 they are difficult to use because the marginal likelihood may be hard to calculate. In addition, BFs cannot be used under improper priors and are subject to the Jeffrey-Lindley paradox. Given that DIC is simple to calculate from the MCMC output with the data augmentation technique and also that data augmentation is often used for Bayesian parameter estimation, DIC has been widely used for comparing alternative latent variable models; see for example, Berg et al. 004, Huang and Yu 010. The first contribution of this paper is that we argue DIC has to be used with care in the context of latent variable models. In particular, we believe DIC, in the way it is commonly implemented in practice, has some conceptual and practical problems. Firstly, DIC requires a concrete focus which is often not easily identified in practice. If the focus cannot be identified, using DIC violates the likelihood principle; see Gelfand and Trevisani 00. Secondly, DIC is not robust to apparently innocuous transformations and distributional representations. This problem is made worse by the data augmentation technique for latent variable models. Data augmentation greatly inflates the number of parameters and hence the effective number of parameter used in DIC is very sensitive to transformations and distributional representations. The detail will be explained in Section 3. Finally, DIC requires that the likelihood function has a closed form expression for it to be computationally operational. For latent variable models, this is achieved by data augmentation and, as a consequence, DIC opens up to possible variations. It is unclear which variation should be used in practice; see Celeux et al. 006 for further discussion of this problem. In this paper we argue that although data augmentation leads to a likelihood function in closed-form and greatly facilitates parameter estimation, DIC should NOT be calculated based on the new likelihood associated with data augmentation. The reason is that data augmentation makes the likelihood function non-regular and hence invalidates the standard asymptotic arguments. Consequently, it undermines the theoretical underpinnings of DIC. The source of the problem is data augmentation. With data augmentation, a closedform expression for likelihood is ensured and it is easy to compute DIC, but the asymptotic justification of DIC is invalid. Without data augmentation, the likelihood function does not have a closed form expression and hence DIC is much harder to compute for latent variable models, although it is asymptotically justified. The second contribution of this paper is that we advocate the use of a robust version of DIC, denoted by RDIC, to make Bayesian comparison of latent variable models. It is shown that RDIC is a good approximation to DIC without data augmentation and hence is theoretically justified. We then show that the expectation maximization EM algorithm facilitates the computation of RDIC for latent variable models when the MCMC output is available. Moreover, RDIC is robust to nonlinear transformations of latent variables and to distributional representations of model specification. The advantages of the proposed approach are illustrated using two popular models in 3

4 economics and finance, including a class of dynamic factor models and a class of stochastic volatility models. It is shown that DIC is very sensitive to the nonlinear transformations of latent variables in these models, whereas RDIC is robust to these transformations. As a result, substantial discrepancy is found between DIC and RDIC. The paper is organized as follows. In Section, the latent variable models are introduced. The Bayesian estimation method with data augmentation and the EM algorithm are also reviewed. Section 3 reviews DIC, introduces and justifies RDIC for latent variable models, and discusses how to compute RDIC from the MCMC output. Section 4 illustrates the method using models from economics and finance. Section 5 concludes the paper. The Appendix collects the proof of the theoretical results in the paper. Latent Variable Models, EM Algorithm and MCMC Let y = y 1, y,, y n denote observed variables and z = z 1, z,, z n the latent variables. The latent variable model is indexed by the a set of P parameters, θ = θ 1,..., θ P. Let py θ be the likelihood function of the observed data denoted the observed-data likelihood, and py, z θ be the complete-data likelihood function. The relationship between the two functions is: py θ = py, z θdz. 1 In many cases, the integral does not have a closed-form solution. Consequently, statistical inferences, such as estimation and model comparison, are difficult to make. In the literature, maximum likelihood ML analysis using the EM algorithm and Bayesian analysis using MCMC are two popular approaches for carrying out statistical inference of the latent variable models..1 Maximum likelihood via the EM algorithm The EM algorithm is an iterative numerical method for finding the ML estimates of θ in the latent variable models. It has been widely used in applications since Dempster et al gave its name and did the convergence analysis. In this subsection, we briefly review the main idea of the EM algorithm. For more details, one can refer to McLachlan and Krishnan 008. Let x = y, z be the complete data with a density px θ parameterized by a P -dimensional parameter vector θ Θ R P. The observed-data log-likelihood L o y θ = ln py θ often involves some intractable integral, preventing researchers from directly optimizing L o y θ with respect to θ. In many cases, however, the complete-data log-likelihood L c x θ = ln px θ has a closed-form expression. Instead of maximizing L o y θ directly, the EM algorithm maximizes Qθ θ r, the conditional expectation of the complete-data log-likelihood function L c x θ given the observed data y and a current fit θ r of the parameter. 4

5 Generally, a standard EM algorithm has two steps: the expectation E step and the maximization M step. The E-step evaluates Qθ θ r = E z L c x θ y, θ r }, where the expectation is taken with respect to the conditional distribution pz y, θ r. The M- step determines a θ r+1 that maximizes Qθ θ r. Under some mild regularity conditions, the sequence θ r } obtained from the EM iterations converges to the ML estimate ˆθ; see Dempster et al and Wu 1983 for details on the convergence properties of θ r }.. Bayesian analysis using MCMC Although the EM algorithm is a reasonable statistical approach for analyzing latent variable models, the numerical optimization in the M-step is often unstable. This numerical problem worsens as the dimension of θ increases. It is well recognized that Bayesian methods using MCMC provide a powerful tool to analyze the latent variables models. However, if the posterior analysis is conducted from the observed-data likelihood, py θ, one would end up with the same problem as in the ML method as py θ does not have a closed-form expression. The novelty in the Bayesian methods is to treat the latent variable model as a hierarchical structure of conditional distributions, namely, py z, θ, pz θ, and pθ. In other words, one can use the data augmentation strategy of Tanner and Wong 1987 to expand the parameter space from θ to θ,z. The advantage of data augmentation is that the Bayesian analysis is now based on the new likelihood function, py θ,z which often has a closed-form expression. Then the Gibbs sampler and other MCMC samplers can be used to generate random samples from the joint posterior distribution pθ, z y. After a sufficiently long period for a burningin phase, the simulated random samples can be regarded as random observations from the joint distribution. The statistical analysis can be established on the basis of these simulated posterior random observations. As a by-product to the Bayesian analysis, one also obtains Markov chains for the latent variables z and hence statistical inference can be made about z. For further details on Bayesian analysis of latent variable models via MCMC, including algorithms, examples and references, see Geweke et al From the above discussion, it can be seen that data augmentation is the key technique for Bayesian estimation of latent variable models. Two observations are in order. First, with data augmentation, the parameter space is much bigger. More often than not, the dimension of the space increases as the number of observations increases and is larger than the number of observations. In the latter case, the new likelihood function becomes non-regular. Second, it is difficult to argue that the latent variables can be always treated as the model parameters. Models parameters are typically fixed but the latent variables are often time varying. Consequently, the same treatment of 5

6 these two types of variables does not seem to be justifiable from the perspective of model selection. 3 Bayesian Comparison of Latent Variable Models 3.1 DIC Spiegelhalter et al. 00 proposed DIC for Bayesian model comparison. The criterion is based on the deviance Dθ = ln py θ, and takes the form of DIC = Dθ + P D. 3 The first term, used as a Bayesian measure of model fit, is defined as the posterior expectation of the deviance, that is, Dθ = E θ y Dθ = E θ y ln py θ. The better the model fits the data, the larger the log-likelihood value and hence the smaller the value for Dθ. The second term, used to measure the model complexity and also known as effective number of parameters, is defined as the difference between the posterior mean of the deviance and the deviance evaluated at the posterior mean of the parameters: P D = Dθ D θ = ln py θ ln py θpθ ydθ, 4 where θ is the Bayesian estimator, and more precisely the posterior mean, of the parameter θ. Here, P D can be explained as the expected excess of the true over the estimated residual information conditional on data y. In other words, P D can be interpreted as the expected reduction in uncertainty due to estimation. DIC can be rewritten by two equivalent forms: DIC = D θ + P D, 5 and DIC = Dθ D θ = 4E θ y ln py θ + ln py θ. 6 DIC defined in Equation 5 bears similarity to AIC of Akaike 1973 and can be interpreted as a classical plug-in measure of fit plus a measure of complexity. In Equation 3 the Bayesian measure, Dθ, is the same as D θ + P D which already includes a penalty term for model complexity and thus could be better thought of as a measure of model adequacy rather than pure goodness of fit. 6

7 Remark 3.1 The asymptotic justification of DIC requires that the candidate models nest the true model and that the posterior distribution is approximately normal. These two requirements parallel to those in AIC where the candidate models nest the true model and the ML estimator is asymptotically normally distributed. To see the importance of the asymptotic normality, Spiegelhalter et al. 00 show that, when the prior is noninformative, P D is approximately the same as the number of parameters, P. In this case DIC is explained as Bayesian version of AIC. However, if the asymptotic normality does not hold true, P D cannot be approximated by P and DIC is not the Bayesian version of AIC. Furthermore, the information-theoretical explanation of DIC requires the asymptotic normality of the Bayesian posterior to be held true. Remark 3. If py θ has a closed-form expression, DIC is trivially computable from the MCMC output. This is in sharp contrast to BFs and some other model selection criteria within the classical framework. The computational tractability, together with the versatility of MCMC and the fact that DIC is incorporated into a Bayesian software, WinBUGS, allows DIC to enjoy a very wide range of applications. 1 However, if py θ is not available in closed form, such as in random effects models and state space models, computing DIC may become infeasible, or at least, very time consuming. Remark 3.3 When an information criterion is used for model selection, the degrees of freedom are typically used to measure the model complexity. In the Bayesian framework, the prior information almost always imposes additional restrictions on the parameter space and hence the degrees of freedom may be reduced by the prior information. A useful contribution of DIC is to provide a way to measure the model complexity when the prior information is incorporated; see Brooks 00. Remark 3.4 Unlike BFs that address how observed data are predicted by the priors, DIC addresses how well the posterior might predict future data generated by the same mechanism that gave rise to the observed data Spiegelhalter et al. 00. This predictive perspective for selecting a good model is important to many practical business, economic, and financial decisions. For latent variable models, depending on whether or not the latent variables are treated as parameters, Celeux et al. 006 gave different ways to define DIC and classified them in three categories. Based on the observed likelihood py θ, the first category of DIC can be 1 As of July 8, 01, Spiegelhalter et al. 00 has been cited 3396 times according to Google Scholar and 1,984 time according to Science Citation Index. 7

8 defined as DIC 1 = 4E θ y ln py θ + ln p y θy, DIC = 4E θ y ln py θ + ln p y ˆθy, } DIC 3 = 4E θ y ln py θ + ln E θ y py θ, where θy and ˆθy are the posterior mean and the posterior mode, respectively. Based on the complete likelihood py, z θ, the second category of DICs can be defined as DIC 4 = 4E θ,z y ln py, z θ + E z y ln p y, z E θ y,z θ y, z, DIC 5 = 4E θ,z y ln py, z θ + ln p y, ẑy ˆθy, DIC 6 = 4E θ,z y ln py, z θ + E ln p y, z ˆθy, z y,ˆθy where in DIC 5, z is treated as parameters and ẑy and ˆθy are the joint Bayesian estimators, such as the joint maximum a posteriori MAP estimators of z, θ; in DIC 6, ˆθy is an estimator of θ based on the posterior distribution pθ y. as Based on the conditional likelihood py z, θ, the third category of DICs can be defined DIC 7 = 4E θ,z y ln py z, θ + ln p y ẑy, ˆθy, DIC 8 = 4E θ,z y ln py z, θ + E z y ln p y z, ˆθy, z, where again, in DIC 7, the latent variable z is treated as parameters so that ẑy and ˆθy are the joint Bayesian estimator, such as the joint maximum a posteriori MAP estimator of the pair z, θ; in DIC 8, ˆθy, z is estimator of θ based on py, z θ. Remark 3.5 To compute DIC 1, DIC, DIC 3, it is generally required that the observed likelihood py θ is available in closed form. However, for latent variable models, such as statespace models, including linear Gaussian state space models, the observed-data likelihood py θ is not available in closed form. In this case, computing these DICs from the MCMC output is time consuming or even infeasible since py θ has to be computed at each draw from the Markov chain. In this case, DIC is particularly hard to compute as it needs the maximum likelihood estimator ˆθy. DIC 4 requires the computation of a posterior expectation for each value of z. Consequently, the computational cost is too high in many latent variable models. The definition of DIC 5 is inconsistent in the sense that the first component treats z as latent variables while the second component treats z as parameters. In DIC 6 the P D is not guaranteed For linear Gaussian state space models, to do ML, the Kalman filter can be used to obtain the likelihood function numerically. Numerically more efficient algorithms have been developed in the recent literature; see for example, Chan and Jeliazkov

9 to be positive. Moreover, as argued in Celeux et al. 006, ẑy is often a terrible estimator and so are DIC 4, DIC 5, DIC 6. In DIC 7 the latent variable is regarded as parameters in both components and is easy to compute. As a result, DIC 7 is the default information criterion for comparing latent variable models and is implemented and reported in WinBUGS, following the suggestion of Spiegelhalter et al. 00. Examples that use DIC 7 in applications include Berg et al. 004 and Wang et al Clearly this choice of defining DIC is simple for computational convenience. In DIC 8, the estimator of ˆθy, z is very difficult to obtain. Remark 3.6 From a theoretical viewpoint, DIC 7 has a couple of serious problems. Firstly, due to the data augmentation, the number of the latent variables often increases with the sample size in latent variable models, causing the problem of a non-regular likelihood-based statistical inference; see Gelman 003. This invalidates the asymptotic justification of DIC because the standard asymptotic theory derived from regular likelihood is not applicable to nonregular likelihood. Secondly, if the latent variable can be treated as parameters, an incoherent inference problem will result. That is, when one model can be rewritten as distributional representation of another model with latent variables and the same prior is used in the two models, the different DIC values can be obtained. A simple example is the student-t distribution which can be rewritten as a normal-gamma scale mixture representation. As to this case, in Section 8. of Spiegelhalter et al. 00, where Models 4 and 5 are predictively identical but their DIC values are quite different. The same difficulty also shows up in Model 8 of Berg et al Thirdly, when the latent variables are discrete, such as component indicators in Markov switching models, generally, Bayesian estimator is not a discrete value which can cause some logic problems. Fourthly, due to the data augmentation, the dimension of the parameter space becomes larger and hence we expect that DIC 7 is very sensitive to transformations of latent variables. To illustrate the last problem, we consider a simple transformation of latent variables in the well-known Clark model Clark 1973 which is given by, Model 1 : y t Nµ, exph t, h t N0, σ, t = 1,, n. 7 An equivalent representation of the model is Model : y t Nµ, σt, σt LN0, σ, t = 1,, n, 8 where LN denotes the log-normal distribution. In Model the latent variable is the volatility σt, while the latent variable is the logarithmic volatility h t = log σt in Model 1. Suppose the parameters of interest are µ and σ. With the same focus, the two models are identical and hence are expected to have the same DIC and P D. To calculate the P D component in DIC 7, we simulate 1000 observations from the model with µ = 0, σ = 0.5. Vague priors are 9

10 selected for the two parameters, namely, µ N0, 100, σ Γ0.001, We run Gibbs sampler to make 40,000 simulated draws from the posterior distributions. The first 40,000 are discarded as burn-in samples. The remaining observations with every 10th observation are collected as effective observations for statistical inference. With the data augmentation, the latent variables, h t and σt are regarded as parameters, and we find that P D = for Model 1 but P D = for Model. The difference is very significant. Given that we have the identical models and priors, and use the same dataset, the vast difference suggests that DIC 7 and the corresponding P D are very sensitive to transformations of latent variables. For latent variable models, DIC 1 or DIC or DIC 3 does not suffer from the same theoretical problems as DIC 7. However, computing DIC 1 from the MCMC output is much harder, often infeasible, since py θ is not available in closed-form and computing E θ y log py θ necessitates numerical calculation of py θ at each draw from the Markov chain. To summarize the problems with DIC in the context of latent variable models, while DIC 7 is trivial to calculate but cannot be theoretically justified, DIC 1 is theoretically justified but infeasible to compute. 3. RDIC In this section we introduce a robust version of DIC, denoted as RDIC, as follows RDIC = D θ + tr I θv θ } = D θ + PD, 9 where P D = tr I θv θ }, 10 with tr denoting the trace of a matrix and, Iθ = log py θ θ θ θ, V θ = E θ θ θ y. Interestingly, in Equation 15 on Page 590 Spiegelhalter et al. 00 obtained the expression for P D and claimed that P D approximates the P D component in DIC 1. Unfortunately, to the best of our knowledge, PD has never been implemented in practice and WinBUGS does not report PD. Moreover, the proof of P D P D was not given in Spiegelhalter et al. 00. The conditions under which P D P D holds true were not specified. The order of the approximation remains unknown. To justify the choice of RDIC, we will have to establish conditions under which we can show that RDIC approximates DIC 1 and P D approximates P D that corresponds to DIC 1 with a known order of magnitude. We then show that how the EM algorithm facilitates the computation of RDIC from the MCMC output for latent variable models. Let L n θ = log pθ y, L 1 n θ = log pθ y/ θ, L n θ = log pθ y/ θθ. In this paper, we impose the following regularity conditions. 10

11 Assumption 1: There exists a finite sample size n, for n > n, there is a local maximum at ˆθ m so that L 1 n ˆθ m = 0 and L n ˆθ m is a negative definite matrix. Obviously, ˆθ m is the posterior mode and L n ˆθ m /n = O p 1. Assumption : Moreover, the largest eigenvalue of n. L n ˆθ m 1, σ n, goes to zero when Assumption 3: For any ɛ > 0, there exists an integer n } and some δ > 0 such that for any n > maxn, n } and θ H ˆθ m, δ = θ : θ ˆθ m δ, L n θ exists and satisfies where I P Aɛ L n θl n ˆθ m I P Aɛ, is a P P identity matrix, Aɛ a P P semi-definite symmetric matrix whose largest eigenvalue goes to zero as ɛ 0. Assumption 4: For any δ > 0, as n, pθ ydθ 0, where Θ is the support of θ. Θ Hˆθ m,δ Assumption 5: Both the first moment and the second moment exist for pθ y. Assumption 6: For all θ Θ, the prior of θ is O p 1. Assumption 7: The data generating process is stationary and the model is regular so that the standard maximum likelihood theory can be applied. Assumption 8: For any given θ 0 Θ and y from the same data generating process, there exists a positive number c and a function M y both of which may depend on θ 0 such that 1 n log py θ My, where θ 0 c < θ < θ 0 + c and E θ0 My <. Lemma 3.1 Under Assumptions 1-5, conditional on the observed data y, we have θ = E θ y = ˆθ m + o p n 1/, V ˆθ m = E θ ˆθ m θ ˆθ m y = L n ˆθ m + o p n 1. Remark 3.7 Lemma 3.1 establishes Bayesian large sample theory. The regularity conditions 1-4 have been used in the literature to develop Bayesian large sample theory for stationary and nonstationary dynamic models and nondynamic models; see, for example, Chen 1985, Kim 1994, Kim 1998, Geweke 005. The Bayesian large sample theory was also developed from different sets of regularity conditions in different contexts. For example, Ghosh and Ramamoorthi 003 developed the asymptotic posterior normality and Lemma 3.1 in the iid case. 11

12 Theorem 3.1 Under Assumptions 1-6, it can be shown that, where P D is defined in 4. P D = P D + o p 1, DIC 1 = RDIC + o p 1, Remark 3.8 Theorem 3.1 improves Equation 15 Spiegelhalter et al. 00 in two ways. First, it gives the order of the approximation errors. Second, it specifies the conditions under which P D approximates P D and DIC 1 approximates RDIC. Remark 3.9 As DIC 1 is theoretically justified for the latent variable models, Theorem 3.1 justifies RDIC asymptotically since RDIC and DIC 1 are asymptotically equivalent. Remark 3.10 RDIC maintains all the good features of DIC 1. For example, informative prior impose restrictions on the parameter space so that the degrees of freedom of the model are reduced. Hence, RDIC can incorporate the prior information when measuring the model complexity. Following Spiegelhalter et al. 00, we get I ˆθ m } log pθ y = θ θ log pθ θ θ } = L θ=ˆθ n ˆθ m log pθ m θ θ. θ=ˆθ m Under Assumption 1-5, following Lemma 3.1 and the proof of Theorem 3.1, we get } PD = tr Iˆθ m V θ = tr = tr L n ˆθ m L n = P tr + o p 1 log pθ θ θ } ˆθ m V θ tr log pθ θ θ θ=ˆθ m } } V θ + o p 1 θ=ˆθm } log pθ θ θ V θ + o θ=ˆθ p 1 m } V θ + o p From 11, it can be seen clearly that the prior information can reduce the model complexity. Remark 3.11 Like DIC 1, RDIC is justified by the standard Bayesian large sample theory. When the Bayesian large sample theory is not available, RDIC is not justified. These include models in which the number of the parameters increases with the sample size, under-identified models, models with an unbounded likelihood, and models with improper posterior distributions. For more details about the standard Bayesian large sample theory, see Gelman 003 and Geweke 005. For the latent variable models, since the number of the latent variables increases with sample size, the standard Bayesian large sample theory is not applicable if the data augmentation technique is used. As a result, when calculating RDIC, data augmentation should NOT be used. 1

13 Remark 3.1 Since RDIC is defined from the observed-data likelihood py θ, there is no need to specify a focus, and hence, RDIC does not suffer from the incoherent inference problem. Remark 3.13 For the latent variable models, while the number of the model parameters P is fixed and usually not so big, the number of the latent variables increases as the sample size increases. In the definition of RDIC, the latent variables are not regarded as the parameters. Consequently, the problem of parameter transformation is less serious. For example, in the Clark model, with the same setting as before, we get P D = 1.75 for Model 1 and P D = 1.80 for Model. There is no significant difference between them. Moreover, these two values are close to, that is the actual number of parameters. This is what we expected given that the vague priors are used and hence PD P =. The difference between P D and P arises due to the simulation error and the prior. Remark 3.14 An obvious computational advantage in RDIC is that PD does not involve inverting a matrix. This advantage is not so important when the latent variable model has only a small number of parameters. However, for high dimensional latent variable models where there are many parameters, this computational advantage may be important. We now consider the justification DIC and RDIC from an information-theoretic perspective. As in AIC, let y rep = y 1,rep, y,rep,, y n,rep be the independent replicate data generated by the same mechanism that gives rise to the observed data y, i.e., py rep = py. In the literature, the KL divergence is used to describe the difference between two models given by: KLpx, qx = px log px qx dx. Hence, from the information-theoretic perspective, when using the fitted y in Model M to predict y rep, on the basis of the KL divergence, a simple loss function can be chosen as = KLpy rep θ, p y rep θy = log py rep θpy rep θdy rep log py rep θ p y rep θy py rep θdy rep log p y rep θy py rep θdy rep. 1 Then, a posterior loss function is given by Ly rep, y = KLpy rep θ, p y rep θy pθ ydθ. 13 Hence, the criterion for model selection is to choose a model to minimize E y Ly rep, y = Lypydy. 13

14 Remark 3.15 In Spiegelhalter, et al 00, Page 604, Ly rep, y was chosen to be log p y rep θy and they showed that E y log p y rep θy } py rep θdy rep pθ ydθ E y DIC 1. However, their derivation is heuristic in the sense that no rigorous proof is provided. However, we can see from 1 and 13 that log py rep θpy rep θdy rep is not the same across the different models. Hence, it is difficult to justify DIC on the basis of the loss function log p y rep θy. A more rigorous justification of DIC is needed. Consider the predictive distribution py rep y = py rep θpθ ydθ. The KL loss function on the basis of this predictive distribution is KL py rep, py rep y = log py rep py rep y dy rep. Since log py rep dpy rep dy rep is the same for all the models, we can choose the loss function as Ly rep, y = log p y rep y. We then propose to choose a model to minimize the following risk function E y E yrep Ly rep, y = Ly rep, ypy rep pydy rep dy. The following theorem provides the justification of RDIC and DIC from the informationtheoretical viewpoint. Theorem 3. Under Assumptions 1-8, it can be shown that E y E yrep Ly rep, y = E y DIC 1 + o1 = E y RDIC + o1. Remark 3.16 According to the proof of Theorem 3., P D = P D + o p1 = P + o p 1. Consequently, DIC 1 = RDIC + o p 1 = log py ˆθ + P + o p 1 = AIC + o p 1. Namely, both RDIC and DIC 1 can be regarded as the Bayesian version of AIC. Remark 3.17 Both RDIC and DIC 1 are an unbiased estimator of the risk function asymptotically. Remark 3.18 Like DIC 1, RDIC addresses how well the posterior may predict future data generated by the same mechanism that gives rise to the observed data. This posterior predictive feature could be appealing in many applications. 14

15 Remark 3.19 Like AIC, both DIC 1 and RDIC require the candidate models nest the true model. This is of course a strong assumption. Under the iid case, Ando and Tsay 010 relaxed this assumption and obtained a predictive likelihood information criterion BPIC that minimizes the loss function η = E y E yrep log py rep y. The estimator of η is given by ˆη = log py rep y yrep=y + 1 I tr 1 ˆθJˆθ, where Iθ and Jθ are the Hessian matrix and the Fisher information matrix. In Ando 007, another BPIC was given as BP IC = log py ˆθ + tri 1 ˆθJˆθ + P/. Ando 007 showed that BPIC is an estimator of the loss function E y E yrep log py rep θpθ ydθ. Like TIC of Takeuchi 1976, these two information criteria involve the inverse of Hessian matrix which is numerically changing when the dimension of the parameter space is large. This is one of the reasons why TIC has not been widely used in practice. Furthermore, the derivation of these two information criteria requires the data be iid. For data in economic and finance, this requirement is often too restrictive. In addition, for many latent variable models, the maximum likelihood estimator, the Hessian matrix and the Fisher information matrix are difficult to obtain. How to develop a good information criterion for comparing latent variable models, without assuming the candidate models nest the true model, will be pursued in future research. Remark 3.0 For unit root models, Kim 1994 and Kim 1998 showed that the asymptotic normality of posterior distribution can be established under Assumptions 1-4. Hence, Lemma 3.1 holds true for unit root models. However, to develop Theorem 3., the standard maximum likelihood asymptotic theory is required. Hence, Theorem 3. may not be applicable to models with a unit root or explosive root. The topic on comparing non-stationary models will be pursued in future studies. Within the classical framework, Phillips and Ploberger 1996 and Phillips 1996 have proposed model selection criteria for models without latent variables. Remark 3.1 If the observed-data likelihood function, py θ, does not have a closed-from expression, its second derivative, log py θ/ θ θ and hence RDIC will be difficult to compute. Some general methods such as Kalman filter and particle filter can be used for this kind of purpose. In the following section, we show how the EM algorithm may be used to facilitate the computation of the second derivative and RDIC. 15

16 3.3 Computing RDIC by the EM algorithm The definition of RDIC clearly requires the evaluation of observed-data likelihood at the posterior mean, py θ, as well as the information matrix and the second derivative of the observed-data likelihood function. For most latent variable models, the observed-data likelihood function does not have a closed-from expression. In this section we show how the EM algorithm may be used to evaluate py θ, the second derivative of the observed-data likelihood function, and hence RDIC for the latent variable models. It is important to point out that we do not need to numerically optimize any function here as in the EM algorithm. Consequently, our method is not subject to the instability problem found in the M-step. As argued in Section.1, the main idea of EM algorithm is to replace the observed-data log-likelihood log py θ with the complete-data log-likelihood log py, z θ. Note that log py, z θ = log pz y, θ + log py θ. For any θ and θ in Θ, it was shown in Dempster et al that log py, z θpz y, θ dz = log pz y, θpz y, θ dz + log py θ. Hence, we can have the following lemma. Lemma 3. Let Hθ θ = log pz y, θpz y, θ dz, the so-called H function in the EM algorithm, we can have L o y, θ = Q θ θ H θ θ, where the Q function is defined in Equation. Following Lemma 3., the Bayesian plug-in model fit, log py θ, may be obtained as log py θ = Q θ θ H θ θ. 14 It can be seen that even when Q θ θ is not available in closed form, it is easy to evaluate from the MCMC output because Q θ θ = log py, z θpz y, θdz 1 M M m=1 log p y, z m θ. where z m, m = 1,,, M} are random observations drawn from the posterior distribution pz y, θ. For the second term in 14, if pz y, θ is a standard distribution, H θ θ can be easily evaluated from the MCMC output as H θ θ = log pz y, θpz y, θdz 1 M 16 M m=1 log p z m y, θ.

17 However, if pz y, θ is not a standard distribution, an alternative approach has to be used, depending on the specific model in consideration. We now consider two situations. First, if the complete-data y i, z i are independent with i j, and z i is of low-dimension, say 5, then a nonparametric approach may be used to approximate the posterior distribution pz y, θ. Note that Hθ θ = log pz y, θπz y, θdz = n i=1 log pz i y i, θπz i y, θdz i = n H i θ θ. The computation of H i θ θ requires an analytic approximation to pz i y i, θ which can be constructed using a nonparametric method. In particular, MCMC allows one to draw some effective samples from p z i y i, θ. Using these random samples, one can then use nonparametric techniques such as the kernel-based methods to approximate p z i y i, θ. In a recent study, Ibrahim et al. 008 suggested using a truncated Hermite expansion to approximate pz i y i, θ. As a simple illustration, we apply this method to the Clark model. When the Gaussian kernel method is used, we get log py θ = , RDIC= for Model 1 and log py θ = , RDIC= 90.4 for Model. These two sets of numbers are nearly identical. However, if the latent variable models are regarded as parameters, we get DIC 7 = for Model 1 and DIC 7 = for Model. The highly distinctive difference between them suggests that DIC 7 is not a reliable model selection criterion for the model. Note that DIC 1 is not really feasible to compute in this case. Second, for some latent variable models, the latent variables z follow a multivariate normal distribution and the observed variables y are independent, conditional on z. This class of models is referred to as the Gaussian latent variable models in the literature. In economics and finance, many latent variable models belong to this class of models, including dynamic linear models, dynamic factor models, various forms of stochastic volatility models and credit risk models. In these models, the observed-data likelihood is non-gaussian but has a Gaussian flavor in the sense that the posterior distribution, pz y, θ, may be expressed as, pz y, θ exp 1 n z V θz + log py i z i, θ. Rue et al. 004 and Rue et al. 009 showed that this type of posterior distribution can be well approximated by a Gaussian distribution that matches the mode and the curvature at the mode. The resulting approximation is known as the Laplace approximation and can be expressed as, i=1 pz y, θ exp 1 z V θ + diagcz, where c comes from the second order term in the Taylor expansion of n i=1 log py i z i at the mode of pz y, θ. The Laplace approximation may be employed to compute H θ θ. After i=1 17

18 py θ is obtained, it is easy to obtain D θ. It is important to point out that the numerical evaluation of py θ is needed only once, i.e., at the posterior mean. To compute PD, we have to calculate the second derivative of the observed-data likelihood function in 11. The following two lemmas show how to compute the second derivatives. Lemma 3.3 Under the mild regularity conditions, the observed-data information matrix may be expressed as: } Iθ = L o y θ = Qθ θ θ θ θ θ Qθ θ. 15 θ θ θ =θ Lemma 3.4 Let Sx θ = L c x θ/ θ. Under the mild regularity condition, the observeddata information matrix has an equivalent form: Iθ = L o y θ = E θ θ z y,θ = E z y,θ L c x θ θ θ L c x θ θ θ Sx θsx θ } V ar z y,θ Sx θ} 16 } + E z y,θ Sx θ}e z y,θ Sx θ}, where all the expectations are taken with respect to the conditional distribution of z given y and θ. Remark 3. Lemma 3.3 and Lemma 3.4 were developed in Oakes 1999 and Louis 198, respectively, for finding the standard error in the EM algorithm. If the Q function is available, we can use Lemma 3.3 to evaluate the second derivatives. If the Q function does not have an analytic form, we may use Lemma 3.4 to evaluate the second derivatives as follows, } E z y,θ L c x θ Sx θsx θ, θ θ } 1 M L c y, z m θ + Sy, z m θsy, z m θ, M θ θ m=1 E z y,θ Sx θ} 1 M M Sy, z m θ, m=1 where z m, m = 1,,, M} are random observations drawn from the posterior distribution pz y, θ. 4 Examples We now illustrate the proposed method in two applications. In the first example, while py θ is not available in closed-form, Kalman filter provides a recursive algorithm to evaluate it. Hence, Qθ θ and Hθ θ can be calculated in the same manner, facilitating the computation of RDIC while DIC 1 is much harder to compute. In the second example, py θ is not available in closed-form and Kalman filter cannot be applied. To compute RDIC, we use the Laplace approximation and the technique suggested in Lemma

19 4.1 Comparing high dimensional dynamic factor models For many countries, there exists a rich array of macroeconomic time series and financial time series. To reduce the dimensionality and to extract the information from the large number of time series, factor analysis has been widely used in the empirical macroeconomic literature and in the empirical finance literature. For example, by extending the static factor models previously developed for cross-sectional data, Geweke 1977 proposed the dynamic factor model for time series data. Many empirical studies, such as Sargent and Sims 1977, Giannone et al. 004, have reported evidence that a large fraction of the variance of many macroeconomic series can be explained by a small number of dynamic factors. Stock and Watson 1999 and Stock and Watson 00 showed that dynamic factors extracted from a large number of predictors can be used to lead to improvement in predicting macroeconomic variables. Not surprisingly, high dimensional dynamic factor models have become a popular tool under a data rich environment for macroeconomists and policy makers. review on the dynamic factor models is given by Stock and Watson 010. An excellent Following Bernanke et al. 005 BBE hereafter, the present paper considers the following fundamental dynamic factor model: Y t = F t L + ε t, F t = F t 1 Φ + η t, where Y t is a 1 N vector of time series variables, F t a 1 K vector of unobserved latent factors which contains the information extracted from all the N time series variables, L an N K factor loading matrix, Φ the K K autoregressive parameter matrix of unobserved latent factors. It is assumed that ε t N 0, Σ and η t N 0, Q. For the purpose of identification, Σ is assume to be diagonal and ε t and η t are assumed to be independent with each other. Following BBE 005, we set the first K K block in the loading matrix L to be the identity matrix. In this dynamic factor model, the observed variable Y t consists of a balanced panel of 10 US monthly macroeconomic time series. These series are initially transformed to induce stationarity. The description of the series and the transformation is provided in BBE 005. The sample period is from January 1959 to August 001. Because the data are of high dimension, the analysis of the dynamic factor models via a frequentist method is not trivial; see the discussion in Stock and Watson 011. In the literature, Bayesian inference via the MCMC techniques has been popular for analyzing the dynamic factor models; see Otrok and Whiteman 1998, Kose et al. 003, Kose et al. 008, BBE 005. Following BBE 005, we specify the following prior distribution: Σ ii Inverse Γ 3, 0.001, L i N 0, Σ ii M0 1, vec Φ Q N 0, Q 0, Q Inverse Γ Q 0, K +, 19

20 where M 0 is a K K identity matrix, L i the ith i > K column of L. The diagonal elements of Q 0 are set to be the residual variances of the corresponding one lag univariate autoregressions, ˆσ i. The diagonal elements of 0 are constructed so that the prior variance of parameter on the jth variable in the ith equation equals ˆσ i /ˆσ j. In this example, we aim to determine the number of factors in the dynamic factor models using model selection criteria. In BBE 005 model comparison is achieved by graphic methods. Our approach can be regarded as a formal statistical alternative to the graphic methods. It is well documented that the determination of number of factors in the setting of the dynamic factor models is important; see Stock and Watson As in the previous example, we use DIC 7 and RDIC to compare models with different numbers of factors, namely K = 1, and 3, which are denoted by M 1, M, M 3 respectively. Using the Gibbs sampler, we sample,000 random observations from the corresponding posterior distributions. We discard the first,000 observations and keep the following 0,000 as the effective samples from the posterior distribution of the parameters. Following a suggestion of a referee, we also compare alternative models using the marginal likelihood approach. Unfortunately, the prior distributions of Φ and Q of BBE 005 depend on the latent variables which lead to implicit joint prior distributions of L, R, Φ and Q. Consequently, it is difficult to calculate the joint prior density of L, R, Φ and Q. To avoid the evaluation of the joint prior density, we calculate the marginal likelihood by the harmonic mean method Newton and Raftery 1994, which only needs to calculate the reciprocal of the likelihood for each posterior draw of parameters. Based on the 0,000 samples, we compute DIC 7, RDIC, and the marginal likelihood for all three models. The technique in Lemma 3. is used to approximate the observed-data likelihood at the posterior mean. Table 1 reports the simple count of the number of parameters including the latent variables, DIC 7, the P D component of DIC 7, i.e. when the data augmentation technique is used, the simple count of the number of parameters excluding the latent variables, RDIC, the PD component and the D θ component of RDIC i.e. when the data augmentation technique is not used, and the marginal likelihood. Several conclusions may be drawn from Table 1. First, DIC 7, RDIC and the marginal likelihood all suggest that M 3 is the best model, followed by Model and then by Model 1. Model 3 has higher effective number of parameters than the other two models. However, the gain in the fit to data is greater. The conclusion is that at least 3 factors are needed to describe the joint movement of the 10 macroeconomic time series. Second, since some very informative priors have been used, neither P D nor PD is close to the actual number of parameters. While it is cheap to compute RDIC, it is much harder to compute DIC 1. This is because the observed-data likelihood py θ is not available in closed-form and Kalman filter is used to numerically calculate py θ which involves the computation of 1 J J j=1 log py θj, for J = 0, 000. We have to run Kalman 0

21 Table 1: Model selection results for dynamic factor models Model M 1 M M 3 Number of Parameters P D DIC Number of Parameters PD D θ RDIC Log MargLik filter 0,000 times, which takes more than 4 hours to compute in Matlab. 3 In sharp contrast, it only took less than 80 seconds to compute RDIC. Obviously, the discrepancy in CPU time increases with J. 4. Comparing stochastic volatility models Stochastic volatility SV models have been found very useful for pricing derivative securities. In the discrete time log-normal SV models, the logarithmic volatility is the state variable which is often assumed to follow an AR1 model. The basic log-normal SV model is of the form: y t = α + exph t /u t, u t N0, 1, h t = µ + φh t 1 µ + v t, v t N0, τ, where t = 1,,, n, y t is the continuously compounded return, h t the unobserved logvolatility, h 0 = µ, and u t, v t independently normal variables for all t. In this paper, we denote this model by M 1. To carry out Bayesian analysis of M 1, following Meyer and Yu 000, the prior distributions are specified as follows: α N 0, 100, µ N 0, 100, φ Beta 1, 1, 1/τ Γ 0.001, An alternative specification of M 1 is given by: y t = α + σ t u t, u t N0, 1, log σt = µ + φ log σt 1 µ + ν t, v t N0, τ, 3 Numerically more efficient algorithms, such as the one proposed by Chan and Jeliazkov 009, may be used to evaluate log py θ j. 1

Deviance Information Criterion for Comparing VAR Models

Deviance Information Criterion for Comparing VAR Models Deviance Information Criterion for Comparing VAR Models Tao Zeng Singapore Management University Jun Yu Singapore Management University June 16, 014 Yong Li Renmin University Abstract: Vector Autoregression

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Lecture 6: Model Checking and Selection

Lecture 6: Model Checking and Selection Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

A Bayesian Chi-Squared Test for Hypothesis Testing

A Bayesian Chi-Squared Test for Hypothesis Testing Singapore Management University Institutional Knowledge at Singapore Management University Research Collection School Of Economics School of Economics 6-204 A Bayesian Chi-Squared Test for Hypothesis Testing

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Recursive Deviance Information Criterion for the Hidden Markov Model

Recursive Deviance Information Criterion for the Hidden Markov Model International Journal of Statistics and Probability; Vol. 5, No. 1; 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Recursive Deviance Information Criterion for

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Lawrence J Christiano Dynamic Factor Models and Factor Augmented Vector Autoregressions Problem: the time series dimension of data is relatively

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

Bayesian Hypothesis Testing in Latent Variable Models

Bayesian Hypothesis Testing in Latent Variable Models Bayesian Hypothesis Testing in Latent Variable Models Yong Li Sun Yat-Sen University Jun Yu Singapore Management University Abstract: Hypothesis testing using Bayes factors (BFs) is known to suffer from

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Cross-sectional space-time modeling using ARNN(p, n) processes

Cross-sectional space-time modeling using ARNN(p, n) processes Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest

More information

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

High-dimensional Problems in Finance and Economics. Thomas M. Mertens High-dimensional Problems in Finance and Economics Thomas M. Mertens NYU Stern Risk Economics Lab April 17, 2012 1 / 78 Motivation Many problems in finance and economics are high dimensional. Dynamic Optimization:

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

(I AL BL 2 )z t = (I CL)ζ t, where

(I AL BL 2 )z t = (I CL)ζ t, where ECO 513 Fall 2011 MIDTERM EXAM The exam lasts 90 minutes. Answer all three questions. (1 Consider this model: x t = 1.2x t 1.62x t 2 +.2y t 1.23y t 2 + ε t.7ε t 1.9ν t 1 (1 [ εt y t = 1.4y t 1.62y t 2

More information

Generalized Autoregressive Score Models

Generalized Autoregressive Score Models Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be

More information

Assessing Regime Uncertainty Through Reversible Jump McMC

Assessing Regime Uncertainty Through Reversible Jump McMC Assessing Regime Uncertainty Through Reversible Jump McMC August 14, 2008 1 Introduction Background Research Question 2 The RJMcMC Method McMC RJMcMC Algorithm Dependent Proposals Independent Proposals

More information

Bayesian Model Comparison:

Bayesian Model Comparison: Bayesian Model Comparison: Modeling Petrobrás log-returns Hedibert Freitas Lopes February 2014 Log price: y t = log p t Time span: 12/29/2000-12/31/2013 (n = 3268 days) LOG PRICE 1 2 3 4 0 500 1000 1500

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Bayesian Estimation of DSGE Models

Bayesian Estimation of DSGE Models Bayesian Estimation of DSGE Models Stéphane Adjemian Université du Maine, GAINS & CEPREMAP stephane.adjemian@univ-lemans.fr http://www.dynare.org/stepan June 28, 2011 June 28, 2011 Université du Maine,

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation So far we have discussed types of spatial data, some basic modeling frameworks and exploratory techniques. We have not discussed

More information

Nowcasting Norwegian GDP

Nowcasting Norwegian GDP Nowcasting Norwegian GDP Knut Are Aastveit and Tørres Trovik May 13, 2007 Introduction Motivation The last decades of advances in information technology has made it possible to access a huge amount of

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

BEAR 4.2. Introducing Stochastic Volatility, Time Varying Parameters and Time Varying Trends. A. Dieppe R. Legrand B. van Roye ECB.

BEAR 4.2. Introducing Stochastic Volatility, Time Varying Parameters and Time Varying Trends. A. Dieppe R. Legrand B. van Roye ECB. BEAR 4.2 Introducing Stochastic Volatility, Time Varying Parameters and Time Varying Trends A. Dieppe R. Legrand B. van Roye ECB 25 June 2018 The views expressed in this presentation are the authors and

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Research Division Federal Reserve Bank of St. Louis Working Paper Series Research Division Federal Reserve Bank of St Louis Working Paper Series Kalman Filtering with Truncated Normal State Variables for Bayesian Estimation of Macroeconomic Models Michael Dueker Working Paper

More information

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure

A Robust Approach to Estimating Production Functions: Replication of the ACF procedure A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018

More information

A Note on Lenk s Correction of the Harmonic Mean Estimator

A Note on Lenk s Correction of the Harmonic Mean Estimator Central European Journal of Economic Modelling and Econometrics Note on Lenk s Correction of the Harmonic Mean Estimator nna Pajor, Jacek Osiewalski Submitted: 5.2.203, ccepted: 30.0.204 bstract The paper

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

DSGE Methods. Estimation of DSGE models: Maximum Likelihood & Bayesian. Willi Mutschler, M.Sc.

DSGE Methods. Estimation of DSGE models: Maximum Likelihood & Bayesian. Willi Mutschler, M.Sc. DSGE Methods Estimation of DSGE models: Maximum Likelihood & Bayesian Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@uni-muenster.de Summer

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models Francis X. Diebold Frank Schorfheide Minchul Shin University of Pennsylvania May 4, 2014 1 / 33 Motivation The use

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

DIC: Deviance Information Criterion

DIC: Deviance Information Criterion (((( Welcome Page Latest News DIC: Deviance Information Criterion Contact us/bugs list WinBUGS New WinBUGS examples FAQs DIC GeoBUGS DIC (Deviance Information Criterion) is a Bayesian method for model

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Departamento de Economía Universidad de Chile

Departamento de Economía Universidad de Chile Departamento de Economía Universidad de Chile GRADUATE COURSE SPATIAL ECONOMETRICS November 14, 16, 17, 20 and 21, 2017 Prof. Henk Folmer University of Groningen Objectives The main objective of the course

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET Investigating posterior contour probabilities using INLA: A case study on recurrence of bladder tumours by Rupali Akerkar PREPRINT STATISTICS NO. 4/2012 NORWEGIAN

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Author's personal copy

Author's personal copy Journal of Econometrics 166 (01) 37 46 Contents lists available at SciVerse ScienceDirect Journal of Econometrics journal homepage: wwwelseviercom/locate/jeconom Bayesian hypothesis testing in latent variable

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Switching Regime Estimation

Switching Regime Estimation Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms

More information

Lecture Notes based on Koop (2003) Bayesian Econometrics

Lecture Notes based on Koop (2003) Bayesian Econometrics Lecture Notes based on Koop (2003) Bayesian Econometrics A.Colin Cameron University of California - Davis November 15, 2005 1. CH.1: Introduction The concepts below are the essential concepts used throughout

More information

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Behaviormetrika Vol.36, No., 2009, 27 48 BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Yanyan Sheng and Christopher K. Wikle IRT-based models with a general ability and several specific

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y

More information