E cient Importance Sampling
|
|
- Chad Preston
- 5 years ago
- Views:
Transcription
1 E cient David N. DeJong University of Pittsburgh Spring 2008
2 Our goal is to calculate integrals of the form Z G (Y ) = ϕ (θ; Y ) dθ. Special case (e.g., posterior moment): Z G (Y ) = Θ Θ φ (θ; Y ) p (θjy ) dθ. Scenario: analytical solutions to these integrals are unavailable. We will remedy this problem using numerical approximation methods.
3 , cont. In the context of achieving likelihood evaluation and ltering in state space representations, recall that we face the challenge of constructing the ltering density f (s t jy t ) = f (y t, s t jy t 1 ) f (y t jy t 1 ) = f (y tjs t, Y t 1 ) f (s t jy t 1 ), f (y t jy t 1 ) where Z f (s t jy t 1 ) = f (s t js t 1, Y t 1 ) f (s t 1 jy t 1 ) ds t 1, and Z f (y t jy t 1 ) = f (y t js t, Y t 1 ) f (s t jy t 1 ) ds t.
4 , cont. To gain intuition, suppose the integral we face is of the form Z G (Y ) = φ (θ; Y ) p (θjy ) dθ, Θ and it is possible to obtain pseudo-random drawings θ i from p (θjy ). Then by the law of large numbers, G (Y ) N = 1 N N i=1 φ (θ i ; Y ) converges in probability to G (Y ). We refer to G (Y ) N as the Monte Carlo estimate of G (Y ). As we shall see, from the standpoint of numerical e ciency, this represents a best-case scenario.
5 , cont. The MC estimate of the standard deviation of G (Y ) is given by σ N (G (Y )) = " 1 N N i φ (θ i ; Y ) 2! G (Y ) 2 N # 1/2. The numerical standard error associated with G (Y ) N is given by s.e. G (Y ) N = σ N (G (Y )) p. N Thus for N = 10, 000, s.e. G (Y ) N is 1% of the size of σ N (G (Y )).
6 (Geweke, 1989 Econometrica) If p (θjy ) is unavialable as a sampler, one remedy is to augment the targeted integrand with an importance sampling distribution g (θja): G (Y ) = = Z Θ Z Θ ϕ (θ; Y ) g (θja) dθ g (θja) φ (θ; Y ) p (θjy ) g (θja) dθ. g (θja) Key requirements: I Support of g (θja) must span that of ϕ (θjy ) I E [G (Y )] must exist and be nite. I g (θja) must be implementable as a sampler
7 , cont. I MC estimate of G (Y ): G (Y ) N = 1 N N i=1 ω i ; ω i = ϕ (θ i jy ) g (θ i ja) I MC estimate of the standard deviation of ω w.r.t. g (θja): σ N (ω (θ, Y )) = 1 N ω 2 i G (Y ) 2 N 1/2. I Numerical standard error associated with G (θ, Y ) N : s.e. G (Y ) N I = σ N (ω (θ, Y )) p N. I Note: Variability in ω translates into increased n.s.e. (numerical ine ciency)
8 , cont. For the special case in which the integrand factorizes as ϕ (θ i jy ) = φ (θ; Y ) p (θjy ), G (Y ) N = 1 N " σ N (G (Y )) = N i=1 1 N φ (θ i ; Y ) w i, w i = p (θ i jy ) g (θ i ja)! # 1/2 N φ (θ i ; Y ) 2 w i G (Y ) 2 N. i
9 , cont. Continuing with the special case, if we lack an integrating constant for either p (θjy ) or g (θja), we can work instead with G (Y ) N = σ N (G (Y )) =! N 1 w i " 1 w i φ (θ i ; Y ) w i i=1! N! φ (θ i ; Y ) 2 w i i=1 G (Y ) 2 N # 1/2. The impact of ignoring integrating constants is eliminated by the inclusion of the accumulated weights in the denominators of these expressions.
10 , cont. Regardless of whether p (θjy ) and g (θja) are proper p.d.f.s, notice that when p (θjy ) itself can be used as a sampling density (i.e., p (θjy ) = g (θja)), then w i = 1 8 i, and we revert to the best-case scenario. In general, the accuracy of our approximation will fall short of the best-case scenario. To judge the degree of the shortfall, various metrics are available.
11 , cont. Two metrics for judging numerical accuracy: I w max 2 w i 2 (good practical benchmark: 1%) I relative numerical e ciency (RNE)
12 , cont. Motivation for RNE. Issue: how close are we to the best-case scenario? I Under the best-case scenario, recall that n.s.e. is given by s.e. G (Y ) N = σ N (G (Y )) p. N I Actual n.s.e. is given by s.e. G (Y ) N I = σ N (ω (θ, Y )) p N, where recall σ N (ω (θ, Y )) = 1 N ω 2 i G (Y ) 2 N 1/2, ω i = ϕ (θ i jy ) g (θ i ja)
13 , cont. The idea behind RNE is to compare the actual n.s.e. to an estimate of the optimal (best-case) n.s.e.: RNE = = (ideal n.s.e.)2 (actual n.s.e.) 2 σn (G (Y p )) 2 N 2. s.e. G (Y ) N I
14 , cont. Rearranging yields s.e. G (Y ) N I = σ N (G (Y )) p N RNE. Note: relative to the best-case scenario, p N RNE replaces p N in the denominator. Thus the further is RNE from 1, the more draws are required to achieve a given level of accuracy relative to the best-case scenario.
15 Suppose p (θjy ) N (µ, Σ), 2 µ = , corr(σ) = sqrt(diag(σ)) 0 = Statistics of interest: 3 7 5, E (sumc(µ)), E (prodc(µ))
16 , cont. Using rndseed , and N = 10, 000, MC estimates (bµ, sqrt(diag(σ)), \ n.s.e.(bµ):
17 , cont. MC estimate of corr(σ) :
18 , cont. MC estimates of statistics (mean, std. dev., n.s.e.(mean)): E (sumc(µ)) E (prodc(µ))
19 , cont. Exercise: replicate Hint for exercise. To obtain draws from N(µ, Σ) distribution: I swish = chol(sig) ; swish is lower-diagonal Cholesky decomposition of sig (i.e., sig=swish*swish ). I draw = mu + swish*rndn(n,1);
20 , cont. Suppose instead we seek to obtain estimates using an density g (θja) N (µ I, Σ I ), with µ I = µ sqrt(diag(σ)), Σ I = diagrv(σ, sqrt(diag(σ)))
21 , cont. Using rndseed , and N = 10, 000, MC versus IS estimates (bµ, sqrt(diag(σ)), \ n.s.e.(bµ):
22 , cont. MC versus IS estimate of corr(σ) :
23 , cont. MC versus estimates of statistics (mean, std. dev., n.s.e.(mean)): E (sumc(µ)) E (prodc(µ)) E (sumc(µ)) E (prodc(µ))
24 , cont. Accuracy Diagnostics Summary statistics on weights: avg, stdev, min, max, maxsq/totsq: RNEs and 1/RNEs:
25 , cont. Plot of weights (all, top 1,000, top 100):
26 , cont. Exercise: Replicate
27 From a programming perspective, two simple approaches to improving e ciency are: I Increase N (RNEs indicate good rules of thumb for necessary increases). This brute-force method is often computationally prohibitive. I Sequential updating. (Can still be expensive, but less brutish.)
28 , cont. Sequential updating: I Begin with an initial parameterization a 0 for g (θja) (e.g., (µ 0, Σ 0 )). I Calculate bθ 0, map into a 1. I Repeat until a i yields an acceptable level of numerical accuracy.
29 , cont. Returning to the example, RNEs and 1/RNEs evolve as follows: Iteration 0: Iteration 1: Iteration 2: Iteration 3:
30 , cont. Weight plots, Iteration 3:
31 , cont. Caveats regarding sequential updating: I Convergence to target is not guaranteed I Performance can be sensitive to starting values I Initial sampler should be su ciently di use to ensure coverage of appropriate range for targeted integrand
32 , cont. Aside regarding coverage: The multivariate-t density is an attractive sampler relative to the normal density: it has similar location and shape parameters, but has thicker tails. Tail thickness controlled by degrees-of-freedom parameter υ (smaller υ, thicker tails). I Parameters of the multivariat-t: (γ, V, υ). I Mean and second-order moments: γ, V 1 υ υ 2
33 , cont. Algorithm for obtaining drawings µ i from multivariate-t (γ, V, υ) : I Obtain s i from a χ 2 (υ) distribution: s i = υ j=1 x 2 j, x j N(0, 1) I Use s i to construct the scaling factor I Obtain µ i as σ i = (s i /υ) 1/2 µ i = γ σ i V 1/2 w i, w i,j N(0, 1), j = 1,..., k V 1/2 = chol(v 1 ) 0 I Note: use of yields antithetic acceleration (Geweke, 1988, JoE)
34 , cont. Exercise: Replace the normal densities used as Samplers in the exercise above with multi-t densities, with γ = µ and V 1 = Σ. Experiment with alternative υ 0 s to assess the impact on numerical e ciency.
35 (Richard and Zhang, 2007 JoE) Goal: Tailor g (θja) (via the speci cation of a) to minimize the n.s.e. associated with the approximation of Z G (Y ) = ϕ (θjy ) dθ. Θ Write g (θja) as g (θja) = χ(a) = k (θ; a) χ(a), Z k (θ; a) dθ. Θ
36 , cont. Details regarding the tailoring of g (θja) are distinct for two special cases: I g (θja) is parametric (i.e., a normal distribution) I g (θja) is piecewise-linear
37 , cont. When g (θja) is fully parametric, n.s.e. is (approximately) minimized via iterations on (ba l+1, bc l+1 ) = arg min a,c Q N (a, c; Y jba l ), Q N (a, c; Y jba l ) = 1 N d θ l i, a, c, Y = ln ϕ ω θ l i ; Y, ba l N i=1 d 2 θ l i, a, c, Y θ l i jy = ϕ (θ i jy ) g (θ i jba l ). c ω θ l i ; Y, ba l, ln k θ l i ; a, The term c is a normalizing constant that controls for factors in ϕ and g that do not depend upon θ. Typically, it su ces to set ω θ l i ; Y, ba l = 1 8 i.
38 , cont. When g (θja) is piecewise-linear, the parameters a are grid points: a 0 = (a 0,..., a R ), a 0 < a 1 <... < a R. In this case, the kernel k(θ; a) is given by ln k j (θ; a) = α j + β j θ 8θ 2 [a j 1, a j ], β j = ln ϕ(a j ) ln ϕ(a j 1 ) a j a j 1, α j = ln ϕ(a j ) β j a j. Optimization is achieved by selecting ba as an equal-probability division of the support of ϕ (θjy ).
39 , cont. Given nal estimates (ba, bc), the estimate of G (Y ) is given by G (Y ) N = 1 N N i=1 ω (θ i ; Y, ba). N.S.E. is computed as indicated above.
40 , Gaussian Sampler To simplify notation, denote the targeted integrand as ϕ (θjy ) ϕ (θ). We ll take θ as k dimensional, with elements (x 1, x 2,...x k ). With g (θja) Gaussian, a consists of the k 1 vector of means µ and the k k covariance matrix Σ. Since the covariance matrix is symmetric, the number of auxiliary parameters reduces to k + k(k + 1)/2. The precision matrix H = Σ 1.
41 , Gaussian Sampler, cont. Our goal is to choose (µ, H) to approximate optimally ln ϕ(s) by a Gaussian kernel: ln ϕ(θ) 1 2 (θ µ)0 H(θ µ) 1 2 θ0 Hθ 2θ 0 Hµ. Recall that by optimally, we refer to the weighted-squared-error minimization introduced above.
42 , Gaussian Sampler, cont. The term θ 0 Hθ can be written as h 11 h 21.. h 1k x 1 h 21 h 22.. h 2k x 2 x 1 x 2.. x k B..... C B. A. h k1 h k2.. h kk x k
43 , Gaussian Sampler, cont. Expanding θ 0 Hθ, we obtain θ 0 Hθ = h 11 x1 2 + h22 x hjj xkj 2 +2h 21 (x 2 x 1 ) + 2h 31 (x 3 x 1 ) h k1 (x k x 1 ) +2h 32 (x 3 x 2 ) + 2h 42 (x 4 x 2 ) h k2 (x k x 2 )... +2h k(k 1) (x k x k 1 ). This expression indicates that the coe cients of the squares, pairwise products and the individual components of θ are in one-to-one correspondence with the µ and H.
44 , Gaussian Sampler, cont. Given this correspondence, the optimization problem amounts to a weighted-least-squares problem involving the regression of ln ϕ(s) on 1, x1,..., x k, x 1 x 2, x 1 x 3,..., x k The number of regressors is K = 1 x k x1 2, x2 2,..., xk k + k(k+1) 2.
45 , Gaussian Sampler, cont. Algorithm I Specify a 0, generate 0 y ln ϕ(θ 1 )... ln ϕ(θ M ) 1 A, 0 w ω 1... ω M 1 A, ϕ (θ i jy ) g (θ i ja 0 ), X = (Note: M << N) κ 1... κ M 1 A, κ i = 1~θ 0 i ~vech(θ i θ 0 i ) 0.
46 , Gaussian Sampler, cont. Algorithm, cont. I Construct ey = y. (w.^2), ex = X. (w.^2). (Caution: set w = 1 when using a poor initial sampling density.) I Estimate I Map bβ into bβ = ( ex 0 ex ) 1 ex ey. bµ, bσ. Jointly, these constitute a 1. I Replacing a 0 above with a 1, repeat until convergence.
47 , Gaussian Sampler, cont. Algorithm, cont. To map bβ into bµ, bσ : I Map the k + 2 through K elements of bβ into a symmetric matrix eh, with j th diagonal element corresponding to the coe cient associated with the squared value of the j th element of θ, and (j, k) th element corresponding to the product of the j and k th element of θ. I.E., eh=xpnd(bβ[k+2:rows(beta)]); I Construct eh by multiplying all elements of eh by 1, then multiplying the diagonal elements by 2. I bσ = eh 1 I bµ = bσ bβ[2 : k + 1]
48 , Gaussian Sampler, cont. Exercise: Return to the example outlined above. Using an initial sampler speci ed with µ 0 = µ + 3 sqrt(diag(σ)), Σ 0 = 10 diagrv(σ, sqrt(diag(σ))), show that the algorithm yields bµ = µ, bσ = Σ in one step using M = 50, w = 1. Experiment with alternative initial samplers.
49 , Piecewise-Linear Approximation Context: E ective for use in univariate cases featuring obvious deviations from normality. (Curse of dimensionality renders implementation problematic in high-dimensional cases.)
50 , PW-L Approx., cont. Motivation 1. Mixture of Normals, (µ, σ) = ( 0.5, 0.25), (0.5, 0.5), equal weights
51 , PW-L Approx., cont. Approximation via Gaussian samplers (handmade and ): RNEs for calculating E(x): 0.61, 0.78.
52 RNE: 0.37, PW-L Approx., cont. Motivation 2. (µ, σ) = ( 1.5, 0.25), (1.5, 0.5), equal weights
53 RNE: 0.11, PW-L Approx., cont. Motivation 3. (µ, σ) = ( 1.5, 0.05), (1.5, 0.5), equal weights
54 , PW-L Approx., cont. Reboot: Recall that when g (θja) is piecewise-linear, the parameters a are grid points: a 0 = (a 0,..., a R ), a 0 < a 1 <... < a R. In this case, the kernel k(θ; a) is given by ln k j (θ; a) = α j + β j θ 8θ 2 [a j 1, a j ], β j = ln ϕ(a j ) ln ϕ(a j 1 ) a j a j 1, α j = ln ϕ(a j ) β j a j. Optimization is achieved by selecting ba as an equal-probability division of the support of ϕ (θjy ).
55 , PW-L Approx., cont. In order to achieve equal-probability division, and to implement the distribution as a sampler, we must I construct its associated CDF I invert the CDF.
56 Inversion involves inducing a mapping from the y to the x axis. involves mapping drawings obtained from a U [0, 1] distribution onto the x axis., PW-L Approx., cont. To gain intuition behind implementation and inversion, consider the CDF associated with Case 3:
57 , PW-L Approx., cont. Letting s denote the x written as axis variable, the CDF of k can be K j (s; a) = χ j (s; a) χ n (a), 8s 2 [a j 1, a j ], χ j (s; a) = χ j 1 (a) + 1 β j [k j (s; a) k j (a j 1 ; a)], χ 0 (a) = 0, χ j (a) = χ j (a j ; a). Note: χ n (a) is the integrating constant associated with the pdf.
58 , PW-L Approx., cont. Inversion/implementation: s = 1 nln hk j (a j 1 ; a) + β β j u χ R (a) χ j 1 (a) i o α j, j u 2 ]0, 1[ and χ j 1 (a) < u χ R (a) < χ j (a).
59 , PW-L Approx., cont. Unre ned (equal spacing over x range) piecewise approximation for case 1: RNE: 0.99
60 , PW-L Approx., cont. Unre ned (equal spacing over x range) piecewise approximation for case 2: RNE: 0.99
61 , PW-L Approx., cont. Unre ned (equal spacing over x range) piecewise approximation for case 3: RNE: 0.24
62 , PW-L Approx., cont. To understand the source of the problem in Case 3, consider again the CDF of the unre ned sampler: Note: relatively few points along steep portions of the CDF,
63 , PW-L Approx., cont. To address the problem: equal probability division of the range for s. I.e., divide the vertical axis of the CDF into equal portions, then map into s : u i = ε + (2 ε) i, i = 1,..., R 1, R with ε su ciently small (typically ε = 10 4 ) to avoid tail intervals of excessive length.
64 , PW-L Approx., cont. Iterative construction: Given the step-l grid ba l, construct the density kernel k and its CDF K as described above. The step-l + 1 grid is then computed as ba l+1 i = K 1 (u i ), i = 1,..., R 1. Iterate until (approximate) convergence.
65 , PW-L Approx., cont. Re ned approximation for Case 3: RNE: 0.95
66 , PW-L Approx., cont. Re ned CDF:
E cient Likelihood Evaluation of State-Space Representations
E cient Likelihood Evaluation of State-Space Representations David N. DeJong, Roman Liesenfeld, Guilherme V. Moura, Jean-Francois Richard, Hariharan Dharmarajan University of Pittsburgh Universität Kiel
More informationSession 2B: Some basic simulation methods
Session 2B: Some basic simulation methods John Geweke Bayesian Econometrics and its Applications August 14, 2012 ohn Geweke Bayesian Econometrics and its Applications Session 2B: Some () basic simulation
More informationBayesian Modeling of Conditional Distributions
Bayesian Modeling of Conditional Distributions John Geweke University of Iowa Indiana University Department of Economics February 27, 2007 Outline Motivation Model description Methods of inference Earnings
More informationModeling conditional distributions with mixture models: Applications in finance and financial decision-making
Modeling conditional distributions with mixture models: Applications in finance and financial decision-making John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università
More informationDirect Simulation Methods #2
Direct Simulation Methods #2 Econ 690 Purdue University Outline 1 A Generalized Rejection Sampling Algorithm 2 The Weighted Bootstrap 3 Importance Sampling Rejection Sampling Algorithm #2 Suppose we wish
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationMonte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University
Methods Econ 690 Purdue University Outline 1 Monte Carlo Integration 2 The Method of Composition 3 The Method of Inversion 4 Acceptance/Rejection Sampling Monte Carlo Integration Suppose you wish to calculate
More informationApproximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature
Approximating Bayesian Posterior Means Using Multivariate Gaussian Quadrature John A.L. Cranfield Paul V. Preckel Songquan Liu Presented at Western Agricultural Economics Association 1997 Annual Meeting
More informationExample Environments
Example Environments David N. DeJong University of Pittsburgh Optimality Closing the Cycle Spring 2008, Revised Spring 2010 Lucas (1978 Econometrica) One-Tree of Asset Prices Text reference: Ch. 5.3, pp.
More informationChapter 2. Dynamic panel data models
Chapter 2. Dynamic panel data models School of Economics and Management - University of Geneva Christophe Hurlin, Université of Orléans University of Orléans April 2018 C. Hurlin (University of Orléans)
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationA Robust Approach to Estimating Production Functions: Replication of the ACF procedure
A Robust Approach to Estimating Production Functions: Replication of the ACF procedure Kyoo il Kim Michigan State University Yao Luo University of Toronto Yingjun Su IESR, Jinan University August 2018
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationSelf Adaptive Particle Filter
Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter
More informationStructural Macroeconometrics. Chapter 4. Summarizing Time Series Behavior
Structural Macroeconometrics Chapter 4. Summarizing Time Series Behavior David N. DeJong Chetan Dave The sign of a truly educated man is to be deeply moved by statistics. George Bernard Shaw This chapter
More informationModeling conditional distributions with mixture models: Theory and Inference
Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005
More informationHochdimensionale Integration
Oliver Ernst Institut für Numerische Mathematik und Optimierung Hochdimensionale Integration 14-tägige Vorlesung im Wintersemester 2010/11 im Rahmen des Moduls Ausgewählte Kapitel der Numerik Contents
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationPANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1
PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011 PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,...,
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationBasic Sampling Methods
Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution
More informationStable Process. 2. Multivariate Stable Distributions. July, 2006
Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.
More informationCSCI-6971 Lecture Notes: Monte Carlo integration
CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following
More informationSimple Estimators for Semiparametric Multinomial Choice Models
Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationMonetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results
Monetary and Exchange Rate Policy Under Remittance Fluctuations Technical Appendix and Additional Results Federico Mandelman February In this appendix, I provide technical details on the Bayesian estimation.
More informationStructured Variational Inference
Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:
More informationBayesian IV: the normal case with multiple endogenous variables
Baesian IV: the normal case with multiple endogenous variables Timoth Cogle and Richard Startz revised Januar 05 Abstract We set out a Gibbs sampler for the linear instrumental-variable model with normal
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationNonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania
Nonlinear and/or Non-normal Filtering Jesús Fernández-Villaverde University of Pennsylvania 1 Motivation Nonlinear and/or non-gaussian filtering, smoothing, and forecasting (NLGF) problems are pervasive
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationChapter 6: Endogeneity and Instrumental Variables (IV) estimator
Chapter 6: Endogeneity and Instrumental Variables (IV) estimator Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 15, 2013 Christophe Hurlin (University of Orléans)
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationE cient Filtering in State-Space Representations
E cient Filtering in State-Space Representations David N. DeJong University of Pittsburgh Hariharan Dharmarajan Bates White, L.L.C. Jean-François Richard University of Pittsburgh Roman Liesenfeld Universität
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationvariability of the model, represented by σ 2 and not accounted for by Xβ
Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability
More informationFiltering and Likelihood Inference
Filtering and Likelihood Inference Jesús Fernández-Villaverde University of Pennsylvania July 10, 2011 Jesús Fernández-Villaverde (PENN) Filtering and Likelihood July 10, 2011 1 / 79 Motivation Introduction
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationGeneralized Autoregressive Score Models
Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationESSAYS ON ESTIMATION OF NON-LINEAR STATE-SPACE MODELS.
ESSAYS ON ESTIMATION OF NON-LINEAR STATE-SPACE MODELS. by Dharmarajan Hariharan B.E. (Chemical Engineering), Bangalore University, 1997 M.S. (Chemical Engineering), West Virginia University, 2001 Submitted
More informationGaussian kernel GARCH models
Gaussian kernel GARCH models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics 7 June 2013 Motivation A regression model is often
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationE cient Monte Carlo for Gaussian Fields and Processes
E cient Monte Carlo for Gaussian Fields and Processes Jose Blanchet (with R. Adler, J. C. Liu, and C. Li) Columbia University Nov, 2010 Jose Blanchet (Columbia) Monte Carlo for Gaussian Fields Nov, 2010
More information10. Composite Hypothesis Testing. ECE 830, Spring 2014
10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationGeneralized Method of Moments (GMM) Estimation
Econometrics 2 Fall 2004 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen of29 Outline of the Lecture () Introduction. (2) Moment conditions and methods of moments (MM) estimation. Ordinary
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationFöreläsning /31
1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +
More informationSequential Importance Sampling for Rare Event Estimation with Computer Experiments
Sequential Importance Sampling for Rare Event Estimation with Computer Experiments Brian Williams and Rick Picard LA-UR-12-22467 Statistical Sciences Group, Los Alamos National Laboratory Abstract Importance
More informationMonte Carlo Simulations and PcNaive
Econometrics 2 Fall 2005 Monte Carlo Simulations and Pcaive Heino Bohn ielsen 1of21 Monte Carlo Simulations MC simulations were introduced in Econometrics 1. Formalizing the thought experiment underlying
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationEcon Some Bayesian Econometrics
Econ 8208- Some Bayesian Econometrics Patrick Bajari Patrick Bajari () Econ 8208- Some Bayesian Econometrics 1 / 72 Motivation Next we shall begin to discuss Gibbs sampling and Markov Chain Monte Carlo
More informationTesting for a Trend with Persistent Errors
Testing for a Trend with Persistent Errors Graham Elliott UCSD August 2017 Abstract We develop new tests for the coe cient on a time trend in a regression of a variable on a constant and time trend where
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationThe Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11
Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element
More informationOnline appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US
Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus
More informationLog-Linear Approximation and Model. Solution
Log-Linear and Model David N. DeJong University of Pittsburgh Spring 2008, Revised Spring 2010 Last time, we sketched the process of converting model environments into non-linear rst-order systems of expectational
More informationMonte Carlo Methods. Jesús Fernández-Villaverde University of Pennsylvania
Monte Carlo Methods Jesús Fernández-Villaverde University of Pennsylvania 1 Why Monte Carlo? From previous chapter, we want to compute: 1. Posterior distribution: π ³ θ Y T,i = f(y T θ,i)π (θ i) R Θ i
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationOnline Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access
Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationBAYESIAN ANALYSIS OF DSGE MODELS - SOME COMMENTS
BAYESIAN ANALYSIS OF DSGE MODELS - SOME COMMENTS MALIN ADOLFSON, JESPER LINDÉ, AND MATTIAS VILLANI Sungbae An and Frank Schorfheide have provided an excellent review of the main elements of Bayesian inference
More informationA Bayesian perspective on GMM and IV
A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationExercises Chapter 4 Statistical Hypothesis Testing
Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationWeek 1 Quantitative Analysis of Financial Markets Distributions A
Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationANOVA: Analysis of Variance - Part I
ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationMultivariate Time Series: VAR(p) Processes and Models
Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationSimulation. Where real stuff starts
1 Simulation Where real stuff starts ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2 1. What is a simulation? 3 What is a simulation?
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationSome Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression
Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June
More informationIn manycomputationaleconomicapplications, one must compute thede nite n
Chapter 6 Numerical Integration In manycomputationaleconomicapplications, one must compute thede nite n integral of a real-valued function f de ned on some interval I of
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationStudent-t Process as Alternative to Gaussian Processes Discussion
Student-t Process as Alternative to Gaussian Processes Discussion A. Shah, A. G. Wilson and Z. Gharamani Discussion by: R. Henao Duke University June 20, 2014 Contributions The paper is concerned about
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationcomponent risk analysis
273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationThe Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo
NBER Summer Institute Minicourse What s New in Econometrics: Time Series Lecture 5 July 5, 2008 The Kalman filter, Nonlinear filtering, and Markov Chain Monte Carlo Lecture 5, July 2, 2008 Outline. Models
More information(17) (18)
Module 4 : Solving Linear Algebraic Equations Section 3 : Direct Solution Techniques 3 Direct Solution Techniques Methods for solving linear algebraic equations can be categorized as direct and iterative
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More information