Semi-Parametric Importance Sampling for Rare-event probability Estimation

Similar documents
Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Adaptive Monte Carlo methods

Bayesian Methods for Machine Learning

Econometrics I, Estimation

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Statistics & Data Sciences: First Year Prelim Exam May 2018

Stat 451 Lecture Notes Monte Carlo Integration

Chapter 7. Markov chain background. 7.1 Finite state space

Lecture 8: The Metropolis-Hastings Algorithm

An ABC interpretation of the multiple auxiliary variable method

STAT 425: Introduction to Bayesian Analysis

Control Variates for Markov Chain Monte Carlo

Stat 516, Homework 1

Efficient rare-event simulation for sums of dependent random varia

Asymptotics and Simulation of Heavy-Tailed Processes

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

STA 4273H: Statistical Machine Learning

F denotes cumulative density. denotes probability density function; (.)

Markov Chain Monte Carlo (MCMC)

Three examples of a Practical Exact Markov Chain Sampling

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

ELEC633: Graphical Models

STA205 Probability: Week 8 R. Wolpert

Improved Cross-Entropy Method for Estimation

Marginal Specifications and a Gaussian Copula Estimation

Consistency of the maximum likelihood estimator for general hidden Markov models

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Computational statistics

Importance Sampling Methodology for Multidimensional Heavy-tailed Random Walks

Online Companion: Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

Bayesian GLMs and Metropolis-Hastings Algorithm

Dynamic System Identification using HDMR-Bayesian Technique

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

On Reparametrization and the Gibbs Sampler

STA 294: Stochastic Processes & Bayesian Nonparametrics

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

ABC methods for phase-type distributions with applications in insurance risk problems

6.1 Variational representation of f-divergences

CS281A/Stat241A Lecture 22

Bayesian Methods with Monte Carlo Markov Chains II

Computer intensive statistical methods

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

17 : Markov Chain Monte Carlo

Markov chain Monte Carlo Lecture 9

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Regression Linear and Logistic Regression

Monte Carlo Methods. Leon Gu CSD, CMU

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Perturbed Proximal Gradient Algorithm

Introduction to Bayesian methods in inverse problems

Generalized Cross-Entropy Methods

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

dans les modèles à vraisemblance non explicite par des algorithmes gradient-proximaux perturbés

Computer intensive statistical methods

Default Priors and Effcient Posterior Computation in Bayesian

The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Bayes: All uncertainty is described using probability.

Rare Event Simulation using Importance Sampling and Cross-Entropy p. 1

MARKOV CHAIN MONTE CARLO

Sampling from complex probability distributions

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Kernel adaptive Sequential Monte Carlo

Assessing Regime Uncertainty Through Reversible Jump McMC

Approximate Bayesian Computation and Particle Filters

Parameter Estimation

13 Notes on Markov Chain Monte Carlo

University of Toronto Department of Statistics

Sampling Algorithms for Probabilistic Graphical models

Covariance function estimation in Gaussian process regression

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

A Novel Nonparametric Density Estimator

6 Markov Chain Monte Carlo (MCMC)

I. Bayesian econometrics

Advances and Applications in Perfect Sampling

STAT 512 sp 2018 Summary Sheet

Answers and expectations

Exploring Monte Carlo Methods

Nonlinear and non-gaussian state-space modelling by means of hidden Markov models

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

If we want to analyze experimental or simulated data we might encounter the following tasks:

Markov Chain Monte Carlo methods

NEW FRONTIERS IN APPLIED PROBABILITY

Weak convergence of Markov chain Monte Carlo II

Brief introduction to Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo

Quantifying Stochastic Model Errors via Robust Optimization

STAT 730 Chapter 4: Estimation

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Kernel Sequential Monte Carlo

Introduction to Rare Event Simulation

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

A tailor made nonparametric density estimate

MCMC algorithms for fitting Bayesian models

Transcription:

Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability Estimation p.1/24

Outline Formulation of importance sampling problem. Background material on currently suggested adaptive importance sampling methods. The proposed methodology Numerical example taken from recent work by Asmussen, Blanchet, Juneja, and Rojas-Nandayapa Tail probabilities of sums of correlated lognormals. Conclusions Semi-Parametric Importance Sampling for Rare-event probability Estimation p.2/24

Problem formulation The problem is to estimate high-dimensional integrals of the form l = f(x)h(x) dx = E f H(X) The function H : R d R and X is a d-dimensional random variable with pdf f. For discrete counting or combinatorial problems we simply replace the integration by summation. How do we estimate such integrals via Monte Carlo? Semi-Parametric Importance Sampling for Rare-event probability Estimation p.3/24

Existing methods Importance sampling: Let g be an importance sampling density such that g(x) = 0 H(x)f(x) = 0 for all x. Generate X 1,...,X m iid g, then an unbiased estimator of l is l = 1 m m k=1 Z k with Z k = H(X k ) f(x k) g(x k ). The minimum variance importance sampling density is π(x) = H(x)f(x) l, (H(x) 0), which depends on l and is therefore not useful. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.4/24

Existing importance sampling ideas Existing methods for selecting the density g assume that g is part of a parametric family: {g( ;η)}. The objective is to select the parameter η so that g is as close to the optimal density π as possible. The closeness between π and g is measured by the φ-divergence distance ( ) g(x;η) π(x) φ dx, (1) π(x) where φ : R + R is twice continuously differentiable, and φ(1) = 0, φ (x) > 0, for all x > 0. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.5/24

Variance Minimal method ariance Minimal (VM) method: Method equivalent to minimizing the φ-divergence with φ(z) = 1/z. The minimization of the resulting φ-divergence argmin η π 2 (x) g(x;η) dx is highly nonlinear The φ-divergence has to be estimated we have a noisy nonlinear optimization problem Semi-Parametric Importance Sampling for Rare-event probability Estimation p.6/24

Cross Entropy method ross Entropy method: Method equivalent to minimizing the φ-divergence with φ(z) = ln(z). The minimization of the Kullback-Leibler distance argmin π(x) ln(g(x; η)/π(x))dx η is similar to likelihood maximization and we thus frequently have analytical solutions to the optimization problem The Kullback-Leibler distance still has to be estimated Semi-Parametric Importance Sampling for Rare-event probability Estimation p.7/24

Shortcomings oth Cross-Entropy and Variance Minimization methods uffer from the following : Their performance is limited by how well a simple and rigid parametric density can approximate the optimal π. Ideally we would like a flexible non-parametric model for the importance sampling density, but this is often impossible. The VM method always requires non-linear non-convex optimization and the Cross Entropy method is as simple as likelihood maximization can be. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.8/24

MCMC methods for estimation he Bayesian community has alternatives to parametric mportance sampling that use MCMC: Chib s method Bridge sampling Path sampling Equi-Energy sampling Gelfand-Dey Method he most popular and efficient is Chib s method and its ariants. However, even Chib s approach suffers from the ollowing drawbacks. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.9/24

MCMC problems The estimators are biased estimators, because the chain almost never starts in stationarity. Difficulty in computing empirical and asymptotic error estimates, because MCMC does not generate iid samples. Chib s estimator relies on the output of multiple different chains. s there a way to draw on the strength of both MCMC and mportance sampling? Semi-Parametric Importance Sampling for Rare-event probability Estimation p.10/24

Markov chain importance sampling imilar to the cross entropy and variance minimization ethods, the MCIS method consists of two stages: Markov Chain (MC) stage, in which we construct an ergodic estimator π of the minimum variance importance sampling density π. Importance Sampling (IS) stage, in which we use π as an importance sampling density to estimate l. here are many different ways of constructing a model-free stimator of π e.g., standard kernel density estimation poor convergence). Here we only explore one way related o the Gibbs sampler. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.11/24

MCIS with Gibbs Suppose that we have used an MCMC sampler to generate the population X 1,...,X n approx π(x), X i = (X i,1,...,x i,d ). We can construct the nonparametric estimator of π using the Gibbs transition kernel: π(y) = 1 n κ(y X) def = n κ(y X i ) i=1 d π(y j y 1,...,y j 1,X j+1,...,x d ). j=1 Semi-Parametric Importance Sampling for Rare-event probability Estimation p.12/24

MCIS vs importance sampling he MCIS estimator is l = 1 m m k=1 H(Y k ) f(y k ) π(y k ), here Y 1,...,Y m iid π. Advantages and disadvantages of he MCIS approach: P(lim n π(x) = π(x)) = 1 for all x. No need to solve a φ-divergence optimization problem If n is large, then evaluation of π may be costly. MCIS estimator, unlike Chib s, is unbiased and iid sampling allows for standard estimation error bands. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.13/24

Sums of correlated lognormals onsider the estimation of the rare-event probability l = P(e X 1 + +e X d γ) = f(x)i{s(x) γ} dx, here: X = (X 1,...,X d ) and X N(µ,Σ). S(x) = e x 1 + +e x d. f is the density of N(µ,Σ) with associated precision matrix Λ = Σ 1. We use the notation Λ = (Λ i,j ) and Σ = (Σ i,j ). Semi-Parametric Importance Sampling for Rare-event probability Estimation p.14/24

Conditional densities needed for π e need the conditional densities of the optimal importance ampling pdf π(y) = f(y)i{s(x) γ}/l: (y i y i ) { f(yi y i ) if j i ey j { γ f(y i y i )I y i ln(γ } j i ey j ) if j i ey j < γ, here by standard properties of the multivariate normal ensity f(y i y i ) is normal density with mean µ i +Λ 1 i,i Λ i,j (µ j y j ) and variance j i Λ 1 i,i. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.15/24

Numerical setup Compare with importance sampling vanishing relative error estimator (ISVE) and the cross entropy vanishing relative error estimator (CEVE) of Asmussen, Blanchet, Juneja, Rojas-Nandayapa, Efficient simulation of tail probabilities of sums of correlated lognormals, Ann. Oper. Res. (2009) Both of these estimators decompose the probability l(γ) into two parts: l(γ) = P(max i X i γ)+p(s(x) γ, max i X i γ). The first (dominant) term and the second (residual) term are estimated by two different importance sampling estimators that ensure the strong efficiency of the sum. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.16/24

Numerical setup I The first term is asymptotically dominant in the sense that P(max i X i γ) lim γ P(S(X) γ, max i X i γ) = 0. We compute l for various values of the common correlation coefficient = Σ i,j Σi,i Σ j,j. d = 10, µ i = i 10, σ 2 i = i (i = 1,...,d), γ = 5 105. We used the MCIS estimator with m = 5 10 5 and a Markov chain sample n = 80 obtained using splitting. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.17/24

Numerical results I relative error % MCIS est. l MCIS CEVE ISVE 0 1.795092 10 5 0.0092 0.0063 0.0069 0.4 1.8077 10 5 0.093 0.17 0.23 0.7 1.9014 10 5 0.04 0.65 2.85 0.9 2.0735 10 5 0.068 0.63 2.80 0.93 2.0997 10 5 0.17 3.5 4.59 0.95 2.1412 10 5 0.11 6 8.09 0.99 2.1882 10 5 0.29 15 4.35 Semi-Parametric Importance Sampling for Rare-event probability Estimation p.18/24

Numerical setup II Both the ISVE and CEVE estimators are strongly efficient, so we expect that these estimators will eventually outperform the MCIS estimator. This intuition is confirmed in the next table describing 14 cases with increasing values of the threshold γ. We use the same algorithmic and problem parameters, except that = 0.9 in all cases and the threshold parameter depends on the case number c according to the formula: γ = 5 10 c+3, c = 1,...,14. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.19/24

Numerical results II relative error % efficiency case c = ISVE estimate MCIS ISVE MCIS ISVE 1 3.9865 10 4 0.049 1.6 25 1500 2 2.0802 10 5 0.067 4.3 75 10000 3 6.4385 10 7 0.077 5.5 64 22000 4 1.2039 10 8 0.041 3.0 21 5000 5 1.3468e 10 10 0.034 6.1 12 20000 6 8.9791 10 13 0.036 7.6 17 30000 7 3.5899 10 15 0.043 0.014 27 0.11 8 8.5302 10 18 0.079 0.029 81 0.50 9 1.2082 10 20 0.025 0.024 7 0.32 10 1.0148 10 23 0.042 0.00064 28 0.00021 11 5.0428 10 27 0.042 0.00030 27 4.6 10 5 12 1.4898 10 30 0.024 0.00014 7 1.0 10 5 13 2.5961 10 34 0.023 3.87 10 15 8.2 7.7 10 27 14 2.6754 10 38 0.012 4.22 10 13 2.6 9.1 10 23 Semi-Parametric Importance Sampling for Rare-event probability Estimation p.20/24

L 2 properties ssume that: X 1,...,X n iid κt 1 (x θ), where κ t 1 is the (t 1) step transition density. κ is the transition density of systematic Gibbs sampler. κ 2 (y x) π(x) π(y) dydx <. Warning: this can be difficult to verify All states of the chain can be accessed using a single step of the chain (irreducability condition). hen we have the following result for l = f(y)h(y) π(y). Semi-Parametric Importance Sampling for Rare-event probability Estimation p.21/24

L 2 theoretical properties We have the following bound on the Neymann χ 2 distance: E [ ( l l) 2 ll ] ( ) κ 2 = 1+ t (y θ) dy + 1 Eκ 2 (y X) κ 2 t(y θ) dy π(y) n π(y) }{{}}{{} χ 2 component variance component V(θ)e rt +O(1/n), where V is a positive Liapunov function and r > 0 is a constant (the geometric rate of convergence of the Gibbs sampler). Note that the above is NOT the relative error, because the denominator has l, instead of l. Semi-Parametric Importance Sampling for Rare-event probability Estimation p.22/24

Conclusions We have presented a new method for integration that combines MCMC with importance sampling. We argue that the method is preferable to methods based purely on MCMC or importance sampling. The method dispenses with the traditional parametric modeling used in the design of importance sampling densities Unlike MCMC based methods, the proposed method provides unbiased estimators and estimation of relative error and confidence bands is straightforward Semi-Parametric Importance Sampling for Rare-event probability Estimation p.23/24

Some interesting links Can we use large deviations theory to sample exactly from the minimum variance density, instead of using MCMC? Idea related to empirical likelihood and Rao-Blackwellization in classical statistics. Essentially we construct an estimator based on empirical likelihood arguments. thank you for your attention Semi-Parametric Importance Sampling for Rare-event probability Estimation p.24/24