Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 8: Importance Sampling

Similar documents
Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 1: Introduction

Answers and expectations

Sequential Monte Carlo Methods for Bayesian Computation

01 Probability Theory and Statistics Review

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Econometrics I, Estimation

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Sequential Monte Carlo Methods (for DSGE Models)

Sequential Bayesian Updating

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Introduction to Bayesian Inference

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Lecture 5: GPs and Streaming regression

Empirical Risk Minimization is an incomplete inductive principle Thomas P. Minka

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture Particle Filters

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Introduction to Probability and Statistics (Continued)

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Monte Carlo Methods. Geoff Gordon February 9, 2006

Autonomous Mobile Robot Design

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

ST5215: Advanced Statistical Theory

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

F denotes cumulative density. denotes probability density function; (.)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Patterns of Scalable Bayesian Inference Background (Session 1)

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Lecture Note 1: Probability Theory and Statistics

Improved diffusion Monte Carlo

Stochastic Gradient Descent

Quantitative Biology II Lecture 4: Variational Methods

IEOR 4703: Homework 2 Solutions

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Density Estimation. Seungjin Choi

Data Uncertainty, MCML and Sampling Density

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Bayesian Ingredients. Hedibert Freitas Lopes

Bayesian Inference and MCMC

Lecture 6: Bayesian Inference in SDE Models

Approximate Bayesian Computation and Particle Filters

ENGR352 Problem Set 02

AUTOMOTIVE ENVIRONMENT SENSORS

Intelligent Systems I

State-Space Methods for Inferring Spike Trains from Calcium Imaging

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Bias-Variance Tradeoff

Cheng Soon Ong & Christian Walder. Canberra February June 2018

LECTURE NOTE #3 PROF. ALAN YUILLE

PATTERN RECOGNITION AND MACHINE LEARNING

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Machine Learning, Fall 2012 Homework 2

Particle Filters. Outline

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Announcements. Proposals graded

Nonlinear ensemble data assimila/on in high- dimensional spaces. Peter Jan van Leeuwen Javier Amezcua, Mengbin Zhu, Melanie Ades

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

An introduction to Sequential Monte Carlo

Lecture 7 Introduction to Statistical Decision Theory

Lecture 8: Information Theory and Statistics

Statistical Methods in Particle Physics

Lecture 7 and 8: Markov Chain Monte Carlo

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Lecture 8: Signal Detection and Noise Assumption

Introduction to Machine Learning

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

an introduction to bayesian inference

Mathematical Preliminaries

Lecture 13 Fundamentals of Bayesian Inference

Combining Interval and Probabilistic Uncertainty in Engineering Applications

CPSC 540: Machine Learning

Notes on pseudo-marginal methods, variational Bayes and ABC

Bayesian Machine Learning

Gaussians Linear Regression Bias-Variance Tradeoff

Introduction to Statistical Learning Theory

1 Stat 605. Homework I. Due Feb. 1, 2011

Math Bootcamp 2012 Miscellaneous

Robert Collins CSE586, PSU Intro to Sampling Methods

Introduction to Algorithmic Trading Strategies Lecture 10

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Introduction to Self-normalized Limit Theory

Parametric Techniques Lecture 3

Introduction to Machine Learning

Quantile Filtering and Learning

Exercises with solutions (Set D)

Lecture 7: Optimal Smoothing

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Linear Dynamical Systems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Auto-Encoding Variational Bayes

Fundamentals of Data Assimila1on

Transcription:

Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 8: Importance Sampling

8.1 Importance Sampling Importance sampling is a sampling method with a reduced variance. Assume that we are interested in the evaluation of the following integral E p [f ] = f (x)p(x)dx where p(x) is a probability density. Let g(x) is another probability density with a support containing the support of f (x) and it is easy to draw a sample from g(x). Importance sampling use the following idea of a change of variables f (x)p(x) f (x)p(x)dx = g(x)dx g(x)

8.1 Importance Sampling If {x i } is IID from g(x), the integral is approximated by E p [f ] 1 n n i f (x i )p(x i ) g(x i ) = 1 n n w i f (x i ) i where w i = p(x i ) g(x i ) is the weight of x i. Note that there is no g(x i ) in the numerator.

8.1 Importance Sampling Example: Small tail probabilities: rare events. Among many other applications of the importance sampling, a small tail probability is a good example related to rare events. Let we are interested in the following probability using a Monte Carlo method µ(z > 4.5) = 4.5 1 2π e x2 /2 dx where Z is the standard normal random variable. As we know the analytic form of the density that is easy to integrate, the probability we are looking for is 3.39 10 6, a really small probability.

8.1 Importance Sampling Example: Small tail probabilities: rare events. A programming tip: a fast calculation for counting (code example in Matlab/Python) What is your expected sample size to calculate the small probability? The small probability is of order 10 6. This means that we can expect only a few values larger than 4.5 out of million sample values. Try in Matlab/Python

8.1 Importance Sampling Example: Small tail probabilities: rare events. A programming tip: a fast calculation for counting (code example in Matlab/Python) What is your expected sample size to calculate the small probability? The small probability is of order 10 6. This means that we can expect only a few values larger than 4.5 out of million sample values. Try in Matlab/Python Importance sampling using a fat tail distribution (we used the Cauchy density) increases the accuracy (or decrease the sample size to estimate the small density).

8.2 Finite Variance The variance of the importance sampling estimator is [ f 2 (x)p 2 ] [ ] (x) f (x)p(x) 2 E g g 2 E g. (x) g(x) To have a finite variance, we need to have f 2 (x)p 2 (x) dx <. g(x) That is, the ratio p(x)/g(x) must be bounded. This implies that the tails of g(x) must be fatter than those of p(x).

8.2 Finite Variance Theorem. The choice of g(x) that minimizes the variance of the importance sampling estimator is g (x) = f (x) p(x) f (s) p(s)ds Idea of Proof. The variance of the importance sampling estimator is given by [ f 2 (x)p 2 ] [ ] (x) f (x)p(x) 2 E g g 2 E g (x) g(x) where the second term is independent of g(x). From the Jensen s inequality, the lower bound of the first term is [ f 2 (x)p 2 ] ( [ ]) (x) f (x) p(x) 2 E g g 2 E g (x) g(x) The lower bound is obtained by g (x) = f (x) p(x) f (s) p(s)ds.

8.2 Finite Variance An alternative importance sampling estimator with increased stability is n f (x i )p(x i ) i g(x i ) n p(x i ) i g(x i ) instead of the standard importance sampling estimator 1 n n i f (x i )p(x i ). g(x i ) This is motivated by the following convergence 1 n n i p(x i ) g(x i ) 1 as n This estimator is biased but the bias is small.

8.2 Finite Variance Also, the following approach is preferred to achieve a stable importance density g(x) with fat tails g(x) = ρh(x) + (1 ρ)l(x), 0 < ρ < 1 where h(x) is close to p(x) and l(x) has fat tails.

8.3 Sequential Importance Sampling Example: Target tracking (Gordon et al. 1993). We consider a tracking problem where an object (an airplane, a pedestrian or a ship) is observed through some noisy measurement of its angular position Z t at time t. Of interests are the position (X t, Y t ) of the object in the plane and its speed (Ẋ t, Y t ). The model is then discretized as Ẋ t = X t+1 X t, Y t = Y t+1 Y t, and Ẋ t = Ẋ t 1 + τɛ x t Y t = Y t 1 + τɛ y t Z t = arctan(y t /X t ) + ηɛ z t, where ɛ x t, ɛ y t, and ɛ z t are iid N(0, 1) random variables. We are interested in p t (θ t z 1:t ) where θ t = (τ, η, Ẋ t, Y t ) and z 1:t denotes the vector (z 1, z 2,..., z t ).

8.3 Sequential Importance Sampling The importance sampling is useful when a sequence of target distributions p t (x) (where t is an index for time) is available. Let observations v t is available at times t = m t where m R and t is a time interval. Then we have a new posterior density p t (x) using the new observation v t p t (x) = p(x(t) v t ) p(x(t))p(v t x(t))

8.3 Sequential Importance Sampling Drawing a sample from the posterior at each step is costly. Importance sampling is ideally suited for this problem in that the densities p t and p t+1 are defined in the same space. That is, use the prior density as your importance density for the posterior density. The weight is given by w(t) = w(t 1)f t (x)/g t (x) where f t (x) is the posterior and g t (x) is the prior.

8.3 Sequential Importance Sampling Weight Degeneracy. w(t) = w(t 1) f t(x) g t (x) = w(t 2) f t(x)f t 1 (x) g t (x)g t 1 (x) = w(0)π f t i(x) g t i (x). Equivalently, ( ) w(t) = w(0) exp ln(ft i (x)/g t i (x)). In the special case where g t i and f t i are both independent of time, approximately we have w(t) exp ( te g [ln g(x)/f (x)]). Thus as t, w degenerates to 0 w(t) 0.

Homework 1 For the normal-cauchy Bayes estimator δ(x) = θ e 1+θ 2 (x θ)2 /2 dθ 1 e 1+θ 2 (x θ)2 /2 dθ, which is the posterior mean of θ with a Cauchy prior density and a normal likelihood, 1.1 Use Monte Carlo integration to calculate the integral. 1.2 What is the standard error with a sample size n = (a) 100, (b) 1000 and (c) 10000. 2 For a Gaussian random variable X with a mean 0 and a variance σ 2, prove that E[e X 2 ] = 1 2σ 2 + 1.

Homework 3 Make a one-paragraph summary of particle filtering (max 5 sentences).