Importance Sampling and. Radon-Nikodym Derivatives. Steven R. Dunbar. Sampling with respect to 2 distributions. Rare Event Simulation

Similar documents
BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Topic 3: Hypothesis Testing

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Review. December 4 th, Review

Preliminary Statistics. Lecture 5: Hypothesis Testing

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Summary of Chapters 7-9

Chapter 9: Hypothesis Testing Sections

Hypothesis testing (cont d)

Introductory Econometrics. Review of statistics (Part II: Inference)

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Hypothesis Testing - Frequentist

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Lecture 16 November Application of MoUM to our 2-sided testing problem

Topic 15: Simple Hypotheses

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Math 152. Rumbos Fall Solutions to Exam #2

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Hypothesis Testing One Sample Tests

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Introduction to Signal Detection and Classification. Phani Chavali

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Primer on statistics:

Mathematical Statistics

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Lectures 5 & 6: Hypothesis Testing

Solutions to Practice Test 2 Math 4753 Summer 2005

F79SM STATISTICAL METHODS

Central Limit Theorem ( 5.3)

STA 2101/442 Assignment 2 1

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Math Review Sheet, Fall 2008

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Statistical Inference

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Practice Problems Section Problems

Math489/889 Stochastic Processes and Advanced Mathematical Finance Solutions for Homework 7


Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

14.30 Introduction to Statistical Methods in Economics Spring 2009

Introduction to Statistical Inference

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

8: Hypothesis Testing

Section 10: Role of influence functions in characterizing large sample efficiency

f (1 0.5)/n Z =

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

FYST17 Lecture 8 Statistics and hypothesis testing. Thanks to T. Petersen, S. Maschiocci, G. Cowan, L. Lyons

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Confidence Intervals and Hypothesis Tests

STAT 830 Hypothesis Testing

Lecture 8: Information Theory and Statistics

DA Freedman Notes on the MLE Fall 2003

Statistical inference

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Hypothesis Testing Chap 10p460

14.30 Introduction to Statistical Methods in Economics Spring 2009

Solution E[sum of all eleven dice] = E[sum of ten d20] + E[one d6] = 10 * E[one d20] + E[one d6]

hypothesis a claim about the value of some parameter (like p)

CSCI-6971 Lecture Notes: Probability theory

STAT 830 Hypothesis Testing

Institute of Actuaries of India

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

14 : Approximate Inference Monte Carlo Methods

9 Radon-Nikodym theorem and conditioning

Medical statistics part I, autumn 2010: One sample test of hypothesis

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Topic 19 Extensions on the Likelihood Ratio

Testing and Model Selection

Inference for Single Proportions and Means T.Scofield

P Values and Nuisance Parameters

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Gov Univariate Inference II: Interval Estimation and Testing

Spring 2012 Math 541B Exam 1

An Informal Introduction to Statistics in 2h. Tim Kraska

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Understanding p Values

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

DETECTION theory deals primarily with techniques for

LECTURE 5. Introduction to Econometrics. Hypothesis testing

ECE531 Lecture 4b: Composite Hypothesis Testing

Lecture 2: Repetition of probability theory and statistics

Non-parametric Inference and Resampling

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Some General Types of Tests

STA 711: Probability & Measure Theory Robert L. Wolpert

Transcription:

1 / 33

Outline 1 2 3 4 5 2 / 33

More than one way to evaluate a statistic A statistic for X with pdf u(x) is A = E u [F (X)] = F (x)u(x) dx 3 / 33 Suppose v(x) is another probability density such that is well-defined. L(x) = u(x) v(x) Then A = E v [F (X)L(X)] = F (x) u(x) v(x) dx v(x)

Likelihood ratio The likelihood ratio is so L(x) = u(x) v(x) A = E u [F (X)] = E v [F (X)L(X)] To get A we can either: take samples with density u(x) calculate with F ; or take samples with density v(x), calculate with F L. 4 / 33

Good choice of likelihood ratio How should we seek v(x) and corresponding L(x)? Reduction of variance is a good criterion: Want [ Var v [F (X)L(X)] = E ] v (F (X)L(X)) 2 A 2 less than [ Var u [F (X)] = E ] u (F (X)) 2 A 2 5 / 33

Example: Large values for a normal r.v. Suppose X N(0, 1) and we wish to evaluate P [X > 4] R code for probability: 1 - pnorm(4) The output [1] 3.167124e-05 6 / 33

Sampling Code for sampling sample <- rnorm(10^5) sum( sample > 4 ) sample[ which(sample > 4) ] The output [1] 6 [1] 4.110055 4.970314 4.026803 4.133050 4.014239 Most of the sample doesn t count, "wasted effort". The samples with X > 4 are only a little larger than 4. 7 / 33

A better choice of sampling PDF Draw samples from N(4, 1) instead. Put most of the weight in sampling near 4. Likelihood ratio is L(x) = u(x) v(x) = ke x2/2 ke (x 4)2 /2 = e42 /2 e 4x. 8 / 33

New sampling The v-method, with sample draw according to the new pdf: P N(0,1) [X > 4] = 4 L(X)v(x) dx 1 N = e42 /2 N L(v k ) v k >4 v k >4 e 4v k 9 / 33

New sampling example Code for sampling vsample <- rnorm(10^5,4,1) usevsample <- vsample[which(vsample > 4)] Pv <- exp(4^2/2)*sum(exp(-4*usevsample))/10^5 Pv The output [1] 3.17203e-05 10 / 33

Intuition behind Sampling About half the samples are counted, but they are counted with a small weight. A lot of small weight hits give a lower variance estimator than a few large weight hits. 11 / 33

Digression: Reduction of Variance Variance for the naive sampling: /2 P = P N(0,1) [X > 4] = 1 [X>4] (x) e x2 2π dx [ Var u [F (X)] = E ] u (F (X)) 2 P 2 = (1 [X>4] (x)) 2 /2 e x2 2π = P P 2 P dx 12 / 33 Variance is same order as P. (The standard deviation will be relatively large compared to P.)

Digression: Reduction of Variance Naive Sample Variance P <- sum((sample > 4)^2)/10^5 P-P^2 Results [1] 5.99964e-05 13 / 33

Digression: Reduction of Variance sampling: L(x) = e 42 /2 e 4x Variance for importance sampling: [ Var v [F (X)] = E ] u (F (X)L(X)) 2 P 2 = (1 [X>4] (x)e 42 /2 e 4x ) 2 /2 e (x 4)2 dx P 2 2π 14 / 33

Digression: Reduction of Variance Sample Variance secmomv <- sum((exp(4^2/2)*exp(-4*usevsample))^2 secmomv - Pv^2 Results [1] 4.550243e-09 15 / 33

Digression: Reduction of Variance What is best normal distribution N(b, 1) for importance sampling of P = P N(0,1) [X > 4]? After some calculation with L(x): Var v [ 1[X>4] ] = e b 2 (1 Φ(b + 4)) P 2 ( Φ(x) is the cdf of the N(0, 1) distribution. ) 16 / 33

Digression: Reduction of Variance Unfortunately, (1 Φ(b + 4)) is 0 to precision-levels for 3 b 5 leading to severe numerical error in e b2 (1 Φ(b + 4)) P 2. Use standard asymptotic estimates for the normal cdf tail: 1 ) (e (b+4)2 /2 e (b+5)2 /2 2π(b + 5) 1 Φ(x) 1 ) (e (b+4)2 /2 2π(b + 4) 17 / 33

Digression: Reduction of Variance Using the estimates with L(x) provides algebraic simplification which reduces numerical error. Using a N(b, 1) with b 4.1 gives minimum variance. 18 / 33

New Measures from Old If Q is a probability, can define a new probability measure dp = L(x) dq provided L(x) 0 (almost surely w.r.t. Q). L(x) dq = 1. Ω Then E P [F (X)] = E Q [F (X)L(X)] 19 / 33

Relation between measures Ask the reverse question: Given measures P and Q, is there an L(x) relating them? Necessary condition: Q[B] = 0 = P[B] = 0 because P[B] = E P [1 B (X)] = E Q [1 B (X)L(X)] = 0 Impossible under Q is also impossible under P. 20 / 33

If Q[B] = 0 = P[B] = 0 we say P is absolutely continuous w.r.t. Q and write P Q. Impossible under Q is also impossible under P. 21 / 33

Theorem The R-N Theorem says the necessary condition of absolute continuity is also sufficient: Theorem P and Q are probability measures on common σ-algebra F; and P is absolutely continuous w.r.t. Q then there is a function L(x) = dp dq. 22 / 33 so that E P [F (X)] = E Q [F (X)L(X)].

Completely Singular If there is a B with P[B] = 0 and Q[B] = 1 we say Q is completely singular w.r.t. P. Note: P[B C ] = 1 and Q[B C ] = 0 so then P is completely singular w.r.t. Q and we write P Q 23 / 33

Simple Example 1 Then X U([0, 1]) with probability P Y N(0, 1) with probability Q P is absolutely continuous w.r.t. Q ( Q[B] = 0 = P[B] = 0 ) Q is not absolutely continuous w.r.t. P ( P[1, 2] = 0 but Q[1, 2] 0.136 ) 24 / 33

Simple Example 2 X R 2, X N((0, 0), I 2,2 ) with probability P Y = X/ X, Y U(S 1 ) with probability Q Then P[S 1 ] = 0 and Q[S 1 ] = 1 P Q. 25 / 33

Sophisticated Example 26 / 33 Then X U([0, 1]) with probability P Y has the Cantor distribution on [0, 1], with probability Q Q has a continuous c.d.f. (Devil s staircase), but the p.d.f. does not exist in any easy sense. P Q. This example shows that absolute continuity and completely singular are extensions of the calculus definitions for functions from real analysis.

Example of an R-N derivative Previous sampling example: X N(0, 1), with probability P, p.d.f. e x2 /2 Y N(4, 1), with probability Q, p.d.f e (x 4)2 /2 dp dq = e42 /2 e 4x 27 / 33

Bigger example of an R-N derivative X R n, X N(0, C n,n ), H = C 1. Y R n, Y N(µ, C n,n ), H = C 1. u(x) = k n e xt Hx/2 v(x) = k n e (xt µ)h(x µ)/2 = k n e µt Hµ/2 e µt Hx e xt Hx/2 28 / 33

Continuation, Bigger Example Let so µ T H = ν T Hµ = ν µ = Cν dp dq = eνt µ/2 e νt x 29 / 33

Deciding the Distribution Have a single sample point: X. Is: X P (null hypothesis ); or X Q (alternate hypothesis) Test: Criterion is set B: If x B say Q If x B C say P 30 / 33

Types of errors Type I error is saying Q when truth is P (reject null and accept alternative when null is true) Type II error is saying P when truth is Q (accept null and reject alternative when null is false) (Usually we believe) Type I errors are worse than Type II 31 / 33

Confidence in the test The confidence in the procedure is P [ B C] = 1 P [B], probability of accepting null, when the null is true. The power in the procedure is Q [B] of rejecting null, when the null is false. If Q P, there is a test with 100% confidence and 100% power. Conversely, If there is a test with 100% confidence and 100% power, then Q P, 32 / 33

Neyman-Pearson Lemma A test is efficient if there is no way to increase the confidence without decreasing the power. Neyman-Pearson Lemma If B is efficient, then there is L 0 such that { } dq B = x : dp > L 0 33 / 33